In this section, we will see how to use the methods of the documents
client.
Create Messages
A DocumentMessage object with the messages created from the document.
A unique identifier for the document loading.
The type of object being loaded. Always “document.message”.
messages
array[ChatCompletionUiformMessage]
A list of messages containing the document content and metadata.
The Unix timestamp (in seconds) of when the document was loaded.
The modality of the document to load.
The document being loaded.
Returns the items in the document as a list of strings or PIL Images.
openai_messages
array[ChatCompletionMessageParam]
Returns the messages in OpenAI’s format.
Returns the system message in Anthropic’s Claude format.
Returns the messages in Anthropic’s Claude format.
Returns the messages in Google’s Gemini format.
from uiform import UiForm
from openai import OpenAI
uiclient = UiForm( )
doc_msg = uiclient. documents. create_messages(
document = "freight/booking_confirmation.jpg" ,
modality = "text" ,
image_settings = {
"correct_image_orientation" : True ,
"dpi" : 72 ,
"image_to_text" : "ocr" ,
"browser_canvas" : "A4"
}
)
Use doc_msg.items
to have a list of [PIL.Image.Image | str]
objects
Correct image orientation
The orientation-corrected image as a PIL Image object
from uiform import UiForm
from openai import OpenAI
uiclient = UiForm( )
image = uiclient. documents. correct_image_orientation(
document = "freight/booking_confirmation.jpg" ,
)
An OpenAI ParsedChatCompletion object with the extracted data.
Request (parse)
Request (stream)
Response
from uiform import UiForm
uiclient = UiForm( )
doc_msg = uiclient. documents. extractions. parse(
document = "freight/booking_confirmation.jpg" ,
model= "gpt-4o-mini" ,
json_schema = {
'X-SystemPrompt' : 'You are a useful assistant.' ,
'properties' : {
'name' : {
'X-FieldPrompt' : 'Provide a descriptive and concise name for the event.' ,
'description' : 'The name of the calendar event.' ,
'title' : 'Name' ,
'type' : 'string'
} ,
'date' : {
'X-FieldPrompt' : 'Specify the event date in YYYY-MM-DD format.' ,
'description' : 'The date of the calendar event in ISO 8601 format.' ,
'title' : 'Date' ,
'type' : 'string'
}
} ,
'required' : [ 'name' , 'date' ] ,
'title' : 'CalendarEvent' ,
'type' : 'object'
} ,
modality= "text"
)
Templates
templates.documentai
The Document AI templates allow you to extract structured data from documents using predefined templates inspired by Google Document AI. These templates are designed to handle common document types like invoices, receipts, IDs and more.
templates.documentai.parse
templates.documentai.parse
Extract structured data from a document using a predefined template inspired by Google Document AI
The template to use for extraction. Must be one of:
“bank_statement” - Extract data from bank statements
“contract” - Extract data from contracts
“driver_license” - Extract data from driver licenses
“expense” - Extract data from expense reports
“identity_proofing” - Extract data for identity verification
“invoice” - Extract data from invoices
“passport” - Extract data from passports
“pay_slip” - Extract data from pay slips
“w2” - Extract data from W-2 forms
document
Path | str | IO[bytes] | MIMEData
required The document to extract from. Can be a file path, string, bytes IO object, or MIMEData.
Optional image preprocessing operations:
correct_image_orientation
Whether to correct the image orientation
image_to_text
'ocr' | 'llm_description'
Whether to convert the image to text
The canvas of the browser (default = A4)
model
default: "gpt-4o-2024-08-06"
The model to use for extraction.
The sampling temperature to use.
messages
array[ChatCompletionUiformMessage]
Optional list of previous messages to include. Each message must have:
The role of the message sender. Must be one of:
“user”
“system”
“assistant”
content
string | array[ChatCompletionContentPartParam]
The content of the message. Can be a string or array of content parts:
Show ChatCompletionContentPartParam
An image content part
URL or base64 encoded image data
Detail level: “auto”, “low”, or “high”
An audio content part
Base64 encoded audio data
The modality to use for processing the document. Can be:
“native” (default) - Uses the document’s native modality based on file type
“image+text” - Uses both the document’s native modality and processes as text
“text” - Process as text
“image” - Process as image
“audio” - Process as audio
“video” - Process as video
Whether to store the document and extraction results.
The extraction response containing the structured data.
Request (parse)
Request (stream)
Response
from uiform import UiForm
uiclient = UiForm( )
doc_msg = uiclient. documents. templates. documentai. parse(
document = "freight/booking_confirmation.jpg" ,
model= "gpt-4o-mini" ,
template= "invoice" ,
modality= "text"
)