Documents Methods
In this section, we will see how to use the methods of the documents
client.
Create Messages
Creates messages from a document for use with LLM models
The document to process. Can be a file path, string content, bytes IO object, or PIL Image object.
The modality to use for processing the document. Can be:
- “native” (default) - Uses the document’s native modality based on file type
- “text” - Process as text
- “image” - Process as image
- “audio” - Process as audio
- “video” - Process as video
Optional image preprocessing operations:
A DocumentMessage object with the messages created from the document.
A unique identifier for the document loading.
The type of object being loaded. Always “document.message”.
A list of messages containing the document content and metadata.
The Unix timestamp (in seconds) of when the document was loaded.
The modality of the document to load.
The document being loaded.
Returns the items in the document as a list of strings or PIL Images.
Returns the messages in OpenAI’s format.
Returns the system message in Anthropic’s Claude format.
Returns the messages in Anthropic’s Claude format.
Returns the messages in Google’s Gemini format.
doc_msg.items
to have a list of [PIL.Image.Image | str]
objectsCorrect image orientation
Corrects the orientation of an image using the UiForm API.
The input image to correct. Can be:
- A file path (Path or str)
- A file-like object (IOBase)
- A MIMEData object
- A PIL Image object
The orientation-corrected image as a PIL Image object
Extractions
extractions.parse
Extract structured data from a document using a JSON schema
The JSON schema defining the structure to extract. Can be a dict, file path, or string.
The document to extract from. Can be a file path, string, or bytes IO object.
Optional image preprocessing operations:
The model to use for extraction.
The sampling temperature to use.
Optional list of previous messages to include. Each message must have:
The role of the message sender. Must be one of:
- “user”
- “system”
- “assistant”
The content of the message. Can be a string or array of content parts:
The modality to use for processing the document. Can be:
- “native” (default) - Uses the document’s native modality based on file type
- “text” - Process as text
- “image” - Process as image
- “audio” - Process as audio
- “video” - Process as video
An OpenAI ParsedChatCompletion object with the extracted data.
extractions.stream
Extract structured data from a document using a JSON schema
The JSON schema defining the structure to extract. Can be a dict, file path, or string.
The document to extract from. Can be a file path, string, or bytes IO object.
Optional image preprocessing operations:
The model to use for extraction.
The sampling temperature to use.
Optional list of previous messages to include. Each message must have:
The role of the message sender. Must be one of:
- “user”
- “system”
- “assistant”
The content of the message. Can be a string or array of content parts:
The modality to use for processing the document. Can be:
- “native” (default) - Uses the document’s native modality based on file type
- “text” - Process as text
- “image” - Process as image
- “audio” - Process as audio
- “video” - Process as video
An OpenAI AsyncChatCompletionStreamManager[ResponseFormatT] object with the extracted data.