Introduction

The client.documents module offers a consolidated, production-grade pipeline for converting heterogeneous documents into chat-ready payloads for OpenAI, Anthropic, and Gemini endpoints. Our model read documents the way humans do. It accepts native digital files (Images, PDFs, DOCX, XLSX, E-mail) and parses text, detects visual structure across pages, tables, forms, and figures, and re-assembles the content in logical reading order before delivering it in the exact message schema required by each provider. Please check the API Reference for more details.

The module exposes three high-level methods:

MethodPurposeTypical Scenario
create_messagesGenerates a verbatim, chat-formatted rendition of the document.Retrieval-augmented generation or “chat with your PDF”.
create_inputsWraps the document in a developer prompt targeting a supplied JSON schema.Function-calling or structured extraction with JSON mode.
extractions.parse / extractions.streamExecutes the extraction and returns the parsed object (optionally with consensus voting).One-step OCR + LLM parsing when only the structured output is required.

In practice, the workflow is straightforward: load a file, invoke the method that matches the desired outcome, and pass the returned openai_messages array directly to client.chat.completions.create(). The complexities of OCR, layout reconstruction, and prompt scaffolding are handled internally, allowing to focus solely on downstream model logic.

Create Messages

Converts any document into OpenAI-compatible chat messages. You can choose between different preprocessing parameters according to your needs: modalities (text, image, native) and image settings (dpi, browser_canvas, etc..).

Returns
DocumentMessage Object

A DocumentMessage object with the messages created from the document.

from uiform import UiForm
from openai import OpenAI

uiclient = UiForm()
doc_msg = uiclient.documents.create_messages(
    document = "freight/booking_confirmation.jpg",
    modality = "text",
    image_resolution_dpi = 72,
    browser_canvas = "A4"
)
Use doc_msg.items to have a list of [PIL.Image.Image | str] objects

Create Inputs

Converts any document and a json schema into OpenAI-compatible responses input. You can choose between different preprocessing parameters according to your needs: modalities (text, image, native) and image settings (dpi, browser_canvas, etc..).

Returns
DocumentMessage Object

A DocumentMessage object with the document content structured according to the provided JSON schema.

from uiform import UiForm

uiclient = UiForm()
doc_input = uiclient.documents.create_inputs(
    document = "freight/invoice.pdf",
    json_schema = {
        "properties": {
            "invoice_number": {
                "type": "string",
                "description": "The invoice number"
            },
            "total_amount": {
                "type": "number",
                "description": "The total invoice amount"
            },
            "issue_date": {
                "type": "string",
                "description": "The date the invoice was issued"
            }
        },
        "required": ["invoice_number", "total_amount", "issue_date"]
    },
    modality = "text",
    image_resolution_dpi = 72,
    browser_canvas = "A4"
)

Extractions

Returns
ParsedChatCompletion

An OpenAI ParsedChatCompletion object with the extracted data.


from uiform import UiForm

uiclient = UiForm()

doc_msg = uiclient.documents.extractions.parse(
    document = "freight/booking_confirmation.jpg", 
    model="gpt-4.1-nano",
    json_schema = {
      'X-SystemPrompt': 'You are a useful assistant.',
      'properties': {
          'name': {
              'description': 'The name of the calendar event.',
              'title': 'Name',
              'type': 'string'
          },
          'date': {
              'description': 'The date of the calendar event in ISO 8601 format.',
              'title': 'Date',
              'type': 'string'
          }
      },
      'required': ['name', 'date'],
      'title': 'CalendarEvent',
      'type': 'object'
    },
    modality="text",
    n_consensus=1 # 1 means disabled (default), if greater than 1 it will run the extraction with n-consensus mode
)