Introduction

UiForm’s document processing pipeline automatically converts various file types into LLM-ready formats, eliminating the need for custom parsers. This guide explains how to process different document types and understand the resulting output format.

Supported File Types

UiForm supports a wide range of document formats:

  • Text Documents: PDF, DOC, DOCX, TXT
  • Spreadsheets: XLS, XLSX, CSV
  • Emails: EML, MSG
  • Images: JPG, PNG, TIFF
  • Presentations: PPT, PPTX
  • And more: HTML, XML, JSON

Basic Document Processing

Here’s how to convert a document into an LLM-ready format:

from uiform import UiForm

uiclient = UiForm()
doc_msg = uiclient.documents.create_messages(
    document = "path/to/your/document.jpg"
)

The create_messages method returns a standardized message format:

{
    "id": "doc_dd003f95-81ce-4a55-9180-00c5a58d82ec",
    "object": "document.message",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Document content here..."
                }
            ]
        }
    ],
    "created": 1736524416,
    "modality": "text",
    "document": {
        "id": "cf908729402d0796537bb91e63df5e339ce93b4cabdcac2f9a4f90592647e130",
        "name": "document.jpg",
        "mime_type": "image/jpeg"
    }
}

Document Processing Options

Text Operations

You can provide regex instructions to help identify specific patterns in the text:

doc_msg = uiclient.documents.create_messages(
    document = "document.pdf",
    prompting_context = {
        "regex_instructions": [
            {
                "name": "vat_number",
                "pattern": r"\b[A-Z]{2}\d{9}\b",
                "description": "VAT number in the format XX999999999"
            }
        ]
    }
)

The regex results will be available in the response under regex_instruction_results, with each match containing:

  • instruction: The original regex instruction (name, pattern, description)
  • hits: Array of strings matching the pattern

Image Operations

You can configure various image processing options:

doc_msg = uiclient.documents.create_messages(
    document = "document.jpg",
    prompting_context = {
        "image_operations": {
            "correct_image_orientation": True,
            "dpi": "auto",  # or specific integer value
            "image_to_text": "ocr",  # or "llm_description"
            "browser_canvas": "A4"  # "A3", "A4", or "A5"
        }
    }
)

The image operations support these configurations:

  • correct_image_orientation: Automatically rotates images to correct orientation if need
  • dpi: Set image DPI resolution (“auto” or specific integer)
  • image_to_text: Choose text extraction method:
    • ocr: Traditional OCR processing
    • llm_description: AI-generated image description
  • browser_canvas: Set document canvas size:
    • A3: 11.7in x 16.54in
    • A4: 8.27in x 11.7in (default)
    • A5: 5.83in x 8.27in

Modality Control

You can specify the document processing modality using the modality parameter:

response = uiclient.documents.extract(
    document = "document.pdf",
    json_schema = schema,
    modality = "native"  # or "text"
)

Available modalities from the endpoints:

  • native: Default processing mode
  • text: Text-only processing mode
  • image: Image-only processing mode
  • native+text: Native processing mode (text or image depending on the document type) + text content

The chosen modality will be reflected in the response under the modality field.

Supported Models

You can use any of these supported models:

  • Claude-3 series (claude-3-5-sonnet-latest, claude-3-opus-20240229, etc.)
  • GPT-4o series (gpt-4o, gpt-4o-mini, etc.)
  • Gemini series (gemini-1.5-pro, gemini-1.5-flash, etc.)
  • Grok-2 series (grok-2-vision-1212, grok-2-1212)

List available models using:

uiclient = UiForm()
models = uiclient.models.list()

Rate Limits

The API has the following rate limits:

  • 300 requests per 60-second window
  • Applies to document processing endpoints (create_messages and extractions)
  • Returns 429 status code when exceeded

Best Practices

  1. Rate Limits: Stay within the 300 requests per minute limit
  2. Text Operations: Use regex instructions when specific pattern matching is needed
  3. Modality: Specify the desired modality when default processing isn’t suitable

Go further