Making Files LLM-Ready
Convert any document type into a format suitable for LLMs
Introduction
UiForm’s document processing pipeline automatically converts various file types into LLM-ready formats, eliminating the need for custom parsers. This guide explains how to process different document types and understand the resulting output format.
Supported File Types
UiForm supports a wide range of document formats:
- Text Documents: PDF, DOC, DOCX, TXT
- Spreadsheets: XLS, XLSX, CSV
- Emails: EML, MSG
- Images: JPG, PNG, TIFF
- Presentations: PPT, PPTX
- And more: HTML, XML, JSON
Basic Document Processing
Here’s how to convert a document into an LLM-ready format:
The create_messages
method returns a standardized message format:
Document Processing Options
Text Operations
You can provide regex instructions to help identify specific patterns in the text:
The regex results will be available in the response under regex_instruction_results
, with each match containing:
instruction
: The original regex instruction (name, pattern, description)hits
: Array of strings matching the pattern
Image Operations
You can configure various image processing options:
The image operations support these configurations:
correct_image_orientation
: Automatically rotates images to correct orientation if needdpi
: Set image DPI resolution (“auto” or specific integer)image_to_text
: Choose text extraction method:ocr
: Traditional OCR processingllm_description
: AI-generated image description
browser_canvas
: Set document canvas size:A3
: 11.7in x 16.54inA4
: 8.27in x 11.7in (default)A5
: 5.83in x 8.27in
Modality Control
You can specify the document processing modality using the modality
parameter:
Available modalities from the endpoints:
native
: Default processing modetext
: Text-only processing modeimage
: Image-only processing modenative+text
: Native processing mode (text or image depending on the document type) + text content
The chosen modality will be reflected in the response under the modality
field.
Supported Models
You can use any of these supported models:
- Claude-3 series (claude-3-5-sonnet-latest, claude-3-opus-20240229, etc.)
- GPT-4o series (gpt-4o, gpt-4o-mini, etc.)
- Gemini series (gemini-1.5-pro, gemini-1.5-flash, etc.)
- Grok-2 series (grok-2-vision-1212, grok-2-1212)
List available models using:
Rate Limits
The API has the following rate limits:
- 300 requests per 60-second window
- Applies to document processing endpoints (
create_messages
andextractions
) - Returns 429 status code when exceeded
Best Practices
- Rate Limits: Stay within the 300 requests per minute limit
- Text Operations: Use regex instructions when specific pattern matching is needed
- Modality: Specify the desired modality when default processing isn’t suitable