TLDR

Quick tips on prompt-engineering for structured generation:

  • Images is the best modality for a lot of tasks
  • try modality="image+text" for challenging documents
  • Be conservative with reasoning fields
  • Check if the chain of thought inside the reasoning fields makes sense
  • The system prompt matters a lot
  • Ensure images are upright and not rotated
  • Focus on prompt engineering optimization before considering fine-tuning

UiForm’s approach

We use the standard JSON Schema format to organize all the directives needed to prompt an LLM for data extraction. The schema includes three custom annotations that provide additional context and guidance for large language models (LLMs):

  • X-SystemPrompt
    A top-level directive that provides general instructions or context for the LLM, ensuring consistent behavior and improving the relevance of its responses during the data extraction process.

  • X-ReasoningPrompt
    This annotation creates an auxiliary field for generating reasoning or explanatory context about a property. It allows the LLM to provide additional insights or justifications for extracted values, which can be helpful in complex or ambiguous scenarios.

These annotations help ensure structured and precise interactions with LLMs while remaining compatible with standard JSON Schema conventions.

Our promptify method can analyze and refine the directives provided in the schema to improve overall performance and response quality. Importantly, this optimization process leaves the extraction parameters, such as the schema’s structure and field definitions, unchanged to ensure consistency in data processing.

X-Directives

Here is a full example of a schema with all the X-Directives:

{
  "X-SystemPrompt": "You are a useful assistant extracting information from documents.",
  "properties": {
    "name": {
      "description": "The name of the calendar event.",
      "title": "Name",
      "type": "string"
    },
    "date": {
      "X-ReasoningPrompt": "The user can mention it in any format, like **next week** or **tomorrow**. Infer the right date format from the user input.",
      "description": "The date of the calendar event in ISO 8601 format.",
      "title": "Date",
      "type": "string"
    }
  },
  "required": [
    "name",
    "date"
  ],
  "title": "CalendarEvent",
  "type": "object"
}

X-SystemPrompt

A top-level directive that provides general instructions or context for the LLM, ensuring consistent behavior and improving the relevance of its responses during the data extraction process.

{
  "X-SystemPrompt": "You are a useful assistant extracting information from documents.",
  ...
}

X-ReasoningPrompt

Generates a reasoning field alongside the data field.

{
  "X-ReasoningPrompt": "The user can mention it in any format, like **next week** or **tomorrow**. Infer the right date format from the user input.",
  ...
}

This schema should validate objects like this:

{
  "name": "Example string value.",
  "date": "Example string in object."
}

However, the LLM will internally produce additional reasoning fields for better extraction, such as:

{
  "name": "Example string value.",
  "reasoning___date": "Reasoning for date.",
  "date": "Example string in object."
}

As you can see, apart from the “reasoning___” fields, the LLM output follows the same structure as your supplied schema.

Python’s Pydantic BaseModel Support

You can define the custom annotations in the pydantic.Field class using the json_schema_extra field.

Here is a minimalistic example with everything you should need:

from pydantic import BaseModel, Field, ConfigDict

class CalendarEvent(BaseModel):
    model_config = ConfigDict(json_schema_extra = {"X-SystemPrompt": "You are a useful assistant."})

    name: str = Field(...,
        description="The name of the calendar event.",
    )
    date: str = Field(...,
        description="The date of the calendar event in ISO 8601 format.",
        json_schema_extra={
            "X-ReasoningPrompt": "The user can mention it in any format, like **next week** or **tomorrow**. Infer the right date format from the user input.",
        }
    )

If you need a json_schema, you can convert the BaseModel to model_json_schema:

CalendarEvent.model_json_schema()

Complete Example

You will need to use uiform’s Schema object to leverage the directives in the schema.

from uiform import UiForm, Schema
from openai import OpenAI
from pydantic import BaseModel, Field, ConfigDict

uiclient = UiForm()
doc_msg = uiclient.documents.create_messages(
    document = "document_1.xlsx"
)

class CalendarEvent(BaseModel):
    model_config = ConfigDict(json_schema_extra = {"X-SystemPrompt": "You are a useful assistant."})

    name: str = Field(...,
        description="The name of the calendar event.",
    )
    date: str = Field(...,
        description="The date of the calendar event in ISO 8601 format.",
        json_schema_extra={
            'X-ReasoningPrompt': 'The user can mention it in any format, like **next week** or **tomorrow**. Infer the right date format from the user input.',
        }
    )

schema_obj =Schema(
    pydantic_model = CalendarEvent
)

# Now you can use your favorite model to analyze your document
client = OpenAI()
completion = client.beta.chat.completions.parse(
    model="gpt-4o",
    messages=schema_obj.openai_messages + doc_msg.openai_messages,
    response_format=schema_obj.response_format_pydantic
)

# Validate the response against the original schema if you want to remove the reasoning fields
from uiform._utils.json_schema import filter_auxiliary_fields_json
assert completion.choices[0].message.content is not None
extraction = schema_obj.pydantic_model.model_validate(
    filter_auxiliary_fields_json(completion.choices[0].message.content, schema_obj.pydantic_model)
)

print("Extraction:",extraction)

Go further