Introduction

We use the standard JSON Schema format to organize all the directives needed to prompt an LLM for data extraction. The schema includes three custom annotations that provide additional context and guidance for large language models (LLMs):

  • X-SystemPrompt
    A top-level directive that provides general instructions or context for the LLM, ensuring consistent behavior and improving the relevance of its responses during the data extraction process.

  • X-FieldPrompt
    An optional annotation that replaces or complements the description field for a property. It offers the LLM a more detailed understanding of the field, improving its ability to parse or interpret the data accurately.

  • X-ReasoningPrompt
    This annotation creates an auxiliary field for generating reasoning or explanatory context about a property. It allows the LLM to provide additional insights or justifications for extracted values, which can be helpful in complex or ambiguous scenarios.

These annotations help ensure structured and precise interactions with LLMs while remaining compatible with standard JSON Schema conventions.

Our prompt optimization tool (soon to be released) can analyze and refine the directives provided in the schema, such as the X-SystemPrompt and X-FieldPrompt, to improve overall performance and response quality. Importantly, this optimization process leaves the extraction parameters, such as the schema’s structure and field definitions, unchanged to ensure consistency in data processing.

TLDR

Quick tips on prompt-engineering for structured generation:

  • Images is the best modality for a lot of tasks
  • try modality="native+text" for challenging documents
  • Be conservative with reasoning fields
  • Check if the chain of thought inside the reasoning fields makes sense
  • The system prompt matters a lot
  • GPT-4o outperforms Claude-Sonnet for vision tasks (as of January 2025)
  • Ensure images are upright and not rotated
  • Focus on prompt engineering optimization before considering fine-tuning

X-Directives

Here is a full example of a schema with all the X-Directives:

{
  "X-SystemPrompt": "You are a useful assistant extracting information from documents.",
  "properties": {
    "name": {
      "X-FieldPrompt": "Provide a descriptive and concise name for the event.",
      "description": "The name of the calendar event.",
      "title": "Name",
      "type": "string"
    },
    "date": {
      "X-ReasoningPrompt": "The user can mention it in any format, like **next week** or **tomorrow**. Infer the right date format from the user input.",
      "description": "The date of the calendar event in ISO 8601 format.",
      "title": "Date",
      "type": "string"
    }
  },
  "required": [
    "name",
    "date"
  ],
  "title": "CalendarEvent",
  "type": "object"
}

X-SystemPrompt

A top-level directive that provides general instructions or context for the LLM, ensuring consistent behavior and improving the relevance of its responses during the data extraction process.

{
  "X-SystemPrompt": "You are a useful assistant extracting information from documents.",
  ...
}

X-FieldPrompt

An optional annotation that replaces or complements the description field for a property. It offers the LLM a more detailed understanding of the field, improving its ability to parse or interpret the data accurately.

{
  "X-FieldPrompt": "Provide a descriptive and concise name for the event.",
  ...
}

X-ReasoningPrompt

Generates a reasoning field alongside the data field.

{
  "X-ReasoningPrompt": "The user can mention it in any format, like **next week** or **tomorrow**. Infer the right date format from the user input.",
  ...
}

This schema should validate objects like this:

{
  "name": "Example string value.",
  "date": "Example string in object."
}

However, the LLM will internally produce additional reasoning fields for better extraction, such as:

{
  "name": "Example string value.",
  "reasoning___date": "Reasoning for date.",
  "date": "Example string in object."
}

As you can see, apart from the “reasoning___” fields, the LLM output follows the same structure as your supplied schema.

Python’s Pydantic BaseModel Support

You can define the custom annotations in the pydantic.Field class using the json_schema_extra field.

Here is a minimalistic example with everything you should need:

from pydantic import BaseModel, Field, ConfigDict

class CalendarEvent(BaseModel):
    model_config = ConfigDict(json_schema_extra = {"X-SystemPrompt": "You are a useful assistant."})

    name: str = Field(...,
        description="The name of the calendar event.",
        json_schema_extra={"X-FieldPrompt": "Provide a descriptive and concise name for the event."}
    )
    date: str = Field(...,
        description="The date of the calendar event in ISO 8601 format.",
        json_schema_extra={
            "X-ReasoningPrompt": "The user can mention it in any format, like **next week** or **tomorrow**. Infer the right date format from the user input.",
        }
    )

If you need a json_schema, you can convert the BaseModel to model_json_schema:


Extract informations from a document

You will need to use uiform’s Schema object to leverage the directives in the schema.


Testing and Evaluation

Here are some best practices to test and evaluate your schema:

  1. Simulate Real-world Inputs: Test with documents containing varying levels of complexity and ambiguity.
  2. Evaluate Outputs Systematically: Use criteria such as field accuracy, reasoning clarity, and JSON validity.
  3. Iterate and Optimize: Continuously refine prompts and schema annotations based on evaluation results.
  4. Optimize prompt-engineering before fine-tuning your model.

Conclusion

By leveraging X-SystemPrompt, X-FieldPrompt, and X-ReasoningPrompt, you can significantly enhance structured data extraction tasks. This guide provides a foundational framework to integrate these strategies into your workflows effectively.


Go further