Prompting with the JSON schema
Introduction
We use the standard JSON Schema format to organize all the directives needed to prompt an LLM for data extraction. The schema includes three custom annotations that provide additional context and guidance for large language models (LLMs):
-
X-SystemPrompt
A top-level directive that provides general instructions or context for the LLM, ensuring consistent behavior and improving the relevance of its responses during the data extraction process. -
X-FieldPrompt
An optional annotation that replaces or complements thedescription
field for a property. It offers the LLM a more detailed understanding of the field, improving its ability to parse or interpret the data accurately. -
X-ReasoningPrompt
This annotation creates an auxiliary field for generating reasoning or explanatory context about a property. It allows the LLM to provide additional insights or justifications for extracted values, which can be helpful in complex or ambiguous scenarios.
These annotations help ensure structured and precise interactions with LLMs while remaining compatible with standard JSON Schema conventions.
Our prompt optimization tool (soon to be released) can analyze and refine the directives provided in the schema, such as the X-SystemPrompt and X-FieldPrompt, to improve overall performance and response quality. Importantly, this optimization process leaves the extraction parameters, such as the schema’s structure and field definitions, unchanged to ensure consistency in data processing.
TLDR
Quick tips on prompt-engineering for structured generation:
- Images is the best modality for a lot of tasks
- try
modality="native+text"
for challenging documents - Be conservative with reasoning fields
- Check if the chain of thought inside the reasoning fields makes sense
- The system prompt matters a lot
- GPT-4o outperforms Claude-Sonnet for vision tasks (as of January 2025)
- Ensure images are upright and not rotated
- Focus on prompt engineering optimization before considering fine-tuning
X-Directives
Here is a full example of a schema with all the X-Directives:
X-SystemPrompt
A top-level directive that provides general instructions or context for the LLM, ensuring consistent behavior and improving the relevance of its responses during the data extraction process.
X-FieldPrompt
An optional annotation that replaces or complements the description
field for a property. It offers the LLM a more detailed understanding of the field, improving its ability to parse or interpret the data accurately.
X-ReasoningPrompt
Generates a reasoning field alongside the data field.
This schema should validate objects like this:
However, the LLM will internally produce additional reasoning fields for better extraction, such as:
As you can see, apart from the “reasoning___” fields, the LLM output follows the same structure as your supplied schema.
Python’s Pydantic BaseModel Support
You can define the custom annotations in the pydantic.Field
class using the json_schema_extra
field.
Here is a minimalistic example with everything you should need:
If you need a json_schema, you can convert the BaseModel to model_json_schema:
Extract informations from a document
You will need to use uiform’s Schema
object to leverage the directives in the schema.
Testing and Evaluation
Here are some best practices to test and evaluate your schema:
- Simulate Real-world Inputs: Test with documents containing varying levels of complexity and ambiguity.
- Evaluate Outputs Systematically: Use criteria such as field accuracy, reasoning clarity, and JSON validity.
- Iterate and Optimize: Continuously refine prompts and schema annotations based on evaluation results.
- Optimize prompt-engineering before fine-tuning your model.
Conclusion
By leveraging X-SystemPrompt
, X-FieldPrompt
, and X-ReasoningPrompt
, you can significantly enhance structured data extraction tasks. This guide provides a foundational framework to integrate these strategies into your workflows effectively.