Sync & Async Client

UiForm offers both synchronous and asynchronous client interfaces, making it versatile for different application needs. The asynchronous client (AsyncUiForm) is ideal for high-performance, non-blocking applications where multiple tasks run concurrently. For simpler or blocking operations, the synchronous client (UiForm) provides a straightforward approach.

Here’s how you can use both:

# Async client
from uiform import AsyncUiForm

async def fetch_models():
    uiclient = AsyncUiForm()
    models = await uiclient.models.list()
    print(models)

# Sync client
from uiform import UiForm

client = UiForm()
models = client.models.list()
print(models)

Both clients provide the same core functionality, enabling you to list models, create messages, extract data from documents, and more, with the flexibility to match your application’s concurrency model.

Pagination

Many top-level resources have support for bulk fetches via list API methods. For instance, you can list extraction links, list email addresses, and list logs. These list API methods share a common structure, taking at least these four parameters: limit, order, after, and before.

UiForm utilizes pagination via the after and before parameters. Both parameters take an existing object ID value and return objects in either descending or ascending order by creation time.

Idempotency

The UiForm API supports idempotency which guarantees that performing the same operation multiple times will have the same result as if the operation were performed only once. This is handy in situations where you may need to retry a request due to a failure or prevent accidental duplicate requests from creating more than one resource.

To achieve idempotency, you can add Idempotency-Key request header to any UiForm API request with a unique string as the value. Each subsequent request matching this unique string will return the same response. We suggest using v4 UUIDs for idempotency keys to avoid collisions.

Idempotency key example
curl --request POST \
  --url https://api.uiform.com/v1/emails/tests/webhook \
  -H "Authorization: Bearer sk_test_a2V5XzAxSkgwVjhSN1ZaRTlYUzJYQzhOOTVRVDMzLEJSa3BzTEFuUTRVUWF5dEV5ZHpnRVZpVkI" \
  -H "Idempotency-Key: cd320c5c-e928-4212-a5bd-986c29362867" \

Idempotency keys expire after 24 hours. The UiForm API will generate a new response if you submit a request with an expired key.

Rate Limits

UiForm implements rate limiting to ensure stable service for all users. The API uses a rolling window rate limit with the following configuration:

  • 300 requests per 60-second window
  • Applies across the following API endpoints:
    • POST /v1/documents/extractions
    • POST /v1/documents/create_messages

When you exceed the rate limit, the API will return a 429 Too Many Requests response. The response headers will include:

Status 429 - {'detail': 'Rate limit exceeded. Please try again later.'}

For high-volume applications, we can provide a dedicated plan. Contact us for more information.

Modality

LLM works with text and image data. UiForm converts documents into different modalities, based on the document type.

Native modalities

Here are the list of native modalities supported by UiForm:

TEXT_TYPES = Literal[".txt", ".csv", ".tsv", ".md", ".log", ".html", ".htm", ".xml", ".json", ".yaml", ".yml", ".rtf", ".ini", ".conf", ".cfg", ".nfo", ".srt", ".sql", ".sh", ".bat", ".ps1", ".js", ".jsx", ".ts", ".tsx", ".py", ".java", ".c", ".cpp", ".cs", ".rb", ".php", ".swift", ".kt", ".go", ".rs", ".pl", ".r", ".m", ".scala"]

You can also use the modality parameter to specify the modality of the document and override the default modality.

import json
from uiform.client import UiForm

with open("booking_confirmation_json_schema.json", "r") as f:
    json_schema = json.load(f)

client = UiForm()

response = client.documents.extract(
    json_schema = json_schema,
    document="booking_confirmation.jpg",
    model="gpt-4o-mini-2024-07-18",
    temperature=0,
    modality='text' # The image will be converted to text (with an OCR model) before being sent to the LLM
)

Image Settings

When processing images, several factors can affect the LLM’s ability to accurately interpret and extract information. The image_settings parameter allows you to tune images settings to improve extraction quality.

API Reference

image_settings
ImageSettings Object

Image preprocessing operations to optimize document analysis.

Matching JSON Objects with an internal database

Structured generation will output JSON objects. To match these JSON objects with objects from an internal database, we recommend you to use the Levenshtein distance. The steps are the following:

1

Normalize the values

Normalize the values of the JSON object by:

  • Flattening nested values
  • Removing all spacing
  • Removing accents This makes it match the format in the internal database.
2

Compare using Levenshtein distance

Compare the normalized values using the Levenshtein distance algorithm to find matches.

Here is a python example:

from typing import Any, List, Dict, Hashable
import re
import unicodedata
from Levenshtein import distance as levenshtein_distance
import pandas as pd
from rich.table import Table
from rich.console import Console

def normalize_value(val: Any) -> str:
    """Convert a value to uppercase and remove all spacing and accents for comparison."""
    if val is None:
        return ""
    # Convert to string, remove spacing and uppercase
    prep = re.sub(r'\s+', '', str(val).upper())
    # Remove accents (e.g. é -> E)
    return unicodedata.normalize('NFKD', prep).encode('ASCII', 'ignore').decode()

def levenshtein_similarity(val1: Any, val2: Any) -> float:
    """
    Calculate similarity between two values using the Levenshtein distance.
    Returns a similarity score between 0.0 and 1.0.
    """
    # If both values are "empty" or equal
    if (val1 or "") == (val2 or ""):
        return 1.0
    
    # For numeric comparisons, use a tolerance of 5%
    if isinstance(val1, (int, float)) and isinstance(val2, (int, float)):
        return 1.0 if abs(val1 - val2) <= 0.05 * max(abs(val1), abs(val2)) else 0.0

    # For non-numeric values, compare the normalized strings
    str1 = normalize_value(val1)
    str2 = normalize_value(val2)
    
    if str1 == str2:
        return 1.0
        
    if str1 and str2:
        max_len = max(len(str1), len(str2))
        if max_len == 0:
            return 1.0
        dist = levenshtein_distance(str1, str2)
        return 1 - (dist / max_len)
        
    return 0.0

from typing import TypedDict

class MatchResult(TypedDict):
    record: Any
    similarity: float


def find_top_k_neighbors(
    query: Dict[str, Any],
    database: List[Dict[Hashable, Any]], 
    k: int = 5
) -> List[MatchResult]:
    """
    Find the top k closest records in `database` to the `query` based on
    average Levenshtein similarity across all fields present in the query.

    Args:
        query: Dictionary containing the search criteria
        database: List of dictionaries containing the records to search
        k: Number of results to return

    Returns:
        List of dictionaries containing the k most similar records and their scores
    """
    compare_fields = list(query.keys())
    results: List[MatchResult] = []

    for record in database:
        score_sum = 0.0
        count = 0
        
        for field in compare_fields:
            val1 = query.get(field, "")
            val2 = record.get(field, "")
            sim = levenshtein_similarity(val1, val2)
            score_sum += sim
            count += 1

        if count > 0:
            avg_sim = score_sum / count
            results.append({
                "record": record,
                "similarity": avg_sim
            })

    # Sort by descending similarity and return the top k results
    results.sort(key=lambda x: x["similarity"], reverse=True)
    return results[:k]

def print_results_table(results: List[MatchResult]) -> None:
    """Print results in a formatted table"""
    console = Console()
    table = Table(title=f"Top {len(results)} Matches")

    # Get all unique fields from the records
    fields = set()
    for res in results:
        fields.update(res["record"].keys())
    fields = set(sorted(fields))

    # Add columns
    table.add_column("Similarity", justify="right", style="cyan")
    for field in fields:
        table.add_column(field, style="magenta")

    # Add rows
    for res in results:
        row = [f"{res['similarity']:.3f}"]
        for field in fields:
            row.append(str(res["record"].get(field, "")))
        table.add_row(*row)

    console.print(table)


def main()-> None:
    # Example usage:
    # Read CSV file
    df = pd.read_csv("data.csv")  # Replace with your CSV file path
    database = df.to_dict('records')

    # Example query using fields from your CSV
    query = {
        'Code Client': 'UIFOR001',
        'Nom': 'UiForm',
        'Adresse 1': '5 Parvis Alan Turing',
        'Adresse 2': '',
        'CP': 75013,
        'Ville': 'PARIS',
        'Pays': 'FR',
        'Num TVA': 'FR00348236132',
        'SIRET': '32323219080032'
    }

    # Find matches
    results = find_top_k_neighbors(query, database, k=5)

    # Display results
    print_results_table(results)