Generate structured JSON schemas from sample data using AI, with automatic self-correction and intelligent post-processing.
Schema generation turns raw entity data into a typed, annotated JSON schema that defines exactly what information to extract during enrichment. Instead of manually writing schemas, you paste sample JSON and let AI analyze the structure, infer types, assign expertise domains, and suggest improvements.
{"en": "...", "fr": "..."}) are collapsed to a single value, and the property count determines how many expertise domains are allowed.The self-correction loop is what makes schema generation reliable. After the AI produces a schema, it passes through a validator that checks 8 rules covering type correctness, expertise assignment, reference integrity, and data completeness. If any rule fails, the specific error message is sent back to the AI so it can fix the issue in its next attempt.
revenue: type mismatch — input is number but schema says 'string'number. All 8 rules pass. Schema is accepted.This approach is far more reliable than asking the AI to “be careful about types” in the prompt. The validator catches concrete errors and gives the AI precise feedback to fix them. Learn more about each rule in the Validation Rules guide.
A generated schema is more than a simple type definition. Each property includes metadata that guides the enrichment process:
JSON Schema type (string, number, integer, boolean, array, object)
Contextual description that tells the AI what information to find
Which expert domain (financial, regulatory, etc.) provides this value
Whether this field identifies the entity (search) or deduplicates arrays (merge)
Whether the field can be null, preventing unnecessary retries for optional data
Whether the field should be enriched across multiple languages
Whether to keep the original value unchanged during enrichment
Realistic example values that guide the AI toward the right format
The AI groups schema properties into expertise domains based on their semantic meaning. For example, a pharmaceutical company schema might have domains like “Financial Analyst,” “Regulatory Expert,” and “Corporate Information.” These domains are used by the multi-expertise strategy to run parallel, specialized LLM calls for deeper results.
The number of expertise domains is automatically limited based on your data's property count to prevent over-fragmentation:
After the AI generates a valid schema, three deterministic post-processing steps refine it based on your actual input data:
Fields with null values in your input are automatically marked as nullable, so the AI won't waste retries trying to fill them.
Search key flags are removed from fields with empty values (null, empty string, zero) since they can't help identify the entity.
All unique expertise domains are gathered from the schema for metrics and strategy configuration.
After generation, you can modify schemas using natural language instructions. Type a command and the AI applies the change while preserving your existing schema structure. Each edit also produces 5 suggestions for further improvements.
Add an employee_count integer fieldCreate a nested address object with city and countryAdd French descriptions to all text fieldsDefine a parent company reference using $defsMark the website field as nullableAI edits are validated using a subset of the generation rules (type checking, reference integrity, expertise consistency) without comparing against input data, since you may intentionally add or remove fields.
Both schema generation and AI editing produce 5 targeted suggestions covering different improvement categories:
Suggestions appear as clickable chips in the Schema Editor — click one to auto-fill the AI edit input and apply it.