Paste any JSON data sample and let AI generate a fully typed enrichment schema -- complete with expertise domains, search keys, multilingual field detection, and validation rules. The generation process includes up to 6 self-correction attempts, ensuring the schema is valid before you ever see it.
Paste Sample JSON
Any JSON object or array representing your entity data
AI Generates Schema
LLM analyzes data types, nesting, naming patterns, and domain expertise
8-Rule Validation
If validation fails, errors are sent back to the LLM for self-correction (up to 6 retries)
Post-Processing
Nullable detection, search key demotion, expertise collection
Validated Enrichment Schema
Ready for enrichment with typed properties, expertise domains, and search keys
LLMs occasionally generate schemas with structural issues -- a type mismatch between the schema and input data, a $ref pointing to a non-existent definition, or too many expertise domains. Entity Enricher uses Pydantic-AI's ModelRetry mechanism to catch these issues and feed them back to the LLM for correction within the same generation run.
This happens transparently. The system validates the LLM output against 8 rules, and if any rule fails, the specific errors are sent back to the model with instructions to fix them. This retry loop runs up to 6 times, achieving near-100% valid schema output without manual intervention.
Schema property types must match the observed data types from the input JSON.
All $ref pointers must reference entities defined in the $defs section.
Every property must belong to a valid expertise domain.
Total expertise domains must stay within configurable limits.
Search keys must reference existing properties with non-empty values.
Property names must follow snake_case convention.
Nesting depth must stay within limits (default 10 levels).
Required fields, min/max values, and enum constraints are validated.
After the LLM generates and self-corrects the schema, additional data-driven transformations are applied:
If the input data has null values for a field, the schema property is automatically marked as nullable. This allows LLMs to return null for fields where data is unavailable, instead of forcing hallucinated values.
Fields marked as search keys but with empty values in the input data have their search key flag removed. This prevents empty search keys from diluting the enrichment prompt focus.
All expertise domains are collected from nested properties into a top-level list, making it easy to see the domain coverage of your schema at a glance.
After generating a schema, you can modify it using natural language instructions. Type something like "add a parent_company reference with name and ownership_percentage" and AI applies the structural change, maintaining all validation rules and expertise assignments.
Each AI edit also produces 5 improvement suggestions -- things like adding missing fields, improving descriptions, or reorganizing expertise domains. You can apply these suggestions with a single click.
For direct control, the visual schema editor provides drag-and-drop property ordering, inline field editing, keyboard navigation, and full undo/redo support. See the schema editor documentation for details.
Entity Enricher does not just generate a JSON schema document -- it converts your schema into a dynamic Pydantic model at runtime. This model is then used as the structured output type for Pydantic-AI agents, which means the LLM output is validated against your schema at the type level. Invalid outputs trigger automatic retries.
This approach combines the flexibility of user-defined schemas with the type safety of compiled models. You get the best of both worlds: define any shape you want, and the system enforces it automatically.
Paste a JSON sample, pick a model, and get a validated enrichment schema in seconds. Then refine it with natural language or the visual editor.
Get Started Free