AI Schema Generation - Entity Enricher

AI Schema Generation

Paste any JSON data sample and let AI generate a fully typed enrichment schema -- complete with expertise domains, search keys, multilingual field detection, and validation rules. The generation process includes up to 6 self-correction attempts, ensuring the schema is valid before you ever see it.

Schema Generation Pipeline

STEP 1

Paste Sample JSON

Any JSON object or array representing your entity data

STEP 2

AI Generates Schema

LLM analyzes data types, nesting, naming patterns, and domain expertise

STEP 3

8-Rule Validation

1.Type correctness
2.$ref target validity
3.Expertise assignment
4.Expertise count limits
5.Search key validity
6.Property naming
7.Nested structure depth
8.Required field constraints

If validation fails, errors are sent back to the LLM for self-correction (up to 6 retries)

STEP 4

Post-Processing

Nullable detection, search key demotion, expertise collection

OUTPUT

Validated Enrichment Schema

Ready for enrichment with typed properties, expertise domains, and search keys

Self-Correction via ModelRetry

LLMs occasionally generate schemas with structural issues -- a type mismatch between the schema and input data, a $ref pointing to a non-existent definition, or too many expertise domains. Entity Enricher uses Pydantic-AI's ModelRetry mechanism to catch these issues and feed them back to the LLM for correction within the same generation run.

This happens transparently. The system validates the LLM output against 8 rules, and if any rule fails, the specific errors are sent back to the model with instructions to fix them. This retry loop runs up to 6 times, achieving near-100% valid schema output without manual intervention.

Validation Rules Applied

Type Correctness

Schema property types must match the observed data types from the input JSON.

$ref Integrity

All $ref pointers must reference entities defined in the $defs section.

Expertise Assignment

Every property must belong to a valid expertise domain.

Expertise Count

Total expertise domains must stay within configurable limits.

Search Key Validity

Search keys must reference existing properties with non-empty values.

Property Naming

Property names must follow snake_case convention.

Structure Depth

Nesting depth must stay within limits (default 10 levels).

Field Constraints

Required fields, min/max values, and enum constraints are validated.

Intelligent Post-Processing

After the LLM generates and self-corrects the schema, additional data-driven transformations are applied:

Nullable Detection

If the input data has null values for a field, the schema property is automatically marked as nullable. This allows LLMs to return null for fields where data is unavailable, instead of forcing hallucinated values.

Search Key Demotion

Fields marked as search keys but with empty values in the input data have their search key flag removed. This prevents empty search keys from diluting the enrichment prompt focus.

Expertise Collection

All expertise domains are collected from nested properties into a top-level list, making it easy to see the domain coverage of your schema at a glance.

Edit Schemas with Natural Language

After generating a schema, you can modify it using natural language instructions. Type something like "add a parent_company reference with name and ownership_percentage" and AI applies the structural change, maintaining all validation rules and expertise assignments.

Each AI edit also produces 5 improvement suggestions -- things like adding missing fields, improving descriptions, or reorganizing expertise domains. You can apply these suggestions with a single click.

For direct control, the visual schema editor provides drag-and-drop property ordering, inline field editing, keyboard navigation, and full undo/redo support. See the schema editor documentation for details.

From Schema to Type-Safe Output

Entity Enricher does not just generate a JSON schema document -- it converts your schema into a dynamic Pydantic model at runtime. This model is then used as the structured output type for Pydantic-AI agents, which means the LLM output is validated against your schema at the type level. Invalid outputs trigger automatic retries.

This approach combines the flexibility of user-defined schemas with the type safety of compiled models. You get the best of both worlds: define any shape you want, and the system enforces it automatically.

Generate Your First Schema

Paste a JSON sample, pick a model, and get a validated enrichment schema in seconds. Then refine it with natural language or the visual editor.

Get Started Free