Validation Rules - Entity Enricher Documentation

Validation Rules

Eight validation rules ensure schema quality. When any rule fails during AI schema generation, the error is sent back to the AI for automatic self-correction.

How Self-Correction Works

After the AI generates a schema, it passes through all 8 validation rules. If any rule fails, the specific error messages are compiled and sent back to the AI as feedback. The AI then produces a corrected schema, which is validated again. This loop continues for up to 6 total attempts (1 initial + 5 retries).

Correction Flow

AI generatesSchema with properties, types, expertise, and descriptions
Validator checks8 rules applied sequentially, all errors collected
If errorsError messages sent back: “Fix these issues: [revenue: type mismatch...]”
AI correctsProduces updated schema addressing the reported errors
RepeatUntil all rules pass or maximum attempts reached

In practice, most schemas pass on the first attempt. The self-correction loop is a safety net that handles edge cases where the AI makes type errors or forgets a field.

Generation vs. Editing Rules

Not all rules apply to both schema generation and AI editing. Rules that compare against input data are skipped during editing because you may intentionally add or remove fields:

ScopeRules AppliedWhy
GenerationAll 8 rulesInput data is available for comparison
AI EditingRules 2, 3, 4, 5 onlyNo input data; user may intentionally modify structure

The 8 Rules

Rule 1

Expertise Domain Count

Scope: Generation only

The number of expertise domains must not exceed the calculated maximum based on your property count. This prevents the AI from creating too many fine-grained domains for small schemas.

Example error: Too many expertise domains: 6 defined, maximum is 3

The maximum is calculated as floor(property_count / 6), with a minimum of 1. A schema with 12 properties allows up to 2 domains.

Rule 2

At Least One Property

Scope: Both

Every schema must define at least one property. An empty schema cannot be used for enrichment.

Example error: Schema must have at least one property

This catches cases where the AI produces a valid JSON structure but forgets to include any actual fields.

Rule 3

Valid JSON Schema Types

Scope: Both

Every property type must be one of the standard JSON Schema types: string, number, integer, boolean, array, object, or null.

Example error: revenue: invalid type 'float'

The AI sometimes invents types like "float", "decimal", or "date". This rule catches those and asks for a correction to a valid type.

Rule 4

$ref Targets Exist

Scope: Both

All $ref references must point to entities defined in the schema's $defs section. Dangling references break the enrichment pipeline.

Example error: manufacturer: $ref '#/$defs/Company' references undefined definition

When the AI creates a reference like #/$defs/Company, there must be a matching Company definition in the $defs block.

Rule 5

Expertise Key Exists

Scope: Both

Every property's expertise value must match one of the defined expertise domains. This prevents typos and inconsistencies.

Example error: revenue: expertise 'finance' not in defined domains: ['financial_analyst']

The AI might use "finance" instead of the defined "financial_analyst" key. This rule catches the mismatch so the AI can correct it.

Rule 6

Expertise Required

Scope: Generation only

Non-object, non-preserved properties must have an expertise assignment. This ensures every enrichable field is handled by a specialist domain.

Example error: revenue: expertise is required for non-object types

Object types are exempt because their child properties carry their own expertise. Preserved fields are exempt because they pass through unchanged.

Rule 7

Type Matches Input Data

Scope: Generation only

The schema type for each property must match the actual Python type of the corresponding value in your input data.

Example error: revenue: type mismatch - input is number but schema says 'string'

If your input has "revenue": 42.5, the schema must use type "number" or "integer", not "string". The validator is flexible: it accepts "number" for integers and vice versa.

Rule 8

All Input Properties Present

Scope: Generation only

Every key in your input data must appear as a property in the generated schema. This prevents the AI from silently dropping fields.

Example error: Missing property from input: 'headquarters'

If your input JSON has a "headquarters" key, the generated schema must include it. This ensures complete coverage of your data.

Type Inference

Rule 7 (type matching) uses automatic type inference to compare your input values against the schema's declared types. The inference is flexible to avoid false positives:

Input ValueInferred TypeAlso Accepts
true / falseboolean(boolean only)
42integernumber
3.14numberinteger
"hello"string(string only)
[1, 2, 3]array(array only)
{"key": "val"}object(object only)

Note: booleans are checked before integers because in some languages, boolean is a subtype of integer. This ordering prevents true from being inferred as an integer.

Next Steps