Eight validation rules ensure schema quality. When any rule fails during AI schema generation, the error is sent back to the AI for automatic self-correction.
After the AI generates a schema, it passes through all 8 validation rules. If any rule fails, the specific error messages are compiled and sent back to the AI as feedback. The AI then produces a corrected schema, which is validated again. This loop continues for up to 6 total attempts (1 initial + 5 retries).
In practice, most schemas pass on the first attempt. The self-correction loop is a safety net that handles edge cases where the AI makes type errors or forgets a field.
Not all rules apply to both schema generation and AI editing. Rules that compare against input data are skipped during editing because you may intentionally add or remove fields:
| Scope | Rules Applied | Why |
|---|---|---|
| Generation | All 8 rules | Input data is available for comparison |
| AI Editing | Rules 2, 3, 4, 5 only | No input data; user may intentionally modify structure |
The number of expertise domains must not exceed the calculated maximum based on your property count. This prevents the AI from creating too many fine-grained domains for small schemas.
Too many expertise domains: 6 defined, maximum is 3The maximum is calculated as floor(property_count / 6), with a minimum of 1. A schema with 12 properties allows up to 2 domains.
Every schema must define at least one property. An empty schema cannot be used for enrichment.
Schema must have at least one propertyThis catches cases where the AI produces a valid JSON structure but forgets to include any actual fields.
Every property type must be one of the standard JSON Schema types: string, number, integer, boolean, array, object, or null.
revenue: invalid type 'float'The AI sometimes invents types like "float", "decimal", or "date". This rule catches those and asks for a correction to a valid type.
All $ref references must point to entities defined in the schema's $defs section. Dangling references break the enrichment pipeline.
manufacturer: $ref '#/$defs/Company' references undefined definitionWhen the AI creates a reference like #/$defs/Company, there must be a matching Company definition in the $defs block.
Every property's expertise value must match one of the defined expertise domains. This prevents typos and inconsistencies.
revenue: expertise 'finance' not in defined domains: ['financial_analyst']The AI might use "finance" instead of the defined "financial_analyst" key. This rule catches the mismatch so the AI can correct it.
Non-object, non-preserved properties must have an expertise assignment. This ensures every enrichable field is handled by a specialist domain.
revenue: expertise is required for non-object typesObject types are exempt because their child properties carry their own expertise. Preserved fields are exempt because they pass through unchanged.
The schema type for each property must match the actual Python type of the corresponding value in your input data.
revenue: type mismatch - input is number but schema says 'string'If your input has "revenue": 42.5, the schema must use type "number" or "integer", not "string". The validator is flexible: it accepts "number" for integers and vice versa.
Every key in your input data must appear as a property in the generated schema. This prevents the AI from silently dropping fields.
Missing property from input: 'headquarters'If your input JSON has a "headquarters" key, the generated schema must include it. This ensures complete coverage of your data.
Rule 7 (type matching) uses automatic type inference to compare your input values against the schema's declared types. The inference is flexible to avoid false positives:
| Input Value | Inferred Type | Also Accepts |
|---|---|---|
| true / false | boolean | (boolean only) |
| 42 | integer | number |
| 3.14 | number | integer |
| "hello" | string | (string only) |
| [1, 2, 3] | array | (array only) |
| {"key": "val"} | object | (object only) |
Note: booleans are checked before integers because in some languages, boolean is a subtype of integer. This ordering prevents true from being inferred as an integer.