Pre-flight classification verifies that an entity matches the expected schema type before enrichment begins. This optional step prevents hallucination and wasted tokens when entities do not match your schema.
LLMs are eager to help. When asked to enrich an entity against a schema, they will produce structured output even if the entity does not match the schema type at all. This leads to hallucinated data that looks plausible but is entirely wrong.
Schema: “Planet” — Entity: “Titan”
The LLM treats Titan as a planet and invents data: orbital period, atmosphere composition, number of moons — all plausible-looking but wrong. Titan is actually a moon of Saturn.
Classification detects: “mismatch — Titan is a moon, not a planet”
The enrichment models receive this context, set irrelevant fields to null, and only fill in properties that genuinely apply to the entity.
Classification runs as a single, fast LLM call before any enrichment models begin. It uses a cheap, quick model (such as Claude Haiku or GPT-4o Mini) to minimize cost.
The entity matches the schema type. Enrichment proceeds with high confidence.
The entity is a different type than the schema expects. The classification explains what the entity actually is.
The entity cannot be identified with certainty. The LLM does not have enough information to classify it.
Multiple valid interpretations exist. The classification lists the alternatives.
Classification is purely advisory. If the classification call fails for any reason (model error, timeout, rate limit), enrichment proceeds normally without classification context. This ensures that the optional classification step never prevents enrichment from completing.
Classification is designed to run on fast, inexpensive models. It sends a minimal payload (schema name, description, and truncated entity data) and expects a small structured response. The typical cost is a fraction of the enrichment itself — well worth the accuracy improvement.
The UI shows classification progress in real-time via Server-Sent Events. A classification_started event fires when the check begins, followed by classification_completed with the status, confidence, and entity description. The result appears as a banner above the model results.
If you cancel the enrichment during the classification phase, the job stops immediately without starting any enrichment models. No unnecessary tokens are spent.
In the Schema Editor or Batch Enrichment sidebar, look for the “Classification” dropdown. Select a fast, inexpensive model (Claude Haiku, GPT-4o Mini, or similar). The classification will run automatically before enrichment begins for each entity.
When using the REST API, include the classification_model field in your enrichment request with the model's composite key (e.g., anthropic::claude-haiku-4-5).