Enrichment Flow - Entity Enricher Documentation

Enrichment Flow

A step-by-step walkthrough of how Entity Enricher processes a single entity — from input through classification, parallel model execution, to structured output.

The Pipeline at a Glance

Input
Entity JSON
+ Schema
Classification
Optional
type check
Parallel Models
Claude
financial
regulatory
general
GPT-4
financial
regulatory
general
Validate
Type check
Self-correct
Output
Structured
JSON per model

Step 1: Configure the Enrichment

Open the Schema Editor page and set up your enrichment. You will interact with four key areas:

Input Panel

Paste sample JSON in the “New Schema” tab to generate a schema, or switch to the “Enrich” tab to fill in entity search keys (name, website, country, etc.).

Schema Panel

Interactive property tree showing your schema structure. You can edit properties, add expertise domains, and mark fields as search keys or preserved.

Sidebar Options

Select your enrichment strategy (single-pass or multi-expertise), pick one or more LLM models, choose output languages, and optionally enable pre-flight classification.

Results Panel

Shows real-time progress and results for each model. When using multiple models, a “Merge Results” button appears for fusion.

Step 2: Pre-flight Classification (Optional)

If you selected a classification model, a fast, inexpensive LLM call runs first to verify the entity matches the schema type. This prevents wasting tokens on enrichment when the entity does not match. Read more in the Classification documentation.

Non-blocking: If classification fails for any reason, enrichment proceeds normally. Classification is purely advisory — it adds context to the enrichment prompts but never blocks the pipeline.

Step 3: Strategy Execution

Each selected model processes the entity using your chosen strategy. When multiple models are selected, they run in parallel across providers (Claude and GPT-4 run simultaneously) while models from the same provider run sequentially to respect rate limits.

Multi-Expertise Example (3 domains)
1
Split schema by expertise
Properties are grouped by their expertise domain: financial fields, regulatory fields, general fields.
2
Run parallel LLM calls
Each expertise gets its own focused prompt with only the relevant schema properties. All run simultaneously.
3
Merge results progressively
As each expertise completes, its output is merged into the accumulated result. You see partial results in real-time.
4
Apply preserve logic
Original values for fields marked as 'preserve' are restored, ensuring your input data stays intact.

Step 4: Validation and Self-Correction

Each LLM response is validated against your schema in real-time. When the output does not match the expected types or constraints, the system automatically sends errors back to the LLM for correction.

What gets corrected automatically:
String instead of number
"42.2" becomes 42.2
Indexed objects as arrays
{"0": "a", "1": "b"} becomes ["a", "b"]
String nulls
"null" or "None" becomes actual null
Missing required fields
Error sent back, LLM fills them in

Up to 5 automatic retry attempts per LLM call. Each retry includes the specific validation error so the LLM knows exactly what to fix.

Step 5: Real-Time Streaming

Entity Enricher uses Server-Sent Events (SSE) to stream progress in real-time. You do not have to wait for all models to complete — results appear progressively as each expertise domain or model finishes.

Event Timeline (example with 2 models, 3 expertises)
0.0sstartedJob begins, 2 models queued
0.1sclassification_startedPre-flight check begins
0.8sclassification_completedEntity confirmed as "match" (95%)
0.9smodel_startedClaude and GPT-4 start in parallel
1.2sexpertise_completedClaude: financial done, partial result streamed
1.5sexpertise_completedClaude: general done, result updated
1.8sexpertise_completedClaude: regulatory done, full result ready
1.9smodel_completedClaude finished with full structured output
2.5smodel_completedGPT-4 finished with full structured output
2.5scompletedAll models done, stream closes

Step 6: Review Results

Each model gets its own result panel showing the structured JSON output, per-expertise progress badges, token usage, cost, and processing time. When using the multi-expertise strategy, expertise badges update in real-time as each domain completes.

What you see per model:
  • Status badge — Waiting, Running, Success, Failed, or Partial
  • Expertise badges — Colored pills showing per-domain progress (blue = running, green = done, red = failed)
  • Progressive JSON — Output updates after each expertise completes
  • Metrics — Processing time, token count, cost in USD
  • Progress log — Timestamped entries for every event

Handling Partial Success

When using the multi-expertise strategy, some expertises may fail while others succeed. Rather than discarding everything, Entity Enricher returns the merged output from successful expertises with a “Partial” status. You can then retry only the failed expertises without re-running the entire enrichment.

Example: If 2 out of 3 expertises succeed, you get structured output covering the successful domains. The failed expertise can be retried, and its results will be merged into the existing output.

What Happens Next?

After enrichment completes, your results are saved to the Records page for future reference. If you used multiple models, you can merge the results using Multi-Model Fusion.