Enrichment Flow - Entity Enricher Documentation

Enrichment Flow

A step-by-step walkthrough of how Entity Enricher processes a single entity — from input through classification, parallel model execution, to structured output.

The Pipeline at a Glance

Input

Entity JSON
+ Schema

Classification

Optional
type check

Parallel Models

Claude

financial
regulatory
general

GPT-4

financial
regulatory
general

Validate

Type check
Self-correct

Output

Structured
JSON per model

Step 1: Configure the Enrichment

Open the Schema Editor page and set up your enrichment. A workflow stepper guides you through the pipeline stages: Sample Data, Schema, Enrichment, and Results.

Schema Panel (left)

Paste sample JSON to auto-generate a schema, then explore the interactive property tree. Edit properties, add expertise domains, and mark fields as search keys or preserved.

Enrichment Panel (right)

Configure enrichment options (strategy, models, languages, classification, plus the response schema and strict structured output toggles) and fill in entity search keys (name, website, country, etc.) to identify the entity.

Results Panel

Shows real-time progress and results for each model. When using multiple models, a “Merge Results” button appears for fusion.

Step 2: Pre-flight Classification (Optional)

If you selected a classification model, a fast, inexpensive LLM call runs first to verify the entity matches the schema type. This prevents wasting tokens on enrichment when the entity does not match. Read more in the Classification documentation.

Non-blocking: If classification fails for any reason, enrichment proceeds normally. Classification is purely advisory — it adds context to the enrichment prompts but never blocks the pipeline.

Step 3: Strategy Execution

Each selected model processes the entity using your chosen strategy. When multiple models are selected, they run in parallel across providers (Claude and GPT-4 run simultaneously) while models from the same provider run sequentially to respect rate limits.

Multi-Expertise Example (3 domains)

Split schema by expertise

Properties are grouped by their expertise domain: financial fields, regulatory fields, general fields.

Run parallel LLM calls

Each expertise gets its own focused prompt with only the relevant schema properties. All run simultaneously.

Merge results progressively

As each expertise completes, its output is merged into the accumulated result. You see partial results in real-time.

Apply preserve logic

Original values for fields marked as 'preserve' are restored, ensuring your input data stays intact.

Step 4: Validation and Self-Correction

Each LLM response is validated against your schema in real-time. When the output does not match the expected types or constraints, the system automatically sends errors back to the LLM for correction.

What gets corrected automatically:

String instead of number

"42.2" becomes 42.2

Indexed objects as arrays

{"0": "a", "1": "b"} becomes ["a", "b"]

String nulls

"null" or "None" becomes actual null

Missing required fields

Error sent back, LLM fills them in

Up to 5 automatic retry attempts per LLM call. Each retry includes the specific validation error so the LLM knows exactly what to fix.

Enforcing output at the source

Two optional toggles ask the provider to constrain the output before it comes back, so fewer responses need correcting in the first place. Both apply only to models that support them; everything still falls back to the validation-and-retry loop above.

Response schema

Sends your schema on the provider’s native response-schema channel so the JSON is enforced server-side. Off by default — capable models otherwise use the tool-call channel.

Strict structured output

Constrains decoding to the schema (no drift) on whichever structured channel is used. On by default; quietly ignored by models that can’t enforce it.

Step 5: Real-Time Streaming

Entity Enricher uses Server-Sent Events (SSE) to stream progress in real-time. You do not have to wait for all models to complete — results appear progressively as each expertise domain or model finishes.

Event Timeline (example with 2 models, 3 expertises)

0.0sstartedJob begins, 2 models queued

0.1sclassification_startedPre-flight check begins

0.8sclassification_completedEntity confirmed as "match" (95%)

0.9smodel_startedClaude and GPT-4 start in parallel

1.2sexpertise_completedClaude: financial done, partial result streamed

1.5sexpertise_completedClaude: general done, result updated

1.8sexpertise_completedClaude: regulatory done, full result ready

1.9smodel_completedClaude finished with full structured output

2.5smodel_completedGPT-4 finished with full structured output

2.5scompletedAll models done, stream closes

Step 6: Review Results

Each model gets its own result panel showing the structured JSON output, per-expertise progress badges, token usage, cost, and processing time. When using the multi-expertise strategy, expertise badges update in real-time as each domain completes.

What you see per model:

Status badge — Waiting, Running, Success, Failed, or Partial
Expertise badges — Colored pills showing per-domain progress (blue = running, green = done, red = failed)
Progressive JSON — Output updates after each expertise completes
Metrics — Processing time, token count, cost in USD
Progress log — Timestamped entries for every event

Handling Partial Success

When using the multi-expertise strategy, some expertises may fail while others succeed. Rather than discarding everything, Entity Enricher returns the merged output from successful expertises with a “Partial” status. You can then retry only the failed expertises without re-running the entire enrichment.

Example: If 2 out of 3 expertises succeed, you get structured output covering the successful domains. The failed expertise can be retried, and its results will be merged into the existing output.

What Happens Next?

After enrichment completes, your results are saved to the Records page for future reference. If you used multiple models, you can merge the results using Multi-Model Fusion.

Strategies

Single-pass vs multi-expertise

Classification

Pre-flight entity type verification

Fusion

Merge results from multiple models