Core Concepts - Entity Enricher Documentation

Core Concepts

Understand the building blocks of Entity Enricher: schemas, expertise domains, enrichment strategies, and quality controls.

The Core Idea

Entity Enricher bridges the gap between your incomplete data and the vast knowledge embedded in Large Language Models. Think of LLMs as distilled human knowledge — billions of documents, databases, and web pages compressed into queryable neural networks. Entity Enricher provides the interface to extract this knowledge in a structured, reliable format that fits your data model.

Your Data
Company names
Partial info
Missing fields
Raw identifiers
Schema + LLM
“What do I want to know?”
Enriched Data
Full profiles
Classifications
Relationships
Structured facts

Three Pillars

1. The Schema: Your Question to the Knowledge Base

A schema is not just a data structure — it is a formalized question you are asking to the collective knowledge of humanity. When you define a schema with properties like companyName,industry, and headquarters, you are essentially asking: “Given a company identifier, tell me its name, what industry it operates in, and where it is headquartered.”

Schema ConceptPurpose
PropertiesThe specific facts you want to extract
TypesThe format you expect (string, number, object, array)
Expertise DomainsWhich specialist should answer (pharmaceutical, financial, geographic)
Search KeysIdentifiers that help locate the entity in the knowledge base
PreserveFields to pass through unchanged from your input
MultilingualFields that should be translated to multiple languages

2. The LLM: Queryable Human Knowledge

Large Language Models represent a new kind of knowledge base. Unlike traditional databases that return exact matches on stored records, LLMs understand context, reason about incomplete data, and generalize from patterns.

Entity Enricher treats multiple LLMs as different knowledge perspectives. Each provider brings its own strengths — Claude excels at nuanced reasoning, GPT-4 has broad knowledge, Gemini offers multilingual depth, and local Ollama models keep your data private.

Running the same enrichment across multiple providers lets you compare answers for confidence, aggregate consensus from multiple experts, and balance cost versus quality. Learn more about this in Multi-Model Enrichment.

3. The Enrichment: Structured Knowledge Extraction

Enrichment is the process of identifying the entity using search keys, retrieving relevant knowledge from the LLM, structuring the response according to your schema, validating the output matches expected types, and preserving your original data where specified.

Input
{ "name": "Novartis", "website": "novartis.com" }
Extract keys → Query LLM → Validate → Normalize
Output
{ "name": "Novartis", "industry": "Pharmaceutical", "foundedYear": 1996, "headquarters": { "city": "Basel" } }

Expertise Domains: Consulting the Right Specialist

Not all knowledge is equal. A question about drug mechanisms requires different expertise than a question about corporate structure. Expertise domains route schema properties to the right specialist within the LLM, activating the relevant knowledge patterns for each domain.

pharmaceutical
Drug names, mechanisms, indications, regulatory status
business_classification
Industry codes, company types, market segments
geographic
Locations, regions, country-specific information
financial
Revenue, market cap, funding rounds
temporal
Dates, periods, historical events
regulatory
Approvals, licenses, compliance status

When using the multi-expertise strategy, each domain gets its own focused LLM call with only the relevant schema properties, improving output quality significantly.

Quality Controls

Validation and Self-Correction

LLMs can make mistakes. Entity Enricher implements multiple layers of quality control to catch and fix errors automatically:

  1. Type Validation — Ensures output matches schema types (string, number, boolean, etc.)
  2. Expertise Validation — Verifies all expertise domains are defined and contain properties
  3. Self-Correction — When validation fails, errors are sent back to the LLM for automatic correction (up to 5 retries)
  4. Preserve Logic — Original values for preserved fields are restored after enrichment, ensuring data integrity

Search Keys: Anchoring Identity

Search keys prevent the LLM from hallucinating about the wrong entity. They serve two roles:

  • Search keys (name, website) — Lookup identifiers that help the LLM find the right entity
  • Merge keys (product_name in arrays) — Deduplication keys for matching array items when merging results from multiple models

The enrichment prompt emphasizes: “You are enriching this specific entity identified by these search keys.”

Pre-flight Classification

Before enrichment begins, an optional pre-flight classification step can verify that the entity actually matches the schema type. This prevents hallucination when entities do not match — for example, enriching “Titan” against a “Planet” schema when Titan is actually a moon.

Cost Awareness

LLM calls have costs. Entity Enricher tracks token usage, cost per provider, cost per enrichment, and organization-scoped spending. This enables budget monitoring, provider comparison (cost vs. quality), and optimization decisions like using cheaper models for simple fields.

Summary

ComponentConceptual Role
SchemaThe question you are asking
LLM ProvidersDifferent knowledge perspectives
Search KeysEntity identity anchors
Expertise DomainsSpecialist routing
StrategiesHow to orchestrate LLM calls
EnrichmentKnowledge extraction process
ValidationQuality assurance
PreserveData integrity protection

Next Steps