Understand the building blocks of Entity Enricher: schemas, expertise domains, enrichment strategies, and quality controls.
Entity Enricher bridges the gap between your incomplete data and the vast knowledge embedded in Large Language Models. Think of LLMs as distilled human knowledge — billions of documents, databases, and web pages compressed into queryable neural networks. Entity Enricher provides the interface to extract this knowledge in a structured, reliable format that fits your data model.
A schema is not just a data structure — it is a formalized question you are asking to the collective knowledge of humanity. When you define a schema with properties like companyName,industry, and headquarters, you are essentially asking: “Given a company identifier, tell me its name, what industry it operates in, and where it is headquartered.”
| Schema Concept | Purpose |
|---|---|
| Properties | The specific facts you want to extract |
| Types | The format you expect (string, number, object, array) |
| Expertise Domains | Which specialist should answer (pharmaceutical, financial, geographic) |
| Search Keys | Identifiers that help locate the entity in the knowledge base |
| Preserve | Fields to pass through unchanged from your input |
| Multilingual | Fields that should be translated to multiple languages |
Large Language Models represent a new kind of knowledge base. Unlike traditional databases that return exact matches on stored records, LLMs understand context, reason about incomplete data, and generalize from patterns.
Entity Enricher treats multiple LLMs as different knowledge perspectives. Each provider brings its own strengths — Claude excels at nuanced reasoning, GPT-4 has broad knowledge, Gemini offers multilingual depth, and local Ollama models keep your data private.
Running the same enrichment across multiple providers lets you compare answers for confidence, aggregate consensus from multiple experts, and balance cost versus quality. Learn more about this in Multi-Model Enrichment.
Enrichment is the process of identifying the entity using search keys, retrieving relevant knowledge from the LLM, structuring the response according to your schema, validating the output matches expected types, and preserving your original data where specified.
{ "name": "Novartis", "website": "novartis.com" }{ "name": "Novartis", "industry": "Pharmaceutical", "foundedYear": 1996, "headquarters": { "city": "Basel" } }Not all knowledge is equal. A question about drug mechanisms requires different expertise than a question about corporate structure. Expertise domains route schema properties to the right specialist within the LLM, activating the relevant knowledge patterns for each domain.
When using the multi-expertise strategy, each domain gets its own focused LLM call with only the relevant schema properties, improving output quality significantly.
LLMs can make mistakes. Entity Enricher implements multiple layers of quality control to catch and fix errors automatically:
Search keys prevent the LLM from hallucinating about the wrong entity. They serve two roles:
The enrichment prompt emphasizes: “You are enriching this specific entity identified by these search keys.”
Before enrichment begins, an optional pre-flight classification step can verify that the entity actually matches the schema type. This prevents hallucination when entities do not match — for example, enriching “Titan” against a “Planet” schema when Titan is actually a moon.
LLM calls have costs. Entity Enricher tracks token usage, cost per provider, cost per enrichment, and organization-scoped spending. This enables budget monitoring, provider comparison (cost vs. quality), and optimization decisions like using cheaper models for simple fields.
| Component | Conceptual Role |
|---|---|
| Schema | The question you are asking |
| LLM Providers | Different knowledge perspectives |
| Search Keys | Entity identity anchors |
| Expertise Domains | Specialist routing |
| Strategies | How to orchestrate LLM calls |
| Enrichment | Knowledge extraction process |
| Validation | Quality assurance |
| Preserve | Data integrity protection |