Enrich the same kind of entity again and again and you keep re-discovering the same real-world things — the same company, the same drug side-effect, the same person — described with slightly different words each time. A semantic ID is a stable, organization-scoped identifier Entity Enricher assigns to an object from its key fields, so those near-duplicates collapse to one identity you can group, deduplicate, and join on.
An object’s identity is built from its key fields — and there can be one or several. Two examples:
nameIt shows up as Headache, Céphalée, and Cephalalgia across runs and languages. One key field, three spellings, one real concept.
name + countryAcme Inc. · United States and Acme Incorporated · United States are the same company — while Acme Inc. · Germany is a different one. The second key disambiguates; that’s why an object can carry more than one.
Plain string matching fails on all of these; a human knows which are the same. Semantic IDs encode that judgement automatically.
string property on an object (named id by default), holding an opaque, stable identifier.preserve) field: always a string, never a key, never multilingual, at most one per object.manufacturer), or each item in an array (e.g. each side_effect).After the model returns its result, Entity Enricher resolves each semantic ID in four steps — cheapest first:
“Acme Inc.” and“Acme Incorporated” land next to each other.0.92, tunable per property), that concept’s ID is reused. Otherwise a brand-new ID is minted and stored for next time.Threshold trade-off: a higher threshold is stricter (fewer accidental merges); a lower one is looser (more aggressive deduplication). Tune it per property when the default 0.92 over- or under-merges.
Whether an ID is generated depends on whether one is already present in the input for that object. This is what lets you round-trip: enrich once to obtain IDs, then pass a known ID back on later runs to attach new facts to the same identity — cheaper and unambiguous.
If the object you send already carries a semantic ID, it’s treated as a lookup: the ID is kept verbatim, the record is linked to that existing concept, and there is no embedding — no cost, no match-or-mint. You’re telling the platform “this object is already identified in our database.”
If the object has no semantic ID, the platform generates one with the four steps above. That ID becomes the object’s stable identifier in your organization’s database from then on.
A present-but-unrecognizable value (not a real concept ID) is ignored, and an ID is generated instead.
Resolution costs a small amount of embedding usage per enrichment (metered like any model call). The exact-match cache makes repeats free, and input-provided IDs cost nothing.
Resolved IDs appear in the enrichment output JSON (the id field on each object) and in the record detail’s semantic concepts. Use them to:
Fusion reconciles disagreements across models within a single run; semantic IDs reconcile the same entity across runs and time. The two work together.