Document Attachments - Entity Enricher Documentation

Document Attachments

Attach PDFs, images, audio recordings, Office documents, spreadsheets, slides, and text files to any enrichment, schema generation, sample generation, AI schema edit, or playground request. Files reach the model either as native bytes (for PDF-, vision-, and audio-capable models) or as server-extracted text inlined into the prompt — no manual OCR, transcription, conversion, or chunking required.

Where You Can Attach Documents

Single enrichment

Per-record attachments alongside JSON input

Batch enrichment

Shared attachments applied to every entity in the batch

Schema generation (guided)

Generate a schema from an example document

Sample JSON generation

Extract a sample entity from a source file

AI schema editing

Refine a schema with natural language + a reference doc

Playground

Free-form custom prompts with attachments

Two Delivery Modes

Each supported MIME type has an admin-configured delivery mode. The mode determines how the file reaches the model.

binaryNative bytes

The original bytes are passed to the model as BinaryContent. The model reads the file directly — no server-side preprocessing.

Requires a model with the matching capability flag (supports_pdf_input for PDFs,supports_vision for images,supports_audio_input for audio). The model picker is automatically filtered to only show compatible models.

inline_textExtracted text

A server-side extractor runs once at upload time and caches the resulting text. On every subsequent LLM call the cached text is inlined into the user prompt.

No model capability required — works with every model. Plain text and Markdown skip the extractor and decode the raw bytes directly.

Supported Formats

19 formats ship enabled by default. System administrators can flip any format between binary andinline_text mode, change its label, or disable it entirely from Model Management → Document policies.

Format	Extensions	Default mode	Capability / extractor
PDF document	.pdf	binary	`supports_pdf_input`
PNG image	.png	binary	`supports_vision`
JPEG image	.jpg, .jpeg	binary	`supports_vision`
MP3 audio	.mp3	binary	`supports_audio_input`
WAV audio	.wav	binary	`supports_audio_input`
M4A audio	.m4a	binary	`supports_audio_input`
OGG audio	.ogg, .oga	binary	`supports_audio_input`
FLAC audio	.flac	binary	`supports_audio_input`
Plain text	.txt	inline_text	raw decode
Markdown	.md, .markdown	inline_text	raw decode
Word (legacy .doc)	.doc	binary	docx2txt
Word (.docx)	.docx	binary	python-docx
OpenDocument text	.odt	binary	odfpy
Rich Text Format	.rtf	binary	striprtf
EPUB ebook	.epub	binary	ebooklib
HTML	.html, .htm	binary	beautifulsoup
CSV	.csv	binary	csv (stdlib)
Spreadsheet (.xlsx)	.xlsx	binary	openpyxl
Presentation (.pptx)	.pptx	binary	python-pptx

Limits

10 MB

Per file

Reject upload above this cap

50 MB

Per request

Sum of all files in a single upload

No limit

File count

Bounded only by the 50 MB per-request total

Extracted text cap: 500 KB per attachment — longer source documents are truncated when extracted server-side. Extractor timeout: 10s wall-clock per attachment (uploads that exceed the timeout still succeed; the file is stored but its extracted text is empty).

Lifecycle

Upload

Drag-and-drop or pick files in the attachment panel of any supported page. The browser-supplied content type is not trusted — the server sniffs magic bytes and rejects anything outside the allow-list. Each file is hashed (SHA-256) and stored on encrypted block storage.

Dedup by content

Identical bytes uploaded twice within the same organization deduplicate to a single stored file. Two different organizations uploading the same file produce two independent rows — no cross-tenant leakage. The dedup key is (organization_id, sha256).

Extract once (inline_text mode)

For inline_text formats, the extractor runs at upload time and the resulting text is cached on the attachment row. Subsequent LLM calls reuse the cached text — no re-extraction cost. binary formats skip this step.

Reference by ID in any job

Once uploaded, attachments are passed by ID in subsequent enrichment, schema-generation, or playground requests. Each attachment is added to the model's user content as either native bytes (binary mode) or inlined text (inline_text mode), preserving the original filename.

Persisted on the record

When an enrichment record is saved, the attachment IDs are linked to it. The record detail page lists all attachments with a download button. Records can be re-merged or retried without re-uploading.

Delete when done (optional)

Once you no longer need a file, delete it with DELETE /api/attachments/{id} — a handy post-enrichment cleanup step. Deletion is org-scoped and returns { success, id, filename }.

Attachments can be uploaded and deleted programmatically, not just from the web UI: the n8n connector uploads via native multipart, the Make.com and MCP connectors upload via the base64 JSON route, and any client can use the REST API directly (DELETE /api/attachments/{id} for cleanup).

Automatic Model Filtering

When you attach a binary file with a capability requirement (PDF, image, or audio), the model picker is filtered to only show models that declare that capability. If you attach multiple files with different requirements, only models satisfying all requirements appear.

The API enforces the same rule: pairing an incompatible model with a binary attachment returns 400 model_lacks_attachment_capability, so integrations that bypass the UI get a clear pre-flight error instead of a provider failure mid-job. Inline-text attachments never impose a requirement.

Attached files	Eligible models
1 PDF	`supports_pdf_input`
1 PNG	`supports_vision`
1 MP3	`supports_audio_input`
1 PDF + 1 PNG	`supports_pdf_input` AND `supports_vision`
1 DOCX (binary mode, no capability)	All models — native byte support is assumed when no capability flag is set
1 TXT or 1 MD (inline_text mode)	All models — text is inlined into the prompt

Pricing & Token Usage

Attachments are billed as input tokens reported by the model provider — Entity Enricher does not charge a separate per-document fee. The cost depends on the file type and the selected model.

PDFs, images & audio (binary mode)

Consume model-specific input tokens. Anthropic charges around 1700 tokens per PDF page; OpenAI prices vision inputs by tile count; audio-capable models meter audio input in proportion to its duration. Check your model's pricing card in Models & Pricing.

Office docs & spreadsheets (extracted text)

The extracted text consumes input tokens at the standard text rate. Large documents are capped at 500 KB of extracted text — longer content is truncated.

Security & Tenancy

MIME allow-list with magic-byte sniffing

The browser-supplied content type is ignored. The server inspects file headers and rejects anything outside the configured allow-list.

Organization-scoped storage

Each file is stored under its owning organization. The download endpoint enforces org membership — there is no path through the API to reach another tenant’s files.

Sandboxed extractors

Each extractor runs with a 10-second wall-clock timeout inside a try/except boundary. A misbehaving file cannot stall or crash the API process.

Encrypted at rest

Attachment bytes live on encrypted block storage, mounted into the application container with restricted permissions.

Admin-controlled per-MIME policies

System administrators can disable any format globally, change a format from binary to inline_text (or vice versa), or relabel it. Changes take effect on the next upload of that MIME type.

Enrichment Flow

How attachments fit into the pipeline

Schema Generation