Ingest Documents
Add documents to Kappa Graph so their concepts and relationships become queryable.
Supported formats
| Format | Extension |
|---|---|
| Plain text | .txt |
| Markdown | .md |
.pdf |
|
| Word | .docx |
| Images | .png, .jpg, .jpeg, .gif, .webp, .bmp |
Images are described by the configured vision model before chunking; the resulting text is ingested as a normal document.
How ingestion works
Submitting a document creates an ingestion job. The job passes through these stages:
- pending (analyzing) — cost estimation runs without calling the AI extraction model.
- awaiting_approval — estimates are ready; the job waits for approval.
- approved — approved manually or automatically; the lane manager picks it up.
- processing — documents are chunked (~1000 words, 200-word overlap), each chunk is analyzed by the extraction model, and concepts are upserted to the graph.
- completed — all concepts and relationships are in the graph.
The CLI auto-approves by default. The API does not; use auto_approve=true to skip manual approval when calling the API directly.
Ingest a file
The -o/--ontology flag is required. If the named ontology does not exist, it is created.
Wait for completion (polls until done and streams progress):
Require manual approval (job pauses at awaiting_approval):
Re-ingest after a model or prompt update (bypasses duplicate detection):
Ingest a directory
-r/--recurse with --depth <n> or --depth all controls how deep the scan goes. Pass --directories-as-ontologies to create one ontology per subdirectory automatically.
Ingest raw text
Useful for piped output or programmatically generated content. Chunking behavior is identical to file ingestion.
Check job status
Use the web interface
- Navigate to Ingest in the top menu.
- Upload files using the file picker or drag-and-drop.
- Select an ontology (or type a new name to create one).
- Review the cost estimate — tokens and approximate cost appear before processing starts.
- Click Approve to begin processing.
- Monitor progress in the Jobs view.
Use the API
# File upload
curl -X POST "http://localhost:8000/ingest" \
-H "Authorization: Bearer $TOKEN" \
-F "file=@document.pdf" \
-F "ontology=research" \
-F "auto_approve=true"
# Raw text
curl -X POST "http://localhost:8000/ingest/text" \
-H "Authorization: Bearer $TOKEN" \
-F "text=Some text content" \
-F "ontology=notes" \
-F "auto_approve=true"
Both endpoints return a job_id. Poll GET /jobs/<job_id> for status or stream progress with GET /jobs/<job_id>/stream.
Manage ontologies
Ontologies are named collections of related knowledge. Use them to separate topics and control access.
# List all ontologies
kg ontology list
# View details (file count, concept count, evidence)
kg ontology info "climate-research"
An ontology in the frozen lifecycle state rejects new ingestion. Set it back to active before ingesting.
Troubleshooting
Job stuck in awaiting_approval
The job was submitted with --no-approve. Approve it:
Extraction results look wrong
Check which AI extraction provider and model are active:
Different models produce different extraction quality. See Configure AI Providers to switch providers.
Duplicate detected unexpectedly
The system hashes document content (SHA-256) and skips re-ingestion of identical content in the same ontology. Use --force to override:
Memory exhaustion on large documents
Large documents with many chunks can exhaust available memory. Reduce load by splitting the document into smaller files, or lower MAX_CONCURRENT_JOBS in the platform configuration (requires an API restart).