Skip to content

Configure AI Providers

Kappa Graph uses an LLM to extract concepts and relationships from documents during ingestion. This page covers how to set an extraction provider, store API keys, tune extraction parameters, and switch providers.

Embeddings are configured separately — see Configure Embeddings. For a side-by-side quality comparison of providers, see Compare Extraction Quality.


Supported providers

Provider Default model Notes
openai gpt-4o Default. Requires OpenAI API key.
anthropic claude-sonnet-4-20250514 Requires Anthropic API key. Anthropic does not provide embeddings; configure a separate embedding provider.
ollama mistral:7b-instruct Local inference — no API key required. Install Ollama on the host or wire up your own container.
openrouter openai/gpt-4o Routes to many providers under one API. Set the routed model slug via --model.

Provider classes live in api/app/lib/ai_providers.py. The model catalog is stored in kg_api.provider_model_catalog (ADR-800).


Set an extraction provider

# Via operator
./operator.sh ai-provider <provider> [--model <model>] [--max-tokens <n>]

# Via kg CLI
kg admin extraction set --provider <provider> --model <model>

Extraction configuration is read from the database at the start of each ingestion job. Changes apply to the next job you submit — no API restart required.


Store API keys

Keys are validated against the provider before being stored encrypted in kg_api.system_api_keys (ADR-405). Plaintext is never returned after storage.

# Via operator (interactive prompt)
./operator.sh api-key <provider>

# Non-interactive
./operator.sh api-key <provider> --key <key>

# Via kg CLI
kg admin keys set <provider>
kg admin keys set <provider> --key <key>

OpenAI key format: sk-...
Anthropic key format: sk-ant-...

View key status

kg admin keys list

Output shows validation status, a masked key, and last validated time:

  ✓ openai
    Status:         Valid
    Key:            sk-...abc123
    Last Validated: 10/22/2025, 9:11:14 AM

  ⚠ anthropic
    Status:         Invalid
    Error:          Authentication failed: invalid API key

  ○ ollama
    Not configured

Icons: valid · invalid or expired · not configured

Delete a key

kg admin keys delete <provider>

View current extraction configuration

kg admin extraction config
  Provider:      openai
  Model:         gpt-4o
  Vision Support: Yes
  JSON Mode:     Yes
  Max Tokens:    4096

Extraction parameters

kg admin extraction set [OPTIONS]
Option Description Default
--provider <provider> openai, anthropic, ollama, openrouter openai
--model <model> Model name Provider default
--vision / --no-vision Enable or disable vision API support true for gpt-4o
--json-mode / --no-json-mode Enable or disable JSON mode true
--max-tokens <n> Max output tokens (1024–16384) 16384

For Ollama and vllm local providers, additional options are available:

Option Description
--base-url <url> Base URL for the local inference server (e.g. http://localhost:11434)
--temperature <n> Sampling temperature 0.0–1.0
--top-p <n> Nucleus sampling threshold 0.0–1.0
--gpu-layers <n> GPU layers: -1 = auto, 0 = CPU only, >0 = specific count
--num-threads <n> CPU threads for inference
--thinking-mode <mode> off, low, medium, high — Ollama 0.12.x+ reasoning models only

Common workflows

Set up OpenAI (default)

# 1. Store the API key
kg admin keys set openai

# 2. Verify the key is valid
kg admin keys list

# 3. Check the active config (gpt-4o is the default)
kg admin extraction config

# 4. Run a test ingestion
kg ingest text "The universe is vast and complex." -o "test"
kg job list done -l 1

Switch to Anthropic

# 1. Store the Anthropic key
kg admin keys set anthropic

# 2. If embeddings use OpenAI, store that key too
kg admin keys set openai

# 3. Switch the extraction provider
kg admin extraction set --provider anthropic --model claude-sonnet-4-20250514

# 4. Verify
kg admin extraction config

Switch to Ollama (local inference)

Ollama must be reachable from the API container before ingestion starts. Install Ollama on the host (https://ollama.com/) and run ollama serve, or configure OLLAMA_BASE_URL to point at your Ollama instance.

# 1. Pull a model into your Ollama instance
#    (command depends on how you're running Ollama)
ollama pull mistral:7b-instruct

# 2. Configure Kappa Graph
kg admin extraction set --provider ollama --model mistral:7b-instruct

# 3. Test ingestion (expect 8–30s per chunk, not 2s)
kg ingest file -o "Test" -y test-document.txt

Recommended models by GPU VRAM:

VRAM Model Pull command
8–12 GB mistral:7b-instruct ollama pull mistral:7b-instruct
16 GB qwen2.5:14b-instruct ollama pull qwen2.5:14b-instruct
48+ GB llama3.1:70b-instruct ollama pull llama3.1:70b-instruct
CPU only mistral:7b-instruct ollama pull mistral:7b-instruct

Switch to a cost-optimized model

kg admin extraction set --provider openai --model gpt-4o-mini --max-tokens 2048

Changes take effect on the next submitted job.

Use a reasoning model with thinking mode (Ollama 0.12.x+)

Reasoning models generate an internal chain of thought before producing the final JSON. The system uses only the JSON output; the reasoning trace is logged for debugging.

# Configure with thinking mode
kg admin extraction set \
  --provider ollama \
  --model gpt-oss:20b \
  --thinking-mode medium

Thinking modes:

Mode Speed Max tokens Use when
off Fastest 4,096 Simple documents, speed is critical
low Fast 4,096 Standard workloads
medium Slower 12,288 Technical or philosophical content
high Slowest 16,384 Quality is critical, speed is secondary

Standard models (Mistral, Llama) treat all non-off modes identically as enabled.


Choosing a provider

Factor OpenAI Anthropic Ollama
Setup Single API key covers extraction and embeddings Two keys required (extraction + separate embedding provider) No API key; install Ollama separately
Speed ~2s/chunk ~2s/chunk 8–30s/chunk (GPU); ~60s/chunk (CPU only)
Cost Paid per token Paid per token Free
Privacy Cloud Cloud Local — data stays on your machine
Context window 128K (gpt-4o) 200K (claude models) Model-dependent

Use Ollama if: you have 100+ documents to process, your documents contain sensitive data, or you need offline capability with a GPU available.

Use a cloud provider if: you have fewer than ~10 documents, need maximum speed, or are running CPU-only.


Hybrid configuration

Extraction provider and embedding provider are configured independently. Run extraction on Anthropic and embeddings on local sentence-transformers, for example:

kg admin extraction set --provider anthropic --model claude-sonnet-4-20250514
kg admin embedding activate <embedding-profile-id>

Troubleshooting

"No API key configured"

kg admin keys set openai
kg admin keys list     # confirm status is Valid

"API key validation failed"

The key is rejected before storage — the stored key is not changed. Check that you are using the correct provider's key format (sk-... for OpenAI, sk-ant-... for Anthropic) and try again.

"Extraction model not found"

Model name is invalid or not in the catalog. Check the model name spelling. Valid examples: gpt-4o, gpt-4o-mini, claude-sonnet-4-20250514, mistral:7b-instruct.

"Rate limit exceeded"

Reduce concurrent ingestion jobs, wait for the provider rate limit window to reset, or switch to a model tier with higher limits.

Ingestion still uses old model after config change

Extraction config is read at the start of each job, so any job submitted before the config change will use the old model. Jobs submitted after the change will use the new model. No restart is needed; confirm the new config with kg admin extraction config.

"Cannot connect to Ollama"

Check that Ollama is running and reachable from the API container:

# If Ollama is on the host, check it's listening
curl http://localhost:11434/api/tags

If OLLAMA_BASE_URL is not set, the API container defaults to http://localhost:11434. Set it to the correct address if Ollama is on a different host or port.

Out of VRAM (Ollama)

Switch to a smaller model:

ollama pull mistral:7b-instruct
kg admin extraction set --provider ollama --model mistral:7b-instruct