Security Guide
Operational guide for Knowledge Graph System security infrastructure
This guide explains the security architecture and how to manage sensitive credentials in the knowledge graph system. Learn how to store LLM API keys securely, understand the defense-in-depth approach, and follow security best practices.
Table of Contents
- Security Architecture Overview
- Encrypted API Key Storage
- Cold Start: First-Time Setup
- Managing LLM API Keys
- Production Deployment
- Security Model & Threat Boundaries
- Troubleshooting
- Security Best Practices
Security Architecture Overview
The knowledge graph system implements defense-in-depth security with multiple protection layers:
┌─────────────────────────────────────────────────────────────────┐
│ HTTP API Layer │
│ • Authentication (OAuth 2.0) │
│ • RBAC authorization │
│ • Rate limiting (future) │
└────────────────────┬────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Job Queue Layer │
│ • PostgreSQL persistence │
│ • Content deduplication (SHA-256) │
│ • Job isolation (one job per worker thread) │
└────────────────────┬────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Worker Thread Layer │
│ • Thread isolation │
│ • Capability tokens (internal authentication) │
│ • Limited module access to key service │
└────────────────────┬────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────┐
│ Encrypted Key Service (ADR-031) │
│ • Fernet encryption (AES-128-CBC + HMAC-SHA256) │
│ • Keys encrypted at rest in PostgreSQL │
│ • Master encryption key in Docker/Podman secrets │
│ • Capability token verification │
└─────────────────────────────────────────────────────────────────┘
Why This Matters:
An attacker needs to compromise multiple isolation boundaries to access LLM API keys:
- HTTP Layer → Exploit API endpoint
- Job Queue → Inject malicious job into PostgreSQL
- Worker Thread → Execute code in worker context
- Key Service → Present valid capability token
This layered approach means a single vulnerability doesn't expose your credentials.
Encrypted API Key Storage
What Gets Protected
The system uses LLM APIs for: - OpenAI - Text embeddings (text-embedding-3-small), concept extraction (GPT-4) - Anthropic - Concept extraction (Claude 3.5 Sonnet) - Future providers - OpenRouter, local Ollama, custom models
These API keys are shard-scoped (one set per deployment) and used by background workers for document ingestion.
How Protection Works
ADR-031: Encrypted API Key Storage
- Encryption at Rest
- Keys encrypted with Fernet (AES-128-CBC + HMAC-SHA256)
- Stored as binary blobs in PostgreSQL (
ag_catalog.system_api_keys) -
Never stored in plaintext
-
Master Key Management
- Master encryption key stored separately from database
- Production: Docker/Podman secrets (
/run/secrets/encryption_master_key) -
Development: Environment variable or auto-generated temporary key
-
Access Control
- Only authorized worker threads can decrypt keys
- Capability token verification (configuration-based shared secret)
-
Module allowlist enforcement
-
Validation Before Storage
- API keys tested against provider API before accepting
- Rejects invalid keys immediately
- Prevents storing expired or malformed credentials
Backward Compatibility
The system maintains full backward compatibility with existing deployments:
Fallback Chain (Priority Order):
1. Encrypted storage (ADR-031) - Tried first
2. Environment variables - OPENAI_API_KEY, ANTHROPIC_API_KEY
3. .env file - Development fallback
Migration is optional - existing .env configurations continue working without changes.
Cold Start: First-Time Setup
When deploying the system for the first time, you have two options:
Option 1: Use Existing .env Configuration (Legacy)
No changes required! The system works exactly as before:
Workers will load keys from environment variables.
Option 2: Migrate to Encrypted Storage (Recommended)
Benefits: - Keys encrypted at rest (protects against database dumps) - Centralized key rotation via API - Audit logging of key access - Supports key-per-provider without environment pollution
Prerequisites:
1. PostgreSQL container running: docker-compose up -d
2. API server running: python -m uvicorn src.api.main:app --host 0.0.0.0 --port 8000
3. Encryption master key configured (auto-generated in development)
Steps:
# 1. Check API health
curl http://localhost:8000/health
# 2. List current key configuration
curl http://localhost:8000/admin/keys
# Response shows which providers are configured:
# [
# {"provider": "openai", "configured": false, "updated_at": null},
# {"provider": "anthropic", "configured": false, "updated_at": null}
# ]
# 3. Store OpenAI key (validates before accepting)
curl -X POST http://localhost:8000/admin/keys/openai \
-F "api_key=sk-proj-..."
# Response on success:
# {
# "status": "success",
# "message": "openai API key configured for this shard",
# "provider": "openai"
# }
# 4. Store Anthropic key (optional)
curl -X POST http://localhost:8000/admin/keys/anthropic \
-F "api_key=sk-ant-..."
# 5. Verify keys are stored
curl http://localhost:8000/admin/keys
# Response now shows:
# [
# {"provider": "openai", "configured": true, "updated_at": "2025-10-13T..."},
# {"provider": "anthropic", "configured": false, "updated_at": null}
# ]
Development Note: In development mode (no ENCRYPTION_KEY set), the system auto-generates a temporary encryption key on startup. This key is regenerated on every restart, so you'll need to re-store API keys after restarting the server.
Managing LLM API Keys
Store or Rotate a Key
# Store new key (or rotate existing)
curl -X POST http://localhost:8000/admin/keys/openai \
-F "api_key=sk-proj-NEW_KEY_HERE"
What happens:
1. ✅ Key format validation (must start with sk- or sk-ant-)
2. ✅ Live API test (minimal request to provider)
3. ✅ Encryption with Fernet
4. ✅ Storage in PostgreSQL
5. ❌ Rejects invalid, expired, or malformed keys
Error responses:
// Invalid format
{
"detail": "Invalid OpenAI API key format (must start with 'sk-')"
}
// API validation failed
{
"detail": "API key validation failed: Error code: 401 - Incorrect API key provided"
}
List Configured Providers
Response:
[
{
"provider": "openai",
"configured": true,
"updated_at": "2025-10-13T13:23:45.539554+00:00"
},
{
"provider": "anthropic",
"configured": false,
"updated_at": null
}
]
Security note: Plaintext keys are never returned via API (only configuration status).
Delete a Key
Response:
⚠️ Warning: After deletion, any ingestion jobs using this provider will fail until a new key is configured.
Verify Encryption in Database
To confirm keys are encrypted (not plaintext):
docker exec -i knowledge-graph-postgres psql -U admin -d knowledge_graph <<'EOF'
\x
SELECT
provider,
length(encrypted_key) as encrypted_key_length,
substring(encode(encrypted_key, 'base64'), 1, 50) || '...' as encrypted_preview,
updated_at
FROM ag_catalog.system_api_keys;
EOF
Expected output:
-[ RECORD 1 ]--------+---------------------------------------------------
provider | openai
encrypted_key_length | 312
encrypted_preview | Z0FBQUFBQm83UDFoclozMUlVdlVxRmZrRUE2YjdONzd...
updated_at | 2025-10-13 13:23:45.539554+00
The encrypted_key is a binary blob (BYTEA) - not human-readable plaintext.
Production Deployment
Master Encryption Key Management
In production, never use auto-generated temporary keys. Configure a persistent master encryption key.
Option 1: Docker/Podman Secrets (Recommended)
Why this is best:
- Secrets never written to disk in plaintext
- Not visible in container environment
- Works across container orchestration (Docker Swarm, Kubernetes)
- Automatically mounted at /run/secrets/
Setup:
# 1. Generate master encryption key (Fernet-compatible)
python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Output: gAAAAABe... (44 characters, base64-encoded)
# 2. Create Docker secret
echo "gAAAAABe..." | docker secret create encryption_master_key -
# 3. Update docker-compose.yml to mount secret
services:
api:
secrets:
- encryption_master_key
secrets:
encryption_master_key:
external: true
# 4. Restart services
docker-compose up -d
The API server will automatically load from /run/secrets/encryption_master_key.
Option 2: Environment Variable
For environments without Docker secrets support:
# Generate key
ENCRYPTION_KEY=$(python3 -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())")
# Add to .env (NEVER commit to git!)
echo "ENCRYPTION_KEY=$ENCRYPTION_KEY" >> .env
# Or set in container environment
docker run -e ENCRYPTION_KEY="$ENCRYPTION_KEY" ...
Option 3: File Path
For systems using external secret management (Vault, AWS Secrets Manager):
# Write key to secure file
echo "gAAAAABe..." > /secure/path/encryption.key
chmod 600 /secure/path/encryption.key
# Point to file path in environment
export ENCRYPTION_KEY_FILE=/secure/path/encryption.key
Internal Service Authentication
Production deployments should configure the internal capability token for worker-to-key-service authentication:
# Generate random token
INTERNAL_SECRET=$(openssl rand -hex 32)
# Add to .env or Docker secrets
echo "INTERNAL_KEY_SERVICE_SECRET=$INTERNAL_SECRET" >> .env
Why this matters: Prevents arbitrary code from accessing encrypted keys. Workers must present this token to decrypt LLM API keys.
Key Rotation Strategy
Recommended schedule: - LLM API keys: Rotate every 90 days - Master encryption key: Rotate every 6-12 months - Internal service token: Rotate every 6 months
How to rotate LLM keys:
# 1. Generate new key at provider (OpenAI, Anthropic dashboard)
# 2. Store new key via API
curl -X POST http://localhost:8000/admin/keys/openai \
-F "api_key=sk-proj-NEW_KEY"
# 3. Test ingestion with new key
kg ingest file -o "Test" document.txt
# 4. If successful, revoke old key at provider
How to rotate master encryption key:
⚠️ Complex operation - requires decrypting all keys with old master, re-encrypting with new master.
Recommended approach: Use blue-green deployment: 1. Deploy new instance with new master key 2. Manually configure LLM keys in new instance 3. Migrate traffic to new instance 4. Decommission old instance
(Automated re-encryption script is a future enhancement.)
Security Model & Threat Boundaries
What This Protects Against
✅ Database Dump Exposure - Attacker gains read access to PostgreSQL - LLM API keys are encrypted blobs (unusable without master key)
✅ Backup File Theft - Database backups contain encrypted keys - Master encryption key stored separately
✅ Accidental Logging - Keys never logged in plaintext - Only encrypted representations logged
✅ Unauthorized Internal Access - Workers require capability token to decrypt keys - Prevents arbitrary code from reading keys
✅ Key Leakage via API - GET /admin/keys never returns plaintext keys - Only configuration status exposed
What This Does NOT Protect Against
❌ Code Execution in Worker Thread - If attacker runs code in ingestion worker, they can read keys - Mitigation: Defense-in-depth (requires exploiting multiple layers)
❌ Memory Dumps - Decrypted keys exist in memory during LLM API calls - Mitigation: Short-lived, process isolation, system hardening
❌ Master Key Compromise - If master encryption key is stolen, all LLM keys can be decrypted - Mitigation: Docker secrets, secure key management, monitoring
❌ Authenticated Admin Access - Admin with valid credentials can store/rotate keys - Mitigation: Strong authentication, audit logging, RBAC
Threat Model Summary
Attacker needs to compromise ALL of: 1. HTTP API authentication (bypass OAuth/RBAC) 2. Job queue isolation (inject malicious job) 3. Worker thread execution (run arbitrary code) 4. Capability token (present valid internal secret) 5. Master encryption key (decrypt stored keys)
Risk reduction: Each layer reduces probability of successful attack by an order of magnitude.
Troubleshooting
Problem: API returns "No encryption key configured"
Symptom:
Cause: Master encryption key not available
Solution:
# Development: Auto-generated key (restart API)
pkill -f uvicorn
python -m uvicorn src.api.main:app --reload
# Production: Configure persistent key
# See "Production Deployment" section above
Problem: Keys disappear after server restart
Symptom: Need to re-store API keys after every restart
Cause: Using auto-generated temporary encryption key
Why this happens: Temporary key is regenerated on startup, so previously encrypted keys can't be decrypted with the new key.
Solution: Configure persistent master encryption key (see Production Deployment).
Problem: "API key validation failed"
Symptom:
curl -X POST http://localhost:8000/admin/keys/openai \
-F "api_key=sk-..."
# Response:
# {"detail": "API key validation failed: Error code: 401 - Incorrect API key provided"}
Possible causes: 1. Key is invalid or expired 2. Key is for wrong provider (OpenAI key used for Anthropic endpoint) 3. Network issue connecting to provider API 4. Provider API is down
Solution:
# Test key manually with provider
curl https://api.openai.com/v1/models \
-H "Authorization: Bearer sk-..."
# Check provider status
# OpenAI: https://status.openai.com/
# Anthropic: https://status.anthropic.com/
Problem: Ingestion fails with "No API key configured"
Symptom:
kg ingest file -o "Test" document.txt
# Job fails with: "No openai API key configured for this shard"
Cause: No key configured in encrypted storage OR .env
Solution:
# Option 1: Store in encrypted storage
curl -X POST http://localhost:8000/admin/keys/openai \
-F "api_key=sk-..."
# Option 2: Set in .env (legacy mode)
echo "OPENAI_API_KEY=sk-..." >> .env
pkill -f uvicorn # Restart to load new env
Problem: "Internal server error storing API key"
Symptom:
curl -X POST http://localhost:8000/admin/keys/openai \
-F "api_key=sk-..."
# Response:
# {"detail": "Internal server error storing API key"}
Check API logs:
Common causes:
- PostgreSQL connection failed
- ag_catalog.system_api_keys table doesn't exist
- Encryption key invalid format (not Fernet-compatible)
Solution:
# Test database connection
docker exec knowledge-graph-postgres psql -U admin -d knowledge_graph -c "\dt ag_catalog.*"
# Verify encryption key format
python3 -c "from cryptography.fernet import Fernet; Fernet(b'${ENCRYPTION_KEY}')"
Problem: Can't access keys from worker
Symptom: Worker logs show "Invalid service token" when trying to decrypt keys
Cause: Internal capability token mismatch or not configured
Solution:
# Ensure consistent internal secret across all services
echo "INTERNAL_KEY_SERVICE_SECRET=$(openssl rand -hex 32)" >> .env
# Restart all services
docker-compose restart
pkill -f uvicorn
python -m uvicorn src.api.main:app --reload
Security Best Practices
✅ DO:
Key Management:
- ✅ Use Docker/Podman secrets for master encryption key in production
- ✅ Rotate LLM API keys every 90 days
- ✅ Store master encryption key separately from database backups
- ✅ Use encrypted storage instead of .env in production
- ✅ Test new keys in staging before production deployment
- ✅ Revoke old keys at provider after rotation
Access Control: - ✅ Configure internal capability token in production - ✅ Restrict admin endpoint access with authentication (ADR-027) - ✅ Use RBAC to limit who can manage keys - ✅ Monitor API logs for suspicious key access patterns
Operations: - ✅ Document your key rotation schedule - ✅ Set up alerts for failed API calls (may indicate key issues) - ✅ Keep backup of master encryption key in secure vault - ✅ Test key recovery procedures regularly
❌ DON'T:
Key Management:
- ❌ Commit .env to version control (in .gitignore)
- ❌ Store keys in plaintext anywhere (use encrypted storage)
- ❌ Use the same master encryption key across environments
- ❌ Share master encryption key in Slack, email, or chat
- ❌ Leave expired keys configured (revoke after rotation)
Access Control: - ❌ Expose admin key endpoints without authentication - ❌ Use auto-generated temporary keys in production - ❌ Grant admin access to too many users - ❌ Skip capability token configuration in production
Operations: - ❌ Forget to test keys after rotation - ❌ Include master encryption key in database backups - ❌ Log decrypted keys (only log "key loaded" events) - ❌ Skip key rotation (set calendar reminders)
Quick Reference
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
GET /admin/keys |
GET | List configured providers |
POST /admin/keys/{provider} |
POST | Store/rotate API key |
DELETE /admin/keys/{provider} |
DELETE | Remove API key |
Supported providers: openai, anthropic
Environment Variables
# Master encryption key (production)
ENCRYPTION_KEY=<fernet-key> # Direct key value
ENCRYPTION_KEY_FILE=/path/to/key # File path
# Or: /run/secrets/encryption_master_key (Docker secrets)
# Internal service authentication
INTERNAL_KEY_SERVICE_SECRET=<hex-secret>
# Legacy mode (backward compatible)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
Key Format Requirements
| Provider | Format | Example |
|---|---|---|
| OpenAI | Starts with sk- or sk-proj- |
sk-proj-abc123... |
| Anthropic | Starts with sk-ant- |
sk-ant-api03-... |
Database Tables
-- Encrypted keys stored in:
ag_catalog.system_api_keys
- provider VARCHAR(50) PRIMARY KEY
- encrypted_key BYTEA NOT NULL
- updated_at TIMESTAMP WITH TIME ZONE
Architecture References
- ADR-031 - Encrypted API key storage design
- ADR-027 - Authentication system
- ADR-028 - Role-based access control
- 01-AUTHENTICATION.md - User authentication guide
Related Guides
- ../../guides/QUICKSTART.md - Initial system setup
- ../02-configuration/01-AI_PROVIDERS.md - Configure LLM providers
- ../05-maintenance/01-BACKUP_RESTORE.md - Database backup security
Security Questions?
For security concerns or vulnerability reports, please file an issue at the project repository with the security label.