Knowledge Graph System Architecture
Overview
The Knowledge Graph System externalizes the latent space of LLMs into a queryable PostgreSQL database. Instead of discarding the understanding when an LLM processes a document, the system serializes the neural activation pattern into a persistent data structure - concepts as nodes, relationships as edges, embeddings as coordinates.
This creates a Large Concept Model: where concepts are first-class entities that can be queried, traversed, filtered, and reasoned about directly.
Key characteristics: - Identity by semantic similarity: Concepts merge when embeddings are ≥85% similar - Truth as geometry: Grounding calculated as vector projection onto polarity axis (ADR-058) - Emergent vocabulary: Relationship types generated by LLM, consolidated by usage patterns (ADR-052) - Evidence accumulation: Grounding scores computed at query time from current evidence
See ADR-044 for probabilistic truth, ADR-058 for geometric grounding, and ADR-063 for authenticity signals.
Note: Some sections below reference earlier architecture phases. The current system uses PostgreSQL for all storage (graph, jobs, config) via the operator architecture (ADR-061).
System Architecture
┌──────────────────────────────────────────────────────────────┐
│ Document Ingestion │
│ .txt/.pdf files → API Server → Background Jobs → AGE │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ FastAPI Server (Phase 1) │
│ • REST endpoints (ingest, jobs) │
│ • Job queue (in-memory + SQLite) │
│ • Content deduplication (SHA-256) │
│ • Placeholder auth (X-Client-ID, X-API-Key) │
└──────────────────────────────────────────────────────────────┘
↓
┌──────────────────────────────────────────────────────────────┐
│ Apache AGE + PostgreSQL Graph Database │
│ • Concepts (nodes with vector embeddings) │
│ • Instances (evidence quotes) │
│ • Sources (document paragraphs) │
│ • Relationships (IMPLIES, SUPPORTS, CONTRADICTS, etc.) │
└──────────────────────────────────────────────────────────────┘
↓
┌───────┴───────────┐
│ │
┌──────────▼─────┐ ┌──────────▼─────────┐
│ TypeScript │ │ MCP Server │
│ CLI (kg) │ │ (Phase 2) │
│ • Ingest │ │ • Claude Desktop │
│ • Jobs │ │ • Same codebase │
└────────────────┘ └────────────────────┘
Core Components
1. API Server Layer (src/api/)
FastAPI REST Server (Phase 1):
- Routes: Ingestion (POST /ingest), job management (GET/POST /jobs/*)
- Services: Job queue (abstract interface), content hasher (deduplication)
- Workers: Background ingestion processing with progress updates
- Models: Pydantic request/response schemas matching TypeScript client
- Middleware: Placeholder authentication (X-Client-ID, X-API-Key headers)
Job Queue Pattern:
# Abstract interface for Phase 1 → Phase 2 migration
class JobQueue(ABC):
def enqueue(job_type, job_data) -> job_id
def get_job(job_id) -> JobStatus
def update_job(job_id, updates) -> None
# Phase 1: InMemoryJobQueue (SQLite persistence)
# Phase 2: RedisJobQueue (distributed workers)
Content Deduplication: - SHA-256 hash of document content + ontology name - Prevents expensive re-ingestion ($50-100 per document) - Returns existing job results if already completed - Force flag to override when intentional
See ADR-012 for detailed design.
2. AI Provider Layer (src/api/lib/ai_providers.py)
Modular abstraction for LLM providers:
OpenAI Provider: - Extraction: GPT-4o, GPT-4o-mini, o1-preview, o1-mini - Embeddings: text-embedding-3-small, text-embedding-3-large
Anthropic Provider: - Extraction: Claude Sonnet 4.5, Claude 3.5 Sonnet, Claude 3 Opus - Embeddings: Delegates to OpenAI (Anthropic doesn't provide embeddings)
3. Ingestion Library (src/api/lib/)
Components:
- parser.py - Document parsing (text, PDF, DOCX)
- llm_extractor.py - LLM-based concept extraction
- age_client.py - Apache AGE graph database operations
- chunker.py - Smart document chunking with semantic boundaries
- ingestion.py - Chunk processing and statistics tracking
- checkpoint.py - Ingestion checkpoint management
Flow: 1. API Submission: Client POSTs file → API returns job_id 2. Background Processing: Worker pulls job from queue 3. Parse & Chunk: Document → semantic chunks with overlap 4. For each chunk: - Query recent concepts from graph (context) - Extract concepts using LLM - Generate embeddings - Match against existing concepts (vector similarity ≥ 0.85) - Upsert to Apache AGE (create/update nodes and relationships) - Update job progress (percent, concepts created) 5. Complete: Worker writes final stats to job result
4. Client Layer (client/)
Unified TypeScript Client (CLI + MCP in one codebase):
Shared Components:
- src/types/ - TypeScript interfaces matching FastAPI Pydantic models
- src/api/client.ts - HTTP client wrapping REST API endpoints
- Configuration: Environment variables (KG_API_URL, KG_CLIENT_ID)
CLI Mode (Phase 1 - Complete):
- Commands: kg health, kg ingest file/text, kg jobs status/list/cancel
- User experience: Color-coded output, progress spinners, duplicate detection
- Installation: Wrapper script (scripts/kg-cli.sh), direct node, or npm link
MCP Server Mode (Phase 2 - Future):
- Entry point detects MCP_SERVER_MODE=true environment variable
- Runs as MCP server for Claude Desktop/Code
- Tools use same API client as CLI
- Claude Desktop config: Node.js + environment variables
See ADR-013 for detailed design.
5. Graph Database (Apache AGE + PostgreSQL)
Node Types:
// Concept - Core knowledge unit
(:Concept {
concept_id: "linear-scanning-system",
label: "Linear scanning system",
embedding: [0.123, ...], // 1536 dims
search_terms: ["linear thinking", "sequential processing"]
})
// Source - Document location
(:Source {
source_id: "watts-doc-1-para-4",
document: "Watts Doc 1",
paragraph: 4,
full_text: "..."
})
// Instance - Evidence quote
(:Instance {
instance_id: "uuid",
quote: "exact verbatim quote"
})
Relationships:
Structural Relationships (link concepts to evidence):
- APPEARS_IN - Concept → Source
- EVIDENCED_BY - Concept → Instance
- FROM_SOURCE - Instance → Source
Concept Relationships (30 semantically sparse types, see ADR-022):
Categories: - Logical & Truth (4 types): IMPLIES, CONTRADICTS, PRESUPPOSES, EQUIVALENT_TO - Causal (5 types): CAUSES, ENABLES, PREVENTS, INFLUENCES, RESULTS_FROM - Structural (5 types): PART_OF, CONTAINS, COMPOSED_OF, SUBSET_OF, INSTANCE_OF - Evidential (4 types): SUPPORTS, REFUTES, EXEMPLIFIES, MEASURED_BY - Similarity (4 types): SIMILAR_TO, ANALOGOUS_TO, CONTRASTS_WITH, OPPOSITE_OF - Temporal (3 types): PRECEDES, CONCURRENT_WITH, EVOLVES_INTO - Functional (4 types): USED_FOR, REQUIRES, PRODUCES, REGULATES - Meta (2 types): DEFINED_AS, CATEGORIZED_AS
Edge Properties:
- confidence (float): LLM confidence score (0.0-1.0)
- category (string): Semantic category for query filtering
6. Legacy Query Interfaces
Legacy MCP Server (mcp-server/):
- Direct Apache AGE database access
- Claude Desktop integration
- Tools: search_concepts, get_concept_details, find_related_concepts, etc.
- Status: Will migrate to unified TypeScript client (Phase 2)
Note: The legacy Python CLI (scripts/cli.py) has been deprecated and removed in favor of the unified TypeScript client (kg command).
Data Flow
Ingestion Flow (Current Architecture)
Client (kg CLI)
↓ POST /ingest (file + ontology)
API Server
├→ Calculate SHA-256 hash
├→ Check for duplicate (hash + ontology)
│ ├→ Duplicate found: return existing job result
│ └→ No duplicate: continue
├→ Create job in SQLite
├→ Enqueue to in-memory queue
├→ Return job_id immediately
└→ Background worker starts
Background Worker
↓ Parse & chunk document
Chunks with context overlap
↓ for each chunk
├→ Query recent concepts from graph (context for LLM)
├→ Extract concepts (LLM)
│ ├→ concepts: [{id, label, search_terms}]
│ ├→ instances: [{concept_id, quote}]
│ └→ relationships: [{from, to, type, confidence}]
│
├→ Generate embeddings (OpenAI)
│
├→ Match existing concepts (vector search)
│ ├→ similarity ≥ 0.85: use existing
│ └→ else: create new
│
├→ Upsert to Apache AGE
│ ├→ CREATE/UPDATE concepts
│ ├→ CREATE instances
│ └→ CREATE relationships
│
└→ Update job progress (SQLite)
└→ percent, chunks_processed, concepts_created
Client polls GET /jobs/{job_id}
↓ every 2 seconds
Job Status Response
├→ status: queued | processing | completed | failed
├→ progress: {percent, chunks_processed, concepts_created}
└→ result: {stats, cost} (if completed)
Query Flow
User Query
↓
Generate embedding
↓
Vector similarity search
↓
MATCH concepts WHERE similarity > threshold
↓
OPTIONAL MATCH related concepts
↓
Return structured results
Configuration
Environment Variables
API Server (.env):
# AI Provider Selection
AI_PROVIDER=openai # or "anthropic"
# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_EXTRACTION_MODEL=gpt-4o
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# Anthropic (optional)
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_EXTRACTION_MODEL=claude-sonnet-4-20250514
# PostgreSQL + Apache AGE
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password
POSTGRES_DB=knowledge_graph
# Authentication (Phase 1: disabled)
AUTH_ENABLED=false
AUTH_REQUIRE_CLIENT_ID=false
AUTH_API_KEYS= # Comma-separated keys for Phase 2
TypeScript Client:
# API connection
KG_API_URL=http://localhost:8000
KG_CLIENT_ID=my-client
KG_API_KEY= # Optional, for Phase 2
# Mode selection (CLI vs MCP)
MCP_SERVER_MODE=false # or "true" for MCP server mode
Concept Matching Algorithm
Multi-stage matching to prevent duplicates:
Stage 1: Exact ID Match - LLM predicted existing concept_id → use it - Confidence: 100%
Stage 2: Vector Similarity (Primary)
- Embed: label + search_terms
- Cosine similarity search
- Threshold ≥ 0.85 → match
- Confidence: similarity score
Stage 3: Create New - No match found - Generate new concept_id (kebab-case)
Scalability Considerations
Phase 1 (Current)
- API Server: Single FastAPI instance with BackgroundTasks
- Job Queue: In-memory dict + SQLite persistence
- Database: Single PostgreSQL + Apache AGE instance
- Limitations: No distributed workers, no multi-instance API
Phase 2 (Planned)
- Job Queue: Redis-based distributed queue
- Workers: Separate worker processes (can scale horizontally)
- API Server: Multiple instances behind load balancer
- Real-time Updates: WebSocket/SSE for job progress
- Authentication: Full API key validation and rate limiting
Future Enhancements
- PostgreSQL replication for HA
- Apache AGE performance optimization
- Dedicated vector database (pgvector, Pinecone, Weaviate)
- Incremental updates (avoid re-processing)
- Caching layer for query results
Security
API Keys
- Stored in
.env(gitignored) - Never committed to version control
- Validated on startup
Database
- PostgreSQL auth required (no anonymous access)
- Apache AGE graph access via PostgreSQL roles
- Local development: simple password
- Production: strong auth + TLS
Testing Strategy
Unit Tests
- AI provider abstraction
- Concept matching logic
- Graph queries
Integration Tests
- End-to-end ingestion
- MCP tool functionality
- CLI commands
Manual Testing
- Use sample Watts documents
- Verify concept extraction quality
- Test relationship accuracy