Knowledge Graph System Architecture

Overview

The Knowledge Graph System externalizes the latent space of LLMs into a queryable PostgreSQL database. Instead of discarding the understanding when an LLM processes a document, the system serializes the neural activation pattern into a persistent data structure - concepts as nodes, relationships as edges, embeddings as coordinates.

This creates a Large Concept Model: where concepts are first-class entities that can be queried, traversed, filtered, and reasoned about directly.

Key characteristics: - Identity by semantic similarity: Concepts merge when embeddings are ≥85% similar - Truth as geometry: Grounding calculated as vector projection onto polarity axis (ADR-058) - Emergent vocabulary: Relationship types generated by LLM, consolidated by usage patterns (ADR-052) - Evidence accumulation: Grounding scores computed at query time from current evidence

See ADR-044 for probabilistic truth, ADR-058 for geometric grounding, and ADR-063 for authenticity signals.

Note: Some sections below reference earlier architecture phases. The current system uses PostgreSQL for all storage (graph, jobs, config) via the operator architecture (ADR-061).

System Architecture

┌──────────────────────────────────────────────────────────────┐
│                    Document Ingestion                         │
│  .txt/.pdf files → API Server → Background Jobs → AGE        │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│                   FastAPI Server (Phase 1)                    │
│  • REST endpoints (ingest, jobs)                              │
│  • Job queue (in-memory + SQLite)                             │
│  • Content deduplication (SHA-256)                            │
│  • Placeholder auth (X-Client-ID, X-API-Key)                  │
└──────────────────────────────────────────────────────────────┘
                            ↓
┌──────────────────────────────────────────────────────────────┐
│         Apache AGE + PostgreSQL Graph Database                │
│  • Concepts (nodes with vector embeddings)                    │
│  • Instances (evidence quotes)                                │
│  • Sources (document paragraphs)                              │
│  • Relationships (IMPLIES, SUPPORTS, CONTRADICTS, etc.)       │
└──────────────────────────────────────────────────────────────┘
                            ↓
                    ┌───────┴───────────┐
                    │                   │
         ┌──────────▼─────┐  ┌──────────▼─────────┐
         │  TypeScript    │  │  MCP Server        │
         │  CLI (kg)      │  │  (Phase 2)         │
         │  • Ingest      │  │  • Claude Desktop  │
         │  • Jobs        │  │  • Same codebase   │
         └────────────────┘  └────────────────────┘

Core Components

1. API Server Layer (`src/api/`)

FastAPI REST Server (Phase 1): - Routes: Ingestion (POST /ingest), job management (GET/POST /jobs/*) - Services: Job queue (abstract interface), content hasher (deduplication) - Workers: Background ingestion processing with progress updates - Models: Pydantic request/response schemas matching TypeScript client - Middleware: Placeholder authentication (X-Client-ID, X-API-Key headers)

Job Queue Pattern:

# Abstract interface for Phase 1 → Phase 2 migration
class JobQueue(ABC):
    def enqueue(job_type, job_data) -> job_id
    def get_job(job_id) -> JobStatus
    def update_job(job_id, updates) -> None

# Phase 1: InMemoryJobQueue (SQLite persistence)
# Phase 2: RedisJobQueue (distributed workers)

Content Deduplication: - SHA-256 hash of document content + ontology name - Prevents expensive re-ingestion ($50-100 per document) - Returns existing job results if already completed - Force flag to override when intentional

See ADR-012 for detailed design.

2. AI Provider Layer (`src/api/lib/ai_providers.py`)

Modular abstraction for LLM providers:

OpenAI Provider: - Extraction: GPT-4o, GPT-4o-mini, o1-preview, o1-mini - Embeddings: text-embedding-3-small, text-embedding-3-large

Anthropic Provider: - Extraction: Claude Sonnet 4.5, Claude 3.5 Sonnet, Claude 3 Opus - Embeddings: Delegates to OpenAI (Anthropic doesn't provide embeddings)

3. Ingestion Library (`src/api/lib/`)

Components: - parser.py - Document parsing (text, PDF, DOCX) - llm_extractor.py - LLM-based concept extraction - age_client.py - Apache AGE graph database operations - chunker.py - Smart document chunking with semantic boundaries - ingestion.py - Chunk processing and statistics tracking - checkpoint.py - Ingestion checkpoint management

Flow: 1. API Submission: Client POSTs file → API returns job_id 2. Background Processing: Worker pulls job from queue 3. Parse & Chunk: Document → semantic chunks with overlap 4. For each chunk: - Query recent concepts from graph (context) - Extract concepts using LLM - Generate embeddings - Match against existing concepts (vector similarity ≥ 0.85) - Upsert to Apache AGE (create/update nodes and relationships) - Update job progress (percent, concepts created) 5. Complete: Worker writes final stats to job result

4. Client Layer (`client/`)

Unified TypeScript Client (CLI + MCP in one codebase):

Shared Components: - src/types/ - TypeScript interfaces matching FastAPI Pydantic models - src/api/client.ts - HTTP client wrapping REST API endpoints - Configuration: Environment variables (KG_API_URL, KG_CLIENT_ID)

CLI Mode (Phase 1 - Complete): - Commands: kg health, kg ingest file/text, kg jobs status/list/cancel - User experience: Color-coded output, progress spinners, duplicate detection - Installation: Wrapper script (scripts/kg-cli.sh), direct node, or npm link

MCP Server Mode (Phase 2 - Future): - Entry point detects MCP_SERVER_MODE=true environment variable - Runs as MCP server for Claude Desktop/Code - Tools use same API client as CLI - Claude Desktop config: Node.js + environment variables

See ADR-013 for detailed design.

5. Graph Database (Apache AGE + PostgreSQL)

Node Types:

// Concept - Core knowledge unit
(:Concept {
  concept_id: "linear-scanning-system",
  label: "Linear scanning system",
  embedding: [0.123, ...],  // 1536 dims
  search_terms: ["linear thinking", "sequential processing"]
})

// Source - Document location
(:Source {
  source_id: "watts-doc-1-para-4",
  document: "Watts Doc 1",
  paragraph: 4,
  full_text: "..."
})

// Instance - Evidence quote
(:Instance {
  instance_id: "uuid",
  quote: "exact verbatim quote"
})

Relationships:

Structural Relationships (link concepts to evidence): - APPEARS_IN - Concept → Source - EVIDENCED_BY - Concept → Instance - FROM_SOURCE - Instance → Source

Concept Relationships (30 semantically sparse types, see ADR-022):

Categories: - Logical & Truth (4 types): IMPLIES, CONTRADICTS, PRESUPPOSES, EQUIVALENT_TO - Causal (5 types): CAUSES, ENABLES, PREVENTS, INFLUENCES, RESULTS_FROM - Structural (5 types): PART_OF, CONTAINS, COMPOSED_OF, SUBSET_OF, INSTANCE_OF - Evidential (4 types): SUPPORTS, REFUTES, EXEMPLIFIES, MEASURED_BY - Similarity (4 types): SIMILAR_TO, ANALOGOUS_TO, CONTRASTS_WITH, OPPOSITE_OF - Temporal (3 types): PRECEDES, CONCURRENT_WITH, EVOLVES_INTO - Functional (4 types): USED_FOR, REQUIRES, PRODUCES, REGULATES - Meta (2 types): DEFINED_AS, CATEGORIZED_AS

Edge Properties: - confidence (float): LLM confidence score (0.0-1.0) - category (string): Semantic category for query filtering

6. Legacy Query Interfaces

Legacy MCP Server (mcp-server/): - Direct Apache AGE database access - Claude Desktop integration - Tools: search_concepts, get_concept_details, find_related_concepts, etc. - Status: Will migrate to unified TypeScript client (Phase 2)

Note: The legacy Python CLI (scripts/cli.py) has been deprecated and removed in favor of the unified TypeScript client (kg command).

Data Flow

Ingestion Flow (Current Architecture)

Client (kg CLI)
  ↓ POST /ingest (file + ontology)
API Server
  ├→ Calculate SHA-256 hash
  ├→ Check for duplicate (hash + ontology)
  │   ├→ Duplicate found: return existing job result
  │   └→ No duplicate: continue
  ├→ Create job in SQLite
  ├→ Enqueue to in-memory queue
  ├→ Return job_id immediately
  └→ Background worker starts

Background Worker
  ↓ Parse & chunk document
Chunks with context overlap
  ↓ for each chunk
  ├→ Query recent concepts from graph (context for LLM)
  ├→ Extract concepts (LLM)
  │   ├→ concepts: [{id, label, search_terms}]
  │   ├→ instances: [{concept_id, quote}]
  │   └→ relationships: [{from, to, type, confidence}]
  │
  ├→ Generate embeddings (OpenAI)
  │
  ├→ Match existing concepts (vector search)
  │   ├→ similarity ≥ 0.85: use existing
  │   └→ else: create new
  │
  ├→ Upsert to Apache AGE
  │   ├→ CREATE/UPDATE concepts
  │   ├→ CREATE instances
  │   └→ CREATE relationships
  │
  └→ Update job progress (SQLite)
      └→ percent, chunks_processed, concepts_created

Client polls GET /jobs/{job_id}
  ↓ every 2 seconds
Job Status Response
  ├→ status: queued | processing | completed | failed
  ├→ progress: {percent, chunks_processed, concepts_created}
  └→ result: {stats, cost} (if completed)

Query Flow

User Query
  ↓
Generate embedding
  ↓
Vector similarity search
  ↓
MATCH concepts WHERE similarity > threshold
  ↓
OPTIONAL MATCH related concepts
  ↓
Return structured results

Configuration

Environment Variables

API Server (.env):

# AI Provider Selection
AI_PROVIDER=openai  # or "anthropic"

# OpenAI
OPENAI_API_KEY=sk-...
OPENAI_EXTRACTION_MODEL=gpt-4o
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Anthropic (optional)
ANTHROPIC_API_KEY=sk-ant-...
ANTHROPIC_EXTRACTION_MODEL=claude-sonnet-4-20250514

# PostgreSQL + Apache AGE
POSTGRES_HOST=localhost
POSTGRES_PORT=5432
POSTGRES_USER=postgres
POSTGRES_PASSWORD=password
POSTGRES_DB=knowledge_graph

# Authentication (Phase 1: disabled)
AUTH_ENABLED=false
AUTH_REQUIRE_CLIENT_ID=false
AUTH_API_KEYS=  # Comma-separated keys for Phase 2

TypeScript Client:

# API connection
KG_API_URL=http://localhost:8000
KG_CLIENT_ID=my-client
KG_API_KEY=  # Optional, for Phase 2

# Mode selection (CLI vs MCP)
MCP_SERVER_MODE=false  # or "true" for MCP server mode

Concept Matching Algorithm

Multi-stage matching to prevent duplicates:

Stage 1: Exact ID Match - LLM predicted existing concept_id → use it - Confidence: 100%

Stage 2: Vector Similarity (Primary) - Embed: label + search_terms - Cosine similarity search - Threshold ≥ 0.85 → match - Confidence: similarity score

Stage 3: Create New - No match found - Generate new concept_id (kebab-case)

Scalability Considerations

Phase 1 (Current)

API Server: Single FastAPI instance with BackgroundTasks
Job Queue: In-memory dict + SQLite persistence
Database: Single PostgreSQL + Apache AGE instance
Limitations: No distributed workers, no multi-instance API

Phase 2 (Planned)

Job Queue: Redis-based distributed queue
Workers: Separate worker processes (can scale horizontally)
API Server: Multiple instances behind load balancer
Real-time Updates: WebSocket/SSE for job progress
Authentication: Full API key validation and rate limiting

Future Enhancements

PostgreSQL replication for HA
Apache AGE performance optimization
Dedicated vector database (pgvector, Pinecone, Weaviate)
Incremental updates (avoid re-processing)
Caching layer for query results

Security

API Keys

Stored in .env (gitignored)
Never committed to version control
Validated on startup

Database

PostgreSQL auth required (no anonymous access)
Apache AGE graph access via PostgreSQL roles
Local development: simple password
Production: strong auth + TLS

Testing Strategy

Unit Tests

AI provider abstraction
Concept matching logic
Graph queries

Integration Tests

End-to-end ingestion
MCP tool functionality
CLI commands

Manual Testing

Use sample Watts documents
Verify concept extraction quality
Test relationship accuracy