Concept: Why Knowledge Graphs, Not Just RAG

The Problem with Text Retrieval

Traditional Retrieval-Augmented Generation (RAG) systems work by: 1. Breaking documents into chunks 2. Creating vector embeddings for each chunk 3. Finding chunks similar to a query 4. Stuffing those chunks into context 5. Hoping the LLM can figure it out

This works... sometimes. But it has fundamental limitations:

Ephemeral Knowledge Every query rebuilds understanding from scratch. There's no persistent structure, no accumulated insight. Each search is like reading the document for the first time.

Similarity ≠ Understanding Vector similarity finds "related text" but doesn't understand how ideas relate. Does concept A support concept B? Contradict it? Depend on it? RAG can't tell you.

No Cross-Document Synthesis RAG treats documents as silos. If two papers discuss the same concept using different terminology, RAG won't connect them unless the vectors happen to align.

Lost Provenance When you get an answer, where did it come from? Which specific quote? From what context? RAG gives you chunks, not citations.

No Traversal You can't ask "show me what connects to this" or "explore related concepts." RAG is search-only, not exploration.

The Knowledge Graph Approach

A knowledge graph system thinks about documents differently:

Concepts are First-Class Entities Instead of "chunk 47 from document X," you have: - Label: "Requisite Variety" - Search terms: ["Ashby's Law", "system control", "variety matching"] - Relationships: SUPPORTS → "AI Sandwich Systems Model" - Evidence: 3 source quotes with exact paragraph references

Relationships Model Understanding The system captures how ideas connect: - Concept A IMPLIES Concept B - Concept C CONTRADICTS Concept D - Concept E SUPPORTS Concept F with 0.85 confidence

These aren't just links—they represent the document's argument structure.

Persistent, Growing Knowledge Once extracted, concepts persist. New documents add to the graph. Similar concepts merge automatically. The graph becomes smarter with each document ingested.

Evidence-Based Retrieval Every concept links to source quotes:

Concept: "Value of Uselessness"
Evidence: "The whole notion of something of life...being useful...
           is to a Taoist absurd."
Source: Watts Taoism 02, paragraph 1

Graph Traversal You can explore: - "What supports this concept?" - "What does this contradict?" - "Show me the evidence chain" - "Find concepts 2 hops away"

What This Enables

For Humans

Exploration, Not Just Search Start with one concept, traverse relationships, discover connections you didn't know to look for.

Provenance & Trust Every claim traces back to specific quotes. You can verify, not just trust.

Concept Maps Visualize how ideas connect across an entire document or corpus.

For LLMs

Semantic Grounding Instead of "here's some similar text," the LLM gets: - "Here's the concept of Requisite Variety" - "It SUPPORTS the AI Sandwich model" - "Evidence: [exact quotes]" - "It's related to these 5 other concepts"

Relationship Awareness The LLM can reason about how concepts connect, not just what they say.

Multi-Document Synthesis Concepts from different sources automatically link, enabling cross-reference reasoning.

The Hybrid Architecture

This system combines three approaches:

Vector Search - Find concepts semantically similar to a query
Graph Traversal - Explore relationships between concepts
Full-Text Search - Find exact quotes or terminology

RAG only has #1. This system has all three.

What We're Not Claiming

This is not: - A replacement for reading - Perfect extraction (LLMs make mistakes) - A solved problem (this is experimental) - The only way to do knowledge graphs

This is: - A different paradigm: persistent concepts vs ephemeral retrieval - A synthesis of LLM extraction + graph storage + semantic search - An experiment in what becomes possible when you model ideas, not just text

When to Use Each

Use RAG when: - You need quick, one-off queries - Documents are homogeneous and well-structured - You don't need to understand relationships - You're okay rebuilding context every time

Use Knowledge Graphs when: - You're building long-term knowledge bases - Relationships between ideas matter - You need provenance and evidence tracking - You want to explore, not just retrieve - You're synthesizing across multiple documents

The Vision

Imagine ingesting: - Your entire codebase (concepts = architectural decisions, components, dependencies) - Research paper collections (concepts = theories, findings, methodologies) - Company documentation (concepts = policies, procedures, best practices) - Historical texts (concepts = events, figures, philosophical ideas)

Then querying: - "Show me all architectural decisions related to authentication" - "What research findings contradict the embodied cognition hypothesis?" - "Trace the evolution of our deployment policy across all versions" - "How do Stoic and Taoist concepts of acceptance relate?"

Not just finding similar text. Understanding the knowledge.

Implementation Reality

This system: - Uses LLMs for extraction (GPT-4, Claude, etc.) - Stores concepts in Neo4j with vector embeddings - Deduplicates via vector similarity (concepts merge across documents) - Preserves evidence links to source quotes - Provides multiple query interfaces (MCP, CLI, Neo4j Browser)

It's not magic. It's structured extraction + graph storage + semantic retrieval.

But the combination creates something qualitatively different from RAG.

The goal isn't to replace RAG. It's to explore what becomes possible when we move from retrieving text to modeling knowledge.