Understanding: Vocabulary Lifecycle & Grounded LLM Decisions
How and why the vocabulary management system works.
The Problem
As documents are ingested, the LLM invents relationship types organically — IMPLIES, SUPPORTS, HISTORICALLY_PRECEDED, etc. Over time the vocabulary sprawls: near-duplicates appear, some types accumulate zero edges, others become structurally irrelevant. Something needs to govern this lifecycle.
The Key Insight
The system computes objective mathematical scores for every relationship type: grounding strength, similarity to other types, edge counts, traversal frequency, bridge importance, polarity positioning, epistemic status. These numbers describe reality.
A human reviewing vocabulary decisions would look at these same numbers and compute a function: "low value, no bridges, similar to another type → merge." If the human ignored the numbers and went with gut feeling, they'd be inconsistent, biased, and worse than the math.
An LLM, given the same numbers as context, computes the same function — but also brings reasoning that a pure threshold cannot: "these types look numerically similar but capture different semantic intent." This is the same external reasoning a human would bring, and a capable LLM does it at comparable quality.
Therefore: math grounds the LLM, the LLM reasons over the math, and the combination is more consistent than a human in the loop.
The human's role is bringing external information into the graph (new documents, new knowledge domains), not reviewing vocabulary hygiene decisions that the system can make objectively.
How Consolidation Works
The consolidation pipeline follows a loop: score candidates, ask the LLM, execute the decision, re-score (since the vocabulary changed), repeat.
sequenceDiagram
participant CLI as CLI / API Client
participant VM as VocabularyManager
participant SD as SynonymDetector
participant PS as PruningStrategy
participant LLM as GPT-4o (Reasoning)
participant DB as PostgreSQL + AGE
CLI->>VM: consolidate(target, threshold, dry_run)
VM->>DB: Query active vocabulary types
DB-->>VM: Types with edge counts
alt vocab_size > target
loop Until target reached or no candidates
VM->>SD: find_synonyms(types)
SD->>DB: Fetch embeddings (768-dim nomic)
SD-->>VM: Ranked synonym candidates
VM->>PS: evaluate_synonym(candidate, scores)
PS->>LLM: Structured prompt (similarity, edges, usage)
LLM-->>PS: MergeDecision(should_merge, reasoning)
PS-->>VM: ActionRecommendation
alt should_merge and not dry_run
VM->>DB: UPDATE edges SET type = target
VM->>DB: SET deprecated type inactive
end
VM->>DB: Re-query vocabulary (state changed)
end
end
VM->>VM: Prune zero-usage custom types
VM-->>CLI: ConsolidationResult
Key behaviors:
- Target-gated: Live mode only works when vocab size exceeds the target.
--target 90 with 63 types is a no-op. Use a target below current size to
trigger merges, or --dry-run to preview candidates regardless of target.
- Re-query loop: After each merge, the vocabulary state changes — types
disappear, edge counts shift. Live mode re-fetches similarity before the
next candidate. Dry-run evaluates all candidates against the initial state.
- Prune after merge: Zero-usage custom types are removed after the merge
loop completes. Builtin types are never pruned.
Decision Model
Candidates route through three tiers based on similarity and the LLM's judgment:
flowchart TD
A[Synonym Candidate] --> B{Similarity >= 0.90?}
B -->|Yes| C{Zero edges on<br/>deprecated type?}
C -->|Yes| D[Auto-prune]
C -->|No| E[LLM Evaluate]
B -->|No| F{Similarity >= 0.70?}
F -->|Yes| E
F -->|No| G[Skip - not synonyms]
E --> H{LLM Decision}
H -->|Merge| I[Execute merge<br/>Update edges + deactivate]
H -->|Skip| J[Reject with reasoning]
D --> K[Remove from vocabulary]
I --> L[Re-query vocabulary]
L --> A
style D fill:#2d5a27,color:#fff
style I fill:#2d5a27,color:#fff
style G fill:#4a4a4a,color:#fff
style J fill:#4a4a4a,color:#fff
style E fill:#1a3a5c,color:#fff
The LLM receives a structured prompt with embedding similarity, edge counts, and usage context. It returns a merge/skip decision with reasoning. The reasoning is displayed in the CLI and stored in the audit trail. The LLM correctly handles cases that pure thresholds miss: - Directional inverses (HAS_PART/PART_OF look similar but are opposites) - Semantic distinctions (CONTRASTS_WITH/EQUIVALENT_TO share embedding space) - Genuinely redundant types (DEFINED_AS/DEFINED, INCREASES/ENHANCES)
Embedding Architecture
All embedding generation flows through a single consistent path. The reasoning provider (used for extraction and consolidation decisions) and the embedding provider (used for vector similarity) are separate but connected:
flowchart LR
subgraph Callers
SW[SynonymDetector]
CC[CategoryClassifier]
EW[EmbeddingWorker]
ING[Ingestion Pipeline]
end
subgraph Provider Layer
GP["get_provider()"] --> RP[ReasoningProvider<br/>e.g. OpenAI GPT-4o]
RP -->|".embedding_provider"| LP
GEP["get_embedding_provider()"] --> LP[LocalEmbeddingProvider<br/>nomic-embed-text-v1.5]
end
subgraph Output
LP -->|sync call| EMB["768-dim embedding"]
end
SW --> GP
CC --> GP
EW --> GP
ING --> GEP
style LP fill:#1a3a5c,color:#fff
style EMB fill:#2d5a27,color:#fff
style RP fill:#4a3a1c,color:#fff
Design constraints:
- generate_embedding() is sync across all providers — never awaited.
- OpenAIProvider.generate_embedding() delegates to self.embedding_provider.
If none is configured, it raises RuntimeError (no silent fallback to
OpenAI's text-embedding-3-small — that would create a dimension mismatch).
- Dimension guards in SynonymDetector._cosine_similarity() catch mismatches
at comparison time with a clear error message.
- Stale embedding detection in _get_edge_type_embedding() auto-regenerates
when a cached embedding's dimensions don't match expectations.
CLI Commands
kg vocab list # Show all types with categories, edges, status
kg vocab consolidate # Execute merges + prune (live)
kg vocab consolidate --dry-run # Preview candidates without executing
kg vocab consolidate --target N # Set target vocab size (default 90)
kg vocab consolidate --threshold 0.85 # LLM auto-execute threshold
kg vocab merge TYPE_A TYPE_B # Manual merge
kg job list -s pending # Status aliases: pending, running, done, failed
kg job cleanup -s done --confirm # Delete completed jobs
Status aliases (pending → awaiting_approval, running → processing,
done → completed) are resolved by a shared resolveStatusFilter() utility
used across all job subcommands.
Known Limitations
- Pending reviews lack persistence:
get_pending_reviews()andapprove_action()exist but are in-memory only. Low priority since the primary path is fully automated.