ADR-032: Automatic Edge Vocabulary Expansion with Intelligent Pruning
Status: Proposed Date: 2025-10-15 Deciders: System Architects Related: ADR-022 (30-Type Taxonomy), ADR-025 (Dynamic Vocabulary), ADR-026 (Autonomous Curation)
Overview
When your vocabulary grows dynamically, you need a strategy to prevent it from becoming unwieldy. Imagine your knowledge graph learns 200 different relationship types—by the time you're prompting the AI with a list of all these options, it gets confused and extraction quality plummets. You need a way to let vocabulary expand when needed but also keep it focused and manageable.
This ADR introduces the concept of vocabulary as a self-regulating cache. When the AI encounters a new relationship type (like "OPTIMIZES" in machine learning documents), the system automatically adds it—no manual approval needed. But the system also tracks usage: types that get used frequently stay in the active vocabulary, while types that were created but never used again (perhaps "CALIBRATES_AGAINST" appeared once in a weird sentence) get automatically pruned. Think of it like your brain learning new words—you remember the ones you use regularly and forget the obscure terms you encountered once. The system maintains a "sweet spot" vocabulary size (30-90 types) through intelligent decisions about what to keep and what to remove, using metrics like usage frequency, semantic similarity to existing types, and how well-grounded the relationships are in evidence. This creates a vocabulary that's both adaptive and self-cleaning.
Context
The current system uses a static 30-type relationship vocabulary defined in src/api/constants.py. While ADR-025 and ADR-026 propose dynamic vocabulary management with manual curator approval, this creates a bottleneck during high-volume ingestion.
Current Limitations
Static Vocabulary (ADR-022):
RELATIONSHIP_TYPES = {
'IMPLIES', 'SUPPORTS', 'CONTRADICTS', 'CAUSES', 'ENABLES',
# ... 25 more fixed types
}
Problems:
1. Ingestion Blocking: Novel edge types from LLM extraction are rejected
2. Lost Semantics: Domain-specific relationships (e.g., TRAINS_ON, OPTIMIZES for ML) get mapped to generic types or skipped
3. Manual Bottleneck: Every new type requires code change and deployment
4. No Self-Regulation: Vocabulary can only grow, never shrink
ADR-025 Proposed Flow (Not Implemented):
LLM extracts "OPTIMIZES" → Skipped → Logged to skipped_relationships
→ Curator reviews → Curator approves → Type added → Backfill process
This works but doesn't scale for rapid iteration or domain-specific ontologies.
Core Insight
Vocabulary should behave like a self-regulating cache: - Auto-expand on first use (like cache miss → fetch) - Value-based retention (frequently used types stay, unused types pruned) - Sliding window (30-90 types, tunable) - Intelligent pruning (AI or human decides what to remove when limit reached)
Decision
Implement automatic edge vocabulary expansion with three-tier intelligent pruning.
Architecture: Proactive Expansion + Reactive Pruning
1. Auto-Expansion During Ingestion
def upsert_relationship(from_id, to_id, rel_type, confidence):
"""
Auto-expand vocabulary on first use.
"""
# 1. Check if type exists in vocabulary
canonical_type, category = normalize_relationship_type(rel_type)
if canonical_type:
# Known type or fuzzy match
create_graph_edge(from_id, to_id, canonical_type, confidence)
increment_usage_count(canonical_type)
else:
# Unknown type - AUTO-EXPAND VOCABULARY
if is_valid_edge_type(rel_type): # Basic validation
# Add to vocabulary immediately
add_to_vocabulary(
relationship_type=rel_type,
category=infer_category(rel_type), # LLM-assisted
description=f"Auto-added during ingestion",
added_by="system:auto-expansion",
is_builtin=False,
is_active=True
)
# Create edge
create_graph_edge(from_id, to_id, rel_type, confidence)
# Log expansion
log_vocabulary_expansion(rel_type, context={
"from_concept": get_label(from_id),
"to_concept": get_label(to_id),
"job_id": current_job_id
})
# Check if pruning needed
if get_active_vocabulary_size() > VOCAB_MAX:
trigger_pruning_workflow()
else:
# Invalid type (e.g., profanity, malformed)
log_rejected_type(rel_type, reason="validation_failed")
Validation Rules:
- Uppercase alphanumeric + underscores only
- Length: 3-50 characters
- Not in blacklist (profanity, reserved terms)
- Not reverse form (_BY suffix rejected)
Category Classification for New Edge Types
Note: Category classification is now handled by ADR-047: Probabilistic Vocabulary Categorization using embedding similarity to seed types with satisficing (max similarity). The approach below is superseded.
Two-Tier Vocabulary Structure:
# High-level categories (8 protected groups from ADR-022, refined in ADR-047)
RELATIONSHIP_CATEGORIES = {
"logical_truth": ["IMPLIES", "CONTRADICTS", "PRESUPPOSES", "EQUIVALENT_TO"],
"causal": ["CAUSES", "ENABLES", "PREVENTS", "INFLUENCES", "RESULTS_FROM"],
"structural": ["PART_OF", "CONTAINS", "COMPOSED_OF", "SUBSET_OF", "INSTANCE_OF"],
"evidential": ["SUPPORTS", "REFUTES", "EXEMPLIFIES", "MEASURED_BY"],
"similarity": ["SIMILAR_TO", "ANALOGOUS_TO", "CONTRASTS_WITH", "OPPOSITE_OF"],
"temporal": ["PRECEDES", "CONCURRENT_WITH", "EVOLVES_INTO"],
"functional": ["USED_FOR", "REQUIRES", "PRODUCES", "REGULATES"],
"meta": ["DEFINED_AS", "CATEGORIZED_AS"],
}
Category Assignment Algorithm:
When a new edge type is auto-added, it must be classified into an existing category:
def infer_category(new_edge_type):
"""
Classify new edge type into existing category using semantic analysis.
Only create new category if confidence is extremely low (<0.3) for ALL categories.
"""
# Get embeddings for the new type
new_embedding = generate_embedding(new_edge_type)
# Calculate similarity to each category
category_scores = {}
for category, existing_types in RELATIONSHIP_CATEGORIES.items():
# Average similarity to all types in this category
similarities = []
for existing_type in existing_types:
existing_embedding = generate_embedding(existing_type)
similarity = cosine_similarity(new_embedding, existing_embedding)
similarities.append(similarity)
category_scores[category] = {
"avg_similarity": np.mean(similarities),
"max_similarity": np.max(similarities),
"confidence": np.mean(similarities) # Use average for robustness
}
# Find best-fit category
best_category = max(category_scores.items(), key=lambda x: x[1]["confidence"])
best_confidence = best_category[1]["confidence"]
# HIGH BAR: Only create new category if confidence < 0.3 for ALL categories
if best_confidence < 0.3:
# Extremely poor fit to all existing categories
return propose_new_category(new_edge_type, category_scores)
else:
# Assign to best-fit category
return best_category[0]
New Category Creation (High Bar):
def propose_new_category(new_edge_type, category_scores):
"""
Propose a new high-level category (requires curator approval).
HIGH BAR: Only if confidence < 0.3 for ALL existing categories.
"""
# Generate category name via LLM reasoning
proposal = {
"new_category_name": suggest_category_name(new_edge_type),
"trigger_type": new_edge_type,
"poor_fit_evidence": {
cat: scores["confidence"]
for cat, scores in category_scores.items()
},
"reasoning": generate_category_justification(new_edge_type, category_scores),
"status": "awaiting_curator_approval"
}
# Log proposal
store_category_proposal(proposal)
# FALLBACK: Temporarily assign to closest category (even if poor fit)
fallback_category = max(category_scores.items(), key=lambda x: x[1]["confidence"])[0]
notify_curator_new_category_proposal(proposal)
return fallback_category # Use fallback until approved
Example LLM Category Reasoning:
prompt = f"""
Analyze the relationship type "{new_edge_type}" and determine if it fits existing categories:
EXISTING CATEGORIES:
- logical_truth: Logical entailment, contradiction, equivalence
- causal: Cause-effect relationships, enablement
- structural: Part-whole, composition, hierarchies
- evidential: Evidence, support, examples
- similarity: Likeness, analogy, contrast
- temporal: Time-based sequences, evolution
- functional: Purpose, requirements, usage
- meta: Definitions, categorizations
CONFIDENCE SCORES:
{json.dumps(category_scores, indent=2)}
All scores < 0.3 suggest poor fit to existing categories.
Should we create a NEW category? If yes:
1. Suggest category name (e.g., "transformation", "attribution")
2. Explain semantic distinction from existing categories
3. Predict other edge types that would belong to this category
Return JSON:
{{
"create_new_category": true|false,
"suggested_name": "category_name",
"semantic_distinction": "Why this doesn't fit existing categories",
"predicted_members": ["OTHER_TYPE_1", "OTHER_TYPE_2"],
"confidence": 0.0-1.0
}}
"""
Category Lifecycle Management:
Just like edge types, categories can be merged:
def merge_categories(source_category, target_category):
"""
Merge two high-level categories.
Example: "transformation" + "temporal" → "temporal" (evolution is temporal)
"""
# Move all edge types from source to target
source_types = RELATIONSHIP_CATEGORIES[source_category]
for edge_type in source_types:
# Update edge type metadata
update_edge_category(edge_type, target_category)
# Update category registry
RELATIONSHIP_CATEGORIES[target_category].extend(source_types)
del RELATIONSHIP_CATEGORIES[source_category]
# Audit trail
log_category_merge(source_category, target_category, len(source_types))
Category Protection Rules:
CATEGORY_PROTECTION = {
"builtin_categories": [
"logical_truth", "causal", "structural", "evidential",
"similarity", "temporal", "functional", "meta"
],
"min_categories": 8, # Never drop below original 8
"max_categories": 15, # HIGH BAR: only 7 additional categories allowed
}
def can_add_category(proposed_name):
"""Check if new category creation is allowed."""
current_count = len(RELATIONSHIP_CATEGORIES)
if current_count >= CATEGORY_PROTECTION["max_categories"]:
# At limit - must merge existing categories first
return False, "Category limit reached (15/15). Merge existing categories first."
return True, "Category creation allowed"
Curator Workflow for Categories:
# Review new category proposals
kg vocab categories review
# Output:
┌─────────────────────────────────────────────────────────────┐
│ Pending Category Proposal │
├─────────────────────────────────────────────────────────────┤
│ Category: "transformation" │
│ Triggered by: TRANSFORMS │
│ │
│ Poor Fit Evidence: │
│ • temporal: 0.28 (closest, but not temporal sequence) │
│ • causal: 0.22 (not pure cause-effect) │
│ • structural: 0.19 (not composition) │
│ │
│ AI Reasoning: │
│ "TRANSFORMS implies state change without implying cause or │
│ temporal sequence. Distinct from EVOLVES_INTO (temporal) │
│ and CAUSES (causal). Predicted members: CONVERTS, │
│ TRANSMUTES, MORPHS_INTO." │
│ │
│ [A]pprove | [R]eject | [M]erge into existing category │
└─────────────────────────────────────────────────────────────┘
# Approve new category
kg vocab categories approve transformation
# Or merge into existing
kg vocab categories merge transformation --into temporal \
--reason "Transformation is a form of temporal evolution"
# View category stats
kg vocab categories list
# Output:
┌────────────────┬───────────────┬─────────────────┐
│ Category │ Edge Types │ Total Edges │
├────────────────┼───────────────┼─────────────────┤
│ causal │ 5 builtin │ 1,247 edges │
│ │ 3 custom │ │
├────────────────┼───────────────┼─────────────────┤
│ structural │ 5 builtin │ 892 edges │
│ │ 1 custom │ │
├────────────────┼───────────────┼─────────────────┤
│ transformation │ 0 builtin │ 34 edges (NEW) │
│ │ 3 custom │ │
└────────────────┴───────────────┴─────────────────┘
Aggressiveness Curve for Categories:
Categories also have a sliding window, but with tighter limits:
CATEGORY_WINDOW = {
'min': 8, # Original 8 categories (protected)
'max': 15, # Maximum 15 categories
'merge_threshold': 12, # Start flagging merge opportunities
}
# When at 12+ categories, flag merge opportunities
if len(RELATIONSHIP_CATEGORIES) >= 12:
merge_suggestions = detect_category_merge_opportunities()
notify_curator_category_merge_suggestions(merge_suggestions)
2. Sliding Window Parameters
VOCABULARY_WINDOW = {
'min': 30, # Protected core (builtin types)
'max': 90, # Soft limit (trigger pruning)
'hard_limit': 200, # Emergency stop (block new types)
'prune_batch_size': 5, # Prune N types per trigger
}
# Tunable via API/config
def set_vocabulary_limits(min_types, max_types):
"""Adjust sliding window (requires curator/admin role)"""
update_config('vocab_min', min_types)
update_config('vocab_max', max_types)
Window Behavior: - Below min (30): Never prune builtin types - Between min-max (30-90): Stable operating range - Above max (90+): Trigger pruning workflow - Above hard limit (200): Block new types, force human intervention
3. Aggressiveness Curve: Graduated Response System
Problem: Reactive pruning (wait until limit hit → prune) causes frequent optimization invocations and system instability.
Solution: Graduated aggressiveness curve using Cubic Bezier interpolation (same as CSS animations), configurable via control points.
Cubic Bezier Aggressiveness Curve
class CubicBezier:
"""
Cubic Bezier curve for smooth, tunable aggressiveness.
Same math as CSS cubic-bezier(x1, y1, x2, y2).
"""
def __init__(self, x1, y1, x2, y2):
self.x1, self.y1 = x1, y1
self.x2, self.y2 = x2, y2
def bezier(self, t):
"""Calculate Bezier value at t (0.0 to 1.0)"""
# Cubic Bezier formula: B(t) = (1-t)³P₀ + 3(1-t)²tP₁ + 3(1-t)t²P₂ + t³P₃
# Where P₀ = (0, 0), P₃ = (1, 1) are fixed endpoints
cx = 3 * self.x1
bx = 3 * (self.x2 - self.x1) - cx
ax = 1 - cx - bx
cy = 3 * self.y1
by = 3 * (self.y2 - self.y1) - cy
ay = 1 - cy - by
return ((ay * t + by) * t + cy) * t
def solve_x(self, x, epsilon=1e-6):
"""Find t value for given x using Newton-Raphson"""
# Binary search for t where bezier_x(t) ≈ x
t = x
for _ in range(8): # Newton iterations
x_guess = ((((1 - 3 * self.x2 + 3 * self.x1) * t +
(3 * self.x2 - 6 * self.x1)) * t +
(3 * self.x1)) * t)
if abs(x_guess - x) < epsilon:
break
# Derivative for Newton step
dx = (3 * (1 - 3 * self.x2 + 3 * self.x1) * t * t +
2 * (3 * self.x2 - 6 * self.x1) * t +
(3 * self.x1))
if abs(dx) < epsilon:
break
t -= (x_guess - x) / dx
return t
def get_y_for_x(self, x):
"""Get aggressiveness (y) for vocabulary position (x)"""
if x <= 0:
return 0
if x >= 1:
return 1
t = self.solve_x(x)
return self.bezier(t)
# Predefined curve profiles (like CSS ease functions)
AGGRESSIVENESS_CURVES = {
"linear": CubicBezier(0.0, 0.0, 1.0, 1.0), # Constant rate
"ease": CubicBezier(0.25, 0.1, 0.25, 1.0), # CSS ease (default)
"ease-in": CubicBezier(0.42, 0.0, 1.0, 1.0), # Slow start, fast end
"ease-out": CubicBezier(0.0, 0.0, 0.58, 1.0), # Fast start, slow end
"ease-in-out": CubicBezier(0.42, 0.0, 0.58, 1.0), # Smooth S-curve
"aggressive": CubicBezier(0.1, 0.0, 0.9, 1.0), # Sharp acceleration near limit
"gentle": CubicBezier(0.5, 0.5, 0.5, 0.5), # Very gradual
"exponential": CubicBezier(0.7, 0.0, 0.84, 0.0), # Explosive near limit
}
# Configuration (tunable via API)
AGGRESSIVENESS_PROFILE = os.getenv("VOCAB_AGGRESSIVENESS", "aggressive")
def calculate_aggressiveness(current_size):
"""
Calculate aggressiveness (0.0-1.0) using Bezier curve.
Args:
current_size: Current vocabulary size
Returns:
float: Aggressiveness value (0.0 = passive, 1.0 = emergency)
"""
VOCAB_MIN = 30
VOCAB_MAX = 90
EMERGENCY = 200
if current_size <= VOCAB_MIN:
return 0.0 # Comfort zone
if current_size >= EMERGENCY:
return 1.0 # Hard limit
# Normalize position: 0.0 (at min) → 1.0 (at max)
position = (current_size - VOCAB_MIN) / (VOCAB_MAX - VOCAB_MIN)
position = max(0.0, min(1.0, position)) # Clamp to [0, 1]
# Apply Bezier curve
curve = AGGRESSIVENESS_CURVES[AGGRESSIVENESS_PROFILE]
aggressiveness = curve.get_y_for_x(position)
# Boost aggressiveness if beyond soft limit
if current_size > VOCAB_MAX:
overage = (current_size - VOCAB_MAX) / (EMERGENCY - VOCAB_MAX)
aggressiveness = aggressiveness + (1.0 - aggressiveness) * overage
return aggressiveness
def calculate_optimization_strategy(current_size):
"""
Determine pruning strategy based on vocabulary size and aggressiveness curve.
Returns (action, aggressiveness, batch_size)
"""
VOCAB_MAX = 90
EMERGENCY = 200
aggressiveness = calculate_aggressiveness(current_size)
# Map aggressiveness to action zones
if aggressiveness < 0.2:
# 0-20%: Comfort zone, just monitor
return ("monitor", aggressiveness, 0)
elif aggressiveness < 0.5:
# 20-50%: Watch zone, flag merge opportunities
return ("watch", aggressiveness, 0)
elif aggressiveness < 0.7:
# 50-70%: Merge zone, prefer synonym merging
batch_size = max(1, ceil(aggressiveness * 10))
return ("merge", aggressiveness, batch_size)
elif aggressiveness < 0.9:
# 70-90%: Mixed zone, merge + prune
batch_size = max(2, ceil(aggressiveness * 15))
return ("mixed", aggressiveness, batch_size)
elif current_size < EMERGENCY:
# 90-100%: Emergency zone
batch_size = max(5, current_size - VOCAB_MAX + 5)
return ("emergency", aggressiveness, batch_size)
else:
# Hard limit reached
return ("block", 1.0, 0)
Curve Profiles Visualization:
Aggressiveness (y)
1.0 ┤ ╭─────── exponential
│ ╭───╯
0.9 ┤ ╭───╯
│ ╭───╯
0.8 ┤ ╭───╯ ╭──── aggressive
│ ╭───╯ ╭──╯
0.7 ┤ ╭───╯ ╭───╯
│ ╭───╯ ╭───╯ ╭───── ease-in-out
0.6 ┤ ╭───╯ ╭───╯ ╭───╯
│ ╭───╯ ╭───╯ ╭───╯
0.5 ┤╭───╯ ╭───╯ ╭───╯ ╭────── linear
│╯ ╭───╯ ╭───╯ ╭───╯
0.4 ┤ ╭───╯ ╭───╯ ╭───╯
│╭──╯ ╭───╯ ╭───╯
0.3 ┤╯ ╭───╯ ╭───╯ ╭──────── gentle
│╭───╯ ╭───╯ ╭───╯
0.2 ┤╯ ╭───╯ ╭───╯
│ ╭───╯ ╭───╯
0.1 ┤╭──╯ ╭───╯
│╯ ╭───╯
0.0 ┼─────────┴────────────────────────────────────────
30 45 60 75 90 (vocab size)
min comfort max
Configuration & Tuning:
# List available profiles
kg vocab config profiles
# Output:
# Available aggressiveness profiles:
# linear - Constant rate increase
# ease - Balanced (CSS default)
# ease-in - Slow start, fast end
# ease-out - Fast start, slow end
# ease-in-out - Smooth S-curve
# aggressive - Sharp near limit (RECOMMENDED)
# gentle - Very gradual
# exponential - Explosive near limit
# Set profile
kg vocab config set aggressiveness aggressive
# View current curve
kg vocab config show aggressiveness
# Output:
# Current profile: aggressive
# Bezier control points: (0.1, 0.0, 0.9, 1.0)
#
# Behavior:
# 30-60: Very gradual (10-20% aggressive)
# 60-75: Moderate (20-40% aggressive)
# 75-85: Accelerating (40-70% aggressive)
# 85-90: Sharp rise (70-95% aggressive)
# 90+: Emergency (95-100% aggressive)
# Custom curve (advanced)
kg vocab config set aggressiveness-custom 0.2,0.1,0.8,0.95
# Test curve without applying
kg vocab simulate --profile gentle --vocab-range 30-95
Curve Selection Guide:
| Profile | Use Case | Behavior |
|---|---|---|
aggressive |
Production (default) | Stay passive until 75, then accelerate sharply |
ease-in-out |
Balanced environments | Smooth S-curve, predictable |
gentle |
High-churn ontologies | Very gradual, minimizes disruption |
exponential |
Strict capacity limits | Explosive response near limit |
linear |
Testing/debugging | Constant rate, easy to predict |
Strategy Zones:
30 60 75 85 90 200
├─────────┼─────────┼─────────┼───┼─────────────────┤
│ COMFORT │ WATCH │ MERGE │ M │ EMERGENCY │
│ (0%) │ (10-30%)│ (30-60%)│I X│ (90-100%) │
│ │ │ │ E │ │
│ No │ Detect │ Prefer │ D │ Aggressive │
│ Action │ Only │ Merging │ │ Pruning │
└─────────┴─────────┴─────────┴───┴─────────────────┘
│
Soft Limit
Decision Logic: Merge vs Prune
def select_optimization_action(current_size, candidates):
"""
Determine whether to merge or prune based on zone and available options.
"""
action_type, aggressiveness, batch_size = calculate_optimization_strategy(current_size)
if action_type == "monitor":
# Just flag opportunities for curator review
synonym_pairs = detect_synonym_opportunities()
if synonym_pairs:
log_merge_opportunities(synonym_pairs, action="flag_only")
return None # Don't act yet
elif action_type == "merge":
# PREFER merging (preserves edges, reduces vocabulary)
synonym_pairs = detect_synonym_opportunities()
if synonym_pairs:
# Select top N pairs by aggressiveness
pairs_to_merge = synonym_pairs[:batch_size]
return {
"action": "merge",
"pairs": pairs_to_merge,
"reason": f"Proactive merging in merge zone ({current_size}/{VOCAB_MAX})"
}
else:
# No merge candidates, prune zero-edge types only
zero_edge_types = [c for c in candidates if c.edge_count == 0]
if zero_edge_types:
return {
"action": "prune",
"types": zero_edge_types[:batch_size],
"reason": "No merge candidates, safe zero-edge pruning"
}
else:
# Can't merge or prune safely - escalate
return {"action": "escalate", "reason": "No safe optimization available"}
elif action_type == "mixed":
# Try both: merge high-similarity pairs AND prune zero-edge types
synonym_pairs = detect_synonym_opportunities()
zero_edge_types = [c for c in candidates if c.edge_count == 0]
actions = []
if synonym_pairs:
actions.append({
"action": "merge",
"pairs": synonym_pairs[:max(2, batch_size // 2)]
})
if zero_edge_types:
actions.append({
"action": "prune",
"types": zero_edge_types[:max(2, batch_size // 2)]
})
if actions:
return {
"action": "mixed",
"sub_actions": actions,
"reason": f"Mixed optimization in prune zone ({current_size}/{VOCAB_MAX})"
}
else:
# Last resort: prune low-value types with edges
return {
"action": "prune",
"types": candidates[:batch_size],
"reason": "Emergency pruning: all safe options exhausted"
}
elif action_type == "emergency":
# Aggressive: prune anything low-value, merge anything similar
return {
"action": "emergency_prune",
"types": candidates[:batch_size],
"reason": f"Emergency: vocabulary at {current_size}/{VOCAB_MAX}"
}
elif action_type == "block":
# Hard stop
raise VocabularyLimitExceeded(
f"Hard limit reached ({current_size}/{EMERGENCY}). "
f"Manual curator intervention required."
)
Merge vs Prune Decision Tree:
┌─────────────────────────────────────────────┐
│ Need to reduce vocabulary by N types │
└─────────────────┬───────────────────────────┘
│
▼
┌────────────────┐
│ Check synonyms │
└────────┬───────┘
│
┌────────┴────────┐
│ │
[Synonyms Found] [No Synonyms]
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ MERGE pairs │ │ Check zero- │
│ (preserves │ │ edge types │
│ edges) │ └──────┬───────┘
└──────┬───────┘ │
│ ┌─────┴──────┐
│ │ │
│ [Found] [None Found]
│ │ │
│ ▼ ▼
│ ┌──────────┐ ┌──────────┐
│ │ PRUNE │ │ PRUNE │
│ │ zero-edge│ │ low-value│
│ │ (safe) │ │ (lossy) │
│ └────┬─────┘ └────┬─────┘
│ │ │
└──────────┴─────────────┘
│
▼
┌───────────────┐
│ Batch actions │
│ to reduce │
│ invocations │
└───────────────┘
Batching Strategy:
Instead of: "Hit 91 → prune 1 → hit 91 again → prune 1 → repeat"
Do this: "Hit 90 → prune/merge 5 → back to 85 → comfortable for longer"
def execute_batched_optimization(current_size):
"""
Batch optimizations to reduce invocation frequency.
"""
if current_size <= VOCAB_MAX:
return # No action needed
# Calculate how much to prune
excess = current_size - VOCAB_MAX
buffer = 5 # Create buffer to avoid immediate re-trigger
target_reduction = excess + buffer # Remove more than minimum
# Get optimization strategy
strategy = select_optimization_action(current_size, get_candidates())
if strategy["action"] == "merge":
# Merging: each pair removes 1 type from active vocabulary
pairs_needed = target_reduction
execute_merges(strategy["pairs"][:pairs_needed])
elif strategy["action"] == "mixed":
# Do both (more efficient)
merges_completed = execute_merges(strategy["sub_actions"][0]["pairs"])
remaining = target_reduction - merges_completed
execute_prunes(strategy["sub_actions"][1]["types"][:remaining])
elif strategy["action"] == "prune":
execute_prunes(strategy["types"][:target_reduction])
log_optimization(
action=strategy["action"],
types_removed=target_reduction,
new_size=current_size - target_reduction,
buffer_created=buffer
)
Benefits of Graduated Approach:
- Reduced invocations: Proactive + batched = fewer optimization runs
- Preference for merging: Preserves graph data while reducing vocabulary
- Predictable behavior: Clear rules for when/how to optimize
- Buffer zones: Creating headroom prevents constant re-triggering
- Early warning: Monitor zone gives visibility before action required
Example Scenario:
Vocabulary grows from 60 → 92 types over 1 week:
Without aggressiveness curve:
- Hit 91 → prune 1 type → back to 90
- Hit 91 → prune 1 type → back to 90
- Hit 91 → prune 1 type → back to 90
- Hit 92 → prune 2 types → back to 90
Total: 4 optimization invocations, 5 types pruned
With aggressiveness curve:
- 60-75: Monitor, no action (flagged 3 synonym pairs)
- 75: Merged 2 synonym pairs → back to 73
- 85: Mixed optimization (merge 2, prune 3) → back to 80
- 90: Emergency batch (prune 7) → back to 83
Total: 3 optimization invocations, 12 types removed
Result: More stable, fewer invocations, better buffer
4. Three-Tier Pruning Modes
Mode Selection:
VOCABULARY_PRUNING_MODE = os.getenv("VOCAB_PRUNING_MODE", "aitl")
# Options: "naive" | "hitl" | "aitl"
AITL_CONFIDENCE_THRESHOLD = 0.7 # Fallback to HITL if AI confidence < 0.7
AITL_REASONING_MODEL = "claude-3-5-sonnet-20241022"
Mode 1: Naive (Algorithmic)
Pure bottom-up pruning, no intelligence:
def naive_prune():
"""
Automatic pruning based purely on value scores.
Use cases: Testing, CI/CD, low-stakes environments
"""
candidates = get_custom_types_ordered_by_value() # ASC
prune_count = get_active_vocabulary_size() - VOCAB_MAX
to_prune = candidates[:prune_count]
for type_obj in to_prune:
if type_obj.edge_count == 0:
delete_type(type_obj.relationship_type)
else:
deprecate_type(type_obj.relationship_type,
reason="Naive pruning: low value score")
log_pruning(mode="naive", pruned=to_prune)
Mode 2: HITL (Human-in-the-Loop) - DEFAULT
System recommends, human approves:
def hitl_prune():
"""
Generate recommendation, await curator approval.
Use cases: Production, high-stakes decisions, learning preferences
"""
candidates = get_custom_types_ordered_by_value()
prune_count = get_active_vocabulary_size() - VOCAB_MAX
# Generate recommendation
recommendation = {
"id": generate_recommendation_id(),
"timestamp": now(),
"trigger": "vocabulary_limit_exceeded",
"current_state": {
"active_types": get_active_vocabulary_size(),
"max_limit": VOCAB_MAX,
"prune_needed": prune_count
},
"suggested_actions": [
{
"action": "prune",
"types": [c.relationship_type for c in candidates[:prune_count]],
"rationale": [format_rationale(c) for c in candidates[:prune_count]]
},
{
"action": "merge",
"opportunities": detect_synonym_pairs(candidates),
"impact_analysis": calculate_merge_impact()
}
],
"status": "awaiting_approval"
}
store_recommendation(recommendation)
notify_curator(recommendation)
# Block further auto-expansion until approved
set_expansion_paused(True)
Curator CLI Workflow:
kg vocab review
# Output:
┌─────────────────────────────────────────────────────────────┐
│ Vocabulary Status: 92/90 types (OVER LIMIT) │
├─────────────────────────────────────────────────────────────┤
│ RECOMMENDED ACTIONS: │
│ │
│ [1] PRUNE 2 low-value types: │
│ • CREATES (0 edges, never used) │
│ • FEEDS_INTO (3 edges, 0 traversals, score: 0.02) │
│ │
│ [2] MERGE 1 synonym pair: │
│ • AUTHORED_BY → CREATED_BY (94% similar) │
│ │
│ Approve all? [Y/n] | Review individually? [i] │
└─────────────────────────────────────────────────────────────┘
# One-click approval
kg vocab approve-all
# Or selective
kg vocab approve recommendation 1 # Just prune
kg vocab reject recommendation 2 # Keep synonyms separate
Mode 3: AITL (AI-in-the-Loop)
Tactical decision layer with strategic human oversight:
class AITLVocabularyCurator:
"""
AI makes tactical decisions, human provides strategic oversight.
"""
def __init__(self):
self.reasoning_model = get_provider(AITL_REASONING_MODEL)
self.decision_history = []
self.curator_corrections = self._load_learned_preferences()
def make_pruning_decision(self, context):
"""
AI analyzes context and makes decision with detailed reasoning.
"""
# Build prompt with context
prompt = self._build_reasoning_prompt(context)
# Get AI decision
response = self.reasoning_model.complete(
prompt=prompt,
response_format={"type": "json_object"}
)
decision = parse_decision(response)
# Log with full justification
self._log_decision(decision, context)
# Check confidence threshold
if decision["confidence"] < AITL_CONFIDENCE_THRESHOLD:
# Fallback to HITL
return self._escalate_to_human(decision, context)
# Execute decision
return self._execute_decision(decision)
def _build_reasoning_prompt(self, context):
"""Build prompt with learned preferences."""
return f"""
You are a knowledge graph vocabulary curator. Analyze this optimization scenario:
CURRENT STATE:
- Active types: {context['active_types']} (limit: {context['max_limit']})
- Recent ingestions: {context['recent_ingestions']}
- Domain: {context['domain']}
PRUNING CANDIDATES (by value score):
{json.dumps(context['candidates'], indent=2)}
MERGE OPPORTUNITIES:
{json.dumps(context['merge_opportunities'], indent=2)}
LEARNED CURATOR PREFERENCES:
{json.dumps(self.curator_corrections, indent=2)}
TASKS:
1. Decide: prune, merge, or reject (raise limit)
2. Select specific types/pairs
3. Analyze impact on graph connectivity
4. Assess future regret probability
5. Provide detailed reasoning
Return JSON:
{{
"decision": "prune" | "merge" | "reject",
"selected_actions": [
{{"action": "prune", "type": "CREATES", "reasoning": "..."}}
],
"confidence": 0.0-1.0,
"reasoning": "Comprehensive explanation",
"alternatives_considered": [...],
"risk_assessment": {{
"connectivity_impact": "zero|low|medium|high",
"query_disruption": "none|minimal|moderate|severe",
"future_regret_probability": 0.0-1.0
}},
"human_review_required": true|false
}}
IMPORTANT: Consider learned preferences. Never prune types that humans have previously protected.
"""
def _log_decision(self, decision, context):
"""Store decision with full justification trail."""
audit_entry = {
"decision_id": generate_id(),
"timestamp": now(),
"mode": "aitl",
"model": AITL_REASONING_MODEL,
"trigger": context["trigger"],
"context": context,
"decision": decision,
"human_review_required": decision.get("human_review_required", False)
}
store_audit(audit_entry)
# Notify if flagged for review
if decision.get("human_review_required"):
notify_curator_review_required(audit_entry)
def learn_from_feedback(self, decision_id, curator_feedback):
"""
Human corrected AI decision - extract preference and update.
"""
decision = get_decision(decision_id)
# Infer preference rule
preference = self._infer_preference(decision, curator_feedback)
# Store for future decisions
self.curator_corrections.append({
"decision_id": decision_id,
"original_decision": decision["decision"],
"curator_action": curator_feedback["action"],
"reasoning": curator_feedback["reason"],
"extracted_rule": preference,
"timestamp": now()
})
# Persist
save_learned_preferences(self.curator_corrections)
def _infer_preference(self, decision, feedback):
"""Extract reusable preference rule from correction."""
if feedback["action"] == "reject_prune":
# Human rejected pruning a type
type_name = feedback["protected_type"]
return {
"rule": f"never_prune_{type_name}",
"condition": {
"relationship_type": type_name,
"reason": feedback["reason"]
}
}
elif feedback["action"] == "reject_merge":
# Human wants to keep synonyms separate
pair = feedback["synonym_pair"]
return {
"rule": f"keep_distinct_{pair[0]}_{pair[1]}",
"condition": {
"types": pair,
"semantic_distinction": feedback["reason"]
}
}
# ... more inference patterns
Human Oversight Interface:
# Review AI decisions
kg vocab decisions --since 7d
# Output:
┌──────────────────────────────────────────────────────────────┐
│ AI Vocabulary Decisions (Last 7 Days) │
├──────────────────────────────────────────────────────────────┤
│ 2025-10-15 14:32 [EXECUTED] PRUNED: CREATES, FEEDS_INTO │
│ Confidence: 87% | Impact: 3 edges │
│ AI Reasoning: "Zero usage, no traversals, no future..." │
│ ➜ [A]pprove | [R]eject & Teach | [D]etailed View │
│ │
│ 2025-10-14 03:15 [EXECUTED] MERGED: AUTHORED_BY → CREATED │
│ Confidence: 91% | Impact: 27 edges │
│ AI Reasoning: "94% semantic similarity, stem match..." │
│ ➜ [A]pprove | [R]eject & Teach | [D]etailed View │
│ │
│ 2025-10-13 19:45 [FLAGGED] AWAITING HUMAN REVIEW │
│ Action: Prune OPTIMIZES │
│ Confidence: 62% (below threshold) │
│ ➜ Human decision REQUIRED │
└──────────────────────────────────────────────────────────────┘
# View detailed reasoning
kg vocab decision vocab_prune_20251015_1432 --explain
# Output:
Decision: vocab_prune_20251015_1432
Model: claude-3-5-sonnet-20241022
Confidence: 87%
DECISION: Prune CREATES and FEEDS_INTO
REASONING:
Pruned CREATES (0 edges, never matched during 15 recent ingestions)
and FEEDS_INTO (3 edges but 0 traversals in 30 days, effectively
orphaned). Rejected pruning OPTIMIZES despite borderline score because
it appears in ML-specific contexts and recent ingestions show increasing
usage (trend: +40% over 14 days).
GRAPH IMPACT ANALYSIS:
- Removing these 2 types affects 0% of active queries
- No orphaned concepts created
- Connectivity preserved
ALTERNATIVES CONSIDERED:
1. Merge AUTHORED_BY → CREATED_BY
Rejected: Semantic analysis shows AUTHORED_BY used specifically
for documentation (86% of instances) vs general object creation.
Merger would lose domain specificity.
2. Raise max limit to 100
Rejected: Trend analysis projects 105 types in 60 days, requiring
another adjustment. Better to prune now.
RISK ASSESSMENT:
- Future regret probability: 15%
- Fallback available: Yes (types archived, can restore)
# Provide corrective feedback (teaches AI)
kg vocab decision vocab_prune_20251015_1432 --reject \
--reason "FEEDS_INTO is critical for data pipeline ontology despite low current usage"
# AI learns and adds to preferences:
# - never_prune_FEEDS_INTO (when domain=data_pipeline)
4. Value Scoring Algorithm
Multi-factor scoring prevents catastrophic forgetting:
def calculate_value_score(rel_type):
"""
Value = structural utility, not temporal recency.
Factors:
- Edge count: How many edges use this type
- Traversal frequency: How often edges are queried
- Bridge bonus: Connects low-activation to high-activation concepts
- Trend: Recent usage growth
"""
stats = get_relationship_stats(rel_type)
edge_count = stats.usage_count
avg_traversal = stats.avg_traversal_count or 0
bridge_count = calculate_bridge_importance(rel_type)
trend = calculate_usage_trend(rel_type, days=14)
# Weighted formula
value_score = (
edge_count * 1.0 + # Base: edge existence
(avg_traversal / 100.0) * 0.5 + # Usage weight
(bridge_count / 10.0) * 0.3 + # Bridge preservation
max(0, trend) * 0.2 # Growth momentum
)
return value_score
def calculate_bridge_importance(rel_type):
"""
Bridge bonus: low-activation nodes connecting to high-activation nodes.
Prevents pruning critical pathways.
"""
query = """
SELECT COUNT(*) as bridge_count
FROM kg_api.edge_usage_stats e
JOIN kg_api.concept_access_stats c_from
ON e.from_concept_id = c_from.concept_id
JOIN kg_api.concept_access_stats c_to
ON e.to_concept_id = c_to.concept_id
WHERE e.relationship_type = %s
AND c_from.access_count < 10 -- Low activation source
AND c_to.access_count > 100 -- High activation destination
"""
result = execute_query(query, [rel_type])
return result['bridge_count']
Key Insight: A rarely-used type with high bridge count (e.g., PRECEDES connecting timeline concepts) scores higher than a frequently-used type with no bridge value.
5. Protected Core Set
30 builtin types are immune to automatic pruning:
def is_protected_type(rel_type):
"""Check if type is in protected core set."""
return db.execute("""
SELECT is_builtin
FROM kg_api.relationship_vocabulary
WHERE relationship_type = %s
""", [rel_type])['is_builtin']
def prune_vocabulary(candidates):
"""Prune low-value types, respecting protections."""
for candidate in candidates:
if is_protected_type(candidate.relationship_type):
log_warning(f"Skipped pruning protected type: {candidate.relationship_type}")
continue
if candidate.edge_count == 0:
delete_type(candidate.relationship_type)
else:
deprecate_type(candidate.relationship_type)
Protected types can be merged (e.g., merge novel OPTIMIZES into builtin IMPROVES), but never deleted.
6. Edge Compaction (Synonym Merging)
When approaching max limit, merge synonyms instead of pruning:
def detect_synonym_opportunities():
"""
Find high-similarity type pairs for merging.
"""
active_types = get_active_custom_types()
synonym_pairs = []
for type_a in active_types:
for type_b in active_types:
if type_a >= type_b:
continue
# Semantic similarity via embeddings
similarity = cosine_similarity(
get_embedding(type_a),
get_embedding(type_b)
)
if similarity > 0.90:
synonym_pairs.append({
"pair": [type_a, type_b],
"similarity": similarity,
"merge_suggestion": suggest_canonical_form(type_a, type_b),
"edge_impact": count_edges(type_a) + count_edges(type_b)
})
return sorted(synonym_pairs, key=lambda x: x['edge_impact'], reverse=True)
def merge_relationship_types(source_type, target_type):
"""
Merge source_type into target_type.
Updates all edges in graph + vocabulary tables.
"""
# 1. Update graph edges (Cypher)
cypher_query = f"""
MATCH ()-[r:{source_type}]->()
SET r:{target_type}
REMOVE r:{source_type}
RETURN count(r) as updated_count
"""
result = execute_graph_query(cypher_query)
# 2. Update vocabulary table
db.execute("""
UPDATE kg_api.relationship_vocabulary
SET synonyms = array_append(synonyms, %s),
usage_count = usage_count + (
SELECT usage_count
FROM kg_api.relationship_vocabulary
WHERE relationship_type = %s
)
WHERE relationship_type = %s
""", [source_type, source_type, target_type])
# 3. Deprecate source type
db.execute("""
UPDATE kg_api.relationship_vocabulary
SET is_active = FALSE,
deprecation_reason = %s
WHERE relationship_type = %s
""", [f"Merged into {target_type}", source_type])
# 4. Audit trail
log_vocabulary_merge(source_type, target_type, result['updated_count'])
7. Deletion History & Rollback
All pruning/merging operations are logged and reversible:
-- Vocabulary history (track all changes)
CREATE TABLE IF NOT EXISTS kg_api.vocabulary_history (
id SERIAL PRIMARY KEY,
relationship_type VARCHAR(100) NOT NULL,
action VARCHAR(50) NOT NULL, -- 'added', 'deprecated', 'deleted', 'merged'
performed_by VARCHAR(100),
performed_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
snapshot JSONB, -- Full type metadata at time of change
merge_target VARCHAR(100), -- If merged, what was target
affected_edges INTEGER,
details JSONB
);
CREATE INDEX idx_vocab_history_type ON kg_api.vocabulary_history(relationship_type);
CREATE INDEX idx_vocab_history_action ON kg_api.vocabulary_history(action);
Rollback support:
# View deletion history
kg vocab history --deleted
# Output:
┌────────────────────────────────────────────────────────────┐
│ Deleted/Merged Relationship Types │
├────────────────────────────────────────────────────────────┤
│ 2025-10-15 14:32 CREATES │
│ Action: Pruned (0 edges) │
│ Reason: Never used │
│ ➜ [R]estore │
│ │
│ 2025-10-14 03:15 AUTHORED_BY → CREATED_BY │
│ Action: Merged (27 edges updated) │
│ Reason: 94% semantic similarity │
│ ➜ [U]nmerge (revert) │
└────────────────────────────────────────────────────────────┘
# Restore pruned type
kg vocab restore CREATES --reason "Needed for new documentation ontology"
# Unmerge (split edges back)
kg vocab unmerge AUTHORED_BY --from CREATED_BY
8. Vocabulary State Portability (Backup/Restore Integration)
Implementation Insight: During implementation, we discovered that vocabulary table state is essential for backup portability (Issue discovered: 2025-10-15).
Problem: Initial backup system (ADR-015) only exported graph data:
{
"data": {
"concepts": [...],
"sources": [...],
"instances": [...],
"relationships": [...] // Contains edge types as strings
}
}
Relationships contain edge type strings (e.g., AUTHORED_BY, OPTIMIZES), but the vocabulary table metadata was not preserved. On restore to a fresh database:
- Graph structure restored ✅
- Edge types present in relationships ✅
- Vocabulary table empty ❌ (only 30 builtin types, missing 60+ custom types)
- Category assignments lost ❌
- Usage statistics lost ❌
- Embeddings lost ❌
- Synonym mappings lost ❌
Core Insight:
Because ADR-032 structures vocabulary as managed state (not just emergent properties), backups become snapshots of TWO things: 1. Graph data (what was ingested) 2. Vocabulary state (what was learned/curated)
This is analogous to backing up both database tables AND schema definitions - you need both for complete restoration.
Solution: Include Vocabulary Table in Backups
Modified backup format to export complete vocabulary state:
{
"version": "1.0",
"type": "full_backup",
"timestamp": "2025-10-15T12:26:31Z",
"statistics": {
"concepts": 807,
"sources": 661,
"instances": 3546,
"relationships": 1699,
"vocabulary": 90 // New: vocabulary count
},
"data": {
"concepts": [...],
"sources": [...],
"instances": [...],
"relationships": [...],
"vocabulary": [ // New: complete vocabulary table
{
"relationship_type": "AUTHORED_BY",
"description": "LLM-generated relationship type",
"category": "attribution",
"added_by": "llm_extractor",
"added_at": "2025-10-15T16:41:26Z",
"usage_count": 27,
"is_active": true,
"is_builtin": false,
"synonyms": ["CREATED_BY"],
"embedding": [0.123, -0.456, ...], // 1536-dim vector
"embedding_model": "text-embedding-ada-002",
"embedding_generated_at": "2025-10-15T16:42:00Z",
"deprecation_reason": null
},
// ... 89 more types
]
}
}
Vocabulary Import During Restore:
Vocabulary must be imported BEFORE relationships to ensure edge types exist:
def import_backup(backup_data):
"""Restore backup with vocabulary-first ordering."""
# 1. Import vocabulary FIRST (ADR-032)
if "vocabulary" in backup_data["data"]:
for entry in backup_data["data"]["vocabulary"]:
# INSERT...ON CONFLICT to handle existing types
db.execute("""
INSERT INTO kg_api.relationship_vocabulary
(relationship_type, category, description, ...)
VALUES (%s, %s, %s, ...)
ON CONFLICT (relationship_type) DO UPDATE SET
category = EXCLUDED.category,
usage_count = EXCLUDED.usage_count,
...
""", entry_values)
# 2. Import concepts (needs vocabulary for validation)
import_concepts(backup_data["data"]["concepts"])
# 3. Import sources
import_sources(backup_data["data"]["sources"])
# 4. Import instances
import_instances(backup_data["data"]["instances"])
# 5. Import relationships (edge types now exist in vocabulary)
import_relationships(backup_data["data"]["relationships"])
Backward Compatibility:
Old backups without vocabulary section still restore correctly:
# Backup integrity checker (backup_integrity.py)
if "vocabulary" in data_section:
# New backup: validate against vocabulary table
vocabulary_types = {v["relationship_type"] for v in data_section["vocabulary"]}
# Validate relationships use known types
for rel in relationships:
if rel["type"] not in vocabulary_types:
# Warn about unknown types
result.add_warning(f"Unknown type: {rel['type']}")
else:
# Old backup: validate against builtin types only
vocabulary_types = BUILTIN_RELATIONSHIP_TYPES
Why This Matters:
- Ontology Portability: Export ontology from dev → import to prod with full vocabulary context
- Disaster Recovery: Complete system state restoration (not just graph data)
- A/B Testing: Clone production vocabulary state to test environment
- Temporal Snapshots: Backup captures "what the system knew" at that moment
- Migration Safety: Vocabulary state travels with graph data during migrations
Example Scenario:
# Export ontology with learned vocabulary
kg admin backup --type ontology --ontology "ML Research Papers"
# Backup contains:
# - 250 concepts from ML domain
# - 45 relationship types (30 builtin + 15 custom)
# - Custom types: TRAINS_ON, OPTIMIZES, OUTPERFORMS, PRETRAINED_ON, ...
# - Category assignments: all 15 custom types → "ml_specific" category
# - Embeddings for synonym detection
# - Usage statistics for value scoring
# Import to fresh database
kg admin restore --file ml_research_papers.json
# Result:
# ✅ All 250 concepts restored
# ✅ All 45 relationship types available
# ✅ Custom ML types immediately usable
# ✅ Category structure preserved
# ✅ Ready for new ingestion without vocabulary re-learning
Implementation Changes:
Modified files:
- src/lib/serialization.py: Added export_vocabulary() method
- src/lib/serialization.py: Modified import_backup() to import vocabulary first
- src/api/lib/backup_integrity.py: Added vocabulary section validation
- Backup format version remains 1.0 (backward compatible)
Statistics Tracking:
Backup integrity checker now reports vocabulary statistics:
✓ Backup validated successfully
Full database backup
- Concepts: 807
- Sources: 661
- Instances: 3546
- Relationships: 1699
- Vocabulary: 90 types (30 builtin, 60 extended)
Data Integrity Note:
During testing, we discovered 851 relationships using edge types not in the vocabulary table (USED_FOR, CONTAINS, DEFINED_AS, etc.). These are pre-ADR-032 data - relationships created before vocabulary tracking was implemented. The backup integrity checker correctly flags these as warnings but still allows restore (they remain as string properties on edges).
Implementation Plan
Phase 1: Auto-Expansion Infrastructure
- Modify
upsert_relationship()inage_client.py: - Add auto-expansion logic
- Basic validation (format, blacklist)
-
Trigger pruning check
-
Create
vocabulary_manager.pyservice: add_to_vocabulary()get_active_vocabulary_size()-
trigger_pruning_workflow() -
Add configuration:
VOCAB_MIN,VOCAB_MAX,VOCAB_HARD_LIMIT-
VOCAB_PRUNING_MODE(naive|hitl|aitl) -
Update schema:
- Add
vocabulary_historytable - Add
pruning_recommendationstable
Phase 2: Aggressiveness Curve + Naive Mode
- Implement aggressiveness curve:
- Zone calculations (comfort/watch/merge/prune/emergency)
- Batching strategy
-
Merge vs prune decision logic
-
Implement naive pruning:
- Value score calculation
-
Automatic prune on limit exceeded
-
Add synonym detection:
- Embedding-based similarity
- Merge suggestions in recommendations
Phase 3: HITL Mode
- Implement HITL workflow:
- Recommendation generation with aggressiveness curve
- Curator approval API endpoints
-
CLI commands (
kg vocab review,kg vocab approve-all) -
Add monitoring:
- Zone transition alerts
- Optimization invocation tracking
- Buffer effectiveness metrics
Phase 4: AITL Mode
- Build AITL curator:
- Reasoning prompt template
- Decision logging
-
Confidence thresholds
-
Implement learning loop:
- Curator feedback capture
- Preference extraction
-
Preference persistence
-
Add oversight interface:
kg vocab decisions(view AI decisions)kg vocab decision {id} --explain(detailed reasoning)kg vocab decision {id} --reject --reason(teach AI)
Phase 5: Rollback & Analytics
- Implement rollback:
kg vocab restore {type}-
kg vocab unmerge {type} -
Add analytics:
kg vocab analytics(trends, value scores, zone history)kg vocab candidates(pruning candidates)- Aggressiveness curve visualization
API Endpoints
Vocabulary Management
GET /api/vocabulary/types # List all types with stats
POST /api/vocabulary/types # Manually add type (curator)
PUT /api/vocabulary/types/{type} # Update metadata
DELETE /api/vocabulary/types/{type} # Deprecate type
POST /api/vocabulary/types/{type}/restore # Restore pruned type
POST /api/vocabulary/merge # Merge two types
POST /api/vocabulary/unmerge # Revert merge
Configuration
GET /api/vocabulary/config # Get tuning parameters
PUT /api/vocabulary/config # Update parameters (admin)
HITL Workflow
GET /api/vocabulary/recommendations # Get pending recommendations
POST /api/vocabulary/recommendations/{id}/approve
POST /api/vocabulary/recommendations/{id}/reject
AITL Workflow
GET /api/vocabulary/decisions # List AI decisions
GET /api/vocabulary/decisions/{id} # Detailed decision view
POST /api/vocabulary/decisions/{id}/feedback # Provide correction
Analytics
GET /api/vocabulary/history # Change history
GET /api/vocabulary/analytics # Value scores, trends
GET /api/vocabulary/candidates # Pruning candidates
Benefits
1. Self-Regulating System
- No manual deployment for new types
- Automatic capacity management (sliding window)
- Data-driven decisions (value scores, not guesswork)
2. Domain Adaptability
- ML ontologies get
TRAINS_ON,PREDICTS,OPTIMIZES - Pipeline ontologies get
FEEDS_INTO,TRANSFORMS,VALIDATES - Semantic ontologies get
SYMBOLIZES,REPRESENTS,EMBODIES
Each domain naturally grows its vocabulary through ingestion.
3. Intelligent Oversight
- Naive mode: Fast, deterministic (CI/CD)
- HITL mode: Human control (production)
- AITL mode: Scalable + justifiable (high-volume)
4. Learning System
- AI learns curator preferences over time
- Reduces false positives (e.g., never prune temporal types)
- Improves with usage (self-optimizing)
5. Auditability
- Full justification logs for every decision
- Rollback capability for mistakes
- Compliance-friendly (who, what, when, why)
Trade-offs
Complexity
Cost: More complex than static vocabulary Mitigation: Start with naive mode, graduate to HITL, enable AITL only when needed
AI Decision Risk
Cost: AITL might make wrong pruning decisions Mitigation: - Confidence threshold (fallback to HITL if < 0.7) - Protected core set (30 builtin types immune) - Full audit trail + rollback - Human oversight weekly
Token Cost
Cost: AITL reasoning uses ~500-1000 tokens per decision Mitigation: - Only runs when limit exceeded (infrequent) - Cost: ~$0.01 per decision with Claude Sonnet - Can disable in cost-sensitive environments
Synonym Detection Accuracy
Cost: Might merge non-synonyms (false positives) Mitigation: - High similarity threshold (0.90+) - HITL/AITL approval required - Easy unmerge via rollback
Monitoring & Metrics
Key Metrics
- Vocabulary Size Over Time
- Track active types (should stay 30-90)
-
Alert if exceeds hard limit
-
Auto-Expansion Rate
- New types added per ingestion
-
Alert if > 5 types/job (possible LLM issue)
-
Pruning Frequency
- How often pruning triggered
-
Target: < 1x per week
-
AITL Decision Accuracy
- % of AI decisions approved by humans
-
Target: > 85%
-
Value Score Distribution
- Histogram of type value scores
- Identify low-value types proactively
Alerts
vocab_size > hard_limit→ Block ingestion, require curator interventionaitl_approval_rate < 70%→ AI making poor decisions, review preferencesauto_expansion_rate > 10/day→ Possible LLM extraction issue
Security & Governance
Access Control (RBAC)
- Contributor: Can ingest (triggers auto-expansion)
- Curator: Can approve pruning recommendations
- Admin: Can modify config, force operations
Validation
- Format validation: Prevent malformed types
- Blacklist: Block profanity, reserved terms
- Rate limiting: Max 10 auto-expansions per ingestion job
Audit Trail
Every operation logged to vocabulary_audit and vocabulary_history with:
- Who (user/system/ai)
- What (action + details)
- When (timestamp)
- Why (reasoning/context)
Alternatives Considered
1. Manual Approval for Every Type (ADR-025)
Rejected: Doesn't scale for high-volume ingestion or domain-specific ontologies
2. Unlimited Vocabulary Growth
Rejected: Leads to vocabulary explosion, degraded LLM extraction quality
3. Time-Based Pruning
Rejected: Graph value is structural, not temporal. Old types can have high bridge importance.
4. No Pruning (Only Expansion)
Rejected: Eventually hits performance limits, confuses LLM with 200+ type options
5. Hardcoded If/Else Threshold Logic
Rejected: Multiple issues with maintainability and tuning
Original Approach:
# Example of hardcoded threshold logic
def calculate_aggressiveness(vocab_size):
if vocab_size < 60:
return 0.0
elif vocab_size < 70:
return 0.2
elif vocab_size < 80:
return 0.5
elif vocab_size < 90:
return 0.8
else:
return 1.0
Problems:
- Hard to Debug:
- Which threshold is causing behavior X?
- What happens at boundary conditions (vocab_size = 79 vs 80)?
-
Discontinuous jumps create unpredictable behavior
-
Difficult to Tune:
- Want gentler curve? Rewrite all thresholds
- Want sharper curve? Add more if/elif branches
-
Every tuning attempt requires code changes and deployment
-
Not Visualizable:
- Can't graph the behavior easily
- Hard to communicate to non-technical stakeholders
-
No way to preview changes before deploying
-
Maintenance Burden:
- Each environment might need different thresholds
- Testing requires multiple code paths
-
Adding new zones means rewriting logic
-
Example Debugging Scenario:
# Bug report: "System pruned aggressively at 78 types" # Developer has to trace through: if vocab_size < 60: # Not here ... elif vocab_size < 70: # Not here ... elif vocab_size < 80: # AH! Here's the culprit return 0.5 # But why 0.5? Is that right for 78? # And what about 79? 77? Where's the sweet spot?
Why Bezier is Better:
# Single line configuration
curve = AGGRESSIVENESS_CURVES["aggressive"]
aggressiveness = curve.get_y_for_x(position)
# Debugging: "What's aggressiveness at 78 types?"
# Answer: Plot curve, see exact value (e.g., 0.67)
# Visual, continuous, predictable
# Tuning: "Too aggressive at 78?"
# Change: VOCAB_AGGRESSIVENESS = "gentle"
# No code changes, no deployment
Bezier Benefits: - ✅ Continuous function (smooth behavior, no jumps) - ✅ Visually tunable (drag control points, see result) - ✅ Configuration-based (no code changes) - ✅ Familiar to developers (CSS animations use same math) - ✅ Easy to debug (plot curve, see exact behavior) - ✅ Environment-specific (dev vs prod can use different profiles)
Trade-off: - More complex implementation (CubicBezier class) - But: Implementation is one-time, benefits are ongoing - And: Standard algorithm, well-tested, no surprises
Success Criteria
Phase 1 (Auto-Expansion)
- [ ] New types auto-added during ingestion
- [ ] No code deployment required for new types
- [ ] Vocabulary size tracked and alerts functional
Phase 2 (HITL)
- [ ] Curator can approve/reject recommendations in < 2 minutes
- [ ] Pruning maintains vocabulary at 30-90 types
- [ ] Zero false positives (protected types never pruned)
Phase 3 (AITL)
- [ ] AI decision approval rate > 85%
- [ ] AI learns from corrections (preferences applied)
- [ ] Detailed justification logs for compliance
Phase 4 (Rollback)
- [ ] Can restore any pruned type
- [ ] Can unmerge any synonym pair
- [ ] Full change history queryable
References
- ADR-022: 30-Type Semantically Sparse Taxonomy (current static system)
- ADR-025: Dynamic Relationship Vocabulary (skip-and-approve workflow)
- ADR-026: Autonomous Vocabulary Curation (LLM-assisted suggestions)
- ADR-047: Probabilistic Vocabulary Categorization (embedding-based category assignment)
- ADR-046: Grounding-Aware Vocabulary Management (synonym detection, compaction workflow)
- ADR-014: Job Approval Workflow (HITL pattern)
- ADR-021: Live Man Switch (human oversight principles)
Future Enhancements
Phase 5: Advanced Learning
- Cross-ontology type analysis (find domain patterns)
- Predictive type suggestions (recommend types before ingestion)
- Automatic category inference via clustering
Phase 6: Distributed Vocabulary
- Multi-tenant vocabulary namespaces
- Vocabulary inheritance (base + domain-specific)
- Federated type sharing across organizations
Status: Proposed Next Steps: 1. Review with development team 2. Prototype auto-expansion in feature branch 3. Test naive mode with sample ingestions 4. Pilot HITL workflow with curator 5. Evaluate AITL with safety checks