ADR-026: Autonomous Vocabulary Curation and Ontology Management
Status: Proposed (Theoretical Enhancement) Date: 2025-10-10 Deciders: System Architects Related: ADR-025 (Dynamic Relationship Vocabulary), ADR-024 (Multi-Schema PostgreSQL)
Overview
When your knowledge graph can learn new relationship types automatically (as enabled in ADR-025), you quickly discover a practical challenge: who decides if "ENHANCES", "IMPROVES", and "STRENGTHENS" are really different concepts or just three ways to say the same thing? A human curator could review every new term, but with 50+ new types appearing from large document sets, this becomes a full-time job that slows down your knowledge building.
This ADR explores using AI to help with vocabulary curation—essentially having the system clean up after itself. Imagine the AI extracting concepts uses GPT-4, and when new relationship types pile up, a separate AI agent (using embeddings to measure semantic similarity) reviews them and suggests: "These five types all mean roughly the same thing—should we consolidate them?" The human curator approves or rejects these suggestions, creating a collaborative workflow. The system also tracks which relationship types are actually being used versus just cluttering the vocabulary, enabling smart pruning decisions. This is theoretical enhancement rather than immediate implementation—exploring how to scale vocabulary management without requiring constant human oversight, while keeping humans in the loop for final decisions about the knowledge structure.
Context
ADR-025 establishes a curator-driven workflow for managing relationship vocabulary growth. While effective, the manual curation process has scalability limitations:
Current Curation Workflow (ADR-025)
# Curator manually reviews skipped relationships
kg vocabulary review
# Output: 127 occurrences of "ENHANCES" across 15 documents
# Curator manually decides: new type or synonym?
kg vocabulary add ENHANCES --category augmentation --description "..."
# OR
kg vocabulary alias ENHANCES --maps-to SUPPORTS
Scalability Challenges
- Manual Bottleneck:
- Large ingestion jobs can produce 50+ unique relationship types
- Curator must review each one individually
- Decision fatigue on semantic similarity judgments
-
Slows down knowledge graph growth
-
Limited Semantic Analysis:
- Curator relies on intuition for synonym detection
- No formal similarity metrics between relationship types
- Risk of creating near-duplicate types (ENHANCES vs IMPROVES vs STRENGTHENS)
-
Inconsistent categorization across curators
-
Reactive Ontology Evolution:
- Vocabulary changes discovered post-ingestion
- No proactive trend analysis
- Missing strategic insights about domain evolution
- Ontology "drifts" rather than "evolves" purposefully
Research Context
- LLM-Assisted Schema Matching: Recent research demonstrates LLMs can identify semantic equivalence between schema elements with >85% accuracy (GPT-4 on COMA++ benchmark)
- Ontology Versioning Standards: Semantic Web community uses OWL versioning (owl:versionInfo, owl:priorVersion) for formal schema evolution tracking
- Knowledge Discovery in Databases: Skipped relationships table represents a "schema change recommendation stream" amenable to data mining
Decision
Introduce three autonomous enhancements to vocabulary management, transforming the curator from operator to validator:
1. LLM-Assisted Synonym and Category Suggestion
Automated Analysis Pipeline:
async def suggest_vocabulary_actions(relationship_type: str):
"""
LLM-powered analysis of skipped relationship types.
Returns:
- Synonym confidence scores for existing types
- Suggested category if creating new type
- Semantic description proposal
- Decision recommendation (add vs alias)
"""
# Retrieve context
skipped_instances = get_skipped_instances(relationship_type)
existing_vocab = get_relationship_vocabulary()
# Build LLM prompt
prompt = f"""
You are a knowledge graph ontology curator. Analyze whether the relationship type
'{relationship_type}' should be:
1. Aliased to an existing type (synonym), OR
2. Added as a new distinct semantic relationship
EXISTING VOCABULARY:
{format_vocabulary_with_descriptions(existing_vocab)}
SKIPPED RELATIONSHIP EXAMPLES:
{format_skipped_examples(skipped_instances[:10])}
For each existing type, provide:
- Semantic similarity score (0.0-1.0)
- Reasoning
If similarity > 0.85: RECOMMEND aliasing to most similar type
If similarity < 0.85: RECOMMEND new type with suggested category and description
Output JSON:
{{
"recommendation": "alias" | "new_type",
"similar_types": [
{{"type": "SUPPORTS", "similarity": 0.78, "reasoning": "..."}},
...
],
"if_alias": {{"canonical_type": "SUPPORTS", "confidence": 0.92}},
"if_new": {{
"suggested_category": "augmentation",
"suggested_description": "One concept improves another...",
"confidence": 0.88
}}
}}
"""
response = await llm_client.complete(
prompt=prompt,
model="gpt-4o",
response_format={"type": "json_object"}
)
return parse_suggestion(response)
Enhanced Curator Workflow:
# LLM pre-analyzes and suggests actions
kg vocabulary review --with-suggestions
# Output:
# ┌─────────────┬────────────┬──────────────┬─────────────────────────────┐
# │ Type │ Count │ Suggestion │ Reasoning │
# ├─────────────┼────────────┼──────────────┼─────────────────────────────┤
# │ ENHANCES │ 127 │ NEW TYPE │ Distinct from SUPPORTS │
# │ │ │ │ (similarity: 0.72) │
# │ │ │ │ Category: augmentation │
# ├─────────────┼────────────┼──────────────┼─────────────────────────────┤
# │ IMPROVES │ 89 │ ALIAS │ Synonym of ENHANCES │
# │ │ │ → ENHANCES │ (similarity: 0.94) │
# ├─────────────┼────────────┼──────────────┼─────────────────────────────┤
# │ VALIDATES │ 56 │ ALIAS │ Evidential relationship │
# │ │ │ → SUPPORTS │ (similarity: 0.88) │
# └─────────────┴────────────┴──────────────┴─────────────────────────────┘
#
# Curator validation:
# Press [A]pprove all | [R]eview individually | [C]ancel
# One-click approval for high-confidence suggestions
kg vocabulary approve-batch --confidence-threshold 0.9
Token Efficiency:
- LLM analysis runs once per batch (not per concept)
- Embedding-based pre-filtering reduces LLM calls
- Results cached in
vocabulary_suggestionstable - Cost: ~$0.05 per 100 relationship types analyzed
2. Formal Ontology Versioning
Immutable Version History:
-- Ontology Version Registry
CREATE TABLE kg_api.ontology_versions (
version_id SERIAL PRIMARY KEY,
version_number VARCHAR(20) NOT NULL, -- Semantic versioning: 1.2.3
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by VARCHAR(100),
change_summary TEXT,
is_active BOOLEAN DEFAULT TRUE,
-- Snapshot of vocabulary at this version
vocabulary_snapshot JSONB NOT NULL,
-- Change metadata
types_added TEXT[],
types_aliased JSONB, -- [{"IMPROVES": "ENHANCES"}, ...]
types_deprecated TEXT[],
-- Compatibility
backward_compatible BOOLEAN,
migration_required BOOLEAN,
UNIQUE(version_number)
);
-- Concept Provenance (track which version created each concept)
CREATE TABLE kg_api.concept_version_metadata (
concept_id VARCHAR(100) PRIMARY KEY,
created_in_version INTEGER REFERENCES kg_api.ontology_versions(version_id),
last_modified_version INTEGER REFERENCES kg_api.ontology_versions(version_id)
);
-- Historical Vocabulary View (time-travel queries)
CREATE VIEW kg_api.vocabulary_at_version AS
SELECT
ov.version_number,
ov.created_at,
jsonb_array_elements(ov.vocabulary_snapshot) as vocabulary_entry
FROM kg_api.ontology_versions ov;
Semantic Versioning Rules:
def increment_ontology_version(changes: dict):
"""
Semantic versioning for ontology changes.
MAJOR.MINOR.PATCH:
- MAJOR: Breaking changes (type removed, semantics fundamentally changed)
- MINOR: New relationship types added (backward compatible)
- PATCH: Aliases added, descriptions improved (no schema change)
"""
current_version = get_current_ontology_version() # e.g., "1.2.3"
major, minor, patch = parse_version(current_version)
if changes['types_removed'] or changes['semantics_changed']:
# Breaking change
return f"{major + 1}.0.0"
elif changes['types_added']:
# New types (backward compatible)
return f"{major}.{minor + 1}.0"
else:
# Aliases or description updates only
return f"{major}.{minor}.{patch + 1}"
Automatic Version Creation:
async def approve_vocabulary_change(action: str, details: dict):
"""
Every vocabulary change triggers version increment.
"""
# Calculate version increment
new_version = increment_ontology_version(action, details)
# Create immutable snapshot
current_vocab = get_full_vocabulary()
version_record = {
'version_number': new_version,
'created_by': current_user.username,
'change_summary': generate_change_summary(action, details),
'vocabulary_snapshot': current_vocab,
'backward_compatible': is_backward_compatible(action),
'types_added': details.get('new_types', []),
'types_aliased': details.get('aliases', {}),
'types_deprecated': details.get('deprecated', [])
}
await db.execute(
"INSERT INTO kg_api.ontology_versions (...) VALUES (...)",
version_record
)
# Audit trail
await log_version_change(new_version, action, details)
Time-Travel Queries:
-- Find concepts created with vocabulary from v1.2.x
MATCH (c:Concept)
WHERE c.created_in_version STARTS WITH '1.2'
RETURN c
-- Query graph as it existed at version 1.1.0
-- (using vocabulary snapshot to reinterpret relationship types)
SELECT * FROM kg_api.vocabulary_at_version
WHERE version_number = '1.1.0'
Migration Support:
# When breaking change occurs (v1.x.x → v2.0.0)
kg ontology migrate --from-version 1.5.2 --to-version 2.0.0
# Shows:
# BREAKING CHANGES:
# - Type REMOVED: "VALIDATES" (use "SUPPORTS" instead)
# - Type SEMANTICS CHANGED: "ENHANCES" now requires confidence > 0.8
#
# Migration will:
# 1. Remap all VALIDATES edges → SUPPORTS
# 2. Flag low-confidence ENHANCES edges for review
#
# Affected: 1,247 edges across 3 ontologies
# Estimated time: 5 minutes
#
# Proceed? [Y/n]
3. Advanced Vocabulary Analytics Dashboard
Strategic Knowledge Discovery Interface:
class VocabularyAnalyticsDashboard:
"""
Transform skipped_relationships from maintenance log into strategic insight tool.
"""
def get_emerging_relationship_trends(self, days: int = 30):
"""
Identify relationship types with accelerating occurrence.
"""
return db.execute(f"""
WITH daily_counts AS (
SELECT
relationship_type,
DATE(first_seen) as date,
SUM(occurrence_count) as daily_count
FROM kg_api.skipped_relationships
WHERE first_seen > NOW() - INTERVAL '{days} days'
GROUP BY relationship_type, DATE(first_seen)
),
growth_rates AS (
SELECT
relationship_type,
REGR_SLOPE(daily_count, EXTRACT(EPOCH FROM date)) as growth_rate,
AVG(daily_count) as avg_daily_count
FROM daily_counts
GROUP BY relationship_type
)
SELECT
relationship_type,
growth_rate,
avg_daily_count,
growth_rate * avg_daily_count as trend_score
FROM growth_rates
WHERE growth_rate > 0
ORDER BY trend_score DESC
LIMIT 10
""")
def get_relationship_cooccurrence_network(self):
"""
Which relationship types appear together in the same documents?
Reveals semantic clusters.
"""
return db.execute("""
SELECT
a.relationship_type as type_a,
b.relationship_type as type_b,
COUNT(DISTINCT a.job_id) as cooccurrence_count,
CORR(a.occurrence_count, b.occurrence_count) as correlation
FROM kg_api.skipped_relationships a
JOIN kg_api.skipped_relationships b
ON a.job_id = b.job_id
AND a.relationship_type < b.relationship_type
GROUP BY a.relationship_type, b.relationship_type
HAVING COUNT(DISTINCT a.job_id) > 3
ORDER BY cooccurrence_count DESC
""")
def get_ontology_vocabulary_fingerprint(self, ontology: str):
"""
What makes this ontology's vocabulary unique?
"""
return db.execute(f"""
-- TF-IDF style scoring for relationship types per ontology
WITH type_freq AS (
SELECT
ontology,
relationship_type,
SUM(occurrence_count) as freq
FROM kg_api.skipped_relationships
GROUP BY ontology, relationship_type
),
inverse_doc_freq AS (
SELECT
relationship_type,
LOG(COUNT(DISTINCT ontology)) as idf
FROM kg_api.skipped_relationships
GROUP BY relationship_type
)
SELECT
tf.relationship_type,
tf.freq,
idf.idf,
tf.freq * idf.idf as distinctiveness_score
FROM type_freq tf
JOIN inverse_doc_freq idf ON tf.relationship_type = idf.relationship_type
WHERE tf.ontology = '{ontology}'
ORDER BY distinctiveness_score DESC
LIMIT 20
""")
def predict_vocabulary_growth(self, months: int = 6):
"""
Forecast vocabulary size and recommend pruning thresholds.
"""
historical_growth = self.get_vocabulary_growth_history()
# Simple linear regression (upgrade to Prophet/ARIMA for production)
slope = calculate_growth_rate(historical_growth)
current_size = get_vocabulary_size()
projected_size = current_size + (slope * months * 30)
if projected_size > VOCABULARY_WINDOW['max']:
pruning_needed = projected_size - VOCABULARY_WINDOW['max']
return {
'forecast': projected_size,
'action_required': True,
'pruning_recommendation': {
'types_to_prune': pruning_needed,
'suggested_candidates': get_low_value_types(limit=pruning_needed)
}
}
return {'forecast': projected_size, 'action_required': False}
Visualization Examples:
# Emerging relationship types (trending up)
kg vocabulary analytics trends --days 30
# Output (with sparkline):
# ┌──────────────┬────────────┬─────────────────────────────────┐
# │ Type │ Growth Rate│ Trend (30 days) │
# ├──────────────┼────────────┼─────────────────────────────────┤
# │ OPTIMIZES │ +340% │ ▁▂▃▅▇█ (accelerating) │
# │ MONITORS │ +180% │ ▂▃▅▆▇█ (steady growth) │
# │ DELEGATES │ +90% │ ▃▄▅▆▆█ (recent spike) │
# └──────────────┴────────────┴─────────────────────────────────┘
# Relationship co-occurrence network
kg vocabulary analytics network --min-cooccurrence 5
# Output (ASCII graph):
# ENHANCES ---- INTEGRATES
# | |
# OPTIMIZES ---- DELEGATES
# | |
# MONITORS ---- VALIDATES
#
# Cluster 1: Performance (OPTIMIZES, MONITORS)
# Cluster 2: Integration (INTEGRATES, DELEGATES)
# Cluster 3: Validation (VALIDATES, ENHANCES)
# Ontology-specific vocabulary signature
kg vocabulary analytics fingerprint "ML Systems"
# Output:
# Distinctive relationship types for "ML Systems":
# 1. TRAINS_ON (87% unique to this ontology)
# 2. PREDICTS (76% unique)
# 3. OPTIMIZES (68% unique)
#
# Suggests: ML domain needs specialized vocabulary beyond core types
# Vocabulary growth forecast
kg vocabulary analytics forecast --months 6
# Output:
# Current vocabulary: 87 active types
# Projected (6 months): 142 types
#
# ⚠ EXCEEDS MAX LIMIT (100 types)
# Recommended action: Prune 42 low-value types
# Candidates: [list of least-used types with value scores]
Curator Dashboard UI (Future):
╔═══════════════════════════════════════════════════════════════╗
║ VOCABULARY ANALYTICS DASHBOARD ║
╠═══════════════════════════════════════════════════════════════╣
║ ║
║ 📈 TRENDING TYPES (30 days) 🔗 CO-OCCURRENCE NET ║
║ ┌─────────────────────────┐ ┌──────────────────┐ ║
║ │ OPTIMIZES ▁▃▅▇█ +340%│ │ [Graph View] │ ║
║ │ MONITORS ▂▄▆▇█ +180%│ │ │ ║
║ └─────────────────────────┘ └──────────────────┘ ║
║ ║
║ 🎯 ONTOLOGY FINGERPRINTS 📊 GROWTH FORECAST ║
║ ┌─────────────────────────┐ ┌──────────────────┐ ║
║ │ ML Systems: │ │ Current: 87 │ ║
║ │ - TRAINS_ON (87%) │ │ +6mo: 142 ⚠ │ ║
║ │ - PREDICTS (76%) │ │ Action: Prune 42 │ ║
║ └─────────────────────────┘ └──────────────────┘ ║
╚═══════════════════════════════════════════════════════════════╝
Implementation Plan
Phase 1: LLM-Assisted Suggestions (2-3 weeks)
- Create
vocabulary_suggestionstable for caching LLM analysis - Implement
suggest_vocabulary_actions()function - Add
--with-suggestionsflag tokg vocabulary review - Build
approve-batchcommand for high-confidence suggestions - Test accuracy on historical skipped relationships
Phase 2: Ontology Versioning (2-3 weeks)
- Create
ontology_versionsandconcept_version_metadatatables - Implement semantic versioning logic
- Add version tracking to all vocabulary mutations
- Build time-travel query support
- Create migration tool for breaking changes
Phase 3: Analytics Dashboard (3-4 weeks)
- Implement analytics queries (trends, cooccurrence, fingerprints)
- Build CLI visualization commands
- Create forecast models (linear regression → time series)
- Develop curator dashboard UI (web-based)
- Integrate with monitoring/alerting system
Benefits
1. Curator Efficiency
- 10x faster curation: LLM suggests actions, curator validates (not discovers)
- Reduced cognitive load: Clear recommendations with confidence scores
- Batch operations: Approve 50+ types in one command vs 50 individual decisions
2. Ontology Quality
- Consistent categorization: LLM applies uniform semantic analysis
- Fewer synonyms: Automated similarity detection prevents duplicates
- Formal versioning: Clear evolution history, no "drift"
3. Strategic Insights
- Proactive vocabulary planning: Forecasts prevent reactive scrambling
- Domain discovery: Trending types reveal emerging concepts in corpus
- Ontology profiling: Understand what makes each domain unique
4. System Intelligence
- Self-improving: Analytics feed back into curation recommendations
- Transparent evolution: Version history provides full audit trail
- Migration safety: Breaking changes handled with formal process
Metrics
Curation Speed: - Manual review time: 5 min/type (current) → 30 sec/type (with suggestions) - Batch approval: <1 minute for 50 high-confidence suggestions
Accuracy: - LLM synonym detection: Target >90% precision - Category suggestions: Target >85% accuracy (vs curator ground truth)
Knowledge Discovery: - Trending types identified: 5-10 per month - Ontology fingerprints: 10-20 distinctive types per ontology - Growth forecast: ±15% accuracy over 6 months
Risks and Mitigations
Risk 1: LLM Hallucination in Suggestions
Risk: LLM suggests incorrect synonym mapping or category
Mitigation: - Curator ALWAYS validates (LLM is advisor, not decision-maker) - Confidence thresholds (only show suggestions >0.8) - Audit trail tracks LLM suggestions vs curator decisions - Periodic accuracy review to retrain prompts
Risk 2: Version Explosion
Risk: Every minor change creates new version (table bloat)
Mitigation: - Batch changes into logical releases (weekly/monthly) - Compress patch versions (only major/minor versions get snapshots) - Archive old versions (>1 year) to cold storage
Risk 3: Analytics Complexity
Risk: Too many metrics overwhelm curators
Mitigation: - Start with 3 key dashboards (trends, network, forecast) - Progressive disclosure (simple view by default, details on demand) - Contextual recommendations ("Try expanding vocabulary in emerging areas")
Alternatives Considered
Alternative 1: Fully Automated Vocabulary (No Curator)
Rejected: - Removes human oversight of semantic quality - Risk of vocabulary explosion (LLM may over-create types) - Cannot handle domain-specific nuance - Violates "curator as validator" principle
Decision: Keep human in loop, use LLM as assistant
Alternative 2: Manual Analytics (SQL Queries)
Rejected: - Requires curator to write complex SQL - No predictive capabilities - Insights hidden in raw data
Decision: Pre-built analytics with visualization
Alternative 3: Snapshot-Based Versioning (Not Immutable)
Rejected: - Cannot do time-travel queries reliably - Version history can be accidentally modified - Breaks audit compliance
Decision: Immutable version records with JSONB snapshots
Success Criteria
Phase 1 (LLM Suggestions): - [ ] 80% of suggestions accepted by curator (high trust) - [ ] <30 seconds per relationship type review (vs 5 min manual) - [ ] Zero false positives in high-confidence (>0.9) suggestions
Phase 2 (Versioning): - [ ] Every vocabulary change tracked in version history - [ ] Breaking changes flagged with migration path - [ ] Time-travel queries return correct historical vocabulary
Phase 3 (Analytics): - [ ] Trending types dashboard identifies 5+ actionable insights/month - [ ] Vocabulary growth forecast within ±15% accuracy - [ ] Curator uses dashboard weekly (tracked via usage logs)
References
- LLM Schema Matching:
- Gu et al. (2024): "GPT-4 for Schema Matching" - 87% accuracy on COMA++ benchmark
-
Li et al. (2023): "Large Language Models as Ontology Aligners"
-
Ontology Versioning:
- W3C OWL 2 Web Ontology Language: Versioning semantics
- Klein & Fensel (2001): "Ontology Versioning and Change Detection on the Web"
-
Semantic Web Best Practices: owl:versionInfo, owl:priorVersion
-
Knowledge Discovery:
- Fayyad et al. (1996): "From Data Mining to Knowledge Discovery in Databases"
-
Codd (1993): "Providing OLAP to User-Analysts" - Dimensional analytics
-
Related ADRs:
- ADR-025: Dynamic Relationship Vocabulary Management
- ADR-024: Multi-Schema PostgreSQL Architecture
- ADR-014: Job Approval Workflow (human-in-loop pattern)
Future Enhancements
Phase 4: Active Learning Loop
- Track curator overrides of LLM suggestions
- Retrain suggestion model on curator feedback
- Personalized suggestions per curator (learn their preferences)
Phase 5: Cross-Ontology Alignment
- Detect when different ontologies use different types for same concept
- Suggest canonical mappings across ontologies
- Enable federated graph queries
Phase 6: Natural Language Curation
# Natural language interface for curators
kg vocabulary curate "Make OPTIMIZES a synonym of IMPROVES"
# → System translates to: kg vocabulary alias OPTIMIZES --maps-to IMPROVES
kg vocabulary curate "Show me types related to performance"
# → System searches by semantic category and embeddings
Status: Proposed for discussion Next Steps: 1. Review with knowledge graph team 2. Pilot LLM suggestions on historical skipped_relationships 3. Measure curator time savings 4. Build versioning infrastructure 5. Prototype analytics dashboard