Epistemic Status Query Filtering
Feature: ADR-065 Phase 2 Status: Implemented (2025-11-16) API: GraphQueryFacade.match_concept_relationships()
Overview
Semantic role filtering allows you to query relationships based on their epistemic status - a classification derived from grounding patterns that indicates whether a relationship type tends to be affirmative, contested, contradictory, or historical.
This enables powerful dialectical queries such as: - "Show me only high-confidence relationships" (AFFIRMATIVE) - "Show me points of tension and contradiction" (CONTESTED + CONTRADICTORY) - "Exclude outdated relationships" (exclude HISTORICAL) - "Find relationships that are actively debated" (CONTESTED only)
Epistemic Status Classifications
Roles are automatically detected by measuring grounding patterns across vocabulary types:
| Role | Avg Grounding | Meaning | Example Use Case |
|---|---|---|---|
| AFFIRMATIVE | > 0.8 | High-confidence, well-supported relationships | Building consensus views, finding established connections |
| CONTESTED | 0.2 to 0.8 | Mixed grounding, actively debated | Exploring uncertainty, finding areas needing investigation |
| CONTRADICTORY | < -0.5 | Negative grounding, oppositional | Dialectical analysis, identifying conflicts |
| HISTORICAL | N/A | Temporal vocabulary (detected by name) | Time-based filtering, evolution tracking |
| UNCLASSIFIED | Other | Doesn't fit known patterns | Default fallback |
| INSUFFICIENT_DATA | N/A | < 3 measurements | Need more data |
How It Works
- Measurement: Run
kg vocab epistemic-status measureto analyze grounding patterns - Storage: Semantic roles stored as VocabType properties (
v.epistemic_status,v.epistemic_stats) - Querying: Use
include_epistemic_statusorexclude_epistemic_statusparameters in GraphQueryFacade - Filtering: Facade queries VocabType nodes, builds relationship type list dynamically
- Results: Only relationships matching role criteria are returned
Philosophy: Semantic roles are temporal measurements, not permanent classifications. Re-running measurement as your graph evolves will yield different results. This embraces bounded locality + satisficing (ADR-065).
Enabling Epistemic Status Filtering
Step 1: Measure Epistemic Status
Run the measurement command via kg CLI to analyze grounding patterns:
# Basic measurement (stores to database by default)
kg vocab epistemic-status measure
# Measure without storing (analysis only)
kg vocab epistemic-status measure --no-store
# Larger sample for more precision
kg vocab epistemic-status measure --sample-size 500
# Detailed analysis with uncertainty metrics
kg vocab epistemic-status measure --sample-size 200 --verbose
Output Example:
Epistemic Status Measurement Report
=================================
Summary:
CONTESTED: 1
UNCLASSIFIED: 6
INSUFFICIENT_DATA: 28
CONTESTED (1)
• ENABLES
8 measurements from 8/8 edges | avg grounding: +0.232
📝 Storing epistemic statuss to VocabType nodes...
✓ Stored 35/35 epistemic statuss to VocabType nodes
Phase 2 query filtering now available via GraphQueryFacade.match_concept_relationships()
Step 2: Verify Storage
Check that epistemic statuss were stored:
from api.api.lib.age_client import AGEClient
client = AGEClient()
facade = client.facade
# List vocabulary types with epistemic statuss
vocab_types = facade.match_vocab_types(
where="v.epistemic_status IS NOT NULL"
)
for vt in vocab_types:
props = vt['v']['properties']
print(f"{props['name']}: {props['epistemic_status']} (avg: {props['epistemic_stats']['avg_grounding']:.3f})")
Example Output:
ENABLES: CONTESTED (avg: +0.232)
SUPPORTS: UNCLASSIFIED (avg: +0.165)
INFLUENCES: UNCLASSIFIED (avg: -0.049)
API Usage
Basic Role Filtering
from api.api.lib.age_client import AGEClient
client = AGEClient()
facade = client.facade
# Include only AFFIRMATIVE relationships (high confidence)
affirmative = facade.match_concept_relationships(
include_epistemic_status=["AFFIRMATIVE"],
limit=10
)
# Exclude HISTORICAL relationships (current state only)
current = facade.match_concept_relationships(
exclude_epistemic_status=["HISTORICAL"],
limit=10
)
Dialectical Queries
# Explore areas of tension and contradiction
dialectical = facade.match_concept_relationships(
include_epistemic_status=["CONTESTED", "CONTRADICTORY"],
limit=20
)
# Find well-established connections (thesis)
thesis = facade.match_concept_relationships(
include_epistemic_status=["AFFIRMATIVE"]
)
# Find points of disagreement (antithesis)
antithesis = facade.match_concept_relationships(
include_epistemic_status=["CONTESTED", "CONTRADICTORY"]
)
Combined Filtering
# Specific relationship type + epistemic status
enables_contested = facade.match_concept_relationships(
rel_types=["ENABLES"],
include_epistemic_status=["CONTESTED"],
limit=10
)
# Multiple types + role filter
causal_affirmative = facade.match_concept_relationships(
rel_types=["ENABLES", "CAUSES", "REQUIRES"],
include_epistemic_status=["AFFIRMATIVE"]
)
# Type filter + exclude historical
current_supports = facade.match_concept_relationships(
rel_types=["SUPPORTS", "VALIDATES"],
exclude_epistemic_status=["HISTORICAL"]
)
Backward Compatibility
# Traditional queries still work (no role filtering)
all_supports = facade.match_concept_relationships(
rel_types=["SUPPORTS"]
)
# No parameters - returns all relationships
all_rels = facade.match_concept_relationships(limit=100)
Use Cases
1. Consensus Building
Goal: Find well-established, high-confidence connections
# Get only AFFIRMATIVE relationships
consensus = facade.match_concept_relationships(
include_epistemic_status=["AFFIRMATIVE"]
)
# Build consensus graph
for rel in consensus:
source = rel['c1']['properties']['label']
target = rel['c2']['properties']['label']
rel_type = rel['r']['label']
confidence = rel['r']['properties'].get('confidence', 'N/A')
print(f"{source} --[{rel_type} (conf: {confidence})]-> {target}")
Use Cases: - Academic literature reviews (established facts) - Documentation generation (proven patterns) - Educational content (consensus knowledge)
2. Research Questions & Investigation
Goal: Identify areas needing further investigation
# Find contested relationships (mixed evidence)
contested = facade.match_concept_relationships(
include_epistemic_status=["CONTESTED"],
where="r.confidence > 0.5" # Still reasonably confident despite mixed grounding
)
# Analyze contested areas
for rel in contested:
source = rel['c1']['properties']['label']
target = rel['c2']['properties']['label']
rel_type = rel['r']['label']
print(f"Contested: {source} --[{rel_type}]-> {target}")
# → Suggests areas for further research or validation
Use Cases: - Identifying research gaps - Finding areas of active debate - Prioritizing validation efforts - Generating research questions
3. Dialectical Analysis
Goal: Explore thesis, antithesis, and synthesis patterns
# Thesis: Established connections
thesis_rels = facade.match_concept_relationships(
include_epistemic_status=["AFFIRMATIVE"]
)
# Antithesis: Points of contradiction
antithesis_rels = facade.match_concept_relationships(
include_epistemic_status=["CONTESTED", "CONTRADICTORY"]
)
# Analyze dialectical tension
print(f"Thesis statements: {len(thesis_rels)}")
print(f"Antithesis statements: {len(antithesis_rels)}")
print(f"Dialectical ratio: {len(antithesis_rels) / len(thesis_rels):.2f}")
Use Cases: - Philosophical analysis - Argumentative writing - Critical thinking exercises - Identifying intellectual tensions
4. Temporal Analysis
Goal: Compare current state vs. historical evolution
# Current state (exclude historical)
current_state = facade.match_concept_relationships(
exclude_epistemic_status=["HISTORICAL"]
)
# Historical context (only historical)
historical_context = facade.match_concept_relationships(
include_epistemic_status=["HISTORICAL"]
)
# Evolution analysis
print(f"Current relationships: {len(current_state)}")
print(f"Historical relationships: {len(historical_context)}")
Use Cases: - Tracking knowledge evolution - Understanding paradigm shifts - Documenting deprecated patterns - Historical research
5. Confidence-Based Filtering
Goal: Filter by reliability level
# High confidence + high grounding
reliable = facade.match_concept_relationships(
include_epistemic_status=["AFFIRMATIVE"],
where="r.confidence > 0.8"
)
# Mixed evidence but still valuable
uncertain = facade.match_concept_relationships(
include_epistemic_status=["CONTESTED"],
where="r.confidence > 0.5"
)
# Low confidence relationships (may need review)
low_confidence = facade.match_concept_relationships(
include_epistemic_status=["UNCLASSIFIED"],
where="r.confidence < 0.5"
)
Use Cases: - Risk assessment - Data quality analysis - Prioritizing verification - Building trust layers
Advanced Patterns
Pattern 1: Concept-Specific Role Analysis
def analyze_concept_roles(concept_id: str):
"""Analyze epistemic status distribution for a specific concept."""
roles = ["AFFIRMATIVE", "CONTESTED", "CONTRADICTORY", "HISTORICAL"]
role_counts = {}
for role in roles:
rels = facade.match_concept_relationships(
include_epistemic_status=[role],
where=f"c1.concept_id = '{concept_id}' OR c2.concept_id = '{concept_id}'"
)
role_counts[role] = len(rels)
return role_counts
# Example
counts = analyze_concept_roles("sha256:abc123...")
print(f"AFFIRMATIVE: {counts['AFFIRMATIVE']}")
print(f"CONTESTED: {counts['CONTESTED']}")
print(f"CONTRADICTORY: {counts['CONTRADICTORY']}")
Pattern 2: Dialectical Subgraph Extraction
def extract_dialectical_subgraph(topic_concept_id: str):
"""Extract thesis-antithesis relationships for a topic."""
# Thesis (well-supported)
thesis = facade.match_concept_relationships(
include_epistemic_status=["AFFIRMATIVE"],
where=f"c1.concept_id = '{topic_concept_id}'"
)
# Antithesis (contested/contradictory)
antithesis = facade.match_concept_relationships(
include_epistemic_status=["CONTESTED", "CONTRADICTORY"],
where=f"c1.concept_id = '{topic_concept_id}'"
)
return {
"thesis": thesis,
"antithesis": antithesis,
"synthesis_needed": len(antithesis) > 0
}
Pattern 3: Role Evolution Tracking
import json
from datetime import datetime
def track_role_evolution(vocab_type: str):
"""Track how a vocabulary type's epistemic status changes over time."""
# Get current role and stats
vt = facade.match_vocab_types(where=f"v.name = '{vocab_type}'")
if vt:
props = vt[0]['v']['properties']
measurement = {
"timestamp": datetime.now().isoformat(),
"vocab_type": vocab_type,
"epistemic_status": props.get('epistemic_status'),
"avg_grounding": props.get('epistemic_stats', {}).get('avg_grounding'),
"measured_concepts": props.get('epistemic_stats', {}).get('measured_concepts')
}
# Append to evolution log
with open(f"role_evolution_{vocab_type}.jsonl", "a") as f:
f.write(json.dumps(measurement) + "\n")
return measurement
return None
Performance Considerations
Query Overhead
Role filtering adds a VocabType lookup query before the main relationship query:
# Two queries executed:
# 1. MATCH (v:VocabType) WHERE v.epistemic_status IN ['AFFIRMATIVE'] RETURN v.name
# 2. MATCH (c1:Concept)-[r:TYPE1|TYPE2|...]->(c2:Concept) RETURN c1, r, c2
Impact: - VocabType query: ~1-5ms (35 vocab types → fast) - Relationship query: Depends on graph size - Total overhead: Negligible (~1-5ms for vocab lookup)
Optimization: - VocabType nodes are small (35 in test graph) - Lookup query is simple (indexed on epistemic_status if needed) - Relationship query benefits from reduced type list
Sample Size Tradeoffs
| Sample Size | Measurement Time | Precision | Use Case |
|---|---|---|---|
| 20 | ~10 seconds | Low | Quick check |
| 100 (default) | ~30 seconds | Medium | Standard use |
| 500 | ~2 minutes | High | Important decisions |
| 1000 | ~5 minutes | Very High | Research validation |
Recommendation: Use default 100 for most cases. Increase to 500+ when: - Making critical decisions based on roles - Publishing research results - Validating architectural changes
Limitations & Considerations
1. Temporal Nature
Semantic roles are temporal measurements, not permanent truths.
# Roles change as graph evolves
# Measurement 1 (Week 1): ENABLES is CONTESTED (+0.232)
# Measurement 2 (Week 4): ENABLES is AFFIRMATIVE (+0.856) # More supporting evidence added
Implication: Re-run measurement periodically to keep roles current.
2. Sample-Based Estimation
Roles are estimated from sampled edges, not exhaustive analysis.
Implication: Larger samples = more precision, but longer measurement time.
3. Bounded Locality
Grounding calculation uses limited recursion depth (bounded locality).
# Grounding is calculated with finite recursion
# Not infinite traversal (satisficing, not optimizing)
Implication: Results are "good enough" estimates, not perfect calculations.
4. Insufficient Data
New or rare vocabulary types may lack sufficient measurements.
Implication: Some types may be INSUFFICIENT_DATA or UNCLASSIFIED until more data exists.
5. No Automatic Updates
Semantic roles are NOT automatically recalculated when graph changes.
Implication: Treat stored roles as "last known measurement" with timestamp.
Best Practices
✅ Do
- Re-measure periodically as your graph evolves (weekly, monthly, or after major ingestion)
- Check timestamps to know when roles were last measured (
v.status_measured_at) - Use appropriate sample sizes for your use case (default 100 is usually fine)
- Combine with confidence filtering for robust queries (
include_epistemic_status + where="r.confidence > 0.8") - Document role-based decisions (e.g., "Used AFFIRMATIVE filter for consensus view on 2025-11-16")
❌ Don't
- Don't treat roles as permanent - they're temporal measurements
- Don't over-optimize sample size - default 100 is sufficient for most cases
- Don't rely solely on roles - combine with other signals (confidence, edge_count, etc.)
- Don't expect 100% coverage - some types will be INSUFFICIENT_DATA or UNCLASSIFIED
- Don't skip --verbose when investigating anomalies - it shows uncertainty metrics
Troubleshooting
Problem: No results with include_epistemic_status
# Query returns empty
results = facade.match_concept_relationships(
include_epistemic_status=["AFFIRMATIVE"]
)
# → []
Solution:
1. Check if epistemic statuses are stored: facade.match_vocab_types(where="v.epistemic_status IS NOT NULL")
2. Run measurement: kg vocab epistemic-status measure
3. Check if any types have that status: facade.match_vocab_types(where="v.epistemic_status = 'AFFIRMATIVE'")
Problem: All relationships are INSUFFICIENT_DATA
Solution:
- Graph is too small or too new
- Increase sample size: --sample-size 500
- Wait for more data to accumulate
- Check grounding calculation is working: Look for non-zero grounding values
Problem: Semantic roles seem incorrect
Solution:
1. Run with --verbose to see detailed stats
2. Check grounding distribution: v.epistemic_stats.grounding_distribution
3. Verify sample size was adequate
4. Re-run measurement with larger sample: --sample-size 500
5. Check if new data shifted grounding patterns
Testing
Tests: tests/test_query_facade.py::TestEpistemicStatusFiltering
# Run epistemic status filtering tests
pytest tests/test_query_facade.py::TestEpistemicStatusFiltering -v
# Expected output:
# ✓ All tests completed
# Phase 2 epistemic status filtering is working correctly
Test Coverage: - ✅ include_epistemic_status with single role - ✅ include_epistemic_status with multiple roles - ✅ exclude_epistemic_status - ✅ Combined rel_types + include_epistemic_status - ✅ Backward compatibility (no role parameters) - ✅ Dialectical queries (CONTESTED + CONTRADICTORY)
Related Documentation
- ADR-065: Vocabulary-Based Provenance Relationships
- ADR-044: Probabilistic Truth Convergence (grounding calculation)
- ADR-058: Polarity Axis Triangulation (grounding methodology)
- VALIDATION-RESULTS.md: Phase 1 validation results
- GraphQueryFacade:
api/api/lib/query_facade.py
Future Enhancements (Phase 3)
Potential future work:
- Auto-remeasurement: Background job to periodically recalculate roles
- Role-aware pruning: Preserve dialectical tension when pruning edges
- Temporal queries: Point-in-time semantic state reconstruction
- Role-weighted grounding: Adjust grounding calculation based on relationship roles
- Visualization: Graph coloring by epistemic status
- API endpoints: REST API support for role filtering
- CLI commands:
kg search --role AFFIRMATIVEsyntax
These await further validation with real-world usage patterns.
Summary
Semantic role filtering enables powerful, nuanced queries that go beyond traditional graph traversal:
- Dialectical analysis (thesis/antithesis)
- Confidence-based filtering (AFFIRMATIVE only)
- Temporal analysis (exclude HISTORICAL)
- Research prioritization (find CONTESTED areas)
The feature is fully backward compatible, well-tested, and production-ready. Roles are temporal measurements that embrace bounded locality and satisficing rather than claiming perfect knowledge.
For questions or issues, see docs/architecture/ADR-065-vocabulary-based-provenance-relationships.md.