Epistemic Status Query Filtering

Feature: ADR-065 Phase 2 Status: Implemented (2025-11-16) API: GraphQueryFacade.match_concept_relationships()

Overview

Semantic role filtering allows you to query relationships based on their epistemic status - a classification derived from grounding patterns that indicates whether a relationship type tends to be affirmative, contested, contradictory, or historical.

This enables powerful dialectical queries such as: - "Show me only high-confidence relationships" (AFFIRMATIVE) - "Show me points of tension and contradiction" (CONTESTED + CONTRADICTORY) - "Exclude outdated relationships" (exclude HISTORICAL) - "Find relationships that are actively debated" (CONTESTED only)

Epistemic Status Classifications

Roles are automatically detected by measuring grounding patterns across vocabulary types:

Role	Avg Grounding	Meaning	Example Use Case
AFFIRMATIVE	> 0.8	High-confidence, well-supported relationships	Building consensus views, finding established connections
CONTESTED	0.2 to 0.8	Mixed grounding, actively debated	Exploring uncertainty, finding areas needing investigation
CONTRADICTORY	< -0.5	Negative grounding, oppositional	Dialectical analysis, identifying conflicts
HISTORICAL	N/A	Temporal vocabulary (detected by name)	Time-based filtering, evolution tracking
UNCLASSIFIED	Other	Doesn't fit known patterns	Default fallback
INSUFFICIENT_DATA	N/A	< 3 measurements	Need more data

How It Works

Measurement: Run kg vocab epistemic-status measure to analyze grounding patterns
Storage: Semantic roles stored as VocabType properties (v.epistemic_status, v.epistemic_stats)
Querying: Use include_epistemic_status or exclude_epistemic_status parameters in GraphQueryFacade
Filtering: Facade queries VocabType nodes, builds relationship type list dynamically
Results: Only relationships matching role criteria are returned

Philosophy: Semantic roles are temporal measurements, not permanent classifications. Re-running measurement as your graph evolves will yield different results. This embraces bounded locality + satisficing (ADR-065).

Enabling Epistemic Status Filtering

Step 1: Measure Epistemic Status

Run the measurement command via kg CLI to analyze grounding patterns:

# Basic measurement (stores to database by default)
kg vocab epistemic-status measure

# Measure without storing (analysis only)
kg vocab epistemic-status measure --no-store

# Larger sample for more precision
kg vocab epistemic-status measure --sample-size 500

# Detailed analysis with uncertainty metrics
kg vocab epistemic-status measure --sample-size 200 --verbose

Output Example:

Epistemic Status Measurement Report
=================================

Summary:
  CONTESTED: 1
  UNCLASSIFIED: 6
  INSUFFICIENT_DATA: 28

CONTESTED (1)
  • ENABLES
    8 measurements from 8/8 edges | avg grounding: +0.232

📝 Storing epistemic statuss to VocabType nodes...
✓ Stored 35/35 epistemic statuss to VocabType nodes
  Phase 2 query filtering now available via GraphQueryFacade.match_concept_relationships()

Step 2: Verify Storage

Check that epistemic statuss were stored:

from api.app.lib.age_client import AGEClient

client = AGEClient()
facade = client.facade

# List vocabulary types with epistemic statuss
vocab_types = facade.match_vocab_types(
    where="v.epistemic_status IS NOT NULL"
)

for vt in vocab_types:
    props = vt['v']['properties']
    print(f"{props['name']}: {props['epistemic_status']} (avg: {props['epistemic_stats']['avg_grounding']:.3f})")

Example Output:

ENABLES: CONTESTED (avg: +0.232)
SUPPORTS: UNCLASSIFIED (avg: +0.165)
INFLUENCES: UNCLASSIFIED (avg: -0.049)

API Usage

Basic Role Filtering

from api.app.lib.age_client import AGEClient

client = AGEClient()
facade = client.facade

# Include only AFFIRMATIVE relationships (high confidence)
affirmative = facade.match_concept_relationships(
    include_epistemic_status=["AFFIRMATIVE"],
    limit=10
)

# Exclude HISTORICAL relationships (current state only)
current = facade.match_concept_relationships(
    exclude_epistemic_status=["HISTORICAL"],
    limit=10
)

Dialectical Queries

# Explore areas of tension and contradiction
dialectical = facade.match_concept_relationships(
    include_epistemic_status=["CONTESTED", "CONTRADICTORY"],
    limit=20
)

# Find well-established connections (thesis)
thesis = facade.match_concept_relationships(
    include_epistemic_status=["AFFIRMATIVE"]
)

# Find points of disagreement (antithesis)
antithesis = facade.match_concept_relationships(
    include_epistemic_status=["CONTESTED", "CONTRADICTORY"]
)

Combined Filtering

# Specific relationship type + epistemic status
enables_contested = facade.match_concept_relationships(
    rel_types=["ENABLES"],
    include_epistemic_status=["CONTESTED"],
    limit=10
)

# Multiple types + role filter
causal_affirmative = facade.match_concept_relationships(
    rel_types=["ENABLES", "CAUSES", "REQUIRES"],
    include_epistemic_status=["AFFIRMATIVE"]
)

# Type filter + exclude historical
current_supports = facade.match_concept_relationships(
    rel_types=["SUPPORTS", "VALIDATES"],
    exclude_epistemic_status=["HISTORICAL"]
)

Backward Compatibility

# Traditional queries still work (no role filtering)
all_supports = facade.match_concept_relationships(
    rel_types=["SUPPORTS"]
)

# No parameters - returns all relationships
all_rels = facade.match_concept_relationships(limit=100)

Use Cases

1. Consensus Building

Goal: Find well-established, high-confidence connections

# Get only AFFIRMATIVE relationships
consensus = facade.match_concept_relationships(
    include_epistemic_status=["AFFIRMATIVE"]
)

# Build consensus graph
for rel in consensus:
    source = rel['c1']['properties']['label']
    target = rel['c2']['properties']['label']
    rel_type = rel['r']['label']
    confidence = rel['r']['properties'].get('confidence', 'N/A')

    print(f"{source} --[{rel_type} (conf: {confidence})]-> {target}")

Use Cases: - Academic literature reviews (established facts) - Documentation generation (proven patterns) - Educational content (consensus knowledge)

2. Research Questions & Investigation

Goal: Identify areas needing further investigation

# Find contested relationships (mixed evidence)
contested = facade.match_concept_relationships(
    include_epistemic_status=["CONTESTED"],
    where="r.confidence > 0.5"  # Still reasonably confident despite mixed grounding
)

# Analyze contested areas
for rel in contested:
    source = rel['c1']['properties']['label']
    target = rel['c2']['properties']['label']
    rel_type = rel['r']['label']

    print(f"Contested: {source} --[{rel_type}]-> {target}")
    # → Suggests areas for further research or validation

Use Cases: - Identifying research gaps - Finding areas of active debate - Prioritizing validation efforts - Generating research questions

3. Dialectical Analysis

Goal: Explore thesis, antithesis, and synthesis patterns

# Thesis: Established connections
thesis_rels = facade.match_concept_relationships(
    include_epistemic_status=["AFFIRMATIVE"]
)

# Antithesis: Points of contradiction
antithesis_rels = facade.match_concept_relationships(
    include_epistemic_status=["CONTESTED", "CONTRADICTORY"]
)

# Analyze dialectical tension
print(f"Thesis statements: {len(thesis_rels)}")
print(f"Antithesis statements: {len(antithesis_rels)}")
print(f"Dialectical ratio: {len(antithesis_rels) / len(thesis_rels):.2f}")

Use Cases: - Philosophical analysis - Argumentative writing - Critical thinking exercises - Identifying intellectual tensions

4. Temporal Analysis

Goal: Compare current state vs. historical evolution

# Current state (exclude historical)
current_state = facade.match_concept_relationships(
    exclude_epistemic_status=["HISTORICAL"]
)

# Historical context (only historical)
historical_context = facade.match_concept_relationships(
    include_epistemic_status=["HISTORICAL"]
)

# Evolution analysis
print(f"Current relationships: {len(current_state)}")
print(f"Historical relationships: {len(historical_context)}")

Use Cases: - Tracking knowledge evolution - Understanding paradigm shifts - Documenting deprecated patterns - Historical research

5. Confidence-Based Filtering

Goal: Filter by reliability level

# High confidence + high grounding
reliable = facade.match_concept_relationships(
    include_epistemic_status=["AFFIRMATIVE"],
    where="r.confidence > 0.8"
)

# Mixed evidence but still valuable
uncertain = facade.match_concept_relationships(
    include_epistemic_status=["CONTESTED"],
    where="r.confidence > 0.5"
)

# Low confidence relationships (may need review)
low_confidence = facade.match_concept_relationships(
    include_epistemic_status=["UNCLASSIFIED"],
    where="r.confidence < 0.5"
)

Use Cases: - Risk assessment - Data quality analysis - Prioritizing verification - Building trust layers

Advanced Patterns

Pattern 1: Concept-Specific Role Analysis

def analyze_concept_roles(concept_id: str):
    """Analyze epistemic status distribution for a specific concept."""

    roles = ["AFFIRMATIVE", "CONTESTED", "CONTRADICTORY", "HISTORICAL"]
    role_counts = {}

    for role in roles:
        rels = facade.match_concept_relationships(
            include_epistemic_status=[role],
            where=f"c1.concept_id = '{concept_id}' OR c2.concept_id = '{concept_id}'"
        )
        role_counts[role] = len(rels)

    return role_counts

# Example
counts = analyze_concept_roles("sha256:abc123...")
print(f"AFFIRMATIVE: {counts['AFFIRMATIVE']}")
print(f"CONTESTED: {counts['CONTESTED']}")
print(f"CONTRADICTORY: {counts['CONTRADICTORY']}")

Pattern 2: Dialectical Subgraph Extraction

def extract_dialectical_subgraph(topic_concept_id: str):
    """Extract thesis-antithesis relationships for a topic."""

    # Thesis (well-supported)
    thesis = facade.match_concept_relationships(
        include_epistemic_status=["AFFIRMATIVE"],
        where=f"c1.concept_id = '{topic_concept_id}'"
    )

    # Antithesis (contested/contradictory)
    antithesis = facade.match_concept_relationships(
        include_epistemic_status=["CONTESTED", "CONTRADICTORY"],
        where=f"c1.concept_id = '{topic_concept_id}'"
    )

    return {
        "thesis": thesis,
        "antithesis": antithesis,
        "synthesis_needed": len(antithesis) > 0
    }

Pattern 3: Role Evolution Tracking

import json
from datetime import datetime

def track_role_evolution(vocab_type: str):
    """Track how a vocabulary type's epistemic status changes over time."""

    # Get current role and stats
    vt = facade.match_vocab_types(where=f"v.name = '{vocab_type}'")

    if vt:
        props = vt[0]['v']['properties']
        measurement = {
            "timestamp": datetime.now().isoformat(),
            "vocab_type": vocab_type,
            "epistemic_status": props.get('epistemic_status'),
            "avg_grounding": props.get('epistemic_stats', {}).get('avg_grounding'),
            "measured_concepts": props.get('epistemic_stats', {}).get('measured_concepts')
        }

        # Append to evolution log
        with open(f"role_evolution_{vocab_type}.jsonl", "a") as f:
            f.write(json.dumps(measurement) + "\n")

        return measurement

    return None

Performance Considerations

Query Overhead

Role filtering adds a VocabType lookup query before the main relationship query:

# Two queries executed:
# 1. MATCH (v:VocabType) WHERE v.epistemic_status IN ['AFFIRMATIVE'] RETURN v.name
# 2. MATCH (c1:Concept)-[r:TYPE1|TYPE2|...]->(c2:Concept) RETURN c1, r, c2

Impact: - VocabType query: ~1-5ms (35 vocab types → fast) - Relationship query: Depends on graph size - Total overhead: Negligible (~1-5ms for vocab lookup)

Optimization: - VocabType nodes are small (35 in test graph) - Lookup query is simple (indexed on epistemic_status if needed) - Relationship query benefits from reduced type list

Sample Size Tradeoffs

Sample Size	Measurement Time	Precision	Use Case
20	~10 seconds	Low	Quick check
100 (default)	~30 seconds	Medium	Standard use
500	~2 minutes	High	Important decisions
1000	~5 minutes	Very High	Research validation

Recommendation: Use default 100 for most cases. Increase to 500+ when: - Making critical decisions based on roles - Publishing research results - Validating architectural changes

Limitations & Considerations

1. Temporal Nature

Semantic roles are temporal measurements, not permanent truths.

# Roles change as graph evolves
# Measurement 1 (Week 1): ENABLES is CONTESTED (+0.232)
# Measurement 2 (Week 4): ENABLES is AFFIRMATIVE (+0.856)  # More supporting evidence added

Implication: Re-run measurement periodically to keep roles current.

2. Sample-Based Estimation

Roles are estimated from sampled edges, not exhaustive analysis.

# Sample size affects precision
# 100 edges → ±0.05 uncertainty
# 500 edges → ±0.02 uncertainty

Implication: Larger samples = more precision, but longer measurement time.

3. Bounded Locality

Grounding calculation uses limited recursion depth (bounded locality).

# Grounding is calculated with finite recursion
# Not infinite traversal (satisficing, not optimizing)

Implication: Results are "good enough" estimates, not perfect calculations.

4. Insufficient Data

New or rare vocabulary types may lack sufficient measurements.

# Only 2 edges → INSUFFICIENT_DATA
# Cannot reliably classify with < 3 measurements

Implication: Some types may be INSUFFICIENT_DATA or UNCLASSIFIED until more data exists.

5. No Automatic Updates

Semantic roles are NOT automatically recalculated when graph changes.

# Roles persist until you re-run measurement script
# Adding 1000 new concepts doesn't update roles

Implication: Treat stored roles as "last known measurement" with timestamp.

Best Practices

✅ Do

Re-measure periodically as your graph evolves (weekly, monthly, or after major ingestion)
Check timestamps to know when roles were last measured (v.status_measured_at)
Use appropriate sample sizes for your use case (default 100 is usually fine)
Combine with confidence filtering for robust queries (include_epistemic_status + where="r.confidence > 0.8")
Document role-based decisions (e.g., "Used AFFIRMATIVE filter for consensus view on 2025-11-16")

❌ Don't

Don't treat roles as permanent - they're temporal measurements
Don't over-optimize sample size - default 100 is sufficient for most cases
Don't rely solely on roles - combine with other signals (confidence, edge_count, etc.)
Don't expect 100% coverage - some types will be INSUFFICIENT_DATA or UNCLASSIFIED
Don't skip --verbose when investigating anomalies - it shows uncertainty metrics

Troubleshooting

Problem: No results with include_epistemic_status

# Query returns empty
results = facade.match_concept_relationships(
    include_epistemic_status=["AFFIRMATIVE"]
)
# → []

Solution: 1. Check if epistemic statuses are stored: facade.match_vocab_types(where="v.epistemic_status IS NOT NULL") 2. Run measurement: kg vocab epistemic-status measure 3. Check if any types have that status: facade.match_vocab_types(where="v.epistemic_status = 'AFFIRMATIVE'")

Problem: All relationships are INSUFFICIENT_DATA

# Measurement output shows:
# INSUFFICIENT_DATA: 35

Solution: - Graph is too small or too new - Increase sample size: --sample-size 500 - Wait for more data to accumulate - Check grounding calculation is working: Look for non-zero grounding values

Problem: Semantic roles seem incorrect

# ENABLES shows AFFIRMATIVE, but you expected CONTESTED

Solution: 1. Run with --verbose to see detailed stats 2. Check grounding distribution: v.epistemic_stats.grounding_distribution 3. Verify sample size was adequate 4. Re-run measurement with larger sample: --sample-size 500 5. Check if new data shifted grounding patterns

Testing

Tests: tests/test_query_facade.py::TestEpistemicStatusFiltering

# Run epistemic status filtering tests
pytest tests/test_query_facade.py::TestEpistemicStatusFiltering -v

# Expected output:
# ✓ All tests completed
# Phase 2 epistemic status filtering is working correctly

Test Coverage: - ✅ include_epistemic_status with single role - ✅ include_epistemic_status with multiple roles - ✅ exclude_epistemic_status - ✅ Combined rel_types + include_epistemic_status - ✅ Backward compatibility (no role parameters) - ✅ Dialectical queries (CONTESTED + CONTRADICTORY)

ADR-065: Vocabulary-Based Provenance Relationships
ADR-044: Probabilistic Truth Convergence (grounding calculation)
ADR-058: Polarity Axis Triangulation (grounding methodology)
VALIDATION-RESULTS.md: Phase 1 validation results
GraphQueryFacade: api/api/lib/query_facade.py

Future Enhancements (Phase 3)

Potential future work:

Auto-remeasurement: Background job to periodically recalculate roles
Role-aware pruning: Preserve dialectical tension when pruning edges
Temporal queries: Point-in-time semantic state reconstruction
Role-weighted grounding: Adjust grounding calculation based on relationship roles
Visualization: Graph coloring by epistemic status
API endpoints: REST API support for role filtering
CLI commands: kg search --role AFFIRMATIVE syntax

These await further validation with real-world usage patterns.

Summary

Semantic role filtering enables powerful, nuanced queries that go beyond traditional graph traversal:

Dialectical analysis (thesis/antithesis)
Confidence-based filtering (AFFIRMATIVE only)
Temporal analysis (exclude HISTORICAL)
Research prioritization (find CONTESTED areas)

The feature is fully backward compatible, well-tested, and production-ready. Roles are temporal measurements that embrace bounded locality and satisficing rather than claiming perfect knowledge.

For questions or issues, see docs/architecture/ADR-065-vocabulary-based-provenance-relationships.md.