Skip to content

ADR-069: Semantic FUSE Filesystem

Status: Proposed Date: 2025-11-28 Related ADRs: ADR-055 (Sharding), ADR-048 (GraphQueryFacade)

"Everything is a file" - Traditional Unix Philosophy "Everything is a file, but which file depends on what you're thinking about" - Semantic Unix Philosophy

Overview

Traditional filesystems force you to organize knowledge in rigid hierarchiesβ€”one directory, one path, one canonical location. But knowledge doesn't work that way. A document about embedding models is simultaneously about AI architecture, operational procedures, and bug fixes. Why should it live in only one folder?

The knowledge graph already solves this by letting concepts exist in multiple semantic contexts. But accessing it requires custom tools: CLI commands, web interfaces, MCP integration. Unix users already have powerful toolsβ€”grep, find, diff, tarβ€”that they know intimately, but these tools can't touch the graph.

This ADR proposes exposing the knowledge graph as a FUSE (Filesystem in Userspace) mount point, turning standard Unix tools into knowledge graph explorers. Type cd /mnt/knowledge/embedding-models/ and you're executing a semantic query. Run ls and you see concepts with similarity scores. Use grep -r across multiple mounted shards and you're running distributed queries. Same concepts appear in multiple "directories" because they belong to multiple contexts. The filesystem adapts to your exploration patterns, making knowledge navigation feel like browsing filesβ€”except the files organize themselves based on what they mean.


Abstract

This ADR proposes exposing the knowledge graph as a FUSE (Filesystem in Userspace) mount point, enabling semantic navigation and querying through standard Unix tools (ls, cd, cat, grep, find). Like /sys/ or /proc/, this is a partial filesystem that implements only operations that make semantic sense, providing a familiar interface to knowledge graph exploration.

Context

The Problem: Hierarchies Don't Fit Knowledge

Traditional filesystems organize knowledge through rigid hierarchies:

/docs/
  /architecture/
    /decisions/
      adr-068.md
  /guides/
    embedding-guide.md

But knowledge doesn't fit in trees. ADR-068 is simultaneously: - An architecture decision - A guide for operators - An embedding system reference - A bug fix chronicle - A compatibility management strategy

Why force it into one directory when it semantically belongs in multiple conceptual spaces?

The Opportunity: FUSE as Semantic Interface

The knowledge graph already provides: - Semantic search (vector similarity) - Relationship traversal (graph navigation) - Multi-ontology federation (shard/facet architecture from ADR-055) - Cross-domain linking (automatic concept merging)

FUSE could expose these capabilities through filesystem metaphors that users already understand.

Architectural Validation

This proposal underwent external peer review to validate feasibility against the existing codebase. Key findings:

  • Architectural Fit: The FUSE operations map directly to existing services without requiring new core logic
  • ls (semantic query) β†’ QueryService.build_search_query
  • cd relationships/ (graph traversal) β†’ QueryService.build_concept_details_query
  • Write operations β†’ existing async ingestion pipeline

  • Implementation Feasibility: High - essentially re-skinning existing services into FUSE protocol

  • Discovery Value: Solves the "I don't know what to search for" problem by allowing users to browse valid semantic pathways

  • Standard Tool Integration: Turns every Unix utility (grep, diff, tar) into a knowledge graph tool for free

The review validated this is a "rigorous application of the 'everything is a file' philosophy to high-dimensional data," not a cursed hack.

Performance and Consistency Engineering

External research on high-dimensional semantic file systems identified critical engineering considerations that our architecture already addresses:

1. The Write Latency Trap (Mitigated) - Risk: Synchronous embedding generation (15-50ms+) and graph linking (seconds) would block write() syscalls, hanging applications - Our Solution: Asynchronous worker pattern (ADR-014) with job queue - Writes accepted immediately to staging area - Background workers handle chunking, embedding, concept matching - POSIX-compliant write performance maintained

2. The Read (ls) Bottleneck (Mitigated) - Risk: Fresh vector searches or clustering on every readdir would cause sluggish directory listings - Our Solution: Query-time retrieval with caching - 100-200ms retrieval target (realistic for vector search + graph traversal) - PostgreSQL connection pooling for concurrent queries - Directory structure is deterministic (ontology-based), not emergent clustering - FUSE implementation will cache directory listings with configurable TTL

3. POSIX Stability via Deterministic Structure (Addressed) - Risk: Purely emergent clustering causes "cluster jitter" - files randomly moving between folders as content shifts - Our Solution: Stable four-level hierarchy (Shard β†’ Facet β†’ Ontology β†’ Concepts) - Paths are deterministic based on ontology assignment - Concepts appear in multiple semantic query directories (intentional non-determinism) - But underlying storage location is stable (ontology-scoped)

4. Eventual Consistency Gap (Acknowledged) - Risk: Async processing creates delay between write and appearance in semantic directories - Mitigation: Virtual README.md in empty query results (see Future Extensions) - Explains why results are empty - Suggests alternative queries or lower thresholds - Future: "Processing" indicator for in-flight ingestion

5. Connection Pool Saturation (Addressed) - Risk: "Thundering herd" when user pastes 1,000 files - every readdir hammers database - Our Solution: - PostgreSQL connection pooling (existing infrastructure) - FUSE TTL-based caching (mount option: cache_ttl=60) - Query rate limiting at API layer - Batch ingestion queuing (ADR-014 job scheduler)

Verdict: The architecture decouples high-latency "thinking" (AI processing) from low-latency "acting" (filesystem I/O), which research validates as the primary requirement for functional semantic filesystems.

This proposal builds on a rich history of semantic filesystems, though none have applied the "Directory = Query" metaphor to vector embeddings and probabilistic similarity.

1. Logic & Query-Based Systems (Direct Ancestors)

Semantic File System (SFS) - MIT, 1991 - Concept: Original implementation of "transducers" extracting attributes from files - Innovation: Virtual directories interpreted as queries (/sfs/author/jdoe dynamically generated) - Limitation: Attribute-based (key-value pairs), not semantic - Our Extension: Replace discrete attributes with continuous similarity scores

Tagsistant - Linux/FUSE - Concept: Directory nesting for boolean logic operations - Innovation: Path as query language (/tags/music/+/rock/ for AND operations) - Similarity: The /+/ operator is conceptually similar to our relationship traversal - Our Extension: Replace boolean logic with semantic similarity thresholds

JOINFS - Concept: Dynamic directories populated by metadata query matching - Innovation: mkdir "format=mp3" creates persistent searches - Similarity: Query definition via directory creation (like our approach) - Our Extension: Semantic queries vs. exact metadata matching

2. Tag-Based Systems (Modern Implementations)

TMSU (Tag My Sht Up) - Concept: SQLite-backed FUSE mount with explicit tagging - Architecture: Standard "FUSE + Database" pattern we follow - Similarity: Files exist in multiple paths (/mnt/tmsu/tags/music/mp3/) - Difference: Deterministic (file is tagged or not), no similarity threshold - Our Extension:* Probabilistic membership based on semantic similarity

TagFS / SemFS - Concept: RDF triples for tag storage (graph-like structure) - Similarity: Graph backend architecture (closer to our Knowledge Graph than SQL) - Difference: Explicit RDF relationships vs. emergent semantic relationships - Our Extension: Vector embeddings replace RDF triples

3. Partial POSIX Precedents

Google Cloud FUSE / rclone - Precedent: Explicitly documents "Limitations and differences from POSIX" - Validation: Large-scale ML workloads accept non-compliance for utility - Similar Violations: Directories disappear, non-deterministic caching, eventual consistency - Our Justification: If users accept this for cloud storage, they'll accept it for semantic navigation

Comparison Table

Feature Tagsistant TMSU MIT SFS (1991) ADR-069 (This Proposal)
Organization Boolean Logic Explicit Tags Key-Value Attributes Vector Embeddings
Navigation /tag1/+/tag2/ /tag1/tag2/ /author/name/ /query/threshold/
Determinism Deterministic Deterministic Deterministic Probabilistic
Backend SQL/Dedup SQLite Transducers Vector DB + LLM
Write Behavior Tags file Tags file Indexing Ingest & Grounding
Membership Model Binary (tagged/not) Binary Binary Continuous (similarity score)

The Key Innovation

Existing systems: Map discrete values (tags, attributes) β†’ directories - File either has tag "music" or it doesn't - Boolean membership: true/false - Deterministic listings

Our proposal: Map continuous values (similarity scores) β†’ directories - Concept has 73.5% similarity to query "embedding models" - Probabilistic membership: threshold-dependent - Non-deterministic listings (similarity changes as graph evolves)

This is the specific innovation that justifies the "POSIX violations" in our design - we're not just organizing files by metadata, we're navigating high-dimensional semantic space through a filesystem interface.

Motivation

Traditional filesystems organize knowledge through rigid hierarchies:

/docs/
  /architecture/
    /decisions/
      adr-068.md
  /guides/
    embedding-guide.md

But knowledge doesn't fit in trees. ADR-068 is simultaneously: - An architecture decision - A guide for operators - An embedding system reference - A bug fix chronicle - A compatibility management strategy

Why force it into one directory when it semantically belongs in multiple conceptual spaces?

The Proposal

Mount Point

mount -t fuse.knowledge-graph /dev/knowledge /mnt/knowledge

Directory Structure

Directories are semantic queries, not static folders:

/mnt/knowledge/
β”œβ”€β”€ embedding-regeneration/     # Concepts matching "embedding regeneration"
β”‚   β”œβ”€β”€ unified-regeneration.concept (79.8% similarity)
β”‚   β”œβ”€β”€ compatibility-checking.concept (75.2% similarity)
β”‚   └── model-migration.concept (78.5% similarity)
β”œβ”€β”€ ai-models/                  # Concepts matching "ai models"
β”‚   β”œβ”€β”€ embedding-models.concept (89.6% similarity)
β”‚   β”œβ”€β”€ unified-regeneration.concept (64.5% similarity)  # Same file!
β”‚   └── ai-capabilities.concept (70.6% similarity)
└── search/
    β”œβ”€β”€ 0.7/                    # 70% similarity threshold
    β”‚   └── embedding+models/
    β”œβ”€β”€ 0.8/                    # 80% similarity threshold
    β”‚   └── embedding+models/   # Fewer results
    └── 0.6/                    # 60% similarity threshold
        └── embedding+models/   # More results

File Format

Concept files are dynamically generated:

$ cat /mnt/knowledge/embedding-regeneration/unified-regeneration.concept
# Unified Embedding Regeneration

**ID:** sha256:95454_chunk1_76de0274
**Ontologies:** ADR-068-Phase4-Implementation, AI-Applications
**Similarity:** 79.8% (to directory query: "embedding regeneration")
**Grounding:** Weak (0.168, 17%)
**Diversity:** 39.2% (10 related concepts)

## Description

A system for regenerating vector embeddings across all graph text entities,
ensuring compatibility and proper namespace organization.

## Evidence

### Source 1: ADR-068-Phase4-Implementation (para 1)
The knowledge graph system needed a unified approach to regenerating vector
embeddings across all graph text entities (concepts, sources, and vocabulary)...

### Source 2: AI-Applications (para 1)
A unified embedding regeneration system addresses this challenge by treating
all embedded entities consistently...

## Relationships

β†’ INCLUDES compatibility-checking.concept
β†’ REQUIRES embedding-management-endpoints.concept
β†’ VALIDATES testing-verification.concept
← SUPPORTS bug-fix-source-regeneration.concept

## Navigate

ls ../ai-models/           # See related concepts in different semantic space
cd relationships/includes/ # Traverse by relationship type

Relationship Navigation

Traverse the graph via relationships:

$ cd /mnt/knowledge/embedding-regeneration/unified-regeneration/
$ ls relationships/
includes/  requires/  validates/  supported-by/

$ cd relationships/includes/
$ ls
compatibility-checking.concept

$ cat compatibility-checking.concept  # Full concept description

Search Interface

$ cd /mnt/knowledge/search/0.75/
$ mkdir "embedding+migration+compatibility"  # Creates query directory!
$ cd "embedding+migration+compatibility"/
$ ls  # Results ranked by similarity

POSIX Violations (Features!)

1. Non-Deterministic Directory Listings

$ ls /mnt/knowledge/embedding-models/
unified-regeneration.concept
compatibility-checking.concept
model-migration.concept

# New concept added to graph elsewhere...

$ ls /mnt/knowledge/embedding-models/
unified-regeneration.concept
compatibility-checking.concept
model-migration.concept
embedding-architecture.concept  # New! Without touching this directory!

Why it's beautiful: Your filesystem stays current with your knowledge, automatically.

2. Multiple Canonical Paths

$ pwd
/mnt/knowledge/embedding-regeneration/unified-regeneration.concept

$ cat unified-regeneration.concept
# ... reads file ...

$ pwd  # From the file's perspective
/mnt/knowledge/ai-models/unified-regeneration.concept

# Both are correct! The file exists in multiple semantic spaces!

Why it's beautiful: Concepts belong to multiple contexts simultaneously.

3. Read-Influenced Writes

$ cat concept-a.concept
$ cat concept-b.concept

# Graph notices correlation...

$ ls  # Now concept-c appears because semantic relatedness!
concept-a.concept
concept-b.concept
concept-c.concept  # ← Appeared based on your read pattern

Why it's beautiful: The filesystem adapts to your workflow.

$ ls -l /mnt/knowledge/embedding-regeneration/
lrwxrwxrwx compatibility β†’ [INCLUDES] ../compatibility-checking/
lrwxrwxrwx testing β†’ [VALIDATES] ../testing-verification/

# These aren't real symlinks, they're semantic relationships!
# Different relationship types could render differently!

Why it's beautiful: Explicit relationship semantics instead of opaque links.

5. Threshold-Dependent Paths

$ cd /mnt/knowledge/search/0.8/ai+models/
$ ls | wc -l
12

$ cd ../0.7/ai+models/  # Same query, lower threshold
$ ls | wc -l
27

$ cd ../0.9/ai+models/  # Higher threshold
$ ls | wc -l
5

Why it's beautiful: Precision vs. recall as a filesystem operation!

6. Temporal Inconsistency

$ stat unified-regeneration.concept
Modified: 2025-11-29 03:59:57  # When concept was created

$ cat unified-regeneration.concept  # Read it

$ stat unified-regeneration.concept
Modified: 2025-11-29 04:15:32  # NOW! Because grounding updated!

Why it's beautiful: Living knowledge, not static files.

Use Cases Where This Is Actually Useful

1. Exploratory Research

# Start with a concept
cd /mnt/knowledge/embedding-models/

# Navigate by relationships
cd unified-regeneration/relationships/requires/

# Follow to related concepts
cd compatibility-checking/relationships/includes/

# Emerge somewhere totally different but semantically connected!
pwd
# /mnt/knowledge/ai-models/compatibility-checking/relationships/includes/

2. Context-Aware Documentation

# You're working on AI models
cd /workspace/ai-stuff/

# Mount context-aware knowledge
ln -s /mnt/knowledge/ai-models/ ./docs

# Everything in ./docs is semantically relevant to AI!

3. Semantic Grep

# Traditional grep
grep -r "embedding" /docs/
# Returns every file mentioning "embedding" (thousands of false positives)

# Semantic filesystem
ls /mnt/knowledge/search/0.8/embedding/
# Returns only concepts semantically related to embedding at 80% threshold

4. AI-Assisted Workflows

# What concepts relate to what I'm working on?
git log --oneline -1
# fix: compatibility checking for embeddings

ls /mnt/knowledge/compatibility+checking/relationships/
requires/  includes/  supports/  related-to/

# Oh, it requires these other concepts!
cd requires/
ls
embedding-models.concept
model-migration.concept

Practical Applications That Sound Insane But Actually Work

TAR as Temporal Snapshots

# Capture your research state RIGHT NOW
tar czf research-$(date +%s).tar.gz /mnt/knowledge/embedding-models/

# Three months later: graph has evolved, new concepts exist
tar czf research-$(date +%s).tar.gz /mnt/knowledge/embedding-models/

# DIFFERENT tar contents!
# Same "directory", different semantic space!
# Each tarball is a temporal snapshot of the knowledge graph

Why this works: The filesystem is a view of the knowledge graph at a point in time. TAR captures that view. Different views = different archives. Version your knowledge semantically!

Practical use: - Archive research findings before pivoting - Create snapshots before major refactoring - Share "knowledge packs" with collaborators - Restore previous understanding states

Living Documentation in Development Workspaces

# Your project workspace
cd /workspace/my-ai-project/

# Symlink semantic knowledge as documentation
ln -s /mnt/knowledge/my-project/ ./docs

# Claude Code (or any IDE) can now:
cat docs/architecture/api-design.concept          # Read current architecture
ls docs/relationships/SUPPORTS/                    # See what supports this design
grep -r "performance" docs/                        # Semantic search in docs!

# As you work and ingest commit messages:
git commit -m "feat: add caching layer"
kg ingest commit HEAD -o my-project

# Moments later:
ls ./docs/
# NEW concepts appear automatically!
# caching-layer.concept
# performance-optimization.concept

Why this works: The symlink points to a semantic query. The query results update as the graph evolves. Your documentation becomes a living, self-organizing entity.

Claude Code integration:

# Claude can literally read your knowledge graph
<Read file="docs/api-design.concept">
# Gets: full concept, relationships, evidence, grounding metrics
# Not just static markdown

# Claude can explore relationships
cd docs/api-design/relationships/REQUIRES/
# Discovers dependencies automatically

Bidirectional Ingestion

# Write support makes this a full knowledge management system
echo "# New Architecture Decision

We're adopting GraphQL for the API layer because..." > /mnt/knowledge/my-project/adr-070.md

# File write triggers:
# 1. Document chunking
# 2. LLM concept extraction
# 3. Semantic matching against existing concepts
# 4. Relationship discovery
# 5. Graph integration

# Seconds later:
ls /mnt/knowledge/api-design/
# adr-070-graphql-adoption.concept appears!

# Batch ingestion:
cp docs/*.md /mnt/knowledge/my-project/
# Processes all files, discovers cross-document relationships automatically

Why this works: Every write is an ingestion trigger. The filesystem becomes a natural interface for knowledge capture.

Anti-pattern prevention:

# Only accept markdown/text
cp binary-file.exe /mnt/knowledge/
# Error: unsupported file type

# Prevent knowledge pollution
cp spam.txt /mnt/knowledge/my-project/
# Ingests but low grounding, won't pollute semantic queries

Build System Integration

# Makefile that depends on semantic queries
API_DOCS := $(shell ls /mnt/knowledge/api-endpoints/*.concept)

docs/api.html: $(API_DOCS)
    kg export --format html /mnt/knowledge/api-endpoints/ > $@

# When new API concepts appear (from code ingestion):
# - Build automatically detects new .concept files
# - Regenerates documentation
# - No manual tracking needed

Why this works: The filesystem exposes semantic queries as file paths. Build tools already know how to depend on file paths.

CI/CD integration:

# GitHub Actions
- name: Check documentation coverage
  run: |
    concept_count=$(ls /mnt/knowledge/my-project/*.concept | wc -l)
    if [ $concept_count -lt 50 ]; then
      echo "Warning: Only $concept_count concepts documented"
    fi

Event-Driven Workflows

# Watch for knowledge graph changes
fswatch /mnt/knowledge/my-project/ | while read event; do
    echo "Knowledge updated: $event"
    kg admin embedding regenerate --type concept --only-missing
done

# Trigger notifications when concepts appear
inotifywait -m /mnt/knowledge/security-vulnerabilities/ -e create |
while read dir action file; do
    notify-send "Security Alert" "New vulnerability concept: $file"
done

Why this works: Filesystem events map to knowledge graph updates. Standard Linux tools (inotify, fswatch) become knowledge graph event listeners.

Knowledge-driven automation:

# When AI research concepts appear, trigger model retraining
ls /mnt/knowledge/ai-research/*.concept | entr make train-model

# When architecture concepts change, validate against constraints
ls /mnt/knowledge/architecture/*.concept | entr ./validate-architecture.sh

Diff-Based Knowledge Evolution Tracking

# Semantic diff across time
tar czf snapshot-before.tar.gz /mnt/knowledge/my-research/

# ... three months of work ...

tar czf snapshot-after.tar.gz /mnt/knowledge/my-research/
tar xzf snapshot-before.tar.gz -C /tmp/before/
tar xzf snapshot-after.tar.gz -C /tmp/after/

diff -r /tmp/before/ /tmp/after/
# Shows concept evolution:
# - New concepts (+ files)
# - Strengthened concepts (modified files with higher grounding)
# - Abandoned concepts (- files, fell below similarity threshold)

Why this works: Concepts are files. Files can be diffed. Knowledge evolution becomes visible through standard Unix tools.

Architecture and Hierarchy

Important: This Is NOT a Full Filesystem

Like /sys/ or /proc/, this is a partial filesystem that exposes a specific interface (knowledge graphs) through filesystem semantics. It only implements operations that make semantic sense.

What works: - ls (semantic query) - cd (navigate semantic space) - cat (read concept) - find / grep (search) - echo > / cp (ingest) - tar (snapshot) - stat (metadata)

What doesn't work (and won't): - mv (concepts don't "move" in semantic space) - chmod / chown (use facet-level RBAC instead) - ln -s (maybe future: create relationships) - touch (timestamps are semantic, not file-based) - dd (nonsensical for semantic content) - Most other file operations that assume static files

This is a feature, not a limitation. Don't pretend to be a full filesystem. Be an excellent semantic interface.

The Four-Level Model

The semantic filesystem has a clear hierarchy that maps infrastructure to semantic content:

Shard (infrastructure: database + API + resources)
  └── Facet (logical grouping of related ontologies)
      └── Ontology (specific knowledge domain)
          └── Concepts (semantic content)

Why this hierarchy matters:

Level Purpose Example Isolation
Shard Physical deployment instance shard-research, shard-production Infrastructure (separate databases)
Facet Logical grouping for organization/RBAC academic, industrial, engineering Access control & resource limits
Ontology Knowledge domain namespace ai-research, api-docs, patents Semantic namespace
Concepts Individual semantic units embedding-models.concept Content

Directory Structure

/mnt/knowledge/
β”œβ”€β”€ shard-research/              # Shard: research infrastructure
β”‚   β”œβ”€β”€ academic/                # Facet: academic research group
β”‚   β”‚   β”œβ”€β”€ ai-research/         # Ontology: AI papers
β”‚   β”‚   β”‚   └── embedding-models.concept
β”‚   β”‚   β”œβ”€β”€ neuroscience/        # Ontology: neuroscience papers
β”‚   β”‚   └── ml-papers/           # Ontology: ML literature
β”‚   β”‚
β”‚   └── industrial/              # Facet: industrial R&D group
β”‚       β”œβ”€β”€ patents/             # Ontology: patent filings
β”‚       └── prototypes/          # Ontology: prototype docs
β”‚
β”œβ”€β”€ shard-production/            # Shard: production infrastructure
β”‚   β”œβ”€β”€ engineering/             # Facet: engineering team
β”‚   β”‚   β”œβ”€β”€ api-docs/            # Ontology: API documentation
β”‚   β”‚   β”œβ”€β”€ architecture/        # Ontology: architecture decisions
β”‚   β”‚   └── runbooks/            # Ontology: operational runbooks
β”‚   β”‚
β”‚   └── compliance/              # Facet: compliance team
β”‚       β”œβ”€β”€ gdpr/                # Ontology: GDPR documentation
β”‚       └── soc2/                # Ontology: SOC2 compliance
β”‚
└── shard-partners/              # Shard: partner infrastructure (remote)
    └── shared/                  # Facet: shared knowledge
        └── api-integration/     # Ontology: integration docs

Why Facets?

Facets provide logical organization within a shard without requiring separate infrastructure:

  1. Access Control Boundaries:

    # Academic team: read/write to academic/ facet
    # Industrial team: read/write to industrial/ facet
    # Same database, different permissions
    

  2. Resource Isolation:

    # Academic facet: high ingestion rate, low query rate
    # Industrial facet: low ingestion rate, high query rate
    # Same infrastructure, different resource profiles
    

  3. Namespace Management:

    # Both facets can have "documentation" ontology:
    /mnt/knowledge/shard-research/academic/documentation/
    /mnt/knowledge/shard-research/industrial/documentation/
    # No collision!
    

  4. Organizational Clarity:

    ls /mnt/knowledge/shard-research/
    academic/      # University research
    industrial/    # Corporate R&D
    # Clear logical separation
    

Mount Options at Different Levels

# Mount entire shard (all facets, all ontologies)
mount -t fuse.knowledge-graph \
  -o api_url=http://localhost:8000 \
  -o client_id=fuse-client \
  -o client_secret=$FUSE_SECRET \
  -o shard=research \
  /dev/knowledge /mnt/knowledge/research

ls /mnt/knowledge/research/
academic/  industrial/

# Mount specific facet (all ontologies in facet)
mount -t fuse.knowledge-graph \
  -o client_id=fuse-client,client_secret=$FUSE_SECRET \
  -o shard=research,facet=academic \
  /dev/knowledge /mnt/knowledge/academic

ls /mnt/knowledge/academic/
ai-research/  neuroscience/  ml-papers/

# Mount specific ontology (direct semantic access)
mount -t fuse.knowledge-graph \
  -o client_id=fuse-client,client_secret=$FUSE_SECRET \
  -o shard=research,facet=academic,ontology=ai-research \
  /dev/knowledge /mnt/knowledge/ai-research

ls /mnt/knowledge/ai-research/
# Shows semantic query space directly
embedding-models/  neural-networks/  transformers/

Note: All mount operations use OAuth client authentication (ADR-054). The same client credentials work across FUSE, MCP server, and CLI - they're all clients of the same API backend.

Cross-Shard, Cross-Facet Queries

Standard Unix tools traverse the hierarchy automatically:

# Search across all mounted shards, facets, and ontologies
find /mnt/knowledge/ -name "*.concept" | grep "embedding"

# Traverses:
# 1. Shards (local + remote)
#    β”œβ”€β”€ shard-research (local FUSE β†’ local PostgreSQL)
#    └── shard-partners (SSHFS β†’ remote FUSE β†’ remote PostgreSQL)
#
# 2. Facets within each shard
#    β”œβ”€β”€ academic
#    β”œβ”€β”€ industrial
#    └── shared
#
# 3. Ontologies within each facet
#    β”œβ”€β”€ ai-research
#    β”œβ”€β”€ patents
#    └── api-integration
#
# 4. Semantic queries within each ontology
#    └── embedding-models.concept (found!)

# All through standard Unix tooling!

The magic: find and grep don't know about: - Knowledge graphs - Semantic queries - Shard boundaries - Local vs. remote mounts

They just traverse directories and read files. The abstraction is perfect.

Distributed Queries Across Mount Boundaries

# Mount local shards
mount -t fuse.knowledge-graph -o shard=research /dev/knowledge /mnt/local/research
mount -t fuse.knowledge-graph -o shard=production /dev/knowledge /mnt/local/production

# Mount remote shards via SSH
sshfs partner-a@remote:/mnt/knowledge/shared /mnt/remote/partner-a
sshfs partner-b@remote:/mnt/knowledge/public /mnt/remote/partner-b

# Now grep across ALL of them:
grep -r "API compatibility" /mnt/{local,remote}/*/

# What actually happens:
# 1. grep traverses /mnt/local/research/
#    β†’ FUSE reads local database
#    β†’ Returns concept files as text
#
# 2. grep traverses /mnt/local/production/
#    β†’ FUSE reads local database
#    β†’ Returns concept files as text
#
# 3. grep traverses /mnt/remote/partner-a/
#    β†’ SSHFS sends reads over SSH
#    β†’ Remote FUSE reads remote database
#    β†’ SSH returns concept files as text
#
# 4. grep traverses /mnt/remote/partner-b/
#    β†’ Same: SSHFS β†’ SSH β†’ remote FUSE β†’ remote database

# Result: distributed semantic search across multiple knowledge graphs
# Using only: grep, mount, and sshfs
# No special distributed query protocol needed

This is profound: Standard Unix tools become distributed knowledge graph query engines simply by mounting semantic filesystems at different paths.

Write Operations Respect Hierarchy

cd /mnt/knowledge/research/academic/ai-research/embedding-models/

# Write here β†’ ingests into:
# - Shard: research
# - Facet: academic
# - Ontology: ai-research
# - Context: embedding-models (semantic query)
echo "# Quantization Techniques..." > quantization.md

# Concept appears in:
# βœ“ /mnt/knowledge/research/academic/ai-research/
# βœ— NOT in /mnt/knowledge/research/industrial/patents/
# Same shard, different facet = isolated

Federation and Discovery

# Local shard (FUSE β†’ local knowledge graph)
mount -t fuse.knowledge-graph -o shard=research /dev/knowledge /mnt/local

# Remote shard (SSHFS β†’ remote FUSE β†’ remote knowledge graph)
sshfs partner@partner.com:/mnt/knowledge/shared \
      /mnt/remote

# Now find operates across BOTH:
find /mnt/{local,remote}/ -name "*.concept" | grep "api"

# Returns concepts from:
# - Local research shard (all facets)
# - Remote partner shard (shared facet)
# Distributed knowledge graph queries via standard Unix tools!

Path Semantics

Every path encodes the full context:

/mnt/knowledge/shard-research/academic/ai-research/embedding-models/quantization.concept
β”‚              β”‚              β”‚        β”‚            β”‚                β”‚
β”‚              β”‚              β”‚        β”‚            β”‚                └─ Concept (semantic entity)
β”‚              β”‚              β”‚        β”‚            └─────────────────── Semantic query context
β”‚              β”‚              β”‚        └──────────────────────────────── Ontology (knowledge domain)
β”‚              β”‚              └───────────────────────────────────────── Facet (logical group)
β”‚              └──────────────────────────────────────────────────────── Shard (infrastructure)
└─────────────────────────────────────────────────────────────────────── Mount point

Deterministic structure, semantic content.

Implementation Sketch

Technology Stack

  • FUSE: Filesystem in Userspace (client interface)
  • Backend: FastAPI REST API server
  • Query Engine: Semantic search API (part of backend)
  • Cache: TTL-based concept cache (fights non-determinism slightly)

Note: The FUSE filesystem is a client interface, just like the MCP server, CLI, and web interface. All clients communicate with the same FastAPI backend.

Basic Operations

class SemanticFS(Operations):
    def readdir(self, path, fh):
        """List directory = semantic query"""
        query = path_to_query(path)
        concepts = kg.search(query, threshold=0.7)
        return [f"{c.id}.concept" for c in concepts]

    def read(self, path, size, offset, fh):
        """Read file = get concept details"""
        concept_id = path_to_concept_id(path)
        concept = kg.get_concept(concept_id)
        return format_concept_markdown(concept)

    def getattr(self, path, fh=None):
        """Stat file = concept metadata"""
        concept = kg.get_concept(path_to_concept_id(path))
        return {
            'st_mode': S_IFREG | 0o444,  # Read-only
            'st_size': len(concept.description),
            'st_mtime': concept.last_updated,  # Changes with grounding!
        }

Mount Options

mount -t fuse.knowledge-graph \
  -o api_url=http://localhost:8000 \    # API server endpoint
  -o client_id=fuse-client \            # OAuth client ID (ADR-054)
  -o client_secret=$FUSE_SECRET \       # OAuth client secret
  -o threshold=0.75 \                   # Default similarity threshold
  -o cache_ttl=60 \                     # Cache concepts for 60s
  -o relationship_links=true \          # Show relationship symlinks
  -o dynamic_discovery=true \           # Concepts appear based on access patterns
  /dev/knowledge /mnt/knowledge

Authentication: FUSE authenticates as an OAuth client (ADR-054), just like the MCP server and CLI. The same client credentials can be shared across all client interfaces, or each can have its own client ID for granular access control.

Alternative: rclone Backend Implementation

Instead of writing a custom FUSE driver, implement as an rclone backend.

Why rclone? - rclone already handles FUSE mounting, caching, config management - Implement knowledge graph as "just another backend" (like S3, Google Drive) - Get interop between knowledge graphs and cloud storage for free - Users already understand rclone's model

Implementation:

// rclone backend for knowledge graphs
package kg

import (
    "context"
    "github.com/rclone/rclone/fs"
)

func init() {
    fs.Register(&fs.RegInfo{
        Name:        "kg",
        Description: "Knowledge Graph Backend",
        NewFs:       NewFs,
        Options: []fs.Option{{
            Name: "api_url",
            Default: "http://localhost:8000",
        }, {
            Name: "shard",
        }, {
            Name: "client_id",
            Help: "OAuth client ID (ADR-054)",
        }, {
            Name: "client_secret",
            Help: "OAuth client secret",
        }},
    })
}

// List directory = semantic query
func (f *Fs) List(ctx context.Context, dir string) (entries fs.DirEntries, err error) {
    facet, ontology, query := parsePath(dir)
    concepts, err := f.client.Search(ctx, query, ontology)
    for _, concept := range concepts {
        entries = append(entries, conceptToEntry(concept))
    }
    return entries, nil
}

// Open file = read concept as markdown
func (o *Object) Open(ctx context.Context) (io.ReadCloser, error) {
    concept, err := o.fs.client.GetConcept(ctx, o.conceptID)
    markdown := formatConceptMarkdown(concept)
    return io.NopCloser(strings.NewReader(markdown)), nil
}

// Put file = ingest into knowledge graph
func (f *Fs) Put(ctx context.Context, in io.Reader, src fs.ObjectInfo) (fs.Object, error) {
    data, _ := io.ReadAll(in)
    facet, ontology, _ := parsePath(src.Remote())
    result, err := f.client.Ingest(ctx, data, ontology, facet)
    return &Object{...}, nil
}

Usage:

# Configure knowledge graph backend (OAuth client authentication)
rclone config create kg-research kg \
  api_url=http://localhost:8000 \
  shard=research \
  client_id=rclone-client \
  client_secret=$RCLONE_SECRET

# Mount it
rclone mount kg-research:academic/ai-research /mnt/knowledge

# Works like any rclone mount
ls /mnt/knowledge/
cat /mnt/knowledge/embedding-models.concept
echo "new idea" > /mnt/knowledge/new-concept.md

Note: Uses same OAuth client authentication (ADR-054) as MCP server and CLI. The same client credentials can be reused, or rclone can have its own client ID for separate access control policies.

Bonus: Cross-Backend Operations

# Backup knowledge graph to S3
rclone sync kg-research: s3:backup/kg-snapshot/

# Ingest Google Drive docs into knowledge graph
rclone copy gdrive:Papers/ kg-research:academic/papers/

# Sync between knowledge graph shards
rclone sync kg-shard-a: kg-shard-b:

# Export concepts to git repository
rclone sync kg-research: /tmp/kg-export/
cd /tmp/kg-export && git init && git add . && git commit

# Use rclone browser GUI to explore knowledge graph
rclone rcd --rc-web-gui

Benefits: - Don't write FUSE layer (rclone handles it) - Get caching, retry logic, rate limiting for free - Instant interop with cloud storage backends - Existing rclone user base understands the model - rclone browser GUI works automatically

Implementation effort: Minimal backend (List/Read/Write) could be prototyped in a weekend.

Why This Will Make Unix Admins Angry

The Angry Tweets We Expect

"This violates everything POSIX stands for. Files shouldn't magically appear and disappear."

Yes. That's the point. Knowledge isn't static.

"How am I supposed to backup a filesystem where tar gives different results each time?"

You backup the knowledge graph, not the filesystem. The filesystem is a view of knowledge.

"My scripts depend on deterministic ls output!"

Your scripts are thinking in hierarchies. Think in semantics instead.

"find . -name '*.concept' | wc -l returns different numbers!"

Correct! The number of concepts matching your context changes as you explore.

"This breaks rsync!"

Have you considered that maybe rsync should understand semantic similarity? πŸ€”

The rclone Defense

"This is just like rclone for Google Drive!"

Yes. Exactly. And millions of people use rclone daily despite its POSIX violations.

rclone for Google Drive exhibits: - Non-deterministic listings: Files appear/disappear as others edit shared drives - Multiple canonical paths: Same file accessible via /MyDrive/ and /SharedDrives/ (Google's "Add to My Drive") - Eventually consistent: Write a file, read might return old content (API sync lag) - Weird metadata: Fake Unix permissions from Google's ACLs, timestamps from cloud provider - Partial POSIX: No symlinks, no memory mapping, fake chmod/chown

People accept this because the abstraction is useful.

Semantic FUSE is actually BETTER than rclone:

Aspect rclone (Google Drive) Semantic FUSE
Non-determinism Network sync (unpredictable) Semantic relevance (intentional)
Multiple paths Google's sharing model (confusing) Semantic contexts (by design)
Performance Network latency, API rate limits Local database (consistent)
Metadata Fake Unix perms from ACLs (awkward) Native semantic data (grounding, similarity)
Consistency Eventually consistent (network) Immediately consistent (local)

rclone documentation literally says:

"Note that many operations are not fully POSIX compliant. This is an inherent limitation of cloud storage systems."

Our documentation:

"Note that many operations are not fully POSIX compliant. This is an inherent limitation of exposing semantic graphs as filesystems."

Same energy. Same usefulness. Same tradeoffs.

If you accept rclone's weirdness for the convenience of grep-ing Google Drive, you'll accept semantic FUSE's weirdness for the convenience of grep-ing knowledge graphs.

The Defenses We Don't Care About

"But the POSIX specification says..."

The POSIX specification doesn't account for semantic knowledge graphs. Times change.

"This would break every tool!"

Good! Those tools assume files are in trees. Knowledge isn't a tree.

"What about make? What about git?"

Don't use this for source code. Use it for knowledge about source code.

"This is cursed."

Yes. Beautifully cursed. Like all the best ideas.

Practical Limitations

What This Is NOT Good For

  • Source code version control (use git)
  • Binary file storage (use object storage)
  • High-performance computing (use tmpfs)
  • Traditional backups (use the graph's native backup)
  • Anything requiring determinism (use a real filesystem)

What This IS Good For

  • Research and exploration
  • Documentation navigation
  • Semantic code search
  • Learning domain knowledge
  • Following conceptual trails
  • AI-assisted development workflows

Future Extensions

Write Support

$ mkdir /mnt/knowledge/my-new-concept/
$ echo "Description: A revolutionary new idea..." > description.md
$ echo "Ontology: MyProject" > .ontology

# Automatically ingested and linked!

Relationship Creation

$ ln -s ../target-concept.concept relationship/supports/
# Creates SUPPORTS relationship in the graph!

Query Operators

$ cd /mnt/knowledge/search/AND/embedding+models/
$ cd /mnt/knowledge/search/OR/ai+ml/
$ cd /mnt/knowledge/search/NOT/embedding-models/

Grounding Filters

$ cd /mnt/knowledge/grounding/strong/embedding-models/
# Only concepts with strong grounding (>0.5)

Decision

Implement knowledge graph access as a FUSE filesystem with the following design choices:

  1. Partial Filesystem Model - Like /sys/ or /proc/, implement only semantically meaningful operations
  2. Support: ls (query), cd (navigate), cat (read), grep/find (search), echo/cp (ingest), tar (snapshot)
  3. Do not support: mv, chmod, chown, touch, dd (operations that don't map to semantic concepts)

  4. Four-Level Hierarchy - Map infrastructure to semantics:

  5. Shard (infrastructure: database + API + resources)
  6. Facet (logical grouping: RBAC + resource isolation)
  7. Ontology (knowledge domain namespace)
  8. Concepts (semantic content)

  9. Directory Creation = Semantic Query - User creates directories with query names

  10. mkdir "embedding models" defines a semantic query
  11. cd embedding-models/ executes the query
  12. ls shows concepts matching the query at configured similarity threshold

  13. Relationship Navigation - Concepts expose relationships/ subdirectory

  14. cd concept.concept/relationships/SUPPORTS/ traverses graph edges
  15. Path represents traversal history (deterministic structure, semantic content)

  16. Write = Ingest - File writes trigger automatic ingestion

  17. echo "content" > file.md ingests into current ontology/facet context
  18. File may not reappear with same name (concept extraction determines label)
  19. Embraces non-determinism as feature (concepts appear based on semantic relevance)

  20. Implementation Options - Two paths forward:

  21. Option A: Custom FUSE driver in Python (full control, more code)
  22. Option B: rclone backend in Go (leverage existing infrastructure, instant interop)

Consequences

Benefits

1. Familiar Interface for Semantic Exploration - Users already understand cd, ls, cat, grep - No need to learn custom query language or web UI - Standard Unix tools become knowledge graph query engines

2. Distributed Queries via Standard Tools

# Transparently searches local + remote shards
find /mnt/knowledge/ -name "*.concept" | grep "pattern"
# - Local shards: FUSE β†’ local PostgreSQL
# - Remote shards: SSHFS β†’ SSH β†’ remote FUSE β†’ remote PostgreSQL

3. Cross-Backend Interoperability (if rclone implementation)

# Backup knowledge graph to S3
rclone sync kg:research s3:backup/

# Ingest from Google Drive
rclone copy gdrive:Papers/ kg:research/papers/

# Export to git repository
rclone sync kg:research /tmp/export/

4. TAR as Temporal Snapshots

tar czf snapshot-$(date +%s).tar.gz /mnt/knowledge/my-research/
# Same path, different contents over time
# Version your semantic space

5. Living Documentation in Workspaces

ln -s /mnt/knowledge/my-project/ ./docs
# Documentation auto-updates as concepts evolve
# Claude Code can read semantic graph directly

Drawbacks

1. Non-Determinism Can Be Confusing - ls results change as graph evolves - Same query returns different results over time - Mitigation: Clear documentation, caching options, embrace as feature

2. POSIX Violations Require Education - Many standard file operations won't work - Users expect traditional filesystem behavior - Mitigation: Follow rclone precedent, document limitations clearly

3. Performance Considerations - Semantic queries slower than filesystem metadata operations - Graph traversal can be expensive for deep relationships - Mitigation: Caching layer, configurable similarity thresholds, limit traversal depth

4. Implementation Complexity - Custom FUSE: ~2000-3000 lines of Python - rclone backend: ~500-1000 lines of Go + API wrapper - Either requires ongoing maintenance

Risks

1. User Confusion - Non-deterministic behavior violates expectations - Mitigation: Clear "partial filesystem" designation, precedent from rclone

2. Performance at Scale - Large knowledge graphs may be slow - Mitigation: Shard/facet architecture limits query scope

3. Adoption Barrier - Requires FUSE support, mount permissions - Mitigation: Provide alternative interfaces (web UI, CLI, MCP)

Alternatives Considered

1. WebDAV/HTTP Filesystem

Pros: Cross-platform, no FUSE required, browser-compatible Cons: Poorer performance, limited caching, no local integration Decision: FUSE provides better Unix integration, can add WebDAV later

2. Git-Like Interface

Pros: Familiar to developers, built-in versioning, distributed Cons: Concepts aren't commits, relationships aren't branches, poor semantic fit Decision: Git is for version control, not semantic navigation

3. Custom CLI Only

Pros: Full control, no filesystem abstraction mismatch Cons: Users must learn new commands, can't use standard Unix tools Decision: CLI exists (kg command), FUSE adds complementary interface

4. SQL/GraphQL Query Interface

Pros: Powerful queries, precise results, standard protocols Cons: Requires learning query language, no filesystem metaphor benefits Decision: APIs exist, FUSE provides filesystem convenience layer

5. Database-as-Filesystem (Direct PostgreSQL Mount)

Pros: Tools exist (pgfuse), direct database access Cons: Exposes tables/rows, not semantic concepts, wrong abstraction level Decision: Need semantic layer, not raw database access

Implementation Recommendation

Update (Post Peer Review): After architectural review, we are strongly leaning toward Python FUSE (Option A) for the MVP, though not yet committed.

Reconsidering Python FUSE (Option A)

Advantages for our specific architecture:

  1. Shared Logic Layer - All core services (QueryService, EmbeddingModel, GraphQueryFacade) are Python
  2. Can import services directly without HTTP overhead
  3. Zero-latency local operations during development
  4. No schema drift between FUSE layer and graph layer

  5. Complex Traversal Support - Deep graph schema knowledge (ADR-048)

  6. Relationship navigation requires VocabType awareness
  7. Dynamic relationship discovery easier in Python
  8. Access to full graph context without API round-trips

  9. Tight Integration - Same runtime as API server

  10. Can mount on same machine as database for testing
  11. Direct access to PostgreSQL connection pool
  12. Shared caching layer with existing services

Implementation with pyfuse3:

import pyfuse3
from api.services.query_service import QueryService

class SemanticFS(pyfuse3.Operations):
    def __init__(self):
        self.query_service = QueryService()  # Direct import!

    async def readdir(self, inode, off, token):
        # Direct service call, no HTTP
        concepts = await self.query_service.execute_search(query, threshold=0.7)
        for concept in concepts:
            pyfuse3.readdir_reply(token, f"{concept.label}.concept", ...)

When to use rclone instead (Option B): - Remote mounting (laptop β†’ cloud server) - OAuth management for remote instances - Cross-backend sync requirements (knowledge graph ↔ S3/Google Drive) - Deployment to users unfamiliar with Python infrastructure

Current stance: Prototype with Python FUSE for local/development use. Both implementations may coexist - Python for tight integration, rclone for remote access and OAuth workflows.

Authentication (applies to both approaches): Both Python FUSE and rclone implementations use the same OAuth client authentication system (ADR-054) as the MCP server and CLI. This means: - Same client credentials can be shared across all client interfaces - Consistent authentication flow regardless of client type - Granular access control via separate client IDs if needed - FUSE authenticates to the API server just like any other client

Future Extensions

Core Features

  • Relationship-based symbolic links (ln -s concept relationships/SUPPORTS/)
  • Query operators (/search/AND/, /search/OR/, /search/NOT/)
  • Grounding filters (/grounding/strong/, /grounding/weak/)
  • Write support for relationship creation
  • Multi-shard federated views

Usability Enhancements (From Peer Review)

1. Empty Directory Problem Solution

When semantic queries return no results, generate a virtual README.md explaining why:

mkdir /mnt/knowledge/research/unicorn-physics/
ls /mnt/knowledge/research/unicorn-physics/
# Empty directory - no matching concepts

cat /mnt/knowledge/research/unicorn-physics/README.md
# Query 'unicorn physics' (Threshold: 0.7) matched 0 concepts in ontology 'research'.
#
# Suggestions:
# - Lower threshold: /mnt/knowledge/search/0.5/unicorn+physics/
# - Try broader query: /mnt/knowledge/research/physics/
# - Check available ontologies: ls /mnt/knowledge/

Benefits: Users understand empty results instead of wondering if the system is broken.

2. Tarball Snapshots with Temporal Metadata

Include a .manifest file in every tarball to enable "time travel":

tar czf snapshot-$(date +%s).tar.gz /mnt/knowledge/research/

tar tzf snapshot-*.tar.gz | head -5
.manifest
embedding-models.concept
neural-networks.concept
...

cat .manifest
{
  "snapshot_timestamp": "2025-11-28T23:45:00Z",
  "graph_revision": "a3b2c1d4",
  "shard": "research",
  "facet": "academic",
  "ontology": "ai-research",
  "query_threshold": 0.7,
  "concept_count": 127,
  "embedding_model": "nomic-ai/nomic-embed-text-v1.5"
}

Benefits: - Restore semantic state from snapshots - Track knowledge evolution over time - Debug "why did this concept disappear?"

3. RBAC Integration via Filesystem Permissions

Map filesystem permission bits to OAuth scopes from ADR-054/055:

ls -l /mnt/knowledge/shard-production/
drwxr-xr-x  engineering/     # User has write:engineering scope
drwxr-xr--  compliance/      # User has read:compliance scope (no write)
d---------  finance/         # User has no access

# Attempting to write without scope:
echo "test" > /mnt/knowledge/shard-production/compliance/test.md
# Permission denied (requires write:compliance scope)

Implementation: Check OAuth scopes during FUSE access() and open() operations.

Benefits: - Familiar Unix permission model - Natural RBAC enforcement - Tools like ls -l show access levels automatically

References

Implementation Tools

Internal Architecture

  • ADR-055: Sharding and facet architecture
  • ADR-048: Query safety and namespace isolation
  • ADR-054: OAuth client management

Knowledge doesn't fit in trees. It forms graphs. Your filesystem should too. πŸŒ³β†’πŸ•ΈοΈ