ADR-089: Deterministic Node and Edge Creation
Status
DRAFT
Context
The knowledge graph currently populates exclusively through the ingest pipeline: documents are chunked, concepts are extracted via LLM, and nodes/edges are created with embedding-based matching. This works well for document-driven knowledge but doesn't support:
- Manual curation - Humans wanting to directly create/edit/delete concepts via web workstation
- Agent-driven creation - MCP tools that let AI agents build knowledge structures programmatically
- LLM-assisted curation - Using LLM to help humans draft/refine concepts (not document extraction)
- Bulk import - Loading structured data (CSV, JSON) without LLM processing
- Subgraph construction - Creating independent concept clusters for specific purposes
- Foreign graph import - Importing knowledge graphs from external systems (Neo4j exports, RDF, JSON-LD, etc.)
- Filesystem exposure - Concepts accessible via FUSE for file-based editing workflows
Key Workflow: Iterative Graph Enrichment
A primary use case for deterministic creation is the research-then-enrich cycle:
┌─────────────────────────────────────────────────────────────────────────┐
│ 1. INGEST │
│ Documents → LLM extraction → Initial graph │
└─────────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────────┐
│ 2. RESEARCH │
│ Query graph → Discover patterns → Identify gaps → Learn structure │
└─────────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────────┐
│ 3. ENRICH (Deterministic Creation) │
│ Add bridging concepts → Create missing relationships → │
│ Strengthen weak connections → Add domain expert knowledge │
└─────────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────────┐
│ 4. SYNTHESIZE │
│ LLM queries enriched graph → Higher quality reasoning → │
│ Better serialization for downstream tasks │
└─────────────────────────────────────────────────────────────────────────┘
↓
(repeat 2-4)
Why this matters: - LLM extraction captures what's explicitly in documents - Human/agent curation adds implicit knowledge, expert judgment, cross-domain connections - Enriched graphs enable more sophisticated reasoning than raw extraction alone - Each cycle improves graph quality for future queries
Current Creation Flow (Ingest)
Document → Chunking → LLM Extraction → Embedding → Matching → MERGE
↓
≥0.85: Link existing
0.75-0.84 + label: Link
<0.75: Create new
What gets created:
- Source node (chunk text, embeddings)
- Concept nodes (matched or new)
- Instance nodes (evidence quotes)
- Relationship edges (IMPLIES, SUPPORTS, etc.)
Provenance tracked:
- source: "llm_extraction"
- created_by: user_id
- job_id: ingestion job
- document_id: source document hash
Decision
Implement a deterministic creation API that allows direct node/edge creation while maintaining full compatibility with auto-created graph elements.
Design Principles
- Functionally Identical - Manual nodes are indistinguishable from auto nodes (same properties, same schema)
- Full Embedding Support - Manual concepts get embeddings via same unified worker
- Optional Matching - Can match to existing concepts or force-create standalone
- Pruning Compatible - Manual edges have same properties, subject to same pruning rules
- Provenance Tracked - Clear distinction via
sourceproperty, not node structure
New Property: creation_method
Add to Concept nodes:
Values:
- llm_extraction - Created through document ingest pipeline
- manual_api - Created via REST API directly
- mcp_tool - Created by AI agents through MCP tools
- workstation - Created via web workstation UI by humans
- graph_import - Imported from foreign graph systems
Note: Backup/restore preserves the original creation_method - it's not overwritten during restore.
This is informational only - all concepts are treated identically by queries, matching, and pruning.
API Endpoints
Create Concept
POST /api/v1/concepts
{
"label": "Quantum Entanglement",
"description": "A quantum mechanical phenomenon...",
"search_terms": ["entanglement", "quantum correlation"],
"ontology": "physics",
"matching_mode": "auto" | "force_create" | "match_only"
}
Response:
{
"concept_id": "manual_abc123",
"matched_existing": false,
"embedding_generated": true
}
Matching modes:
- auto (default): Use standard two-tier matching, create if no match
- force_create: Always create new concept, skip matching
- match_only: Return match or error, never create
"Free Thinking" Pattern (Agents and Humans)
When using matching_mode: "auto", creators (agents or humans) can work freely without graph topology knowledge:
┌─────────────────────────────────────────────────────────────────────────┐
│ AGENT REASONING │
│ "I notice these documents discuss supply chain resilience..." │
│ ↓ │
│ Agent emits concept: { label: "Supply Chain Resilience", ... } │
└─────────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────────┐
│ GRAPH MATCHING (automatic) │
│ - Embed new concept │
│ - Search existing concepts (≥0.85 similarity) │
│ - Found match? → Link to existing "Supply Chain Robustness" │
│ - No match? → Create new node, auto-attach via relationships │
└─────────────────────────────────────────────────────────────────────────┘
↓
┌─────────────────────────────────────────────────────────────────────────┐
│ RESULT │
│ Agent's thought integrated into graph without explicit wiring │
└─────────────────────────────────────────────────────────────────────────┘
Why this matters: - Creator focuses on ideation, graph handles integration - No need to query graph structure before creating - Duplicate concepts naturally merge via similarity matching - Rapid concept emission without cognitive overhead of "where does this fit?" - Graph becomes a thought accumulator that self-organizes
For agents: - "Think out loud" - emit many concepts during reasoning - Graph captures agent's evolving understanding - No API calls to discover existing structure
For humans: - Domain expert can dump knowledge without learning graph structure - Brainstorming sessions become graph-building sessions - Lower barrier to contribution - just describe concepts naturally - Web workstation becomes a "knowledge notepad" that auto-organizes
Contrast with force_create:
- Use when creator intentionally wants distinct concept (even if similar exists)
- Useful for tracking different perspectives on same topic
- Creates isolated subgraphs when desired
Create Edge
POST /api/v1/edges
{
"from_concept_id": "concept_123",
"to_concept_id": "concept_456",
"relationship_type": "IMPLIES",
"confidence": 0.85,
"ontology": "physics"
}
Response:
{
"edge_id": "edge_abc123",
"relationship_type": "IMPLIES", // May be normalized
"vocabulary_created": false // True if new type added
}
Relationship handling:
- Type normalized via existing mapper (ADR-032)
- Unknown types auto-added to vocabulary with category: "manual"
- Embedding generated for new types
Create Evidence (Instance)
POST /api/v1/concepts/{concept_id}/evidence
{
"quote": "Einstein described entanglement as 'spooky action'",
"source_reference": "Einstein 1935 paper" // Optional metadata
}
Update Concept
PATCH /api/v1/concepts/{concept_id}
{
"label": "Quantum Entanglement (updated)",
"description": "Revised description...",
"search_terms": ["entanglement", "quantum correlation", "spooky action"]
}
Response:
{
"concept_id": "concept_123",
"embedding_regenerated": true,
"modified_by": "user_456",
"modified_at": "2026-01-25T..."
}
Update behavior:
- Partial updates supported (only changed fields)
- Embedding regenerated if label/description/search_terms change
- modified_by and modified_at tracked for audit
- Original created_by and creation_method preserved
Update Edge
PATCH /api/v1/edges/{edge_id}
{
"confidence": 0.95,
"relationship_type": "STRONGLY_IMPLIES" // Type change allowed
}
Delete Concept
DELETE /api/v1/concepts/{concept_id}?cascade=false
Response:
{
"deleted": true,
"edges_orphaned": 3, // If cascade=false
"edges_deleted": 0 // If cascade=true, these would be deleted
}
Delete modes:
- cascade=false (default): Delete concept, orphan edges (they remain with dangling references)
- cascade=true: Delete concept and all connected edges
- Returns error if concept has evidence instances (must delete those first)
Delete Edge
Batch Creation
POST /api/v1/graph/batch
{
"ontology": "physics",
"concepts": [...],
"edges": [...],
"instances": [...]
}
Foreign Graph Import
A significant capability enabled by deterministic creation is importing knowledge from external graph systems. This expands KG beyond document-driven knowledge to incorporate existing structured knowledge.
Supported Import Formats
| Format | Source Systems | Notes |
|---|---|---|
| JSON Graph Format | Generic exports, custom systems | Lightweight, easy to transform |
| Neo4j JSON | Neo4j exports | Direct node/edge mapping |
| RDF/JSON-LD | Semantic web, linked data | Requires vocabulary mapping |
| GraphML | yEd, Gephi, NetworkX | XML-based, widely supported |
| CSV (nodes + edges) | Spreadsheets, databases | Simple tabular format |
Import Pipeline
Foreign Graph Data
↓
Validation (schema check)
↓
Normalization (required properties)
↓
Enrichment (sources, embeddings)
↓
Matching (optional deduplication)
↓
MERGE into Apache AGE
Import API
POST /api/v1/graph/import
{
"format": "json_graph" | "neo4j" | "rdf" | "graphml" | "csv",
"ontology": "imported_knowledge",
"data": { ... } | "file_path",
"options": {
"matching_mode": "auto" | "force_create",
"generate_embeddings": true,
"create_synthetic_sources": true,
"node_mapping": {
"label_field": "name",
"description_field": "description"
},
"edge_mapping": {
"type_field": "relationship",
"confidence_field": "weight"
}
}
}
Response:
{
"job_id": "import_abc123",
"nodes_imported": 150,
"edges_imported": 320,
"nodes_matched": 12,
"nodes_created": 138,
"warnings": ["Unknown relationship type 'RELATED_TO' normalized to 'RELATES_TO'"]
}
Normalization Requirements
Foreign nodes must be enriched to meet KG schema:
| Property | Required | Default/Generation |
|---|---|---|
label |
Yes | Mapped from source field |
description |
No | Empty or mapped |
embedding |
Yes | Generated via unified worker |
creation_method |
Auto | Set to "graph_import" |
source_graph |
Auto | Original system identifier |
import_job_id |
Auto | Job tracking |
Provenance for Imports
Each import creates a synthetic Source node for provenance:
(:Source {
source_id: "import_{job_id}",
document: "graph_import:{format}:{source_name}",
full_text: "Imported from {source_system} on {date}",
content_type: "graph_import",
original_format: "neo4j",
original_node_count: 150,
original_edge_count: 320
})
This enables: - Tracking which concepts came from which import - Filtering queries by import source - Auditing import history - Potential rollback of imports
Entry Points
The deterministic creation API supports multiple interfaces suited to different users:
| Interface | Users | Use Cases |
|---|---|---|
| REST API | Developers, scripts | Automation, integration, bulk operations |
| MCP Tools | AI agents (Claude, etc.) | Agent-driven knowledge building |
| Web Workstation | Human curators | Manual curation, visual graph editing |
| CLI | Operators, developers | Quick edits, scripting |
| FUSE Filesystem | Power users, editors | File-based concept editing |
| Import API | Data engineers | Foreign graph ingestion |
LLM-Assisted Curation
Beyond document extraction, LLMs can assist human curators in drafting and refining concepts:
POST /api/v1/concepts/draft
{
"prompt": "Create a concept about quantum entanglement for a physics ontology",
"context": ["related_concept_id_1", "related_concept_id_2"], // Optional
"ontology": "physics"
}
Response:
{
"draft": {
"label": "Quantum Entanglement",
"description": "A quantum mechanical phenomenon where...",
"search_terms": ["entanglement", "EPR paradox", "quantum correlation"],
"suggested_relationships": [
{"to": "concept_123", "type": "IMPLIES", "rationale": "..."}
]
},
"requires_approval": true
}
Use cases: - Human provides rough idea, LLM refines into proper concept structure - LLM suggests relationships based on existing graph context - Human reviews and approves before creation - Different from ingest: no source document, human-in-the-loop
FUSE Filesystem Exposure
Concepts can be exposed as files via FUSE mount, enabling file-based workflows:
/mnt/kg/
├── ontologies/
│ ├── physics/
│ │ ├── concepts/
│ │ │ ├── quantum-entanglement.json
│ │ │ ├── wave-particle-duality.json
│ │ │ └── ...
│ │ └── edges/
│ │ └── ...
│ └── biology/
│ └── ...
└── queries/
└── ...
File format (concept):
{
"concept_id": "concept_123",
"label": "Quantum Entanglement",
"description": "...",
"search_terms": ["..."],
"creation_method": "workstation",
"created_by": "user_456",
"readonly": false
}
Editing via FUSE:
- Edit JSON file → triggers PATCH /api/v1/concepts/{id}
- Delete file → triggers DELETE /api/v1/concepts/{id}
- Create file → triggers POST /api/v1/concepts
- File permissions reflect user's graph editing rights
- readonly: true for concepts user cannot edit
CLI and MCP Shared Implementation
The kg CLI and MCP server share the same codebase (cli/src/). New concept CRUD operations will be implemented once and exposed through both interfaces:
| Operation | CLI Command | MCP Tool Action |
|---|---|---|
| Create | kg concept create -l "Label" -d "Description" |
concept action: create |
| Update | kg concept edit <id> -l "New Label" |
concept action: update |
| Delete | kg concept delete <id> --force |
concept action: delete |
| List | kg concept list -o ontology |
concept action: list |
Implementation notes:
- API client methods added to cli/src/api/client.ts
- CLI commands in cli/src/cli/concept.ts
- MCP actions extend existing concept tool (action enum)
- Same validation, same error handling, same formatting
MCP Tools
// Create single concept
mcp__kg__create_concept({
label: "Quantum Entanglement",
description: "...",
ontology: "physics",
matching_mode: "auto"
})
// Create relationship between concepts
mcp__kg__create_edge({
from_query: "quantum entanglement", // Semantic search
to_query: "quantum superposition", // Semantic search
relationship_type: "REQUIRES",
confidence: 0.9
})
// Batch create subgraph
mcp__kg__create_subgraph({
ontology: "physics",
nodes: [...],
edges: [...]
})
Semantic ID resolution: MCP tools can use concept queries instead of IDs: - Search by label/description - Use embedding similarity - Fail if ambiguous (multiple matches)
Source Nodes for Manual Concepts
Manual concepts don't have source documents. Options:
Option A: Synthetic Source (Recommended) Create a placeholder source node for provenance:
(:Source {
source_id: "manual_{user_id}_{timestamp}",
document: "manual_entry",
full_text: "{description}",
content_type: "manual"
})
Option B: Optional Source Allow concepts without source links. Query patterns must handle NULL.
Option C: Virtual Source Single shared "manual entries" source per ontology.
Recommendation: Option A - maintains graph integrity, enables evidence attachment later.
Edge Properties for Manual Creation
Manual edges get same properties as auto edges:
[:IMPLIES {
confidence: 0.85, // User-provided or default 1.0
category: "logical_truth", // From vocabulary
source: "manual_api", // Distinguishes from llm_extraction
created_by: "user_123",
created_at: "2026-01-25T...",
job_id: null, // No job for manual
document_id: null // No document for manual
}]
Roles and Permissions
Graph editing requires explicit authorization. Casual users should not modify the knowledge graph.
Role Hierarchy
| Role | Capabilities |
|---|---|
viewer |
Read-only: search, query, browse concepts |
contributor |
Create concepts/edges (own ontologies only) |
graph_editor |
Full CRUD on concepts/edges across ontologies |
ontology_admin |
Manage specific ontologies + contributor rights |
admin |
Full system access including user management |
OAuth Scopes
kg:read - Query and search (all authenticated users)
kg:write - Create concepts/edges (contributor+)
kg:edit - Update/delete concepts/edges (graph_editor+)
kg:import - Import foreign graphs (graph_editor+)
kg:ontology - Create/delete ontologies (ontology_admin+)
kg:admin - User management, system config (admin)
MCP Server Authorization
MCP tools MUST respect the OAuth token's scopes:
// MCP tool registration includes required scope
mcp__kg__create_concept: {
required_scope: "kg:write",
// ...
}
mcp__kg__update_concept: {
required_scope: "kg:edit",
// ...
}
mcp__kg__delete_concept: {
required_scope: "kg:edit",
// ...
}
Enforcement: - MCP server validates token scopes before tool execution - Insufficient scope → tool returns permission error - Agents cannot escalate beyond their token's permissions - Audit log captures attempted unauthorized operations
API Permission Checks
All mutation endpoints verify permissions:
@router.post("/concepts")
async def create_concept(
request: CreateConceptRequest,
user: User = Depends(require_scope("kg:write"))
):
# User has kg:write scope, proceed
...
@router.patch("/concepts/{concept_id}")
async def update_concept(
concept_id: str,
request: UpdateConceptRequest,
user: User = Depends(require_scope("kg:edit"))
):
# Also check ontology-level permissions if user is contributor
if user.role == "contributor":
concept = get_concept(concept_id)
if concept.ontology not in user.allowed_ontologies:
raise PermissionDenied("Cannot edit concepts in this ontology")
...
FUSE Permission Mapping
FUSE filesystem reflects user permissions:
- Files appear read-only if user lacks kg:edit scope
- Directories hidden if user lacks access to ontology
- Write operations fail with EACCES if unauthorized
Embedding Generation
Manual concepts MUST have embeddings for:
- Similarity matching (when matching_mode: "auto")
- Future vector searches
- Grounding calculations
Flow:
Label + Description + Search Terms
↓
Unified Embedding Worker
↓
Same model as ingest (nomic-embed-text-v1.5 or configured)
↓
Stored on Concept node
Compatibility with Existing Features
| Feature | Manual Nodes | Notes |
|---|---|---|
| Vector search | ✓ | Same embeddings |
| Grounding calc | ✓ | Same edge properties |
| Pruning | ✓ | Same confidence/source properties |
| Backup/export | ✓ | Same node structure |
| Polarity analysis | ✓ | Same relationships |
| Query definitions | ✓ | Cypher works identically |
Migration
No schema migration needed. The creation_method property is optional and added only to new manually-created nodes. Existing nodes implicitly have creation_method: "llm_extraction".
Consequences
Positive
- Users can curate knowledge directly via web workstation
- Agents can build structured knowledge via MCP
- Bulk import possible without LLM costs
- Independent subgraphs for specialized use cases
- Full compatibility with existing features
- Foreign graph import expands knowledge sources beyond documents
- Leverage existing knowledge graphs from other systems (Neo4j, RDF stores)
- Enable knowledge federation and migration scenarios
- Iterative enrichment - ingest → research → enrich cycle produces higher quality graphs
- LLMs querying enriched graphs can synthesize better reasoning than raw extraction alone
- Human expert knowledge can augment automated extraction
- Free thinking for all - humans and agents emit concepts without graph topology knowledge
- Graph becomes a self-organizing thought accumulator for any contributor
- Lowers barrier to knowledge contribution - describe concepts naturally, graph handles wiring
Negative
- More ways to create inconsistent data (user error)
- Need input validation for manual entries
- Potential for orphaned nodes if not careful
- Documentation complexity increases
- Permission system adds complexity to all mutation paths
- Must audit MCP tools for scope enforcement
Risks
- Users creating low-quality concepts (garbage in)
- Duplicate concepts if matching disabled carelessly
- Relationship type explosion if not normalized
- Foreign graph imports may have incompatible semantics
- Large imports could overwhelm embedding generation
Related ADRs
- ADR-032: Relationship vocabulary normalization
- ADR-044: Grounding strength calculation
- ADR-045: Unified embedding generation
- ADR-048: Query safety patterns
- ADR-051: Provenance tracking
- ADR-065: Epistemic status classification
Implementation Notes
Phase 1: Core API + Permissions
- POST /concepts endpoint
- POST /edges endpoint
- PATCH /concepts, PATCH /edges endpoints
- DELETE /concepts, DELETE /edges endpoints
- Embedding generation integration
- Basic validation
- OAuth scope enforcement (kg:read, kg:write, kg:edit)
- graph_editor role definition
Phase 2: MCP Tools
- create_concept tool
- create_edge tool
- update_concept, delete_concept tools
- Semantic ID resolution
- MCP scope validation for all mutation tools
Phase 3: Batch & UI
- Batch creation endpoint
- Web workstation curation UI
- LLM-assisted concept drafting
- Import from CSV/JSON
Phase 4: Foreign Graph Import
- JSON Graph Format importer
- Neo4j export importer
- Format detection and validation
- Mapping configuration UI
- kg:import scope enforcement
Phase 5: FUSE Filesystem
- Mount concepts as JSON files
- Permission-aware file visibility
- Edit-via-save workflow
- Read-only mode for viewers
Phase 6: Advanced
- RDF/JSON-LD importer
- GraphML importer
- Subgraph templates
- Concept merging
- Edge bulk operations
- Import rollback capability
- Ontology-level permission delegation