GraphProgram DSL

A GraphProgram is a JSON AST that describes a bounded sequence of set-algebraic operations against the Kappa Graph knowledge base. Every authoring surface — web UI, CLI, MCP tools, and AI agents — compiles to the same AST and submits it to the same API executor.

Specification: ADR-500 — Graph Program DSL and AST Architecture

Why a shared AST

Before ADR-500, each client maintained its own execution loop. The web UI replayed explorations statement-by-statement through useQueryReplay.ts, calling POST /query/cypher once per statement and merging results client-side. The CLI had no equivalent. Agents had to drive the browser.

That arrangement had three structural problems. Set operations beyond union (+) and difference (-) — intersection, optional, assert — had nowhere to live: the web client implemented two of them; the CLI and agents implemented none. Smart blocks (vector search, source search, epistemic status) are internal function calls, not Cypher; a client-side loop cannot invoke them without HTTP round-trips. And with execution spread across clients, auditing and resource limits were impossible to enforce uniformly.

The DSL resolves this by moving execution into the API. Programs are authored anywhere and executed once, server-side.

Two graphs: H and W

Execution operates over two graphs.

H is the persistent, read-only knowledge graph stored in Apache AGE / PostgreSQL 18. It is the source of truth for concepts, relationships, evidence, and sources. All reads come from H. Programs cannot write to H.

W is the ephemeral working subgraph built during execution. It starts empty, grows and shrinks as statements apply their operators, and is returned as the final result. W exists only for the duration of a single POST /programs/execute call.

The program shape

Every program has the same structure regardless of origin:

{
  "version": 1,
  "metadata": { "name": "...", "description": "...", "author": "human" },
  "params": [{ "name": "concept_name", "type": "string", "default": "governance" }],
  "statements": [
    {
      "op": "+",
      "operation": { "type": "cypher", "query": "MATCH (c:Concept) WHERE c.label CONTAINS $concept_name RETURN c LIMIT 50" },
      "label": "Seed concepts"
    }
  ]
}

The five set-algebra operators determine how each statement's result is merged into W:

Operator	Effect on W
`+`	Union — merge results, deduplicate by `concept_id`
`-`	Difference — remove matching nodes and their dangling edges
`&`	Intersection — keep only W nodes that appear in results
`?`	Optional union — same as `+`, but empty results are not an error
`!`	Assert — same as `+`, but empty results abort the program

Three operation types can appear in statements: CypherOp (a read-only openCypher query against H), ApiOp (an internal call to an allowlisted API endpoint such as /search/concepts), and ConditionalOp (branching on the current state of W, with then and optional else branches).

How authoring surfaces compile to the AST

Web UI

The web UI has three authoring surfaces, all producing the same AST.

Smart Search records exploration steps as the user selects concepts, expands neighborhoods, and removes nodes. Each action is an ExplorationStep with an op (+ or -) and a Cypher string generated by stepToCypher() in web/src/utils/cypherGenerator.ts. The session serializes to a GraphProgram by wrapping each step as a CypherOp statement.

Block Builder compiles a visual block graph to Cypher via compileBlocksToOpenCypher() in web/src/lib/blockCompiler.ts. It returns a single CompiledQuery (a Cypher string with errors and warnings arrays). ADR-500 specifies a future output of Statement[] with BlockAnnotation metadata that would allow the stored AST to reconstruct the visual layout — that is not implemented yet.

Cypher Editor accepts multi-statement scripts with +/- operator prefixes. The parser (parseCypherStatements() in web/src/utils/cypherGenerator.ts) extracts { op, cypher }[] pairs; each becomes a Statement with a CypherOp.

The web UI is the only client that can author via visual blocks.

CLI

The CLI constructs GraphProgram JSON directly — from a file on disk, from command-line flags, or from piped input — and submits it to the API. The kg query-def commands handle CRUD on stored definitions; kg program run submits an inline or file-resident program for execution.

MCP tools

An MCP client (AI agent or human-driven MCP host) builds a program incrementally during an exploration session:

Use search to find seed concepts.
Use concept with action: "related" or action: "connect" to expand.
Accumulate statements into a GraphProgram JSON object.
Submit via POST /programs/execute, or store first with POST /programs and execute by ID.

MCP clients can also load and re-execute programs authored by other clients.

Agents

AI agents author programs through MCP tools or direct API calls. The typical agent workflow builds a linear program from search results and neighborhood expansions, optionally adds ConditionalOp branches, binds parameters at execution time, and submits to POST /programs/execute. Agents execute headless — no browser required.

Client capabilities at a glance

Client	Authors	Validates	Stores	Retrieves	Executes
Web UI	Yes (3 surfaces, including visual blocks)	Via API	Via API	Via API	Server-side (target); client-side during transition
CLI	Yes (JSON / flags / pipe)	Via API	Via API	Via API	Server-side or client-side
MCP	Yes (tool calls)	Via API	Via API	Via API	Server-side or client-side
FUSE	No	—	—	Via API	Triggers execution by stored program ID
Agents	Yes (MCP / direct API)	Via API	Via API	Via API	Server-side

FUSE is read-only. It can trigger execution of a stored program by mounting a program ID as a virtual file, but it does not author or modify programs.

Validation and notarization

The API is the sole validation authority. Clients may perform local pre-validation for fast feedback (syntax highlighting, error squiggles), but that is advisory. A program that passes client-side checks can still be rejected server-side if, for example, Cypher safety rules have changed.

POST /programs/validate is a dry-run endpoint — it returns a ValidationResult and stores nothing:

{
  "valid": false,
  "errors": [
    {
      "rule_id": "V030",
      "severity": "error",
      "statement": 3,
      "field": "operation.query",
      "message": "Variable-length path depth 10 exceeds maximum 6: [*1..10]"
    }
  ],
  "warnings": []
}

POST /programs validates and stores in one atomic operation — notarization. It returns 201 with the ProgramCreateResponse on success, or 400 with the structured ValidationResult on failure. Nothing is stored on failure.

Validation checks structural soundness (required fields, valid operators and operation types), boundedness (total operation count ≤ 100, statically computed including conditional branches), Cypher safety (no write keywords, no unbounded MATCH without LIMIT, variable-length path depth ≤ 6), parameter resolution, API endpoint allowlist membership, and conditional nesting depth (≤ 3). The authoritative rule catalog with individual rule IDs is in the GraphProgram Validation reference.

Storage

Programs are stored in kg_api.query_definitions under definition_type = 'program'. The definition column holds the complete GraphProgram AST as JSONB. Each definition has an owner_id; users see their own definitions and system-owned ones (owner_id IS NULL). Admins see all.

The /programs REST surface validates on create and re-validates on read (GET /programs/{id}). The generic /query-definitions surface is available for compatibility with other definition types but does not validate the AST on create.

Programs subsume the older exploration type. An exploration is a program with only CypherOp statements and only + and - operators. The conversion is mechanical: add version: 1, wrap each { op, cypher } as { op, operation: { type: 'cypher', query: cypher } }, and change definition_type to 'program'. Both formats can coexist during transition — the replay hook detects the format and dispatches accordingly.

Why execution is server-side

Server-side execution is what enables the full operator set, smart blocks, and headless agents to share a single code path.

ApiOp statements (vector search, source search, epistemic status lookups) are implemented as internal function calls, not HTTP round-trips. A 10-statement program with 4 smart blocks is one HTTP request, not 11. A client-side loop cannot do that without making each smart block a separate API call.

W lives on the server during execution. Only the final W — RawNode[] and RawLink[] — crosses the wire. Intermediate states do not.

Central execution enables uniform auditing, rate limiting, and resource control. The client-side loop currently used by the web UI's exploration replay will be deprecated once the server-side executor is stable.