ADR-066: Published Query Endpoints

Overview

Imagine you've spent hours using the Visual Block Builder to craft the perfect query that extracts exactly the knowledge you need from your graph—maybe it's all the concepts related to machine learning with high grounding strength, formatted as a clean dataset. Right now, every time you need that data, you have to log into the web interface, load your query flow, click "Run Query," and manually export the results. If you want to use this data in an automated pipeline or let another application access it, you're out of luck.

This ADR introduces Published Query Endpoints, which transform your carefully crafted query flows into reusable REST API endpoints. Once you publish a query flow, external applications can execute it programmatically using OAuth credentials, receiving the results as JSON or CSV without any manual intervention. This is similar to how you might save a complex SQL query as a stored procedure or view in a traditional database, except these flows can include not just graph traversals but also semantic search, enrichment operations, and custom filters.

The key innovation is that these aren't just raw graph queries—they're curated data pipelines that encapsulate your domain expertise about what knowledge matters and how it should be filtered and formatted. You control who can access each published endpoint, making it possible to share specific views of your knowledge graph with different teams or applications while keeping the underlying data secure.

Context

Current State: Interactive-Only Query Execution

The Visual Block Builder creates query flows that can only be executed interactively within the web UI. Users must:

Build query in Block Builder
Execute via "Run Query" button
View results in graph visualization
Repeat for each query invocation

This works well for exploration and analysis, but limits the platform's utility for:

Automated pipelines - CI/CD systems needing knowledge graph data
External applications - Third-party tools integrating with the knowledge graph
Scheduled queries - Periodic data extraction for reporting
Multi-tenant access - Different applications accessing shared knowledge

The Opportunity

Users invest significant effort designing query flows that extract valuable subsets of their knowledge graph. These flows should be reusable as programmatic endpoints without requiring the creator to be logged in or manually executing them.

Decision

Published Query Endpoints

Introduce the concept of Published Query Endpoints - saved query flows that become accessible via REST API using OAuth 2.0 client credentials.

Architecture Overview

┌─────────────────┐      ┌──────────────────┐      ┌─────────────────┐
│  Block Builder  │─────▶│  Query Registry  │◀─────│   REST API      │
│  (Create Flow)  │      │  (Store Flows)   │      │  (Execute Flow) │
└─────────────────┘      └──────────────────┘      └─────────────────┘
                                                            │
                                                            ▼
                                              ┌─────────────────────────┐
                                              │   External Consumers    │
                                              │  (OAuth client creds)   │
                                              └─────────────────────────┘

Flow Lifecycle

Create: User builds query flow in Block Builder
Publish: User marks flow as "Published" (Start block execution mode)
Register: System generates unique Flow ID and registers endpoint
Configure: User sets output format (End block: JSON, CSV, or Graph data)
Authorize: User grants access to specific OAuth clients
Execute: External systems call endpoint with client credentials + flow ID

Beyond Pure openCypher: Smart Block Operations

Query flows are more than graph traversals. The Block Builder compiles to annotated openCypher - valid Cypher with embedded markers that trigger additional operations:

Smart Blocks (non-Cypher operations): - Vector Search - Semantic similarity via embedding API - Epistemic Filter - Filter by vocabulary epistemic status - Enrich - Fetch concept details (grounding, ontology, search terms)

Cypher Blocks (pure graph operations): - Neighborhood - Graph traversal - Filter - WHERE clauses - Limit - Result constraints

This is conceptually similar to Neo4j's Cypher extensions or stored procedures - the execution engine interprets the annotated query and orchestrates calls to various services (embedding API, concept details API) alongside the graph database.

Implication for Published Endpoints: The execution engine must be an internal worker that can: 1. Parse annotated openCypher 2. Execute Cypher portions against Apache AGE 3. Invoke smart block services (vector search, enrichment) 4. Compose final results

This makes published flows more powerful than raw Cypher endpoints - they're curated data pipelines that encapsulate complex multi-service operations behind a simple API call.

Start Block: Execution Mode

interface StartBlockParams {
  executionMode: 'interactive' | 'published';
  flowName?: string;  // Human-readable name
  // Future: flowSlug, description, tags
}

Interactive (default): Execute in UI, results render to graph
Published: Register as API endpoint, callable externally

End Block: Output Format

interface EndBlockParams {
  outputFormat: 'visualization' | 'json' | 'csv';
  // Future: pagination, field selection, transformations
}

Visualization (default): Render to graph UI
JSON: Return structured node/edge data
CSV: Return flattened tabular data

API Design

POST /api/v1/flows/{flow_id}/execute
Authorization: Bearer <access_token>
Content-Type: application/json

{
  "parameters": {
    // Flow-specific parameters if any
  }
}

Response varies by output format: - JSON: { "nodes": [...], "edges": [...], "metadata": {...} } - CSV: Text/CSV with appropriate headers

Authentication Model

Uses existing OAuth 2.0 infrastructure (ADR-031):

Client Registration: External applications register as OAuth clients
Client Credentials Grant: client_id + client_secret → access_token
Flow Authorization: Access tokens are scoped to specific published flows
No User Session: Machine-to-machine, not user-interactive

Security Considerations

Flow Ownership: Only flow owner can publish/unpublish
Scoped Access: Clients authorized per-flow, not blanket access
Rate Limiting: Prevent abuse of published endpoints
Audit Logging: Track all external executions
Revocation: Owner can unpublish or revoke client access

Database Schema (Future)

-- Published query flows
CREATE TABLE published_flows (
  flow_id UUID PRIMARY KEY,
  owner_id UUID REFERENCES users(id),
  name VARCHAR(255) NOT NULL,
  slug VARCHAR(255) UNIQUE,
  description TEXT,
  flow_definition JSONB NOT NULL,  -- Serialized nodes/edges
  output_format VARCHAR(50) DEFAULT 'json',
  is_active BOOLEAN DEFAULT true,
  created_at TIMESTAMP DEFAULT NOW(),
  updated_at TIMESTAMP DEFAULT NOW()
);

-- Client authorization for flows
CREATE TABLE flow_authorizations (
  flow_id UUID REFERENCES published_flows(flow_id),
  client_id UUID REFERENCES oauth_clients(id),
  granted_at TIMESTAMP DEFAULT NOW(),
  granted_by UUID REFERENCES users(id),
  PRIMARY KEY (flow_id, client_id)
);

-- Execution audit log
CREATE TABLE flow_executions (
  execution_id UUID PRIMARY KEY,
  flow_id UUID REFERENCES published_flows(flow_id),
  client_id UUID REFERENCES oauth_clients(id),
  executed_at TIMESTAMP DEFAULT NOW(),
  duration_ms INTEGER,
  result_count INTEGER,
  status VARCHAR(50)
);

Consequences

Positive

Reusability: Query flows become first-class platform resources
Integration: External systems can consume knowledge graph data
Automation: Enables scheduled and triggered query execution
Value extraction: Users can share curated views without sharing raw data
API-first: Moves platform toward headless/API-driven architecture

Negative

Complexity: Adds OAuth scoping, flow registry, execution engine
Security surface: External API access requires careful authorization
Versioning: Published flows may need versioning for breaking changes
Monitoring: Must track performance and usage of published endpoints

Neutral

UI changes: Start/End blocks gain controls (already implemented as placeholders)
Migration path: Existing saved diagrams remain interactive-only until published

Alternatives Considered

1. GraphQL Endpoint

Expose full graph via GraphQL, let consumers write their own queries.

Rejected because: - Exposes entire graph structure to external consumers - No curation - users can't limit what's accessible - Complex query language for non-technical users

2. Webhook Push Model

Flows push results to configured webhooks instead of pull API.

Partially applicable: - Could complement pull API as output option - Useful for event-driven architectures - May add as future output format

3. Export Only (No Live API)

Users export flow results as static files (JSON/CSV) for sharing.

Rejected because: - No live data - results stale immediately - Manual process for updates - Doesn't enable automation

Implementation Phases

Phase 1: UI Placeholders (Current)

Add execution mode to Start block (interactive/published toggle)
Add output format to End block (visualization/json/csv)
Controls are visible but non-functional

Phase 2: Flow Registry

Database schema for published flows
Save/load flows with publication metadata
List published flows in UI

Phase 3: Execution Engine

REST endpoint for flow execution
Query compilation from flow definition
Output format rendering (JSON, CSV)

Phase 4: Authorization

OAuth client registration
Per-flow client authorization
Access token scoping

Phase 5: Operations

Rate limiting
Audit logging
Usage analytics
Flow versioning

Open Questions

Parameterization: Should published flows accept runtime parameters (e.g., search terms)?
Caching: Should results be cached for performance? How to invalidate?
Versioning: How to handle flow updates without breaking consumers?
Pricing: Should published endpoint usage be metered/billed differently?

References

OAuth 2.0 Client Credentials: RFC 6749 Section 4.4
Existing auth infrastructure: ADR-031