Skip to content

ADR-011: CLI and Admin Tooling Separation

Overview

When software grows organically, everything tends to end up in one place. Our knowledge graph system started with a single cli.py file that did everything—searching concepts, backing up databases, restoring data, and managing configurations. While this worked initially, it created a tangled mess where adding new features meant navigating an increasingly complex monolith.

The real problem wasn't just messy code. We had shell scripts duplicating Python logic, no shared libraries for common operations, and backup processes that didn't properly save expensive vector embeddings. If you lost your database, you'd have to re-process all your documents through the AI models again—potentially costing $50-100 in API fees for large document collections.

This ADR establishes a clean separation: query tools for exploring data go in the CLI layer, administrative tools for managing the database go in the admin layer, and shared functionality lives in reusable libraries. The result is a codebase that's easier to extend, properly backs up all your data (including those expensive embeddings), and provides a foundation for future interfaces like web UIs.


Context

The original implementation mixed query operations (search, concept details) and administrative operations (backup, restore, database setup) in a single cli.py file. Shell scripts duplicated logic from Python code. No shared library existed for common operations like console output, JSON formatting, or graph database queries. This made it difficult to add new interfaces (GUI, web) without duplicating functionality.

Additionally, backup/restore operations didn't handle vector embeddings properly, risking expensive re-ingestion costs ($50-100 for large documents) if data was lost.

Decision

Restructure codebase into three layers with shared libraries:

  1. Shared Libraries (src/lib/) - Reusable components
  2. CLI Tools (src/cli/) - Data query and exploration
  3. Admin Tools (src/admin/) - Database administration

Proposed Directory Structure

knowledge-graph-system/
├── src/
│   ├── lib/                      # Shared libraries
│   │   ├── __init__.py
│   │   ├── console.py            # Color output, formatting, progress bars
│   │   ├── age_ops.py            # Common Apache AGE operations (was neo4j_ops.py)
│   │   ├── serialization.py      # Export/import with embeddings
│   │   └── config.py             # Configuration management
│   │
│   ├── cli/                      # Query & exploration tools (HTTP API client)
│   │   ├── __init__.py
│   │   ├── main.py               # CLI entry point
│   │   ├── search.py             # Search commands
│   │   ├── concept.py            # Concept operations
│   │   ├── ontology.py           # Ontology inspection
│   │   └── database.py           # Database info/health (read-only)
│   │
│   ├── admin/                    # Administration tools (direct database)
│   │   ├── __init__.py
│   │   ├── backup.py             # Backup operations
│   │   ├── restore.py            # Restore operations
│   │   ├── reset.py              # Database reset
│   │   ├── prune.py              # Prune orphaned nodes
│   │   └── stitch.py             # Semantic restitching
│   │
│   └── api/                      # API server (replaces ingest/)
│       ├── main.py               # FastAPI application
│       ├── lib/age_client.py     # AGE database client
│       └── ...
├── scripts/                      # Thin shell wrappers
│   ├── backup.sh                 # Calls src/admin/backup.py
│   ├── restore.sh                # Calls src/admin/restore.py
│   └── ...
├── cli.py -> src/cli/main.py     # Symlink for backward compat
└── ...

Implementation Strategy

Phase 1: Create shared libraries

# src/lib/console.py
class Console:
    @staticmethod
    def success(msg): print(f"\033[92m{msg}\033[0m")
    @staticmethod
    def error(msg): print(f"\033[91m{msg}\033[0m")
    # ... progress bars, tables, etc.

# src/lib/serialization.py
def export_ontology(ontology_name: str) -> Dict:
    """Export ontology with all data including embeddings"""
    return {
        "metadata": {...},
        "concepts": [...],  # Including embeddings as lists
        "sources": [...],
        "instances": [...],
        "relationships": [...]
    }

Phase 2: Implement admin tools

# src/admin/backup.py
from src.lib.console import Console
from src.lib.serialization import export_ontology

def backup_ontology(name: str, output_file: str):
    Console.info(f"Backing up ontology: {name}")
    data = export_ontology(name)
    with open(output_file, 'w') as f:
        json.dump(data, f, indent=2)
    Console.success(f"Backup saved: {output_file}")

Phase 3: Refactor CLI - Move query operations to src/cli/ - Remove admin operations from current cli.py - Use shared libraries for output

Phase 4: Update shell scripts

# scripts/backup.sh
source venv/bin/activate
python -m src.admin.backup "$@"

Data Format for Backups

JSON format with explicit types:

{
  "version": "1.0",
  "type": "ontology_backup",
  "timestamp": "2025-10-06T14:30:00Z",
  "ontology": "My Ontology",
  "metadata": {
    "file_count": 3,
    "concept_count": 109,
    "source_count": 24
  },
  "concepts": [
    {
      "concept_id": "concept_001",
      "label": "Linear Thinking",
      "search_terms": ["linear", "sequential", "step-by-step"],
      "embedding": [0.234, -0.123, 0.456, ...]  // Full array
    }
  ],
  "sources": [
    {
      "source_id": "doc1_chunk1",
      "document": "My Ontology",
      "file_path": "/path/to/file.md",
      "paragraph": 1,
      "full_text": "..."
    }
  ],
  "relationships": [
    {
      "from": "concept_001",
      "to": "concept_002",
      "type": "IMPLIES",
      "properties": {"confidence": 0.9}
    }
  ]
}

Consequences

Positive

  1. Separation of Concerns
  2. CLI focused on data access
  3. Admin focused on database operations
  4. Clear boundaries

  5. Reusability

  6. Shared libraries avoid duplication
  7. Easy to add new interfaces (web UI, API)
  8. Testable modules

  9. Portability

  10. Backups include all data (embeddings, full text, relationships)
  11. JSON format is portable across systems
  12. Mix-and-match restore (selective ontology restore)

  13. Cost Protection

  14. Save expensive ingestion results ($50-100 for large documents)
  15. Restore into clean database without re-processing
  16. Share ontologies between team members

  17. Future-Proof

  18. GUI can import same modules
  19. API server can use same libraries
  20. Unit tests for all components

Negative

  • More files/directories (but better organized)
  • Need to update imports in existing code
  • Slight learning curve for new contributors

Neutral

  • Need to maintain backward compatibility during transition
  • Documentation updates required

Alternatives Considered

  1. Keep everything in cli.py - Rejected: becomes unmaintainable kitchen sink
  2. Separate repos for admin tools - Rejected: overkill, makes shared code difficult
  3. Bash-only for admin - Rejected: can't handle embeddings properly, lots of duplication

Migration Path

  1. Backward Compatibility
  2. Keep cli.py as symlink to src/cli/main.py
  3. Shell scripts continue to work
  4. Gradual migration of calling code

  5. Incremental

  6. Can implement admin tools first
  7. CLI refactor can follow
  8. No "big bang" rewrite

  9. Testing

  10. Test each component independently
  11. Integration tests for workflows
  12. Backup/restore round-trip tests