Troubleshooting
This page covers common failure modes for Kappa Graph self-hosted deployments, including platform containers and client tools.
Platform containers
Containers won't start
Check logs for the failing service:
Check overall status:
Look for containers in "Exited" state. The exit code in docker ps -a output narrows the cause before you read logs.
Port already in use
Port 3000 (web) or 8000 (API) is occupied by another process.
Find the occupying process:
Either stop the conflicting service or change the ports in .env.
Out of memory (exit code 137)
orThe API container requires memory for ML models. Check current usage:
To address the problem:
- Ensure at least 8 GB RAM is available to Docker.
- Reduce
MAX_CONCURRENT_JOBSin.env(default:4). - Set
GPU_MODE=cpuin.operator.confif GPU memory is constrained, then restart.
Container health check failing
Inspect the health detail:
Common causes:
- Database not yet ready — wait and retry.
- API still loading models on startup — allow more time.
- Configuration error — read
./operator.sh logs api.
Database
Connection refused
Confirm PostgreSQL is running:
Confirm the Docker network is intact:
Migration errors
To run migrations manually:
Database corruption
If PostgreSQL fails to start due to corruption:
-
Try a query first — if the database responds, back up immediately:
-
Restore from backup:
See Backup and Restore for backup procedures. -
Last resort — reinitialize (destroys all data):
Authentication
Can't log in
Kappa Graph uses OAuth. The POST /auth/login endpoint was removed in ADR-406 — authentication flows through /auth/oauth/login-and-authorize.
Check that an OAuth client is registered:
The registered redirect_uris must match your WEB_HOSTNAME. If they do not match, re-run ./operator.sh init or update the client record directly.
Reset the admin password:
500 error during login
Common causes:
- OAuth client missing the
scopescolumn — fixed in current images; run./operator.sh upgrade. - Database connection failure.
OAUTH_SIGNING_KEYmismatch in.env.
Token expired
Access tokens expire after ACCESS_TOKEN_EXPIRE_MINUTES (default: 60). Log out and log in again, or increase the value in .env and restart the API:
TLS and certificates
Certificate not found
Kappa Graph uses Traefik for TLS termination. See TLS and Certificates for certificate setup procedures.
Verify the certificate files exist in the configured path:
Mixed content warnings
The browser blocks HTTP sub-requests from an HTTPS page.
Check the frontend runtime config:
The apiUrl must use https:// when the site itself is served over HTTPS. Set the correct WEB_HOSTNAME in .env and restart:
GPU
GPU not detected
Verify the NVIDIA runtime works outside the container:
Verify GPU access inside the API container:
If the runtime is missing:
Set GPU_MODE in .operator.conf to match your hardware: nvidia, amd-host, mac, or cpu.
CUDA out of memory
Set GPU_MODE=cpu in .operator.conf and restart, or reduce MAX_CONCURRENT_JOBS in .env.
Ingestion
Job stuck in pending
A job enters awaiting_approval state when --no-approve was passed at submission time, or when the server configuration requires manual approval. By default, the CLI auto-approves submitted jobs.
List pending jobs:
Approve a specific job:
Approve all pending jobs at once:
To require manual approval when submitting, pass --no-approve:
Extraction failing
Common causes:
- AI provider API key is invalid or expired — reconfigure via
./operator.sh shellthenconfigure.py ai-provider. - Provider rate limit reached — wait and retry.
- Document format not supported.
Large document timeout
Split the document into smaller files before ingestion, or increase the API worker timeout in .env and restart:
CLI and client tools
kg command not found after installation
If you installed via npm install -g @aaronsb/kg-cli, ensure ~/.local/bin (or the npm global bin directory) is on your PATH:
Add this line to ~/.bashrc or ~/.zshrc for persistence.
To rebuild and reinstall from source:
Requires Node.js 20.12.0 or later. Check with node --version.
CLI authentication failed
Re-authenticate:
Confirm the CLI is pointing at the right API:
The CLI stores its configuration at ~/.config/kg/config.json.
Connection refused from CLI
If the health check returns an error, the problem is on the server side — check ./operator.sh status and ./operator.sh logs api.
Collecting diagnostic information
Before opening a GitHub issue, gather:
./operator.sh status
./operator.sh versions
./operator.sh logs api > logs.txt 2>&1
docker ps -a
docker version
cat .env | grep -v KEY | grep -v SECRET | grep -v PASSWORD
Open an issue at https://github.com/aaronsb/knowledge-graph-system/issues and include:
- What you were trying to do.
- What happened instead.
- Relevant log excerpts (sanitize secrets before posting).
- OS, Docker version, and GPU mode.