ADR-103: Distribution strategy: nomic-first thin appliance with app-store tenancy
Context
The system is fully containerized and managed by operator.sh (an apt-style
lifecycle tool) with a standalone install.sh for production deployment.
Despite this, the single biggest constraint on broader adoption is distribution:
standing the platform up requires the operator to understand Docker, Compose,
operator.sh, provider configuration, and secret generation. That is a steep
funnel for a system that is otherwise self-contained.
"Distribution" is not one problem but four, and they have different solutions:
| Axis | What it means | Solved by an appliance? |
|---|---|---|
| Install friction | Docker/Compose/operator literacy required | Yes — flash, boot, paste a key |
| Discovery | People finding the project at all | No — needs to live where self-hosters browse |
| Update lifecycle | Keeping deployed instances current | Partly — operator.sh upgrade is already proto-OTA |
| Support surface | Variance across host environments | Mixed — known-good env helps; tenant management adds surface |
A naive "ship it all as one VM" framing hides a fork. A fat appliance bakes
images + data into a frozen VM image (truly offline, but fights the existing
pull-based operator.sh upgrade lifecycle and ships multi-GB images per
release). A thin appliance ships a minimal host with Docker + operator.sh
preinstalled and pulls images on first boot (keeps the incremental update story,
needs network on first boot). For a system whose deployment philosophy is
already "pull images, migrate, restart," the thin model is the natural fit.
Two technical facts shape the strategy:
-
The embedding/reasoning split is already a code path, not a thing to build.
LocalEmbeddingProviderdoes on-device embeddings via sentence-transformers and explicitly cannot extract (ADR-804). Concept extraction/reasoning is a separate provider path that calls a remote LLM. This lets the appliance own the cheap, private, edge-friendly compute (embeddings, vector similarity, graph storage — the user's data) while farming out only document chunks to a cloud LLM for extraction. -
Multi-arch images are already produced.
publish.shrunsdocker buildx --platform linux/amd64,linux/arm64, auto-enabled onrelease/main. An ARM64 target (Raspberry Pi 4GB+, Home Assistant OS style) is feasible because the CPU/RAM profile fits when reasoning is remote and embeddings use a small local model.
A target deployment is a Raspberry-Pi-class appliance (HAOS-like): a flashable image that runs the containers, does embeddings on-device, and only reaches out for reasoning. Local inference for reasoning (e.g. vLLM passthrough) is explicitly out of scope — if a user wants it, that is their own add-on container, not a core responsibility.
Decision
Adopt a staged, thin-appliance distribution strategy with a nomic-first local-embedding invariant, and defer building a bespoke supervisor OS.
Invariant — nomic-first, reasoning-remote. The out-of-the-box embedding
model is the local nomic-ai/nomic-embed-text-v1.5 profile (768-dim,
on-device, no API key). Cloud OpenAI embeddings remain a one-command
alternative but are no longer the default. Reasoning/extraction continues to
require a remote LLM provider. This makes the appliance self-contained for the
private/edge half of the workload and means a fresh install needs a cloud key
only for extraction, never for embeddings. Both default models — text
(nomic-embed-text-v1.5) and vision (nomic-embed-vision-v1.5, loaded via
the profile's image slot) — are baked into the API image at build time so the
platform boots without a runtime HuggingFace download, the precondition for an
offline / air-gapped / Pi appliance.
Cache-topology constraint (non-obvious). The HF cache path
(~/.cache/huggingface) sits under a mounted volume (hf_cache ->
/home/api/.cache) in every compose variant. Weights baked directly into that
path are shadowed by the empty volume on first boot, re-triggering the download.
So the image bakes both models into a path outside the volume (/opt/hf-seed,
loaded — not merely downloaded — so the trust_remote_code dynamic-module cache
is populated too), and a first-boot entrypoint (docker-entrypoint.sh) copies
them into the live cache with cp -n (preserving any model a user later
switches to or adds). This keeps the deliberate persistent-cache design intact
while guaranteeing offline first boot.
The bake and seed live in shared scripts (api/bake_embedding_models.py +
api/docker-entrypoint.sh) invoked by every published image variant — the
standard CPU/x86/arm64/NVIDIA api/Dockerfile (runs as api, seeds the cache
volume) and the AMD api/Dockerfile.rocm-host (runs as root, seeds $HOME).
This was learned the hard way: an inline bake added only to api/Dockerfile
silently skipped the ROCm image, which kept re-downloading at runtime. One
source of truth for the model list is the only way the variants stay in lockstep.
Staged distribution path:
-
Stage 1 — app-store tenancy. Publish to existing self-host platforms (Portainer template, CasaOS, Umbrel, TrueNAS SCALE app, Proxmox helper script, Home Assistant add-on). These are already "an appliance OS that manages container tenants." Being a tenant solves discovery and install at a fraction of the cost of authoring an OS. Lowest effort, highest leverage.
-
Stage 2 — thin appliance images. Ship prebuilt OVA/qcow2 + an arm64 Pi image: a minimal host with Docker +
operator.shpreinstalled, nomic weights baked, reasoning cloud-only. Still thin — updates flow through the existingoperator.sh upgradepull lifecycle, not whole-image replacement. -
Stage 3 — supervisor model (deferred). Becoming our own appliance OS with a first-class add-on model (e.g. user-supplied vLLM-passthrough container) is explicitly out of scope for now. We only preserve the seam: keep
operator.shadd-on-shaped so the door stays open if demand proves out.
This ADR records the strategy and the nomic-first invariant. The nomic-first
flip is implemented on branch survey/nomic-first-defaults (seed migrations
003/008, mig-012 fallback, Dockerfile bake, mock/test dimension alignment,
operator-help/API-doc defaults).
Stage 2 Build Contract — x86 thin appliance
The first Stage-2 artifact is an x86 qcow2/OVA. Its build contract (tooling in
appliance/) records the non-obvious decisions:
-
The appliance is ADR-117's "cube" deployment, baked. Not a new installer: a minimal Debian host with Docker + the repo at
/opt/kg,image-source=ghcr, reusing the testedoperator.sh init --headlesspath verbatim.install.sh's standalone curl-fetch is staged at bake time instead of run at install time. -
Bake / first-boot split, with a no-baked-secrets invariant. The image carries OS + Docker + repo but no
.envand no secrets. A oneshot systemd unit (kg-firstboot, self-disarming via a sentinel) runsoperator.sh init --headless --image-source=ghcr --gpu=cpu --skip-ai-configon first power-on, sooperator/lib/init-secrets.shmints unique per-instanceENCRYPTION_KEY/POSTGRES_PASSWORD/etc. A baked.envwould ship every appliance with identical secrets — the one thing the split exists to prevent.WEB_HOSTNAMEis derived from the DHCP lease at first boot. -
Thin, literally. Container images are not baked; they are pulled on first boot (network required once), so updates stay on the
operator.sh upgradepull lifecycle. The only offline asset is the nomic weight set, which rides inside thekg-apiimage. A warm (images-baked) variant is a deferred build flag, not the default. -
Build tool: virt-customize now, Packer deferred. libguestfs customizes the Debian genericcloud qcow2 in place (no VM boot), which fits a "stage files + install Docker" job. A Packer/QEMU template is the deferred CI-release path.
-
Reasoning key is never required to reach a running platform. First boot yields a live box doing local embeddings; the operator pastes a reasoning key in the web UI afterward. This is the "flash, boot, paste a key" curve made concrete.
Consequences
Positive
- Collapses install friction to "flash, boot, paste a reasoning key" on the HAOS adoption curve.
- App-store tenancy (Stage 1) hits discovery + install together without the multi-year cost of building an OS.
- The nomic-first invariant makes the appliance self-contained for embeddings and keeps the private graph local; only document chunks leave the box.
- Keeps the existing pull-based
operator.sh upgradelifecycle intact (thin, not fat) — no multi-GB per-release image churn. - Reuses code paths that already exist (LocalEmbeddingProvider, multi-arch buildx) rather than introducing new architecture.
Negative
- Baking both default models (text ~275MB + vision) adds several hundred MB to the API image, including for cloud-only deployments that will not use local embeddings, and first boot pays a one-time seed-copy into the cache volume. A build ARG to opt out of baking is a possible later refinement.
- Local 768-dim and OpenAI 1536-dim embeddings are incompatible vector spaces; the appliance standardizes on 768 and switching costs a full re-embed.
- A thin appliance still needs network on first boot to pull images (only the embedding model is offline); true air-gap remains a separate, later effort.
- App-store tenancy adds per-platform packaging/maintenance surface (Portainer vs CasaOS vs HA add-on manifests).
Neutral
- ARM64 viability depends on two upstream images having arm64 variants — see open gates below.
- Stage 3 deliberately leaves the supervisor/add-on model unbuilt; this is a recorded non-goal, not an oversight.
- No deployed environments exist yet, so the seed-default flip is a direct migration edit rather than a forward reconciliation migration.
Open Questions / Verification Gates
- AGE arm64: the Postgres+AGE image is pinned to a SHA digest
(
apache/age@sha256:e7de17…). Confirm that build publishes an arm64 variant, or repin/rebuild. - Garage arm64:
dxflrs/garage:v1.0.0— Garage is an edge/self-host product so arm64 almost certainly exists; verify. - Pi RAM budget: validate Postgres+AGE + Garage + API + nomic (~400MB loaded) fit comfortably in 4GB under real ingestion.
- Offline boot ✅ verified 2026-06-11 (AMD ROCm image,
Dockerfile.rocm-host): a freshly built image seeded/opt/hf-seedinto the runtime cache and loaded both text + vision models with logs showing "Loaded from local cache" and zero download lines. The appliance offline precondition holds on real hardware. - QEMU bake: confirm the bake step (which loads both models, populating the dynamic-module cache) completes when building the arm64 image under emulation.
Alternatives Considered
- Fat VM appliance (bake images + data into a frozen VM). Rejected as the default: fights the pull-based upgrade lifecycle, ships multi-GB images per release, and only earns its weight for a hard air-gapped requirement that no user has yet.
- Build our own supervisor OS now (HAOS from scratch). Rejected for now: multi-year effort; existing self-host app stores already provide the container-tenant substrate. Deferred to Stage 3, seam preserved.
- Keep OpenAI-embedding default, add appliance packaging only. Rejected: would force every appliance to carry a cloud embedding key and send all content out for embeddings, defeating the private/edge value proposition.
- Local reasoning in-appliance (bundled vLLM/Ollama). Rejected as a core responsibility: GPU passthrough into a VM/Pi is host-specific and fragile; left to user-supplied add-on containers.