ADR-119: appliance configuration delivery and first-boot orchestration
Context
ADR-103 ships the platform as a thin appliance (OVA/qcow2). ADR-104 defines what happens once configuration is present — the shared provisioning contract and the console-token-gated first-run claim. This ADR fills the gap before that: how configuration arrives, and how first boot behaves when it is — or isn't — there.
Dogfooding the OVA path surfaced the problem. ADR-104 line 32 recorded the appliance's interactivity as "declarative (cloud-init) / none" — either you hand-craft a cloud-init seed, or you get defaults. That splits the audience badly:
- The seed, as we documented it, is a cloud/IaC idiom mis-applied to the
desktop. Our own runbook told a VirtualBox/qemu user to hand-build a
cidataISO withxorrisoand attach it as a virtio disk — friction a "download the OVA and import it" user will not push through. - "none" is a dead-end, not a first-run. Without a seed the box comes up on a DHCP IP with HTTP and no guided way to set the hostname, TLS, or a reasoning key — the operator has to discover the rest.
Design value driving this ADR: minimize novelty. Appliance provisioning is a well-trodden, well-engineered space; our platform is not special, and the closer its shape is to other appliance VMs the more familiar — and the less we have to invent, document, and debug. So the decision below deliberately reaches for established idioms (autoinstall/preseed, OVF properties, the pfSense/TrueNAS console wizard) and rejects clever-but-novel machinery, even where the novel version looked elegant.
Decision
Adopt the conventional appliance shape: a config artifact's presence decides the path (the autoinstall/kickstart convention), and all richer configuration happens after boot through the console (DCUI) and operator — the always-reachable out-of-band surface every appliance already uses.
A — provision.env is the rendezvous; the carrier is a standard NoCloud seed
/etc/kg/provision.env (ADR-103's declarative control surface) stays the single
meeting point; one consumer — operator.sh init (ADR-104's contract) — reads it.
Configuration is delivered by the standard cloud-init datasources, no bespoke
reader:
| Producer | Context | Mechanism |
|---|---|---|
| cloud-init NoCloud | desktop / offline / orchestrator-attached volume | a vfat volume labeled cidata; cloud-init's write_files drops provision.env (today's mechanism) |
| cloud-init network datasources | EC2 / OpenStack / Proxmox cloud-drive | platform metadata, same write_files |
| interactive console + operator | human, post-boot | DCUI / operator.sh set-* write provision.env and apply |
The external carrier is a vfat volume, FS-label cidata, holding the
standard user-data + meta-data (cloud-init NoCloud recognizes vfat or
iso9660 by that label — not ext4). vfat because it is writable (the box can
export/refresh the config on it) and builds without root or ISO tooling
(mcopy), which drops cleanly into Terraform/Packer. This is precisely the carrier
Ubuntu autoinstall uses — maximally familiar, zero custom discovery code.
Forward-compat needs only a KG_PROVISION_SCHEMA=N key inside provision.env
(unknown keys are already ignored) — no separate volume signature file. The
carrier holds declarative inputs only (hostname, TLS mode, DNS-01 creds,
optionally admin password / reasoning key); the per-instance infra secrets
(ENCRYPTION_KEY, POSTGRES_PASSWORD, signing keys) are always minted fresh
per install (ADR-104 Part A) and never ride the carrier. Reload = same inputs, new
secrets. The carrier is sensitive when it holds creds and is documented "treat like
a private key"; the builder (§D) offers an omit-secrets variant.
Because we build the OVA, the OVF will pre-declare an empty virtio slot at a
predictable address (/dev/vdb) so the user points an existing slot at their
.img rather than adding a controller. (Follow-on, not yet shipped: the current
ovf/kg-appliance.ovf.template declares only the primary disk; until the empty
slot lands, the carrier is attached as an added virtio disk — the same friction
this is meant to remove. Tracked with the first-boot orchestration work below.)
The bus the carrier hangs off is governed by the kernel flavor, which is now a
shipped distribution variant (see "Kernel variants" below). The default cloud
kernel has no AHCI, so its carrier is a virtio disk ("attach this .img as a
disk"). The generic variant brings AHCI/SATA, unlocking the one-click
"attach the config ISO to a CD drive" UX for the hypervisors that expect it.
B — First boot is presence-driven, not interactive
kg-firstboot does not pause, prompt, or run a countdown. It branches on
whether configuration is present — the kickstart/autoinstall convention:
flowchart TD
S[first boot · sentinel-guarded] --> R{cloud-init produced<br/>/etc/kg/provision.env?}
R -- yes --> APPLY[operator.sh init reads provision.env · start stack]
R -- no --> ZC[zero-config: mint secrets · start stack<br/>DHCP · HTTP · box usable]
APPLY --> CLAIM[hand to ADR-104 claim:<br/>UNCLAIMED + token, or pre-CLAIMED if creds supplied]
ZC --> CLAIM
CLAIM --> DONE[write sentinel → getty/console-TUI takes tty1]
This keeps first boot a headless oneshot (no tty ownership, no race with the
console getty), exactly as today — the only change is that the no-config branch
now lands the operator in a real first-run experience, not a dead-end (§C).
cloud-init keeps its standard datasource_list (NoCloud and the network
sources); attached labeled volumes are NoCloud-format — the same format
Proxmox/OpenStack config-drives use — so one path covers hand-rolled vfat and
platform config-drives.
C — All richer configuration is post-boot, via the DCUI + operator
The no-config box boots usable, then the operator configures it through the always-reachable console (DCUI) and the operator container — the pfSense/TrueNAS shape, and the surface ADR-104's claim flow already uses:
- Web first-run / claim (ADR-104): set admin password, paste a reasoning key.
- Console DCUI +
operator.sh set-*(warm-reconfig, follow-on work): public hostname, TLS mode, DNS-01 credentials, Cockpit access — the knobs that today force a pre-boot seed. These are idempotentoperator.shcommands (thecockpit-accessverb is the template) and each encapsulates the full ripple of changing the external URL (Traefik router, the registered OAuthredirect_uri, web runtime config, CockpitOrigins, the ACME cert) so a hostname change can't silently half-apply.
Cold (carrier) and warm (DCUI/operator) paths share the same provision.env
schema and the same apply logic, so they cannot drift. The carrier is
cold-provisioning/reload only; editing it does not re-apply to a running box —
that is the warm path's job.
D — One builder, three contexts (the operator container)
Generating a standard NoCloud seed from inputs is pure (inputs → vfat image; no DB, no running platform), so it is a verb on the operator container — the config authority that already lives baked in the appliance and pullable from GHCR:
docker run kg-operator config-volume --hostname kg.x --tls letsencrypt \
--dns-provider porkbun ... > config.img # standard cidata/NoCloud vfat
The same code runs in someone's Terraform, in the console's interactive flow, and in the console's export flow ("pack and download the config out for re-use"). It emits a stock NoCloud seed any cloud-init consumes — it is "a tool to generate cloud-init user-data easily," not a custom format.
E — Kernel variants: cloud (default) and generic (homelab)
The Debian cloud kernel ships no DRM/framebuffer or AHCI drivers — fine for a
headless cloud instance, but it locks the console to 80×25 VGA. That matters
because of who looks at the console: the homelab/desktop user runs the appliance
in a VirtualBox/VMware/Hyper-V window and reads the console there — it never
occurs to them to switch to tty1 or SSH. For them the cramped 80×25 is the
product. So the image ships as two variants, selected at build time
(build-appliance.sh --kernel):
| Variant | Kernel | Console | For |
|---|---|---|---|
| cloud (default) | linux-image-cloud-amd64 |
80×25 VGA (nobody looks) | headless / real-cloud; least overhead. The "expected shape." |
generic (-generic suffix) |
linux-image-amd64 + GRUB_GFXMODE/gfxpayload=keep |
real hi-res framebuffer (legible in a hypervisor window) | homelab/desktop: the person who needs a hand and reads the console window. Also gains AHCI/SATA → the CD-drive carrier UX (§A). |
cloud stays the default because the appliance's headless/edge path is the common
one and cloud users have other access; generic is the opt-in for the
console-window user. publish.sh appliance ships both OVAs/qcow2s on the
release, and the install docs point VirtualBox/VMware users (the console-window
crowd) at the generic artifact.
Consequences
Positive
- One artifact, both audiences, the familiar way. A single OVA boots zero-touch when a NoCloud seed is attached and into a first-run wizard when it isn't — presence-driven, exactly like autoinstall/kickstart and vendor OVF appliances.
- Almost nothing new is built. cloud-init NoCloud is the reader; first boot
stays the current headless oneshot; the DCUI already exists. The net additions
are small: make "no config" a real first-run, add post-boot
set-*knobs, and a seed-builder verb. - Removes the seed-ISO friction for the download-and-import user (carrier is
mcopy-buildable; the console path needs no carrier at all). - Cattle/pet separation. The OVA is stateless and secret-free; the config seed is the portable, owner-held identity → deterministic wipe-and-reload and a clean Terraform/Packer seam.
Negative
- Richer first-time config moves to a second step (post-boot DCUI/operator) rather than being injectable pre-boot for the desktop user — but that is the appliance convention, and the seed still serves the zero-touch case.
- The config seed is a secrets-bearing artifact when it carries DNS/provider creds; it cannot be encrypted-at-rest for cold boot (no key exists yet — chicken/egg), so it relies on operator handling + the omit-secrets variant.
- The warm
set-*commands must encapsulate the external-URL ripple correctly (Traefik + OAuth redirect + web config + Cockpit + cert) or a hostname change half-applies — security-relevant, needs care.
Neutral
- cloud-init is used as-is (standard datasources), neither demoted nor extended.
- The bus/kernel trade is resolved by shipping two kernel variants (§E): cloud (default, virtio-only) and generic (AHCI + hi-res console). Neither changes this config-delivery model; they differ only in drivers and console.
- The builder lands as an operator verb (ADR-211 sink), not a new tool.
- Convergence contract (ADR-103). The OVA is a thin bootstrap seed:
downloaded once, run, then kept current by
operator.sh upgradepulling fresh GHCR images. The per-release artifacts are the container images, not the OVA — so the OVA is republished only occasionally to move the baseline (publish.sh appliance, decoupled fromrelease). This ADR governs the bootstrap; ADR-103 governs ongoing currency; the seam is the first boot that pulls images.
Alternatives Considered
- A countdown that flips between automated and interactive at boot. Considered
and rejected as novel — real appliances decide by config presence
(kickstart
inst.ks=, OVF property presence), not a timer. The clever version also dragged in a bespoke volume reader, a signature file, and first-boot tty1 ownership; presence-driven removes all of it. Familiarity and a smaller surface beat the elegance. - A bespoke appliance-native config-volume reader +
/.kg-appliancesignature. Rejected: cloud-init's NoCloud datasource already reads a labeled vfat volume — reimplementing it is novelty for its own sake. AKG_PROVISION_SCHEMAkey covers forward-compat. - Keep ADR-104's "declarative (cloud-init) / none". Rejected: "none" as a half-configured dead-end is the gap; a proper first-run (DCUI/web) closes it without inventing anything.
- Two OVA flavors (cloud vs interactive). Rejected: doubles the release surface for a difference that is purely whether a seed is attached.
- iso9660 / ext4 carrier. ext4 is not in cloud-init's NoCloud probe set; iso9660 is recognized but read-only (no export/refresh) and needs ISO tooling. vfat is the writable, tool-light intersection — and what Ubuntu autoinstall uses.
Prior art
The result is an assembly of established, shipping idioms, not an invention — which is the point:
- Autoinstall / preseed / kickstart. One image runs interactive by default and
unattended when a config artifact is present — Ubuntu autoinstall uses the very
same NoCloud
cidatacarrier; RHEL/Fedora kickstart (inst.ks=), Debian preseed, SUSE AutoYaST share the shape. - OVF vApp properties / ISO transport. Vendor virtual appliances (F5, Palo Alto, Cisco, NetScaler) ship a single OVA that reads hypervisor-injected properties for zero-touch deploy, else drops to a console wizard — a polymorphic OVA, shipping for 15+ years.
- First-run appliances. pfSense/OPNsense/TrueNAS/Home Assistant OS boot to a usable default and configure via console (DCUI) / web afterward — the no-config branch and the post-boot warm path.