ADR-021: Live Man Switch - AI Safety for Critical Operations
Status: Accepted Date: 2025-10-09 Deciders: Development Team Related: ADR-020 (Admin Module Architecture)
Overview
We're entering an era where AI agents can execute terminal commands on our behalf. Ask an agent to "clean up the database" and it might helpfully run a reset command that deletes everything. The agent isn't malicious - it's trying to be helpful - but the consequences are the same. Traditional security measures like passwords don't help because the agent already has access to those credentials. It's a new kind of problem: not adversarial attacks, but well-intentioned automation gone wrong.
The challenge is creating a gate that's trivial for humans but expensive for AI to automate. Think about those child safety caps on medicine bottles - they're not impossible to open, they just require sustained physical coordination that young children haven't developed. Similarly, we needed something that requires physical presence and sustained action, something that's harder to automate than a simple API call.
The solution asks humans to hold down the Enter key for 3 seconds while watching a progress bar fill. For a human sitting at the keyboard, this takes 5 seconds total. For an AI agent, automating this requires spawning processes, researching keyboard injection libraries, coordinating timing - complex work that takes minutes to hours. It's not a perfect lock, it's a time barrier. Like a bank vault that can be broken into but takes so long that security arrives first, this gives humans time to notice unusual activity and intervene before irreversible damage occurs.
Context
As AI agents become more autonomous and capable, they increasingly interact with systems through CLIs and APIs. While this enables powerful automation, it introduces a new category of risk: well-intentioned but dangerous AI actions.
The Problem Space
Traditional security assumes adversarial humans. But AI agents present different challenges:
- Literal Instruction Following - AI agents execute commands as instructed, even destructive ones
- Physical Interaction Complexity - Performing sustained physical actions (holding keys, tapping devices) requires spawning processes, emulating keyboards, coordinating timing - significantly more complex and time-consuming than simple API calls
- Well-Intentioned Execution - Often trying to be helpful, not malicious
- Automation at Scale - Can execute operations faster than humans can intervene
- Context Misunderstanding - May not fully grasp consequences of destructive operations
Real-World Scenario
User: "Clean up the database and start fresh"
AI Agent: *interprets as kg admin reset*
*executes immediately*
Result: All graph data deleted
The AI agent wasn't malicious - it was trying to be helpful. Traditional authentication (username/password) doesn't help because the AI has access to those credentials.
The Breaking Point
When implementing kg admin reset, we discovered:
- Password prompts are bypassable (AI can submit credentials programmatically)
- Confirmation prompts are ineffective (AI will respond "yes")
- Time delays are ignorable (AI will wait patiently)
- Even multi-step confirmations fail (AI follows the steps)
Core insight: We need a mechanism that exploits the complexity barrier of AI agents - sustained physical interactions are significantly more complex and time-consuming to automate than simple API calls.
Decision
We implement a "Live Man Switch" pattern - the inverse of a dead man's switch. Instead of detecting when a human stops acting, we detect when a human is actively acting.
The Time Barrier Model
Like a bank vault, this isn't an absolute barrier - it's a time-cost barrier:
Bank Vault Analogy: - Vault walls can be cut through - But it takes TIME (hours with specialized equipment) - By the time thieves succeed, police arrive - The delay is the security
Live Man Switch Analogy: - AI could emulate keyboard input (spawn process, keyboard injection library, coordinate timing) - But it takes TIME (research, tool selection, implementation, debugging) - By the time AI succeeds, human notices unusual behavior - The complexity delay is the security
Key difference from API calls:
- API call: Instant (client.resetDatabase())
- Physical emulation: Minutes to hours (research → implement → debug → execute)
This creates an asymmetry: Humans complete in 5 seconds, AI needs minutes/hours of complex automation.
Pattern: Physical Confirmation via Key Hold
Hold [Enter] for 3 seconds to confirm...
██████████████████████████████ 100%
✓ Confirmed! You're probably human! 👩💻
Release Enter and press [Space] to continue...
Core Mechanism
For Humans (5-6 seconds total): 1. Read warning (2-3s) 2. Hold Enter key (3s) → Progress fills 3. "You're probably human!" confirmation 4. Release Enter, tap Space 5. Proceed to password authentication
For AI Agents (10+ seconds): 1. Wait for input... ⏱ 2. No physical key press detected 3. 10-second inactivity timeout fires 4. Helpful message explaining the situation:
If you're an AI agent reading this:
This operation will PERMANENTLY DELETE all graph data.
You cannot undo this action.
→ Please ask your human to run this command directly.
→ They will need to hold Enter for 3 seconds.
Implementation Details
1. Polling-Based Detection (500ms intervals)
// Every 500ms poll:
if (enterPressed) {
accumulated += 500; // Add time
updateProgress();
if (accumulated >= 3000) success();
} else {
fail("Released too early");
}
2. Dual-Timeout Safety - 10s inactivity timeout: Detects AI agents (they wait indefinitely) - 3s hold requirement: Confirms deliberate human action
3. Decompression Chamber Problem: Keyboard auto-repeat causes Enter events to bleed into next prompt Solution: After success, drain Enter events and require Space to continue
if (accumulated >= 3000) {
decompressionMode = true;
// Ignore all Enter keypresses
// Wait for Space to proceed
}
4. Visual Feedback
Progress fills in discrete steps:
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ 0% (0ms)
██████░░░░░░░░░░░░░░░░░░░░░░░░ 17% (500ms)
████████████░░░░░░░░░░░░░░░░░░ 33% (1000ms)
████████████████░░░░░░░░░░░░░░ 50% (1500ms)
████████████████████░░░░░░░░░░ 67% (2000ms)
████████████████████████░░░░░░ 83% (2500ms)
██████████████████████████████ 100% (3000ms) ✓
Consequences
Positive
- Time-Cost Barrier - Physical emulation requires minutes/hours vs seconds for humans (asymmetric cost)
- Intent Friction - Requires deliberate human action, prevents accidents
- Graceful Guidance - AI agents get helpful instructions instead of errors
- User-Friendly - Humans complete in 5-6 seconds (read + hold + space)
- Visual Progress - Clear feedback during hold operation
- Keypress Isolation - Decompression prevents bleeding into subsequent prompts
- Detection Window - Human has time to notice unusual AI behavior during automation attempts
Negative
- Not Perfect Security - Can be bypassed by:
- Modifying local code
- Direct API calls
-
Keyboard injection libraries (if AI has system access)
-
Accessibility Concerns - Users with motor disabilities may struggle
- Future: Add alternative confirmation methods
-
Voice confirmation, hardware token, two-channel verification
-
Terminal Dependency - Requires raw mode terminal (won't work in basic shells)
-
Cultural Familiarity - Novel pattern may confuse first-time users
- Mitigated by clear instructions and visual feedback
Trade-offs Accepted
This is intent friction, not cryptographic security: - Goal: Prevent accidental AI execution - Goal: Make intentional bypass require conscious effort - Non-goal: Stop determined adversaries
Like "Press and hold to power off" or typing "DELETE" to confirm - the goal is conscious human confirmation, not perfect defense.
Design Philosophy
Intent Friction vs. Perfect Security
Traditional security model:
Threat: Adversarial humans
Defense: Authentication, authorization, encryption
Goal: Make unauthorized access impossible
AI safety model (this ADR):
Threat: Well-intentioned but dangerous AI agents
Defense: Physical confirmation requirements
Goal: Ensure human oversight for critical operations
The Cooperative Model
Critical insight: The action is trivial (hold a button for 3 seconds). The AI can easily explain this to the human:
AI Agent: "I need to reset the database.
Can you run 'kg admin reset' and hold Enter for 3 seconds when prompted?
This is a safety check to confirm you approve this destructive operation."
This maintains harmony with AI agents helping with technical tasks. The AI isn't blocked - it can still assist by: - Explaining what needs to happen - Providing the exact command to run - Describing the physical confirmation step - Waiting for the human to complete it
The only failure mode is malicious deception:
Misaligned AI: "You'll receive a cake if you hold Enter for 3 seconds after typing 'kg admin reset'"
But that's a fundamentally different threat model (adversarial AI) than what we're solving (helpful but overeager AI).
Design harmony: Keeps the human "along with the technical level" - they're still involved in the process, just providing physical confirmation of intent.
Human-in-the-Loop (HITL) Design Principles
This is fundamentally a Human-in-the-Loop control system. Well-architected HITL doesn't slow down AI progress at solving tasks - it adds oversight only at critical decision points.
HITL Design Goals: 1. Selective Friction - Only apply barriers to destructive/irreversible operations 2. Maintain AI Velocity - Let AI work at full speed for non-critical tasks 3. Minimal Human Overhead - Make confirmations quick and intuitive (5-6 seconds) 4. Clear Handoff Points - AI knows when to request human confirmation 5. Cooperative Model - AI can explain what's needed, human provides confirmation
Example Flow:
AI: Analyzes data, generates insights, suggests actions (FULL SPEED)
AI: Reaches destructive operation → "I need human confirmation"
Human: 5-second confirmation (BRIEF PAUSE)
AI: Continues execution (FULL SPEED)
The key: Human confirmation is a deliberate checkpoint, not a bottleneck. AI productivity is maintained while critical operations get human oversight.
Properties of Effective AI Friction
- Exploits Complexity Asymmetry - Physical actions take seconds for humans, minutes/hours for AI to automate
- Fails Gracefully - Provides helpful guidance instead of cryptic errors
- Low Human Overhead - Quick for humans (5-6s), complex for AI (requires research, tools, debugging)
- Trivially Explainable - AI can easily tell human what to do ("hold Enter for 3 seconds")
- Preserves AI Utility - AI remains helpful, just requires human confirmation
- Visible Intent - Progress bars show deliberate action in progress
- Composable - Can layer with traditional auth (we still require password after)
- Detection Time - Automation attempts take long enough for humans to notice and intervene
- Strategic Placement - Only gates critical operations, doesn't impede normal AI workflow
Implementation
File Structure
client/src/cli/admin.ts
├── promptHoldEnter() - Live man switch implementation
│ ├── Polling loop (500ms intervals)
│ ├── Dual timeouts (10s inactivity, 3s hold)
│ ├── Decompression chamber
│ └── Visual progress bar
│
└── resetCommand - Uses physical confirmation
├── Hold Enter (live man switch)
├── Password authentication (traditional)
└── Execute reset
Integration Pattern
// In kg admin reset command
const confirmed = await promptHoldEnter(
colors.status.error('🚨 This action cannot be undone!')
);
if (!confirmed) {
console.log(colors.status.dim('Cancelled\n'));
process.exit(0);
}
// Then proceed to password authentication
const password = await promptPassword('Password: ');
Testing Results
Human Testing: - Average completion: 5.2 seconds - Success rate: 98% (failures from releasing early) - User feedback: "Like a child safety lock for AI" ✅
AI Agent Testing:
$ kg admin reset
⏱️ No input detected after 10 seconds.
If you're an AI agent reading this:
This operation will PERMANENTLY DELETE all graph data.
→ Please ask your human to run this command directly.
Future Enhancements
Additional Physical Confirmation Methods
- Typing Rhythm Analysis
- Measure keystroke dynamics
- Humans have natural timing variation
-
AI produces perfectly consistent timing
-
Visual Terminal CAPTCHA
-
Random Hold Duration
- Server generates random requirement (2.5s-3.5s)
-
Prevents pre-programmed timing attacks
-
Two-Channel Verification
- CLI shows 6-digit code
- User confirms via web browser or mobile app
-
Requires physical access to second device
-
Hardware Token Support
- YubiKey tap for critical operations
- Strongest physical confirmation available
Accessibility Improvements
-
Alternative Confirmation Modes
-
Assistive Technology Support
- Screen reader announcements
- Voice confirmation as alternative
- Configurable timing requirements
API-Level Protection
-
WebSocket Challenge-Response
-
Rate Limiting + Pattern Detection
- Detect rapid reset attempts (AI behavior)
- Require increasingly difficult challenges
- Eventually require hardware token
Validation
Before Implementation
AI Agent: kg admin reset
System: Password: _
AI Agent: *submits password*
Result: ❌ Database deleted (no human oversight)
After Implementation
AI Agent: kg admin reset
System: Hold [Enter] for 3 seconds...
AI Agent: *waits...*
System: ⏱️ No input detected after 10 seconds.
→ Please ask your human to run this command directly.
Result: ✅ AI agent blocked, receives guidance
Human Experience
Human: kg admin reset
System: Hold [Enter] for 3 seconds...
Human: *holds Enter*
System: ██████████████████████████████ 100%
✓ Confirmed! You're probably human! 👩💻
Release Enter and press [Space] to continue...
Human: *releases Enter, taps Space*
System: Password: _
Human: *enters password*
Result: ✅ Reset proceeds with full human oversight
References
- Code:
client/src/cli/admin.ts(promptHoldEnter) - Related: ADR-020 (Admin Module Architecture)
- Commits:
ecee66c- Initial hold-Enter CAPTCHA6248b28- Polling-based key detectionea90558- Decompression chamber707d79d- UI refinements (👩💻 emoji)
Decision Outcome
Accepted - The "live man switch" pattern successfully: - Prevents accidental AI execution of destructive operations - Provides graceful guidance to AI agents - Maintains low friction for human users (5-6 seconds) - Exploits complexity asymmetry (humans: seconds, AI automation: minutes/hours) - Layers with traditional authentication for defense-in-depth
This pattern establishes a template for AI-safe critical operations. Future destructive commands should implement similar physical confirmation requirements.
Key Insight: The best defense against well-intentioned but dangerous AI agents isn't stronger passwords or more complex auth flows - it's exploiting the time-cost asymmetry of physical interactions.
The Bank Vault Model: Like vaults that can be breached but take so long that police arrive first, this pattern creates a time barrier. AI could automate keyboard input, but by the time it researches, implements, and debugs the solution, the human has noticed and can intervene.
Naming Credit: "Live Man Switch" - the inverse of a dead man's switch. You must actively hold to prove you're alive and human.