How one engineer runs a 24/7 multi-agent AI stack on bare metal.
Opinionated. Dogfooded. Broken-and-fixed in production. Tested in service.
🦞 No fluff. No theory without implementation. Every guide documents what was actually deployed, how to verify it, and what broke along the way.
This is a working cookbook for one specific stack: a single-engineer setup that runs an always-on multi-agent AI orchestrator on bare-metal Linux, with a homelab behind it for self-hosting, security tooling, and knowledge management.
It is not a framework, not a product, not a tutorial series. It is a record of what is actually deployed, why each piece is shaped the way it is, and what broke along the way. Lift any single piece. Adopt the whole thing. Or use it as a counterexample. All three are valid.
The agent layer runs on OpenClaw, but the patterns generalize. Most of the guides below would apply with light adaptation to Hermes Agent, Claude Code's agent SDK, or any orchestrator that wraps a real LLM with real tools.
┌─────────────────────────────────────┐
│ Bare metal Linux (single host) │
│ Local LLM stack + agents │
└────────────────┬────────────────────┘
│
┌─────────────────────────────┼─────────────────────────────┐
│ │ │
┌───▼──────┐ ┌──────▼──────┐ ┌─────▼─────┐
│ Homelab │ │ Automation │ │ Knowledge │
│ (LXC/VM) │ │ (cron/n8n) │ │ systems │
└──────────┘ └─────────────┘ └────────────┘
│ │ │
┌───▼─────────┐ ┌─────────▼─────────┐ ┌─────────▼─────┐
│ Self-host: │ │ Cron, hooks, │ │ Memory, docs, │
│ media, NAS, │ │ sandbox shims, │ │ search, sync │
│ security │ │ scheduled jobs │ │ workflows │
└─────────────┘ └───────────────────┘ └───────────────┘
The guides assume a specific provider mix. You can substitute, but if you want a known-good baseline:
- Codex Pro ($200/mo) OAuth: main agent + coder. This is the happy path. One flat subscription covers orchestration, code generation, and most cron work. Codex OAuth slots cleanly into OpenClaw's primary-model path and has been the most stable surface across the 2026.4.x releases. Start here.
- Claude Opus via ACP: escalation only. Intel, design, architecture review, and academic work. Run it through the ACPX plugin, not as a direct OpenClaw provider.
- Ollama (free): embeddings, commit messages, triage. Local, fast, no round-trip.
As of April 2026, pointing an OpenClaw agent at your Claude Max subscription OAuth has two problems that make it a non-starter:
- Extra usage charges. Anthropic started metering traffic that arrives through third-party harnesses against your subscription in ways that show up as additional usage on top of normal Max caps. You can burn through quota far faster than the same work would cost through the first-party Claude client.
- System-prompt-level blocking. Claude detects that it's running inside a non-Anthropic harness and injects guidance that degrades behavior (refusals, hedging, dropping tool calls). Prompt-level workarounds don't stick.
The only sensible path to Opus from OpenClaw is ACP. The ACPX plugin launches the official Claude Code CLI as a subprocess. Anthropic's own client handles the OAuth handshake, so the usage accounting and system-prompt behavior stay normal. OpenClaw connects to it over the Agent Client Protocol and treats the session as an escalation sub-agent.
Full migration runbook in claude-cli → ACP Migration.
There is nothing to install. This is a collection of standalone guides. Pick the one that solves a problem you have right now:
- automation/cron-patterns.md: decide which layer (systemd, agent cron, n8n) each scheduled task in your stack actually belongs in
- ai-stack/multi-model-orchestration.md: wire one orchestrator across many models with the right model per task
- security/linux-hardening.md: UFW, SSH hardening, fail2ban, and defense in depth for the host
- infrastructure/backup-recovery.md: restic to NAS + cloud, twice daily, with snapshot mounts
Read these in order:
- knowledge/memory-token-optimization.md: the three-tier layout, local embeddings, and why the index stays tiny
- knowledge/memory-architecture.md: how cards decay, when to verify memory against live state, and how stale claims get replaced
- ai-stack/self-improving-agents.md: the memory sweep workflow that promotes recent sessions into durable knowledge
- knowledge/claude-code-memory-handoffs.md: cross-machine handoffs and the ingest path back into canonical memory
- automation/openclaw-cron-deep-dive.md: scheduling patterns for sweep jobs, decay scans, and quiet-hour-safe maintenance
| Guide | Description | Platform |
|---|---|---|
| Multi-Model Orchestration | Run GPT 5.5, ACP Opus, browser-LLM skills, and Ollama in one setup with the right model per task | Any |
| claude-cli → ACP Migration | Move Opus off the main-agent slot after Anthropic's April 2026 subscription-OAuth block | Anthropic |
| Claude Code via ACP | Running Claude Code as an ACP-driven escalation agent after Anthropic's April 2026 harness block | Any |
| Sub-Agent Patterns | Spawn patterns, model assignment, ACP escalation, error handling, and the wrapper script | Any |
| GPT 5.5 Orchestration | Tool-call narration guard, strict-agentic detection gaps, silent-tool-loop triage, action-verb tuning | Any |
| Self-Improving Agents | Correction capture, behavioral-guard plugins (tool-narration-guard, tokenjuice), memory sweeps, and promotion rules | Any |
| Session Management | Why single-chat apps bottleneck your agent, Discord channel layouts, cron isolation, and the hybrid approach | Any |
| Skills Development | Write custom skills, structure for discoverability, real-world examples, and skill management | Any |
| Prompt Caching | Cache hygiene across Anthropic and OpenAI, so you avoid silent cost/quota leaks | Any |
| Compaction & Context Tuning | Compaction, memory flush, context pruning, and session search for long-running agents | Any |
| Guide | Description | Platform |
|---|---|---|
| Cron Patterns | The three-layer cron stack: systemd timers vs agent cron vs n8n schedule triggers, where each scheduled task belongs | Any |
| OpenClaw Cron Deep-Dive | Heartbeat batching, thinking-budget aliases, explicit delivery routing, quiet hours, and real-incident gotchas | OpenClaw |
| Multi-Channel Setup | Discord, Telegram, Signal routing, session isolation, ACP threads, and access control | Any |
| Hooks | Three-layer hook model: boundary (git pre-push, outbound-scrub CLIs), tool-call (PreToolUse/PostToolUse, OpenClaw before_tool_call/tool_result_persist), lifecycle (SessionStart, before_prompt_build, message_sending) |
Any |
| n8n Patterns | Three interfaces (n8n-ops-mcp, REST API, direct sqlite), Code node sandbox + task-runner constant-folding trap, failure-classifier topology | n8n |
| Guide | Description | Platform |
|---|---|---|
| Backup & Recovery | Restic to NAS + Google Drive, twice-daily schedule, snapshot mounts, and disaster recovery | Any |
| Upgrade Hygiene | Surviving openclaw update: systemd regeneration, dist patches, OAuth sync, schema drift |
Any |
| Guide | Description | Platform |
|---|---|---|
| Memory & Token Optimization | Three-tier memory architecture, local embedding search, memory sweep cadence, and 50-100x token reduction | Any |
| Claude Code Memory Handoffs | Cross-machine sync format and scheduled ingest path that keeps OpenClaw the canonical memory owner | Any |
| Memory Architecture | Operating model: memory as point-in-time claims, trust hierarchy, write/verify/decay loops, and stale-card handling | Any |
| Obsidian Sync Without Conflict Roulette | One canonical vault, one sync layer, and strict writer rules for bidirectional sync that stays boring | Any |
| Session JSONL as Memory Source, Not Noise | Search transcript logs for evidence, then promote only durable facts into memory | OpenClaw |
| Guide | Description | Platform |
|---|---|---|
| Linux Hardening | UFW, SSH hardening, fail2ban, service binding, and defense-in-depth for an OpenClaw host | Ubuntu 24.04 |
| WSL2 Hardening | Windows Firewall, RDP/SSH/SMB lockdown, port proxy hygiene, sleep prevention, and dual-OS defense | Windows 11 + WSL2 |
| Agent Security | API gateway isolation, RBAC, sandboxing, circuit breakers, and a real post-mortem from a sub-agent nuking a database | Any |
The physical layer: choosing the box, partitioning the disk, deciding what the host OS owns vs what gets virtualized. See hardware/.
Index of MCP servers, dashboards, and helpers shipped from this stack. See tools/.
Why this stack is shaped the way it is. What I won't do. See philosophy/.
Drop-in artifacts you can lift without adopting the whole thing. See templates/.
| Template | Used by |
|---|---|
templates/cron/ |
systemd timer, agent cron, n8n schedule trigger skeletons, paired with automation/cron-patterns.md |
templates/hooks/ |
git pre-push, Claude Code PostToolUse, OpenClaw sync plugin skeletons, paired with automation/hooks.md |
Engineers running an always-on AI agent on real infrastructure: bare metal, VPS, homelab, or enterprise. If you have an agent that has access to your systems, you need to lock it down properly. These guides assume you're comfortable with Linux administration and want actionable steps, not vague overviews.
🦞 Built by an engineer who runs this stack 24/7 on bare metal and broke everything at least once so you don't have to.
Every guide follows the same skeleton. See CONTRIBUTING.md for the full template:
- What this is and who it's for
- Why this way: tradeoffs vs the obvious alternatives
- Prerequisites
- Before / After
- Implementation with real commands
- Verification commands you can run right now
- Gotchas from real deployments
- Templates + Related cross-links
Reference implementation: automation/cron-patterns.md.
PRs welcome. See CONTRIBUTING.md. Two non-obvious rules:
- No personal hostnames or IPs in committed text. Use generic terms.
- Every guide ends with a Gotchas section. If nothing broke, the guide is incomplete.
A pre-push hook ships at hooks/pre-push that runs content-guard over the working tree to catch leaks before they hit the remote. Activate after cloning:
git config core.hooksPath hooks- OpenClaw: the AI agent framework this stack runs on
- content-guard: the policy-driven scanner used by the pre-push hook
- ops-deck-oss: self-hosted ops dashboard
- n8n-ops-mcp, jellyfin-mcp, mcporter: MCPs from this stack
- openclaw-overlay: HUD overlay for session monitoring
- usage-tracker: token usage and cost analytics
- Code, scripts, and templates: MIT
- Narrative content (guides, manifestos, prose): CC BY-NC-ND 4.0 🦞
