feat(deploy): version /opt/hermes/* under deploy/eligia-vps/#3
Conversation
Before this commit, three files lived only on the eligia VPS at
/opt/hermes/* and would have been lost in disaster recovery:
- /opt/hermes/wamba_build/Dockerfile.eligia-overlay
- /opt/hermes/docker-compose.yml
- /opt/hermes/data/config.yaml
After this commit they are versioned in the fork under deploy/eligia-vps/.
The Dockerfile.eligia-overlay was already built FROM the fork source files
(gateway/, agent/, plugins/, etc.) via a snapshot at /opt/hermes/wamba_build/
— moving it into the fork means the build context IS the fork checkout
directly, eliminating the snapshot.
Also adds a README.md that:
- Documents the wiring (systemd → sops → docker compose → container).
- Lists the build + deploy commands.
- Enumerates the expected deltas between repo state and live filesystem
(decrypted env, mounted data dirs, container runtime state, etc.).
- Provides a step-by-step disaster-recovery procedure for rebuilding
from a blank VPS.
- Calls out follow-ups (versioning /etc/systemd/system/hermes.service,
migrating prod from /opt/hermes/wamba_build/ to /opt/hermes/source/).
The docker-compose.yml comments are also updated to point at the new
build path; no behaviour change. All secrets are still SOPS-encrypted in
Wizarck/eligia-core secrets.env and consumed via env var expansion at
container start time — nothing sensitive lands in this directory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR introduces a complete production deployment stack for the Hermes agent on ELIGIA's VPS. It includes a Dockerfile overlay that extends the upstream Hermes image with ELIGIA-specific modules and Langfuse observability, Docker Compose configuration for service orchestration, comprehensive agent runtime settings, and detailed deployment documentation with operation and recovery runbooks. ChangesELIGIA Hermes VPS Deployment
Sequence DiagramssequenceDiagram
participant Systemd as systemd<br/>(hermes.service)
participant SOPS as SOPS Secrets
participant Compose as Docker Compose
participant Container as Hermes Container<br/>(gateway run)
participant Langfuse as Langfuse<br/>(observability)
Systemd->>SOPS: Decrypt secrets
Systemd->>Compose: docker compose up
Compose->>Container: Start with env vars<br/>+ mounted config/data
Container->>Container: Load config.yaml from<br/>/opt/data/config.yaml
Container->>Langfuse: Send traces & costs<br/>(via plugin)
Note over Container: Healthcheck probes<br/>127.0.0.1:8642 (gateway)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
deploy/eligia-vps/Dockerfile.eligia-overlay (1)
47-47: ⚡ Quick winPin
langfuseto a bounded version to avoid breaking API changes.Using
langfuse>=3.0allows installation of v4.x, which has breaking changes from v3 (released March 2026). The v4 SDK restructures the observation-centric data model, removesupdate_current_trace, changes OpenTelemetry span export behavior, remaps API namespaces, and requires Pydantic v2. If the application was built against v3 APIs, pulling v4 will cause runtime failures. Pin tolangfuse>=3,<4to maintain reproducible builds and API compatibility.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/eligia-vps/Dockerfile.eligia-overlay` at line 47, The Dockerfile RUN line installs langfuse using an open-ended spec ("langfuse>=3.0") which may pull v4 and break v3-based code; update the package spec in the RUN pip install invocation to pin a bounded range (for example "langfuse>=3,<4") so builds remain reproducible and compatible with existing v3 APIs, then rebuild the image and verify imports that rely on v3 behavior (e.g., any code calling update_current_trace or relying on v3 OpenTelemetry export behavior).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deploy/eligia-vps/config.yaml`:
- Line 253: The configuration key tirith_fail_open is currently set to true
which makes Tirith bypass protections on failure; change tirith_fail_open to
false so Tirith operates fail-closed in production. Locate the tirith_fail_open
setting in the config (search for tirith_fail_open) and update its value to
false, then validate the config syntax and restart/redeploy the service so the
new fail-closed behavior takes effect.
- Line 169: The config currently sets redact_pii: false which can leak
user-identifying data; change the setting to enable PII redaction by setting
redact_pii to true (i.e., update the privacy.redact_pii configuration key in
deploy/eligia-vps/config.yaml) so production logs/observability redact
identifiers and sensitive content; ensure any related logging/observability
components read this flag so redaction is active in production.
In `@deploy/eligia-vps/README.md`:
- Around line 18-40: Add language identifiers to the two fenced code blocks that
currently lack them: change the opening fences for the block starting "systemd
unit: hermes.service" and the block containing "langfuse_client: Langfuse /
application: hermes-bot / consumer: HERMES" to use ```text (or another
appropriate language tag) so markdownlint MD040 passes; ensure the closing
fences remain ``` and update the other similar block mentioned in the comment as
well.
---
Nitpick comments:
In `@deploy/eligia-vps/Dockerfile.eligia-overlay`:
- Line 47: The Dockerfile RUN line installs langfuse using an open-ended spec
("langfuse>=3.0") which may pull v4 and break v3-based code; update the package
spec in the RUN pip install invocation to pin a bounded range (for example
"langfuse>=3,<4") so builds remain reproducible and compatible with existing v3
APIs, then rebuild the image and verify imports that rely on v3 behavior (e.g.,
any code calling update_current_trace or relying on v3 OpenTelemetry export
behavior).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 43e41186-a7b5-43dc-90ca-62ed3229198d
📒 Files selected for processing (4)
deploy/eligia-vps/Dockerfile.eligia-overlaydeploy/eligia-vps/README.mddeploy/eligia-vps/config.yamldeploy/eligia-vps/docker-compose.yml
| tool_progress: all | ||
| background_process_notifications: all | ||
| privacy: | ||
| redact_pii: false |
There was a problem hiding this comment.
Enable PII redaction for production traffic.
privacy.redact_pii: false risks leaking user identifiers/content into logs and observability systems.
Proposed change
privacy:
- redact_pii: false
+ redact_pii: true📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| redact_pii: false | |
| privacy: | |
| redact_pii: true |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/config.yaml` at line 169, The config currently sets
redact_pii: false which can leak user-identifying data; change the setting to
enable PII redaction by setting redact_pii to true (i.e., update the
privacy.redact_pii configuration key in deploy/eligia-vps/config.yaml) so
production logs/observability redact identifiers and sensitive content; ensure
any related logging/observability components read this flag so redaction is
active in production.
| tirith_enabled: true | ||
| tirith_path: tirith | ||
| tirith_timeout: 5 | ||
| tirith_fail_open: true |
There was a problem hiding this comment.
Set Tirith to fail-closed in production.
With tirith_fail_open: true, failures in Tirith bypass protections instead of blocking.
Proposed change
security:
redact_secrets: true
tirith_enabled: true
tirith_path: tirith
tirith_timeout: 5
- tirith_fail_open: true
+ tirith_fail_open: false📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| tirith_fail_open: true | |
| tirith_fail_open: false |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/config.yaml` at line 253, The configuration key
tirith_fail_open is currently set to true which makes Tirith bypass protections
on failure; change tirith_fail_open to false so Tirith operates fail-closed in
production. Locate the tirith_fail_open setting in the config (search for
tirith_fail_open) and update its value to false, then validate the config syntax
and restart/redeploy the service so the new fail-closed behavior takes effect.
| ``` | ||
| systemd unit: hermes.service | ||
| │ | ||
| ▼ | ||
| sops exec-env /opt/eligia/eligia-core/secrets/secrets.env | ||
| │ ↑ decrypts the SOPS-encrypted secrets file from the | ||
| │ eligia-core repo and injects all vars into the env | ||
| ▼ | ||
| docker compose up -d --force-recreate | ||
| │ ↑ reads /opt/hermes/docker-compose.yml (sibling of this README) | ||
| │ which references the env vars (`${ANTHROPIC_API_KEY_HERMES}`, | ||
| │ `${LANGFUSE_PUBLIC_KEY}`, ...) injected above | ||
| ▼ | ||
| Container `hermes` running image `eligia/hermes-agent:wamba` | ||
| │ ↑ built once from this Dockerfile.eligia-overlay; rebuild | ||
| │ whenever this directory changes | ||
| ▼ | ||
| Hermes loads /opt/data/config.yaml (mounted from /opt/hermes/data/config.yaml) | ||
| └──► plugins.enabled: [observability/langfuse] | ||
| └──► writes traces to Langfuse Cloud with | ||
| metadata.application = "hermes-bot" | ||
| metadata.consumer = "HERMES" | ||
| ``` |
There was a problem hiding this comment.
Add language identifiers to fenced code blocks.
Two fences are missing language tags, which fails markdownlint (MD040).
Proposed change
-```
+```text
systemd unit: hermes.service
...
-```
+```
...
-```
+```text
langfuse_client: Langfuse
application: hermes-bot
consumer: HERMES
-```
+```Also applies to: 98-102
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 18-18: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/README.md` around lines 18 - 40, Add language identifiers
to the two fenced code blocks that currently lack them: change the opening
fences for the block starting "systemd unit: hermes.service" and the block
containing "langfuse_client: Langfuse / application: hermes-bot / consumer:
HERMES" to use ```text (or another appropriate language tag) so markdownlint
MD040 passes; ensure the closing fences remain ``` and update the other similar
block mentioned in the comment as well.
…rch#25071) * tui: make URLs clickable + hover-highlight in any terminal Problem ------- URLs printed by `hermes --tui` were not clickable in basic macOS Terminal.app. Cmd+click did nothing, the cursor didn't change shape — like nothing was detected — even though arrow buttons and other Box onClick handlers worked fine. Root cause ---------- Two layers of dead plumbing: 1. `<Link>` only emitted the underlying `<ink-link>` (which carries the hyperlink metadata into the screen buffer) when `supportsHyperlinks()` said yes. On Apple_Terminal that's false, so the per-cell hyperlink field stayed empty, so `Ink.getHyperlinkAt()` had nothing to return on click. The visible underline was just decorative. 2. `Ink.openHyperlink()` calls `this.onHyperlinkClick?.(url)`, but `onHyperlinkClick` was never assigned anywhere in the codebase. The click pipeline (`App.tsx → onOpenHyperlink → Ink.openHyperlink`) ran but bailed silently on the optional chain. Bonus discovery: even when wired up, there was no hover affordance — terminal apps can't change the system mouse cursor, so users had no visual signal that a cell was clickable. Arrow buttons in the chrome worked because they had explicit `<Box onClick>` styling; inline link URLs didn't. Fix --- - `Link.tsx`: always emit `<ink-link>` regardless of terminal capability. The renderer's `wrapWithOsc8Link` already gates the actual OSC 8 escape on `supportsHyperlinks()` further down — so terminals that don't understand OSC 8 still don't see the escape, but the screen-buffer metadata (which the click dispatcher reads) is now populated everywhere. - `ink.tsx + root.ts`: add `onHyperlinkClick?: (url: string) => void` to `Options` / `RenderOptions`, wire it to the existing `Ink.onHyperlinkClick` field in the constructor. - `src/lib/openExternalUrl.ts`: small platform-aware opener using `child_process.spawn` with arg-array (no shell) — http(s) only, rejects `file:`, `javascript:`, `data:`, etc., so a hostile model can't trigger arbitrary local handlers via `<Link url="file:///...">`. Detached + stdio ignore so closing the TUI doesn't kill the browser and Chrome stderr doesn't leak into the alt screen. - `entry.tsx`: pass `onHyperlinkClick: openExternalUrl` to `ink.render`. - `hyperlinkHover.ts` + Ink hover wiring: track the URL under the pointer in `Ink.hoveredHyperlink`, update it from `dispatchHover`, and inverse- highlight every cell of the matching link in the render-pass overlay (same pattern as `applySearchHighlight`). This is the cursor-hover affordance for clickable links — terminals don't expose cursor shape, so we light up the link itself. - `types/hermes-ink.d.ts`: add `onHyperlinkClick` to the `RenderOptions` shim so consumers (`entry.tsx`) type-check against the new option. Tests ----- - `src/lib/openExternalUrl.test.ts` (15 cases): http(s) accepted; file/js/ data/mailto/ftp/ssh rejected; macOS open(1), Windows cmd.exe start with empty title slot, Linux xdg-open dispatch; shell-metacharacter URLs pass through unmolested as a single argv element; synchronous spawn failure returns false. Verified empirically in Apple Terminal 455.1 (macOS 15.7.3): clicking a URL opens in default browser, hovering inverts the link cells, and moving away clears the highlight. Full TUI suite: 713 passing, 0 type errors. Reverts ------- The earlier attempt that version-gated Apple_Terminal in `supports-hyperlinks.ts` was based on a wrong assumption — Terminal.app silently strips OSC 8 sequences but does not render them as clickable hyperlinks. Reverted to the original allowlist. * tui: address Copilot review — explorer.exe on win32 + comment fixes - openExternalUrl: switch win32 from `cmd.exe /c start` to `explorer.exe`. cmd.exe's `start` builtin reparses the URL through cmd's tokenizer, so `&`, `|`, `^`, `<`, `>` either split the command or get reinterpreted — breaking both the protocol-allowlist safety story AND plain http(s) URLs with `&` in query strings. `explorer.exe <url>` invokes the registered protocol handler directly with no shell. - openExternalUrl.test.ts: rename the win32 test to reflect the new contract and add two regression tests — one with `&|^<>` metachars, one with the common analytics-URL `&` query-param pattern — both pinned to single-argv-element delivery via explorer.exe. - Link.tsx: fix misleading comment. OSC 8 escapes are emitted unconditionally by the renderer (`wrapWithOsc8Link` in render-node-to-output.ts, `oscLink` in log-update.ts). Non-supporting terminals silently strip the sequence, which is why hover/click affordance has to come from the in-process overlay rather than the terminal's own link rendering. Verified: 715/715 tests pass, type-check + build clean. * tui: address Copilot review #2 — async spawn errors + hover scope + docs 1. openExternalUrl: attach a no-op `'error'` listener on the spawned child BEFORE unref(). spawn() returns a ChildProcess synchronously even when the binary is missing (ENOENT on xdg-open / explorer.exe), unreachable, or otherwise unusable; the failure surfaces later as an 'error' event. An unhandled 'error' on an EventEmitter crashes Node, which would tear down the whole TUI. The listener is a deliberate no-op — we already returned `true` synchronously and the user just doesn't see the browser pop. 2. openExternalUrl.test.ts: add a regression test using a real EventEmitter to simulate the async-error path. Pins both the listener-attached contract and the "doesn't throw on emit" behavior. Was 17/17, now 18/18. 3. ink.tsx dispatchHover: bypass `getHyperlinkAt()` and read `cellAt(...).hyperlink` directly. `getHyperlinkAt` falls back to `findPlainTextUrlAt` for cells without an OSC 8 hyperlink, but the render-pass overlay (`applyHyperlinkHoverHighlight`) only matches on `cell.hyperlink === hoveredUrl` — so plain-text URLs would burn re-renders without ever producing the highlight. Hover is now a strictly 1:1 fit for what the overlay can paint. Plain-text URLs still get the click action via the existing dispatch path. 4. root.ts + ink.tsx doc comments: replace the misleading "typically `open` / `xdg-open` / `start` shell" wording with the actual safe recipe — argv-array spawn into `open` / `xdg-open` / `explorer.exe`, with an explicit warning that `cmd.exe /c start` reparses the URL through cmd's tokenizer and is unsafe + breaks `&`-query URLs. Verified: 716/716 tests pass, type-check + build clean. * tui: address Copilot review #3 — hover damage, alt-screen cleanup, opener allowlist 1. ink.tsx onRender: stop folding steady-state hover into hlActive. hlActive forces a full-screen damage diff so previous-frame inverted cells get re-emitted when the highlight set changes. The transition IS the trigger — enter / leave / change-to-other-link. While the pointer just sits on a link the painted cells don't change and the per-cell diff handles the no-op. Folding the steady state in would burn a full-screen diff on every frame. Added a lastRenderedHoveredHyperlink tracker and gate the hlActive bump on `hovered !== lastRendered`. 2. ink.tsx setAltScreenActive: clear hoveredHyperlink (and the tracker) when toggling alt-screen state. Hover dispatch is alt-screen-gated, so once we leave there's no path to clear it. Without this, remounting <AlternateScreen> would paint a phantom hover from the previous session until the next mouse-move arrived. 3. openExternalUrl.ts openCommand: allowlist linux + the BSD family for xdg-open and return null for everything else (aix, sunos, cygwin, haiku, etc.). Previously the default-fallback always returned xdg-open, which made the caller's `if (!command) return false` dead and yielded a misleading `true` on platforms that probably don't have xdg-open. New tests cover the null path AND the openExternalUrl-returns-false-without-spawning behavior. Verified: 718/718 tests pass, type-check + build clean. * tui: address Copilot review #4 — doc comment accuracy 1. openExternalUrl return-value doc: now lists all three false paths (URL rejected / no opener for platform / synchronous spawn throw) plus a note that async 'error' events still return true because the spawn was attempted. 2. ink.tsx onHyperlinkClick field doc: clarifies the callback receives either an OSC 8 hyperlink OR a plain-text URL detected by findPlainTextUrlAt — App.tsx routes both into the same callback. 3. hyperlinkHover applyHyperlinkHoverHighlight doc: drops the misleading 'caller forces full-frame damage' promise. Caller decides; for hover the current caller only forces full damage on transitions. No behavior change. 718/718 tests pass. * tui: address Copilot review #5 — lint fixes 1. ink.tsx: reorder `./hyperlinkHover.js` import before `./screen.js` to satisfy perfectionist/sort-imports. 2. Link.tsx: drop unused `fallback` parameter destructuring + the trailing `void (null as ...)` dead-statement (would trip no-unused-expressions). Kept `fallback?: ReactNode` on the Props interface as a documented compat shim so existing call sites still compile, with a comment explaining why it's no longer wired up. 3. openExternalUrl.test.ts: replace `typeof import('node:child_process').spawn` inline annotations (forbidden by @typescript-eslint/consistent-type-imports) with a `SpawnLike` type alias backed by a real `import type { spawn as SpawnFn }`. No behavior change. 718/718 tests pass, type-check clean, lint clean on all modified files.
Summary
Closes the disaster-recovery gap flagged after the T6 deploy: three files were living only on the eligia VPS at
/opt/hermes/*and would have been lost on a fresh-VPS rebuild. This PR brings them under git indeploy/eligia-vps/so the fork is now the full source of truth for the Hermes deployment.What lands here
/opt/hermes/wamba_build/Dockerfile.eligia-overlaydeploy/eligia-vps/Dockerfile.eligia-overlay/opt/hermes/docker-compose.ymldeploy/eligia-vps/docker-compose.yml/opt/hermes/data/config.yamldeploy/eligia-vps/config.yamldeploy/eligia-vps/README.mdBuild-context simplification
The Dockerfile.eligia-overlay was already
COPY-ing from paths that exist as the fork's actual source (gateway/,agent/,plugins/). Before this PR, those files lived as a hand-managed snapshot at/opt/hermes/wamba_build/; with this PR the build context IS the fork checkout directly — no more snapshot drift between fork main and the VPS.Security
No secrets land in this directory. All sensitive values (
ANTHROPIC_API_KEY_HERMES,LANGFUSE_*,TELEGRAM_BOT_TOKEN,WA_ACCESS_TOKEN, ...) are still SOPS-encrypted inWizarck/eligia-core/secrets/secrets.envand injected viasops exec-envin thehermes.servicesystemd unit. The compose file uses${VAR}placeholders exclusively.config.yamlhasapi_key: ''empty strings.Runbook highlights (in deploy/eligia-vps/README.md)
docker build … && systemctl restart hermes).Follow-ups (NOT in this PR)
/opt/hermes/wamba_build/snapshot →/opt/hermes/source/fork checkout. Today the live VPS still uses the snapshot./etc/systemd/system/hermes.service(probably asdeploy/eligia-vps/hermes.service).Test plan
test/nix-ubuntufailures remain unchanged (not my concern; established in PR chore(whatsapp-mcp): remove HITL regex receive-side (superseded by waba-mcp payload routing) #1 + feat(langfuse): inject application=hermes-bot + consumer=HERMES into traces #2 as baseline).Generated with Claude Code.
Summary by CodeRabbit
New Features
Documentation