Skip to content

feat(deploy): hermes.service systemd unit + migrate-from-wamba-build.sh#5

Merged
Wizarck merged 1 commit into
mainfrom
feat/deploy-systemd-and-migration
May 13, 2026
Merged

feat(deploy): hermes.service systemd unit + migrate-from-wamba-build.sh#5
Wizarck merged 1 commit into
mainfrom
feat/deploy-systemd-and-migration

Conversation

@Wizarck

@Wizarck Wizarck commented May 13, 2026

Copy link
Copy Markdown
Owner

Summary

Two more pieces that lived only on the VPS now live in the fork:

File Purpose
deploy/eligia-vps/hermes.service Systemd unit drop-in for /etc/systemd/system/hermes.service. SOPS decrypt + docker-compose up.
deploy/eligia-vps/migrate-from-wamba-build.sh One-shot, idempotent script that migrates the VPS from the legacy /opt/hermes/wamba_build/ snapshot to a git checkout at /opt/hermes/source/. Backs up old files, symlinks compose/config from the repo, installs the systemd unit, rebuilds the image, restarts hermes, polls healthcheck.

After this PR + one run of the script

Future deploys become:

git -C /opt/hermes/source pull &&   docker build -f /opt/hermes/source/deploy/eligia-vps/Dockerfile.eligia-overlay                -t eligia/hermes-agent:wamba                /opt/hermes/source &&   systemctl restart hermes

— no more scp'ing or hand-editing.

README

Updated to:

  • Remove the "TODO: migrate to /opt/hermes/source/" follow-up section (now resolved by the script).
  • Add a curl|bash one-liner for running the migration.
  • List hermes.service in the contents table.
  • Note that the legacy snapshot dir still lives on the VPS until the script flags it for manual deletion (after verifying the new flow).

Test plan

  • Script is executable (100755).
  • No secrets in any of the new files.
  • CI passes (expected: same baseline as previous deploy PRs).
  • To exercise post-merge: run the script on the VPS and verify healthcheck stays green.

Order of operations

  1. Wait for #4 (upstream sync) to merge.
  2. Merge this PR.
  3. Run the migration script on the VPS.
  4. Verify everything is green + remove legacy /opt/hermes/wamba_build/.

Generated with Claude Code.

Summary by CodeRabbit

Release Notes

  • Documentation

    • Updated production deployment documentation to reflect versioned repository-based configuration
  • Chores

    • Added systemd service configuration for automated Hermes gateway deployment and lifecycle management
    • Added automated migration script to transition from legacy snapshot-based deployment to versioned repository approach

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 13, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@Wizarck has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 45 minutes and 50 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4c6fa494-ee00-4ac4-97f8-5159900b631b

📥 Commits

Reviewing files that changed from the base of the PR and between 3c1db255fa2492ed16e1cb18dff6e079f535e447 and 89b30c4.

📒 Files selected for processing (3)
  • deploy/eligia-vps/README.md
  • deploy/eligia-vps/hermes.service
  • deploy/eligia-vps/migrate-from-wamba-build.sh
📝 Walkthrough

Walkthrough

This PR adds production deployment infrastructure for Hermes on the eligia VPS, transitioning from legacy unversioned snapshots to repo-managed configuration. It introduces a systemd service unit, an idempotent migration script, and updates deployment documentation accordingly.

Changes

Hermes VPS deployment infrastructure

Layer / File(s) Summary
Systemd service unit for Hermes
deploy/eligia-vps/hermes.service
Systemd unit configures Hermes to depend on docker and network readiness, uses SOPS to decrypt secrets from eligia-core, and manages Docker Compose lifecycle with forced recreation on start and teardown on stop.
VPS migration and setup script
deploy/eligia-vps/migrate-from-wamba-build.sh
Idempotent Bash script clones repo to /opt/hermes/source, backs up and symlinks compose and config files, conditionally deploys the systemd unit, rebuilds the Docker image, and polls container health status with 10-second retry intervals.
README updates and migration instructions
deploy/eligia-vps/README.md
Documentation adds entries for the new systemd unit and migration script, includes copy-paste curl command to execute the migration, and removes the prior note that hermes.service is not versioned.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • Wizarck/hermes-agent#3: The migration script and systemd unit added here are tightly coupled to the repo artifacts introduced in that PR (docker-compose.yml, config.yaml, and Dockerfile.eligia-overlay), which they deploy and manage in /opt/hermes.

Poem

🐰 A service takes the stage so grand,
With secrets decrypt'd across the land,
And scripts to guide the VPS way—
From snapshots old to repos, hooray! 🚀
Now Hermes runs with versioned care,
And migrations float through systemd air.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main changes: introduction of a systemd unit and migration script for the Hermes deployment on the VPS infrastructure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/deploy-systemd-and-migration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (3)
deploy/eligia-vps/hermes.service (1)

16-17: ⚡ Quick win

Specify compose file explicitly in ExecStart for consistency.

ExecStart relies on WorkingDirectory=/opt/hermes to find docker-compose.yml, but ExecStop explicitly specifies -f /opt/hermes/docker-compose.yml. This inconsistency makes the unit harder to understand and could cause issues if WorkingDirectory behavior changes or if the compose file is in a different location.

♻️ Proposed refactor for consistency
-ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env 'docker compose up -d --force-recreate'
+ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env '/usr/bin/docker compose -f /opt/hermes/docker-compose.yml up -d --force-recreate'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/hermes.service` around lines 16 - 17, Update the systemd
unit's ExecStart to explicitly pass the compose file path like ExecStop does:
modify the ExecStart line that currently runs "/usr/local/bin/sops exec-env
/opt/eligia/eligia-core/secrets/secrets.env 'docker compose up -d
--force-recreate'" so it includes "-f /opt/hermes/docker-compose.yml"
(preserving the sops wrapper and flags) to match ExecStop and avoid relying on
WorkingDirectory; reference the ExecStart and ExecStop directives when making
the change.
deploy/eligia-vps/migrate-from-wamba-build.sh (1)

39-44: 💤 Low value

Symlink replacement doesn't backup the previous target.

If ${COMPOSE_LIVE} is already a symlink (perhaps pointing to a different location), the condition on line 39 prevents creating a backup, and ln -sf on line 44 silently replaces it. While this might be intentional for idempotency, it could surprise operators who manually changed the symlink target.

📝 Optional: Warn when replacing existing symlink
 if [[ -f "${COMPOSE_LIVE}" && ! -L "${COMPOSE_LIVE}" ]]; then
   echo "  ➤ Backing up ${COMPOSE_LIVE} → ${COMPOSE_LIVE}.bak-${TS}"
   cp "${COMPOSE_LIVE}" "${COMPOSE_LIVE}.bak-${TS}"
+elif [[ -L "${COMPOSE_LIVE}" ]]; then
+  echo "  ⚠ ${COMPOSE_LIVE} is already a symlink ($(readlink "${COMPOSE_LIVE}")); replacing it"
 fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/migrate-from-wamba-build.sh` around lines 39 - 44, The
script currently only backs up when ${COMPOSE_LIVE} is a regular file and
silently replaces existing symlinks; modify the logic around the COMPOSE_LIVE
handling to also detect when it's a symlink (use readlink on ${COMPOSE_LIVE}),
create a timestamped backup of the symlink target (or at least warn and copy the
resolved file to ${COMPOSE_LIVE}.bak-${TS}) before running ln -sf, and ensure
the echo message distinguishes backing up a symlink target versus a regular
file; refer to the COMPOSE_LIVE variable, the existing backup naming pattern
(.bak-${TS}), the readlink resolution, and the ln -sf invocation when
implementing the change.
deploy/eligia-vps/README.md (1)

84-88: 💤 Low value

Migration script downloads from live main branch.

The curl command fetches from main, which means the script version can change between when these docs are read and when the command is run. While convenient for always getting the latest version, this creates a reproducibility gap and prevents verification of the exact script being executed. Consider documenting a verification step or using a tagged release for production migrations.

📝 Optional: Add verification or use tagged version

Add a verification step before execution:

> ```bash
> # Download and review before executing
> ssh root@178.104.140.21 'cd /tmp && \
>   curl -sL -o migrate.sh https://raw.githubusercontent.com/Wizarck/hermes-agent/main/deploy/eligia-vps/migrate-from-wamba-build.sh && \
>   less migrate.sh && \
>   bash migrate.sh'
> ```

Or reference a specific commit/tag instead of main for reproducible deployments.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/README.md` around lines 84 - 88, The README's
remote-install curl command downloads migrate-from-wamba-build.sh from the
repo's main branch, which breaks reproducibility and auditability; update the
command in deploy/eligia-vps/README.md to either point to a specific tag or
commit SHA for migrate-from-wamba-build.sh or add a download-and-verify step
that saves the script (e.g., curl -o migrate.sh), lets the operator
inspect/verify its contents/signature, and only then executes it; locate the
curl line that references
https://raw.githubusercontent.com/.../main/deploy/eligia-vps/migrate-from-wamba-build.sh
and replace it with a pinned URL or add the download-review-execute workflow
described above.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deploy/eligia-vps/hermes.service`:
- Line 16: The ExecStart command uses an unqualified "docker compose" inside the
sops exec-env invocation which can fail under a restricted PATH; update the
ExecStart line so the docker executable is referenced via its absolute path (use
/usr/bin/docker) inside the sops exec-env command string (i.e., replace the
unqualified "docker compose" with "/usr/bin/docker compose") to match the usage
on the following line and ensure systemd can find the binary.
- Line 16: Add a prerequisite validation before the migration step that restarts
the service: check that /opt/eligia/eligia-core/secrets/secrets.env exists and
is decryptable before invoking the systemd restart that triggers ExecStart in
hermes.service. Implement a bash check in the migration script (before "step 5"
/ the restart operation) that tests file existence (stat/-f) and attempts a
harmless sops decryption (e.g., sops --decrypt or sops exec-env dry-run) to
verify the AGE key can decrypt; if either check fails, print the two-line error
message from the review and exit nonzero so the healthcheck path (which relies
on ExecStart) never runs. Ensure the error message exactly mentions the
secrets.env path and the README prerequisite to help operators.
- Line 11: The service unit hardcodes
SOPS_AGE_KEY_FILE=/root/.config/sops/age/keys.txt (hermes.service Environment)
but the migration/disaster-recovery flow doesn't validate that file exists;
update the migration script that runs before step 2 to check for the age key at
/root/.config/sops/age/keys.txt and fail with a clear message if missing, or
alternatively add an explicit prerequisite in the README/disaster recovery
instructions that the SOPS age key must be present at that exact path before
performing step 2; reference the hermes.service Environment variable name
SOPS_AGE_KEY_FILE when adding the check/docs so maintainers can locate the
related configuration.

In `@deploy/eligia-vps/migrate-from-wamba-build.sh`:
- Around line 64-67: The docker build step currently overwrites/creates the
production IMAGE_TAG directly and can leave the system inconsistent on failure;
change the flow to build to a temporary/test tag (e.g., IMAGE_TAG_TMP) using
SOURCE_DIR and DOCKERFILE_REL, verify the image exists and passes a quick smoke
test (docker inspect and a short container run that checks service start), and
only after the verification succeeds replace the production IMAGE_TAG, update
symlinks and reload the systemd unit; if verification fails, abort and leave
current production image/unit unchanged and log actionable error/rollback steps.
- Around line 73-80: The health-check loop using the docker inspect hermes
command may exit after 12 attempts without failing the script; after the
for-loop finishes check the final status variable and if it is not "healthy"
emit a prominent error message (including the last observed ${status}) and exit
non‑zero so the deployment fails visibly; otherwise continue normally. Reference
the loop that sets status via docker inspect hermes --format
'{{.State.Health.Status}}' and the status variable to implement the post-loop
check and exit behavior.
- Around line 55-57: The current block compares
SOURCE_DIR/deploy/eligia-vps/hermes.service with
/etc/systemd/system/hermes.service and always attempts to back up the target,
which fails if the target doesn't exist; modify the logic so you only create the
backup if /etc/systemd/system/hermes.service exists (e.g., wrap the cp
"/etc/systemd/system/hermes.service"
"/etc/systemd/system/hermes.service.bak-${TS}" in a test -e or [ -f ] check),
keeping the existing cmp -s comparison and the subsequent copy from
"${SOURCE_DIR}/deploy/eligia-vps/hermes.service" to
/etc/systemd/system/hermes.service unchanged.
- Line 15: The script currently sets strict shell options (set -euo pipefail)
but lacks a root check, so add an early guard that verifies the script is
running as root (check $EUID or id -u) and exits with a clear error if not;
place this check near the top of migrate-from-wamba-build.sh immediately after
the existing set -euo pipefail line and before any operations that touch
/etc/systemd/system/ or install systemd units to prevent cryptic permission
errors.
- Around line 28-29: The script runs git -C "${SOURCE_DIR}" fetch origin main
followed immediately by git -C "${SOURCE_DIR}" reset --hard origin/main which
can discard local changes or act on a stale fetch; modify the logic to verify
the fetch succeeded (check the exit status of the git fetch called in the block
using SOURCE_DIR) and abort with an error message if it failed, and before
running reset --hard detect and warn (or require confirmation/log) if there are
local changes (use git -C "${SOURCE_DIR}" status --porcelain or equivalent) so
the destructive reset is only performed after a successful fetch and an explicit
warning/consent when local work would be lost.

---

Nitpick comments:
In `@deploy/eligia-vps/hermes.service`:
- Around line 16-17: Update the systemd unit's ExecStart to explicitly pass the
compose file path like ExecStop does: modify the ExecStart line that currently
runs "/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env
'docker compose up -d --force-recreate'" so it includes "-f
/opt/hermes/docker-compose.yml" (preserving the sops wrapper and flags) to match
ExecStop and avoid relying on WorkingDirectory; reference the ExecStart and
ExecStop directives when making the change.

In `@deploy/eligia-vps/migrate-from-wamba-build.sh`:
- Around line 39-44: The script currently only backs up when ${COMPOSE_LIVE} is
a regular file and silently replaces existing symlinks; modify the logic around
the COMPOSE_LIVE handling to also detect when it's a symlink (use readlink on
${COMPOSE_LIVE}), create a timestamped backup of the symlink target (or at least
warn and copy the resolved file to ${COMPOSE_LIVE}.bak-${TS}) before running ln
-sf, and ensure the echo message distinguishes backing up a symlink target
versus a regular file; refer to the COMPOSE_LIVE variable, the existing backup
naming pattern (.bak-${TS}), the readlink resolution, and the ln -sf invocation
when implementing the change.

In `@deploy/eligia-vps/README.md`:
- Around line 84-88: The README's remote-install curl command downloads
migrate-from-wamba-build.sh from the repo's main branch, which breaks
reproducibility and auditability; update the command in
deploy/eligia-vps/README.md to either point to a specific tag or commit SHA for
migrate-from-wamba-build.sh or add a download-and-verify step that saves the
script (e.g., curl -o migrate.sh), lets the operator inspect/verify its
contents/signature, and only then executes it; locate the curl line that
references
https://raw.githubusercontent.com/.../main/deploy/eligia-vps/migrate-from-wamba-build.sh
and replace it with a pinned URL or add the download-review-execute workflow
described above.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2af760f1-8f7c-4d15-9a16-2d1a22fed643

📥 Commits

Reviewing files that changed from the base of the PR and between 79414a9 and 3c1db255fa2492ed16e1cb18dff6e079f535e447.

📒 Files selected for processing (3)
  • deploy/eligia-vps/README.md
  • deploy/eligia-vps/hermes.service
  • deploy/eligia-vps/migrate-from-wamba-build.sh

Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/opt/hermes
Environment=SOPS_AGE_KEY_FILE=/root/.config/sops/age/keys.txt

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if the migration script or README documents the SOPS key prerequisite

rg -n "SOPS_AGE_KEY_FILE|keys\.txt|age.*key" deploy/eligia-vps/

Repository: Wizarck/hermes-agent

Length of output: 234


🏁 Script executed:

cat -n deploy/eligia-vps/README.md | sed -n '120,150p'

Repository: Wizarck/hermes-agent

Length of output: 2116


🏁 Script executed:

fd -type f -name "*.md" -o -name "*setup*" -o -name "*deploy*" -o -name "*migration*" | head -20

Repository: Wizarck/hermes-agent

Length of output: 235


🏁 Script executed:

rg -A 5 -B 5 "sops|age" deploy/eligia-vps/README.md

Repository: Wizarck/hermes-agent

Length of output: 5541


🏁 Script executed:

cat deploy/eligia-vps/migrate-from-wamba-build.sh

Repository: Wizarck/hermes-agent

Length of output: 3865


Document the SOPS age key prerequisite in the migration instructions.

The hardcoded path /root/.config/sops/age/keys.txt is not validated by the migration script and is not explicitly documented in the disaster recovery steps. If the age key file is missing or located elsewhere, the service will fail to decrypt secrets during startup. Add a check in the migration script or explicitly document in the README that the age key must be present at this path before step 2 of the disaster recovery sequence.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/hermes.service` at line 11, The service unit hardcodes
SOPS_AGE_KEY_FILE=/root/.config/sops/age/keys.txt (hermes.service Environment)
but the migration/disaster-recovery flow doesn't validate that file exists;
update the migration script that runs before step 2 to check for the age key at
/root/.config/sops/age/keys.txt and fail with a clear message if missing, or
alternatively add an explicit prerequisite in the README/disaster recovery
instructions that the SOPS age key must be present at that exact path before
performing step 2; reference the hermes.service Environment variable name
SOPS_AGE_KEY_FILE when adding the check/docs so maintainers can locate the
related configuration.

# into the docker-compose process so `${VAR}` placeholders in compose.yaml
# resolve to real values. Without sops, the compose env block would expand
# to empty strings and the container would start crippled.
ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env 'docker compose up -d --force-recreate'

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use absolute path for docker command in ExecStart.

Line 16 uses docker compose without an absolute path inside the sops exec-env shell command, while line 17 uses /usr/bin/docker. The PATH environment variable inside sops exec-env may differ from the normal systemd PATH, potentially causing command-not-found failures.

🔧 Proposed fix to use absolute path consistently
-ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env 'docker compose up -d --force-recreate'
+ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env '/usr/bin/docker compose up -d --force-recreate'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env 'docker compose up -d --force-recreate'
ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env '/usr/bin/docker compose up -d --force-recreate'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/hermes.service` at line 16, The ExecStart command uses an
unqualified "docker compose" inside the sops exec-env invocation which can fail
under a restricted PATH; update the ExecStart line so the docker executable is
referenced via its absolute path (use /usr/bin/docker) inside the sops exec-env
command string (i.e., replace the unqualified "docker compose" with
"/usr/bin/docker compose") to match the usage on the following line and ensure
systemd can find the binary.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if migration script verifies eligia-core repo or secrets.env

rg -n "eligia-core|secrets\.env" deploy/eligia-vps/migrate-from-wamba-build.sh

Repository: Wizarck/hermes-agent

Length of output: 46


🏁 Script executed:

# First, check the migration script content
cat -n deploy/eligia-vps/migrate-from-wamba-build.sh

Repository: Wizarck/hermes-agent

Length of output: 4544


🏁 Script executed:

# Also check the full hermes.service file
cat -n deploy/eligia-vps/hermes.service

Repository: Wizarck/hermes-agent

Length of output: 1018


🏁 Script executed:

# Search for any other references to eligia-core in the entire repository
rg -n "eligia-core" --type-list | head -20

Repository: Wizarck/hermes-agent

Length of output: 454


🏁 Script executed:

# Search for any validation or pre-flight check scripts
fd -type f -name "*check*" -o -name "*valid*" -o -name "*verify*" deploy/

Repository: Wizarck/hermes-agent

Length of output: 235


🏁 Script executed:

# Check if there's any documentation about deployment prerequisites
fd -type f \( -name "README*" -o -name "*.md" \) deploy/ | head -10

Repository: Wizarck/hermes-agent

Length of output: 235


🏁 Script executed:

# Search for README or deployment docs
fd -type f \( -name "README*" -o -name "*.md" \) | grep -E "(deploy|README)" | head -20

Repository: Wizarck/hermes-agent

Length of output: 235


🏁 Script executed:

# Search for any references to eligia-core or secrets.env setup
rg -n "eligia-core|secrets\.env" --type md --type txt

Repository: Wizarck/hermes-agent

Length of output: 981


🏁 Script executed:

# Check if there are any runbook or setup docs
fd -type f -name "*runbook*" -o -name "*setup*" -o -name "*install*"

Repository: Wizarck/hermes-agent

Length of output: 235


🏁 Script executed:

# Read the deploy/eligia-vps/README.md file
cat -n deploy/eligia-vps/README.md

Repository: Wizarck/hermes-agent

Length of output: 8451


Add prerequisite validation before service restart.

The migration script progresses to restart the service without verifying that /opt/eligia/eligia-core/secrets/secrets.env exists and is decryptable. If this file is missing or the SOPS age key cannot decrypt it, the ExecStart command will fail silently during the healthcheck phase (lines 73–80), causing deployment to appear successful while the service actually fails to start.

The README documents this as a prerequisite (lines 133–134 in the disaster recovery section), but operators running the migration script directly may miss it. Add a validation check before step 5 (line 70) to ensure the secrets file is present and readable:

if [[ ! -f "/opt/eligia/eligia-core/secrets/secrets.env" ]]; then
  echo "ERROR: /opt/eligia/eligia-core/secrets/secrets.env not found."
  echo "Required prerequisite: Clone Wizarck/eligia-core and decrypt secrets."
  exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/hermes.service` at line 16, Add a prerequisite validation
before the migration step that restarts the service: check that
/opt/eligia/eligia-core/secrets/secrets.env exists and is decryptable before
invoking the systemd restart that triggers ExecStart in hermes.service.
Implement a bash check in the migration script (before "step 5" / the restart
operation) that tests file existence (stat/-f) and attempts a harmless sops
decryption (e.g., sops --decrypt or sops exec-env dry-run) to verify the AGE key
can decrypt; if either check fails, print the two-line error message from the
review and exit nonzero so the healthcheck path (which relies on ExecStart)
never runs. Ensure the error message exactly mentions the secrets.env path and
the README prerequisite to help operators.

# Run on the VPS as root:
# bash deploy/eligia-vps/migrate-from-wamba-build.sh

set -euo pipefail

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify script is running as root.

The script comments (line 12) state "Run on the VPS as root", and operations like installing systemd units and modifying /etc/systemd/system/ require root privileges. However, the script doesn't verify it's actually running as root. If run by a non-root user, it will fail with cryptic permission errors.

🛡️ Proposed fix to add root check
 set -euo pipefail
+
+if [[ $EUID -ne 0 ]]; then
+  echo "Error: This script must be run as root" >&2
+  exit 1
+fi
 
 REPO_URL="https://github.com/Wizarck/hermes-agent.git"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
set -euo pipefail
set -euo pipefail
if [[ $EUID -ne 0 ]]; then
echo "Error: This script must be run as root" >&2
exit 1
fi
REPO_URL="https://github.com/Wizarck/hermes-agent.git"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/migrate-from-wamba-build.sh` at line 15, The script
currently sets strict shell options (set -euo pipefail) but lacks a root check,
so add an early guard that verifies the script is running as root (check $EUID
or id -u) and exits with a clear error if not; place this check near the top of
migrate-from-wamba-build.sh immediately after the existing set -euo pipefail
line and before any operations that touch /etc/systemd/system/ or install
systemd units to prevent cryptic permission errors.

Comment on lines +28 to +29
git -C "${SOURCE_DIR}" fetch origin main
git -C "${SOURCE_DIR}" reset --hard origin/main

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Verify git fetch success before destructive reset.

git reset --hard origin/main on line 29 will discard any local changes without warning. If the fetch on line 28 fails (network issue, auth problem), the reset will use stale origin/main or fail unexpectedly. Consider verifying the fetch succeeded and warning about discarded local changes.

🛡️ Proposed fix to verify fetch and warn user
   echo "  ✓ ${SOURCE_DIR} is already a git checkout — pulling latest."
-  git -C "${SOURCE_DIR}" fetch origin main
+  git -C "${SOURCE_DIR}" fetch origin main || { echo "  ✗ fetch failed"; exit 1; }
+  echo "  ⚠ Discarding any local changes with hard reset"
   git -C "${SOURCE_DIR}" reset --hard origin/main
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/migrate-from-wamba-build.sh` around lines 28 - 29, The
script runs git -C "${SOURCE_DIR}" fetch origin main followed immediately by git
-C "${SOURCE_DIR}" reset --hard origin/main which can discard local changes or
act on a stale fetch; modify the logic to verify the fetch succeeded (check the
exit status of the git fetch called in the block using SOURCE_DIR) and abort
with an error message if it failed, and before running reset --hard detect and
warn (or require confirmation/log) if there are local changes (use git -C
"${SOURCE_DIR}" status --porcelain or equivalent) so the destructive reset is
only performed after a successful fetch and an explicit warning/consent when
local work would be lost.

Comment on lines +55 to +57
if ! cmp -s "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service; then
cp /etc/systemd/system/hermes.service "/etc/systemd/system/hermes.service.bak-${TS}"
cp "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Backup will fail if hermes.service doesn't exist yet.

On a fresh install where /etc/systemd/system/hermes.service doesn't exist, cmp on line 55 will evaluate false (files differ), but line 56 will fail trying to backup a non-existent file.

🛡️ Proposed fix to check existence before backup
 if ! cmp -s "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service; then
-  cp /etc/systemd/system/hermes.service "/etc/systemd/system/hermes.service.bak-${TS}"
+  if [[ -f /etc/systemd/system/hermes.service ]]; then
+    cp /etc/systemd/system/hermes.service "/etc/systemd/system/hermes.service.bak-${TS}"
+  fi
   cp "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if ! cmp -s "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service; then
cp /etc/systemd/system/hermes.service "/etc/systemd/system/hermes.service.bak-${TS}"
cp "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service
if ! cmp -s "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service; then
if [[ -f /etc/systemd/system/hermes.service ]]; then
cp /etc/systemd/system/hermes.service "/etc/systemd/system/hermes.service.bak-${TS}"
fi
cp "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/migrate-from-wamba-build.sh` around lines 55 - 57, The
current block compares SOURCE_DIR/deploy/eligia-vps/hermes.service with
/etc/systemd/system/hermes.service and always attempts to back up the target,
which fails if the target doesn't exist; modify the logic so you only create the
backup if /etc/systemd/system/hermes.service exists (e.g., wrap the cp
"/etc/systemd/system/hermes.service"
"/etc/systemd/system/hermes.service.bak-${TS}" in a test -e or [ -f ] check),
keeping the existing cmp -s comparison and the subsequent copy from
"${SOURCE_DIR}/deploy/eligia-vps/hermes.service" to
/etc/systemd/system/hermes.service unchanged.

Comment on lines +64 to +67
echo
echo "=== Step 4/6: rebuild image from fork checkout ==="
cd "${SOURCE_DIR}"
docker build -f "${DOCKERFILE_REL}" -t "${IMAGE_TAG}" .

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Docker build failure leaves system in inconsistent state.

If docker build fails (missing dependencies, out of disk space, etc.), the script exits via set -e, leaving the system with updated symlinks and systemd unit but no working image. The old image might still exist, but there's no automatic rollback.

Consider verifying the image exists and works before updating production configuration, or document the manual rollback procedure.

Would you like me to suggest a verification step or rollback logic?

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/migrate-from-wamba-build.sh` around lines 64 - 67, The
docker build step currently overwrites/creates the production IMAGE_TAG directly
and can leave the system inconsistent on failure; change the flow to build to a
temporary/test tag (e.g., IMAGE_TAG_TMP) using SOURCE_DIR and DOCKERFILE_REL,
verify the image exists and passes a quick smoke test (docker inspect and a
short container run that checks service start), and only after the verification
succeeds replace the production IMAGE_TAG, update symlinks and reload the
systemd unit; if verification fails, abort and leave current production
image/unit unchanged and log actionable error/rollback steps.

Comment on lines +73 to +80
for i in $(seq 1 12); do
status=$(docker inspect hermes --format '{{.State.Health.Status}}' 2>/dev/null || echo "unknown")
echo " attempt ${i}/12 → ${status}"
if [[ "${status}" == "healthy" ]]; then
break
fi
sleep 10
done

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Health check loop completes even if service never becomes healthy.

After 12 failed attempts (2 minutes), the loop exits and the script continues to completion without failing. An operator running this script might believe the migration succeeded when the service is actually unhealthy. Consider exiting with an error or at least printing a prominent warning if health check fails after all attempts.

🔧 Proposed fix to fail or warn on unhealthy status
 for i in $(seq 1 12); do
   status=$(docker inspect hermes --format '{{.State.Health.Status}}' 2>/dev/null || echo "unknown")
   echo "    attempt ${i}/12 → ${status}"
   if [[ "${status}" == "healthy" ]]; then
     break
   fi
   sleep 10
 done
+if [[ "${status}" != "healthy" ]]; then
+  echo "  ✗ Health check failed after 12 attempts. Service may be broken!"
+  exit 1
+fi
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for i in $(seq 1 12); do
status=$(docker inspect hermes --format '{{.State.Health.Status}}' 2>/dev/null || echo "unknown")
echo " attempt ${i}/12 → ${status}"
if [[ "${status}" == "healthy" ]]; then
break
fi
sleep 10
done
for i in $(seq 1 12); do
status=$(docker inspect hermes --format '{{.State.Health.Status}}' 2>/dev/null || echo "unknown")
echo " attempt ${i}/12 → ${status}"
if [[ "${status}" == "healthy" ]]; then
break
fi
sleep 10
done
if [[ "${status}" != "healthy" ]]; then
echo " ✗ Health check failed after 12 attempts. Service may be broken!"
exit 1
fi
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deploy/eligia-vps/migrate-from-wamba-build.sh` around lines 73 - 80, The
health-check loop using the docker inspect hermes command may exit after 12
attempts without failing the script; after the for-loop finishes check the final
status variable and if it is not "healthy" emit a prominent error message
(including the last observed ${status}) and exit non‑zero so the deployment
fails visibly; otherwise continue normally. Reference the loop that sets status
via docker inspect hermes --format '{{.State.Health.Status}}' and the status
variable to implement the post-loop check and exit behavior.

Two more pieces that lived only on the VPS now live in the fork:

  - hermes.service: systemd unit (Description, Requires, sops exec-env,
    docker compose up). Drop-in compatible with the current
    /etc/systemd/system/hermes.service on the VPS.

  - migrate-from-wamba-build.sh: one-shot, idempotent migration script.
    Steps:
      1. Clone the fork into /opt/hermes/source (or pull if already present).
      2. Symlink /opt/hermes/docker-compose.yml + /opt/hermes/data/config.yaml
         to the repo-versioned files under deploy/eligia-vps/.
      3. Install the new systemd unit (with backup of the old one).
      4. Rebuild eligia/hermes-agent:wamba from the fork checkout.
      5. systemctl restart hermes + poll healthcheck.
      6. Flag the legacy /opt/hermes/wamba_build/ snapshot for manual
         deletion once the new flow is verified.

After this PR + running the script once on the VPS, future updates are
just:

  git -C /opt/hermes/source pull && \
    docker build -f /opt/hermes/source/deploy/eligia-vps/Dockerfile.eligia-overlay \
                 -t eligia/hermes-agent:wamba \
                 /opt/hermes/source && \
    systemctl restart hermes

— no more scp'ing files around or manually editing wamba_build.

README updated: removes the "TODO: migrate to /opt/hermes/source/"
section, adds a curl|bash one-liner for running the migration, lists
hermes.service in the file table, and folds the systemd-unit follow-up
into the now-completed list.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Wizarck Wizarck force-pushed the feat/deploy-systemd-and-migration branch from 3c1db25 to 89b30c4 Compare May 13, 2026 18:30
@github-actions

Copy link
Copy Markdown

🔎 Lint report: feat/deploy-systemd-and-migration vs origin/main

ruff

Total: 0 on HEAD, 0 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 0 pre-existing issues carried over.

ty (type checker)

Total: 8263 on HEAD, 8263 on base (➖ 0)

🆕 New issues: none

✅ Fixed issues: none

Unchanged: 4343 pre-existing issues carried over.

Diagnostics are surfaced as warnings — this check never fails the build.

@Wizarck Wizarck merged commit 5924c01 into main May 13, 2026
6 of 9 checks passed
@Wizarck Wizarck deleted the feat/deploy-systemd-and-migration branch May 13, 2026 18:40
Wizarck pushed a commit that referenced this pull request May 15, 2026
…rch#25071)

* tui: make URLs clickable + hover-highlight in any terminal

Problem
-------
URLs printed by `hermes --tui` were not clickable in basic macOS Terminal.app.
Cmd+click did nothing, the cursor didn't change shape — like nothing was
detected — even though arrow buttons and other Box onClick handlers worked
fine.

Root cause
----------
Two layers of dead plumbing:

1. `<Link>` only emitted the underlying `<ink-link>` (which carries the
   hyperlink metadata into the screen buffer) when `supportsHyperlinks()`
   said yes. On Apple_Terminal that's false, so the per-cell hyperlink
   field stayed empty, so `Ink.getHyperlinkAt()` had nothing to return on
   click. The visible underline was just decorative.

2. `Ink.openHyperlink()` calls `this.onHyperlinkClick?.(url)`, but
   `onHyperlinkClick` was never assigned anywhere in the codebase. The
   click pipeline (`App.tsx → onOpenHyperlink → Ink.openHyperlink`) ran
   but bailed silently on the optional chain.

Bonus discovery: even when wired up, there was no hover affordance —
terminal apps can't change the system mouse cursor, so users had no
visual signal that a cell was clickable. Arrow buttons in the chrome
worked because they had explicit `<Box onClick>` styling; inline link
URLs didn't.

Fix
---
- `Link.tsx`: always emit `<ink-link>` regardless of terminal capability.
  The renderer's `wrapWithOsc8Link` already gates the actual OSC 8 escape
  on `supportsHyperlinks()` further down — so terminals that don't
  understand OSC 8 still don't see the escape, but the screen-buffer
  metadata (which the click dispatcher reads) is now populated everywhere.

- `ink.tsx + root.ts`: add `onHyperlinkClick?: (url: string) => void` to
  `Options` / `RenderOptions`, wire it to the existing `Ink.onHyperlinkClick`
  field in the constructor.

- `src/lib/openExternalUrl.ts`: small platform-aware opener using
  `child_process.spawn` with arg-array (no shell) — http(s) only, rejects
  `file:`, `javascript:`, `data:`, etc., so a hostile model can't trigger
  arbitrary local handlers via `<Link url="file:///...">`. Detached + stdio
  ignore so closing the TUI doesn't kill the browser and Chrome stderr
  doesn't leak into the alt screen.

- `entry.tsx`: pass `onHyperlinkClick: openExternalUrl` to `ink.render`.

- `hyperlinkHover.ts` + Ink hover wiring: track the URL under the pointer
  in `Ink.hoveredHyperlink`, update it from `dispatchHover`, and inverse-
  highlight every cell of the matching link in the render-pass overlay
  (same pattern as `applySearchHighlight`). This is the cursor-hover
  affordance for clickable links — terminals don't expose cursor shape,
  so we light up the link itself.

- `types/hermes-ink.d.ts`: add `onHyperlinkClick` to the `RenderOptions`
  shim so consumers (`entry.tsx`) type-check against the new option.

Tests
-----
- `src/lib/openExternalUrl.test.ts` (15 cases): http(s) accepted; file/js/
  data/mailto/ftp/ssh rejected; macOS open(1), Windows cmd.exe start with
  empty title slot, Linux xdg-open dispatch; shell-metacharacter URLs
  pass through unmolested as a single argv element; synchronous spawn
  failure returns false.

Verified empirically in Apple Terminal 455.1 (macOS 15.7.3): clicking a
URL opens in default browser, hovering inverts the link cells, and
moving away clears the highlight. Full TUI suite: 713 passing, 0
type errors.

Reverts
-------
The earlier attempt that version-gated Apple_Terminal in
`supports-hyperlinks.ts` was based on a wrong assumption — Terminal.app
silently strips OSC 8 sequences but does not render them as clickable
hyperlinks. Reverted to the original allowlist.

* tui: address Copilot review — explorer.exe on win32 + comment fixes

- openExternalUrl: switch win32 from `cmd.exe /c start` to `explorer.exe`.
  cmd.exe's `start` builtin reparses the URL through cmd's tokenizer, so
  `&`, `|`, `^`, `<`, `>` either split the command or get reinterpreted —
  breaking both the protocol-allowlist safety story AND plain http(s) URLs
  with `&` in query strings. `explorer.exe <url>` invokes the registered
  protocol handler directly with no shell.

- openExternalUrl.test.ts: rename the win32 test to reflect the new
  contract and add two regression tests — one with `&|^<>` metachars,
  one with the common analytics-URL `&` query-param pattern — both pinned
  to single-argv-element delivery via explorer.exe.

- Link.tsx: fix misleading comment. OSC 8 escapes are emitted
  unconditionally by the renderer (`wrapWithOsc8Link` in
  render-node-to-output.ts, `oscLink` in log-update.ts). Non-supporting
  terminals silently strip the sequence, which is why hover/click
  affordance has to come from the in-process overlay rather than the
  terminal's own link rendering.

Verified: 715/715 tests pass, type-check + build clean.

* tui: address Copilot review #2 — async spawn errors + hover scope + docs

1. openExternalUrl: attach a no-op `'error'` listener on the spawned
   child BEFORE unref(). spawn() returns a ChildProcess synchronously
   even when the binary is missing (ENOENT on xdg-open / explorer.exe),
   unreachable, or otherwise unusable; the failure surfaces later as
   an 'error' event. An unhandled 'error' on an EventEmitter crashes
   Node, which would tear down the whole TUI. The listener is a
   deliberate no-op — we already returned `true` synchronously and the
   user just doesn't see the browser pop.

2. openExternalUrl.test.ts: add a regression test using a real
   EventEmitter to simulate the async-error path. Pins both the
   listener-attached contract and the "doesn't throw on emit" behavior.
   Was 17/17, now 18/18.

3. ink.tsx dispatchHover: bypass `getHyperlinkAt()` and read
   `cellAt(...).hyperlink` directly. `getHyperlinkAt` falls back to
   `findPlainTextUrlAt` for cells without an OSC 8 hyperlink, but the
   render-pass overlay (`applyHyperlinkHoverHighlight`) only matches on
   `cell.hyperlink === hoveredUrl` — so plain-text URLs would burn
   re-renders without ever producing the highlight. Hover is now a
   strictly 1:1 fit for what the overlay can paint. Plain-text URLs
   still get the click action via the existing dispatch path.

4. root.ts + ink.tsx doc comments: replace the misleading "typically
   `open` / `xdg-open` / `start` shell" wording with the actual safe
   recipe — argv-array spawn into `open` / `xdg-open` / `explorer.exe`,
   with an explicit warning that `cmd.exe /c start` reparses the URL
   through cmd's tokenizer and is unsafe + breaks `&`-query URLs.

Verified: 716/716 tests pass, type-check + build clean.

* tui: address Copilot review #3 — hover damage, alt-screen cleanup, opener allowlist

1. ink.tsx onRender: stop folding steady-state hover into hlActive.
   hlActive forces a full-screen damage diff so previous-frame inverted
   cells get re-emitted when the highlight set changes. The transition
   IS the trigger — enter / leave / change-to-other-link. While the
   pointer just sits on a link the painted cells don't change and the
   per-cell diff handles the no-op. Folding the steady state in would
   burn a full-screen diff on every frame. Added a
   lastRenderedHoveredHyperlink tracker and gate the hlActive bump on
   `hovered !== lastRendered`.

2. ink.tsx setAltScreenActive: clear hoveredHyperlink (and the tracker)
   when toggling alt-screen state. Hover dispatch is alt-screen-gated,
   so once we leave there's no path to clear it. Without this, remounting
   <AlternateScreen> would paint a phantom hover from the previous
   session until the next mouse-move arrived.

3. openExternalUrl.ts openCommand: allowlist linux + the BSD family for
   xdg-open and return null for everything else (aix, sunos, cygwin,
   haiku, etc.). Previously the default-fallback always returned
   xdg-open, which made the caller's `if (!command) return false` dead
   and yielded a misleading `true` on platforms that probably don't
   have xdg-open. New tests cover the null path AND the
   openExternalUrl-returns-false-without-spawning behavior.

Verified: 718/718 tests pass, type-check + build clean.

* tui: address Copilot review #4 — doc comment accuracy

1. openExternalUrl return-value doc: now lists all three false paths
   (URL rejected / no opener for platform / synchronous spawn throw)
   plus a note that async 'error' events still return true because the
   spawn was attempted.

2. ink.tsx onHyperlinkClick field doc: clarifies the callback receives
   either an OSC 8 hyperlink OR a plain-text URL detected by
   findPlainTextUrlAt — App.tsx routes both into the same callback.

3. hyperlinkHover applyHyperlinkHoverHighlight doc: drops the misleading
   'caller forces full-frame damage' promise. Caller decides; for hover
   the current caller only forces full damage on transitions.

No behavior change. 718/718 tests pass.

* tui: address Copilot review #5 — lint fixes

1. ink.tsx: reorder `./hyperlinkHover.js` import before `./screen.js` to
   satisfy perfectionist/sort-imports.

2. Link.tsx: drop unused `fallback` parameter destructuring + the
   trailing `void (null as ...)` dead-statement (would trip
   no-unused-expressions). Kept `fallback?: ReactNode` on the Props
   interface as a documented compat shim so existing call sites still
   compile, with a comment explaining why it's no longer wired up.

3. openExternalUrl.test.ts: replace `typeof import('node:child_process').spawn`
   inline annotations (forbidden by @typescript-eslint/consistent-type-imports)
   with a `SpawnLike` type alias backed by a real `import type { spawn as SpawnFn }`.

No behavior change. 718/718 tests pass, type-check clean, lint clean on
all modified files.
Wizarck pushed a commit that referenced this pull request May 15, 2026
…ex models (NousResearch#24182)

* feat(codex-runtime): scaffold optional codex app-server runtime

Foundational commit for an opt-in alternate runtime that hands OpenAI/Codex
turns to a 'codex app-server' subprocess instead of Hermes' tool dispatch.
Default behavior is unchanged.

Lands in three pieces:

1. agent/transports/codex_app_server.py — JSON-RPC 2.0 over stdio speaker
   for codex's app-server protocol (codex-rs/app-server). Spawn, init
   handshake, request/response, notification queue, server-initiated
   request queue (for approval round-trips), interrupt-friendly blocking
   reads. Tested against real codex 0.130.0 binary end-to-end during
   development.

2. hermes_cli/runtime_provider.py:
   - Adds 'codex_app_server' to _VALID_API_MODES.
   - Adds _maybe_apply_codex_app_server_runtime() helper, called at the
     end of _resolve_runtime_from_pool_entry(). Inert unless
     'model.openai_runtime: codex_app_server' is set in config.yaml AND
     provider in {openai, openai-codex}. Other providers cannot be
     rerouted (anthropic, openrouter, etc. preserved).

3. tests/agent/transports/test_codex_app_server_runtime.py — 24 tests
   covering api_mode registration, the rewriter helper (default-off,
   case-insensitive, opt-in, non-eligible providers preserved), version
   parser, missing-binary handling, error class. Does NOT require codex
   CLI installed.

This commit is wire-only: the api_mode is recognized but AIAgent does
not yet branch on it. Followup commits add the session adapter, event
projector, approval bridge, transcript projection (so memory/skill
review still works), plugin migration, and slash command.

Existing tests remain green:
- tests/cli/test_cli_provider_resolution.py (29 passed)
- tests/agent/test_credential_pool_routing.py (included above)

* feat(codex-runtime): add codex item projector for memory/skill review

The translator that lets Hermes' self-improvement loop keep working under the
Codex runtime: converts codex 'item/*' notifications into Hermes' standard
{role, content, tool_calls, tool_call_id} message shape that
agent/curator.py already knows how to read.

Item taxonomy (matches codex-rs/app-server-protocol/src/protocol/v2/item.rs):
  - userMessage          → {role: user, content}
  - agentMessage         → {role: assistant, content: text}
  - reasoning            → stashed in next assistant's 'reasoning' field
  - commandExecution     → assistant tool_call(name='exec_command') + tool result
  - fileChange           → assistant tool_call(name='apply_patch') + tool result
  - mcpToolCall          → assistant tool_call(name='mcp.<server>.<tool>') + tool result
  - dynamicToolCall      → assistant tool_call(name=<tool>) + tool result
  - plan/hookPrompt/etc  → opaque assistant note, no fabricated tool_calls

Invariants preserved:
  - Message role alternation never violated: each tool item produces at most
    one assistant + one tool message in that order, correlated by call_id.
  - Streaming deltas (item/<type>/outputDelta, item/agentMessage/delta)
    don't materialize messages — only item/completed does. Mirrors how
    Hermes already only writes the assistant message after streaming ends.
  - Tool call ids are deterministic (codex item id-based) so replays produce
    identical messages and prefix caches stay valid (AGENTS.md pitfall NousResearch#16).
  - JSON args use sorted_keys for the same reason.

Real wire formats verified against codex 0.130.0 by capturing live
notifications from thread/shellCommand and including one as a fixture
(COMMAND_EXEC_COMPLETED).

23 new tests, all green:
  - Streaming deltas don't materialize (3 paths)
  - Turn/thread frame events are silent
  - commandExecution: 5 tests including non-zero exit annotation +
    deterministic id stability across replays
  - agentMessage + reasoning attachment + reasoning consumption
  - fileChange: summary without inlined content
  - mcpToolCall: namespaced naming + error surfacing
  - userMessage: text fragments only (drops images/etc)
  - opaque items: no fabricated tool_calls
  - Helpers: deterministic id stability + sorted JSON args
  - Role alternation invariant across all four tool-shaped item types

This commit is a pure addition. AIAgent integration (the wire that uses the
projector) is the next commit.

* feat(codex-runtime): add session adapter + approval bridge

The third self-contained module: CodexAppServerSession owns one Codex
thread per Hermes session, drives turn/start, consumes streaming
notifications via CodexEventProjector, handles server-initiated approval
requests, and translates cancellation into turn/interrupt.

The adapter has a single public per-turn method:

    result = session.run_turn(user_input='...', turn_timeout=600)
    # result.final_text          → assistant text for the caller
    # result.projected_messages  → list ready to splice into AIAgent.messages
    # result.tool_iterations     → tick count for _iters_since_skill nudge
    # result.interrupted         → True on Ctrl+C / deadline / interrupt
    # result.error               → error string when the turn cannot complete
    # result.turn_id, thread_id  → for sessions DB / resume

Behavior:

  - ensure_started() spawns codex, does the initialize handshake, and
    issues thread/start with cwd + permissions profile. Idempotent.
  - run_turn() blocks until turn/completed, drains server-initiated
    requests (approvals) before reading notifications so codex never
    deadlocks waiting for us, projects every item/completed via the
    projector, and increments tool_iterations for the skill nudge gate.
  - request_interrupt() is thread-safe (threading.Event); the next loop
    iteration issues turn/interrupt and unwinds.
  - turn_timeout deadlock guard issues turn/interrupt and records an
    error if the turn never completes.
  - close() escalates terminate → kill via the underlying client.

Approval bridge:

  Codex emits server-initiated requests for execCommandApproval and
  applyPatchApproval. The adapter translates Hermes' approval choice
  vocabulary onto codex's decision vocabulary:

    Hermes 'once'                → codex 'approved'
    Hermes 'session' or 'always' → codex 'approvedForSession'
    Hermes 'deny' / anything else → codex 'denied'

  Routing precedence:
    1. _ServerRequestRouting.auto_approve_* flags (cron / non-interactive)
    2. approval_callback wired by the CLI (defers to
       tools.approval.prompt_dangerous_approval())
    3. Fail-closed denial when neither is wired

  Unknown server-request methods are answered with JSON-RPC error -32601
  so codex doesn't hang waiting for us.

Permission profile mapping mirrors AGENTS.md:
    Hermes 'auto'              → codex 'workspace-write'
    Hermes 'approval-required' → codex 'read-only-with-approval'
    Hermes 'unrestricted/yolo' → codex 'full-access'

20 new tests, all green. Combined with prior commits this PR now has
67 tests across three modules:
  - test_codex_app_server_runtime.py: 24 (api_mode + transport surface)
  - test_codex_event_projector.py: 23 (item taxonomy projections)
  - test_codex_app_server_session.py: 20 (turn loop + approvals + interrupts)

Full tests/agent/transports/ directory: 249/249 pass — no regressions
to existing transport tests.

Still no wire into AIAgent.run_conversation(); that integration commit
is small and goes next.

* feat(codex-runtime): wire codex_app_server runtime into AIAgent

The integration commit. AIAgent.run_conversation() now early-returns to a
new helper _run_codex_app_server_turn() when self.api_mode ==
'codex_app_server', bypassing the chat_completions tool loop entirely.

Three small surgical edits to run_agent.py (~105 LOC total):

1. Line ~1204 (constructor api_mode validation set):
   Add 'codex_app_server' so an explicit api_mode='codex_app_server'
   passed to AIAgent() isn't silently rewritten to 'chat_completions'.

2. Line ~12048 (run_conversation, just before the while loop):
   Early-return to _run_codex_app_server_turn() when self.api_mode is
   'codex_app_server'. Placed AFTER all standard pre-loop setup —
   logging context, session DB, surrogate sanitization, _user_turn_count
   and _turns_since_memory increments, _ext_prefetch_cache, memory
   manager on_turn_start — so behavior outside the model-call loop is
   identical between paths. Default Hermes flow is unchanged when the
   flag is off.

3. End-of-class (line ~15497):
   New method _run_codex_app_server_turn(). Lazy-instantiates one
   CodexAppServerSession per AIAgent (reused across turns), runs the
   turn, splices projected_messages into messages, increments
   _iters_since_skill by tool_iterations (since the chat_completions
   loop normally does that per iteration), fires
   _spawn_background_review on the same cadence as the default path.

Counter accounting:

  _turns_since_memory  ← already incremented at run_conversation:11817
                         (gated on memory store configured) — codex
                         helper does NOT touch it (would double-count).
  _user_turn_count     ← already incremented at run_conversation:11793
                         — codex helper does NOT touch it.
  _iters_since_skill   ← incremented in the chat_completions loop per
                         tool iteration. Codex helper increments by
                         turn.tool_iterations since the loop is bypassed.

User message:

  ALREADY appended to messages by run_conversation pre-loop (line 11823)
  before the early-return reaches us. Helper does NOT append again.
  Regression test test_user_message_not_duplicated guards this.

Approval callback wiring:

  Lazy-fetches tools.terminal_tool._get_approval_callback at session
  spawn time, passes to CodexAppServerSession. CLI threads with
  prompt_toolkit get interactive approvals; gateway/cron contexts get
  the codex-side fail-closed deny.

Error path:

  Codex session exceptions become a 'partial' result with completed=False
  and a final_response that explicitly tells the user how to switch back:
  'Codex app-server turn failed: ... Fall back to default runtime with
  /codex-runtime auto.' Same return-dict shape as the chat_completions
  path so all callers (gateway, CLI, batch_runner, ACP) work unchanged.

9 new integration tests in tests/run_agent/test_codex_app_server_integration.py:
  - api_mode='codex_app_server' is accepted on AIAgent construction
  - run_conversation returns the expected codex shape
    (final_response, codex_thread_id, codex_turn_id, completed, partial)
  - Projected messages are spliced into messages list
  - _iters_since_skill ticks per tool iteration
  - _user_turn_count delegated to standard flow (not double-counted)
  - User message appears exactly once (regression guard)
  - _spawn_background_review IS invoked (memory/skill review keeps working)
  - chat.completions.create is NEVER called (loop fully bypassed)
  - Session exception → partial result with /codex-runtime auto hint
  - Interrupted turn → partial result with error preserved

Adjacent test runs confirm no regressions:
  - tests/run_agent/test_memory_nudge_counter_hydration.py: green
  - tests/run_agent/test_background_review.py: green
  - tests/run_agent/test_fallback_model.py: green
  - tests/agent/transports/: 249/249 green

Still missing for full feature: /codex-runtime slash command, plugin
migration helper, docs page, live e2e test gated on codex binary. Those
are the remaining followup commits.

* feat(codex-runtime): add /codex-runtime slash command (CLI + gateway)

User-facing toggle for the optional codex app-server runtime. Follows the
'Adding a Slash Command (All Platforms)' pattern from AGENTS.md exactly:
single CommandDef in the central registry → CLI handler → gateway handler
→ running-agent guard → all surfaces (autocomplete, /help, Telegram menu,
Slack subcommands) update automatically.

Surface:
    /codex-runtime                    — show current state + codex CLI status
    /codex-runtime auto               — Hermes default runtime
    /codex-runtime codex_app_server   — codex subprocess runtime
    /codex-runtime on / off           — synonyms

Files changed:

  hermes_cli/codex_runtime_switch.py (new):
    Pure-Python state machine shared by CLI and gateway. Parse args,
    read/write model.openai_runtime in the config dict, gate enabling
    behind a codex --version check (don't let users opt in to a runtime
    they have no binary for; print npm install hint instead).
    Returns a CodexRuntimeStatus dataclass that callers render however
    suits their surface.

  hermes_cli/commands.py:
    Single CommandDef entry, no aliases (codex-runtime is its own thing).

  cli.py:
    Dispatch in process_command() + _handle_codex_runtime() handler that
    delegates to the shared module and renders results via _cprint.

  gateway/run.py:
    Dispatch in _handle_message() + _handle_codex_runtime_command() that
    returns a string (gateway sends as message). On a successful change
    that requires a new session, _evict_cached_agent() forces the next
    inbound message to construct a fresh AIAgent with the new api_mode —
    avoids prompt-cache invalidation mid-session.

  gateway/run.py running-agent guard:
    /codex-runtime joins /model in the early-intercept block so a runtime
    flip mid-turn can't split a turn across two transports.

Tests:
  tests/hermes_cli/test_codex_runtime_switch.py — 25 tests covering the
  state machine: arg parsing (10 cases incl. case-insensitive and
  synonyms), reading current runtime (5 cases incl. malformed configs),
  writing runtime (3 cases), apply() entry point covering read-only,
  no-op, codex-missing-blocked, codex-present-success, disable-no-binary-check,
  and persist-failure paths (8 cases). All green.

Adjacent test suites confirm no regressions:
  - tests/hermes_cli/test_commands.py + test_codex_runtime_switch.py:
    167/167 green
  - tests/agent/transports/: 283/283 green when combined with prior commits

Still missing: plugin migration helper, docs page, live e2e test gated on
codex binary. Followup commits.

* feat(codex-runtime): auto-migrate Hermes MCP servers to ~/.codex/config.toml

Translates the user's mcp_servers config from ~/.hermes/config.yaml into
the TOML format codex's MCP client expects. Wired into the
/codex-runtime codex_app_server enable path so users get their MCP tool
surface in the spawned subprocess automatically.

The migration runs on every enable. Failures are non-fatal — the runtime
change still proceeds and the user gets a warning so they can fix the
codex config manually.

What translates (mapping verified against codex-rs/core/src/config/edit.rs):
  Hermes mcp_servers.<n>.command/args/env  → codex stdio transport
  Hermes mcp_servers.<n>.url/headers       → codex streamable_http transport
  Hermes mcp_servers.<n>.timeout           → codex tool_timeout_sec
  Hermes mcp_servers.<n>.connect_timeout   → codex startup_timeout_sec
  Hermes mcp_servers.<n>.cwd               → codex stdio cwd
  Hermes mcp_servers.<n>.enabled: false    → codex enabled = false

What does NOT translate (warned + skipped per server):
  Hermes-specific keys (sampling, etc.) — codex's MCP client has no
  equivalent. Listed in the per-server skipped[] field of the report.

What's NOT migrated (intentional):
  AGENTS.md — codex respects this file natively in its cwd. Hermes' own
  AGENTS.md (project-level) is already in the worktree, so codex picks
  it up without translation. No code needed.

Idempotency design:
  All managed content lives between a 'managed by hermes-agent' marker
  and the next non-mcp_servers section header. _strip_existing_managed_block
  removes the prior managed region cleanly, preserving any user-added
  codex config (model, providers.openai, sandbox profiles, etc.) above
  or below.

Files added:
  hermes_cli/codex_runtime_plugin_migration.py — pure-Python migration
    helper. Public API: migrate(hermes_config, codex_home=None,
    dry_run=False) returns MigrationReport with .migrated/.errors/
    .skipped_keys_per_server. No external TOML dependency — minimal
    formatter handles strings/numbers/booleans/lists/inline-tables.

  tests/hermes_cli/test_codex_runtime_plugin_migration.py — 39 tests
  covering:
    - per-server translation (12): stdio/http/sse, cwd, timeouts,
      enabled flag, command+url precedence, sampling drop, unknown keys
    - TOML formatter (8): types, escaping, inline tables, error case
    - existing-block stripping (4): no marker, alone, with user content
      above, with user content below
    - end-to-end migrate() (8): empty, dry-run, round-trip, idempotent
      re-run, preserves user config, error reporting, invalid input,
      summary formatting

Files changed:
  hermes_cli/codex_runtime_switch.py — apply() now calls migrate() in
    the codex_app_server enable branch. Migration failure logs a warning
    in the result message but does NOT fail the runtime change. Disable
    path (auto) explicitly skips migration.

  tests/hermes_cli/test_codex_runtime_switch.py — 3 new tests:
    test_enable_triggers_mcp_migration, test_disable_does_not_trigger_migration,
    test_migration_failure_does_not_block_enable.

All 325 feature tests green:
  - tests/agent/transports/: 249 (incl. 67 new)
  - tests/run_agent/test_codex_app_server_integration.py: 9
  - tests/hermes_cli/test_codex_runtime_switch.py: 28 (3 new)
  - tests/hermes_cli/test_codex_runtime_plugin_migration.py: 39 (new)

* perf(codex-runtime): cache codex --version check within apply()

Single /codex-runtime invocation could spawn 'codex --version' up to 3
times (state report, enable gate, success message). Each spawn is ~50ms,
so the cumulative cost wasn't a crisis, but it was wasteful and turned a
trivial slash command into something noticeably laggy on slower systems.

Refactored to lazy-once via a closure over a nonlocal cache. First call
spawns; subsequent calls in the same apply() reuse the result.

Behavior unchanged — same return shape, same error handling, same install
hint when codex is missing. Just one subprocess per call instead of three.

Two regression-guard tests added:
  - test_binary_check_cached_within_apply: enable path → call_count == 1
  - test_binary_check_cached_on_read_only_call: state-report path → call_count == 1

Total tests for /codex-runtime now 30 (was 28); all 143 codex-runtime
tests still green.

* fix(codex-runtime): correct protocol field names found via live e2e test

Three real bugs caught only by running a turn end-to-end against codex
0.130.0 with a real ChatGPT subscription. Unit tests passed because they
asserted on our own (incorrect) wire shapes; the wire format from
codex-rs/app-server-protocol/src/protocol/v2/* is the source of truth and
my initial reading of the README was incomplete.

Bug 1: thread/start.permissions wire format

Was sending {"profileId": "workspace-write"}.
Real format per PermissionProfileSelectionParams enum (tagged union):
  {"type": "profile", "id": "workspace-write"}
AND requires the experimentalApi capability declared during initialize.
AND requires a matching [permissions] table in ~/.codex/config.toml or
codex fails the request with 'default_permissions requires a [permissions]
table'.

Fix: stop overriding permissions on thread/start. Codex picks its default
profile (read-only unless user configures otherwise), which matches what
codex CLI users expect — they configure their default permission profile
in ~/.codex/config.toml the standard way. Trying to be clever about
profile selection broke every turn we tested.

Live error before fix: 'Invalid request: missing field type' on every
turn/start, even though our turn/start payload was correct — the field
codex was complaining about was inside the permissions sub-object we
shouldn't have been sending.

Bug 2: server-request method names

Was matching 'execCommandApproval' and 'applyPatchApproval'.
Real names per common.rs ServerRequest enum:
  item/commandExecution/requestApproval
  item/fileChange/requestApproval
  item/permissions/requestApproval (new third method)

Fix: match the documented names. Added handler for
item/permissions/requestApproval that always declines — codex sometimes
asks to escalate permissions mid-turn and silent acceptance would surprise
users.

Live symptom before fix: agent.log showed
'Unknown codex server request: item/commandExecution/requestApproval'
and codex stalled because we replied with -32601 (unsupported method)
instead of an approval decision. The agent reported back 'The write
command was rejected' even though Hermes never showed the user an
approval prompt.

Bug 3: approval decision values

Was sending decision strings 'approved'/'approvedForSession'/'denied'.
Real values per CommandExecutionApprovalDecision enum (camelCase):
  accept, acceptForSession, decline, cancel
(also AcceptWithExecpolicyAmendment and ApplyNetworkPolicyAmendment
variants we don't currently use).

Fix: rename _approval_choice_to_codex_decision return values; update
auto_approve_* fallbacks; update fail-closed default from 'denied' to
'decline'. Test mapping table updated to match.

Live test verified after fixes:
  $ hermes (with model.openai_runtime: codex_app_server)
  > Run the shell command: echo hermes-codex-livetest > .../proof.txt
    then read it back

  Approval prompt fired with 'Codex requests exec in <cwd>'.
  User chose 'Allow once'. Codex executed the command, wrote the file,
  read it back. Final response: 'Read back from proof.txt:
  hermes-codex-livetest'. File contents on disk match.

agent.log confirms:
  codex app-server thread started: id=019e200e profile=workspace-write
                                    cwd=/tmp/hermes-codex-livetest/workspace

All 20 session tests still green after wire-format updates.

* fix(codex-runtime): correct apply_patch approval params + ship docs

Live e2e revealed FileChangeRequestApprovalParams doesn't carry the
changeset (just itemId, threadId, turnId, reason, grantRoot) — Codex's
'reason' field describes what the patch wants to do. Test config and
display logic updated to use it. The first 'apply_patch (0 change(s))'
display from the live test is now 'apply_patch: <reason>'.

Adds website/docs/user-guide/features/codex-app-server-runtime.md
covering enable/disable, prerequisites, approval UX, MCP migration
behavior, permission profile delegation to ~/.codex/config.toml, known
limitations, and the architecture diagram. Wired into the Automation
category in sidebars.ts.

Live e2e validation across the path matrix:
  ✓ thread/start handshake
  ✓ turn/start with text input
  ✓ commandExecution items + projection
  ✓ item/commandExecution/requestApproval → Hermes UI → response
  ✓ Approve once → command runs
  ✓ Deny → command rejected, codex falls back to read-only message
  ✓ Multi-turn (codex remembers prior turn's results)
  ✓ apply_patch via Codex's fileChange path
  ✓ item/fileChange/requestApproval → Hermes UI
  ✓ MCP server migration loads inside spawned codex (verified via
    'use the filesystem MCP tool' prompt)
  ✓ /codex-runtime auto → codex_app_server toggle cycle
  ✓ Disable doesn't trigger migration
  ✓ Enable with codex CLI present succeeds + migrates
  ✓ Hermes-side interrupt path (turn/interrupt request issued cleanly
    even if codex finishes before the interrupt lands)

Known live-validated limitations now documented in the docs page:
  - delegate_task subagents unavailable on this runtime
  - permission profile selection delegated to ~/.codex/config.toml
  - apply_patch approval prompt has no inline changeset (codex protocol
    doesn't expose it)

145/145 codex-runtime tests still green.

* feat(codex-runtime): native plugin migration + UX polish (quirks 2/4/5/10/11)

Major: migrate native Codex plugins (#7 in OpenClaw's PR list)

Discovers installed curated plugins via codex's plugin/list RPC and
writes [plugins."<name>@<marketplace>"] entries to ~/.codex/config.toml
so they're enabled in the spawned Codex sessions. This is the
'YouTube-video-worthy' bit Pash highlighted: when a user has
google-calendar, github, etc. installed in their Codex CLI, those
plugins activate automatically when they enable Hermes' codex runtime.

Implementation:
  - hermes_cli/codex_runtime_plugin_migration.py: new _query_codex_plugins()
    helper spawns 'codex app-server' briefly and walks plugin/list. Returns
    (plugins, error) — failures are non-fatal so MCP migration still works.
  - render_codex_toml_section() now takes plugins + permissions args.
  - migrate() defaults: discover_plugins=True, default_permission_profile=
    'workspace-write'. Explicit None on either disables that side.
  - _strip_existing_managed_block() now also strips [plugins.*] and
    [permissions]/[permissions.*] sections inside the managed block, so
    re-runs replace plugins cleanly without touching codex's own config.

Quirk fixes:

#2 Default permissions profile written on enable.
   Without this, Codex's read-only default kicks in and EVERY write
   triggers an approval prompt. Now writes [permissions] default =
   'workspace-write' so the runtime feels normal out of the box. Set
   default_permission_profile=None to opt out.

#4 apply_patch approval prompt now shows what's changing.
   Codex's FileChangeRequestApprovalParams doesn't carry the changeset.
   Session adapter now caches the fileChange item from item/started
   notifications and looks it up by itemId when codex requests approval.
   Prompt shows '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of
   'apply_patch (0 change(s))'.

   Side benefit: also drains pending notifications BEFORE handling a
   server request, so the projector and per-turn caches are up to date
   when the approval decision fires. Bounded to 8 notifications per
   loop iter to avoid starving codex's response.

#5/#10 Exec approval prompt never shows empty cwd.
   When codex omits cwd in CommandExecutionRequestApprovalParams, fall
   back to the session's cwd. If somehow neither is available, show
   '<unknown>' explicitly instead of an empty string.

   Also surfaces 'reason' from the approval params when codex provides
   it — gives users more context on why codex wants to run something.

#11 Banner indicates the codex_app_server runtime when active.
   New 'Runtime: codex app-server (terminal/file ops/MCP run inside
   codex)' line appears in the welcome banner only when the runtime is
   on. Default banner is unchanged.

Tests:
  - 7 new tests in test_codex_runtime_plugin_migration.py covering
    plugin discovery (mocked), failure handling, dry-run skip, opt-out
    flag, idempotent re-runs, and permissions writing.
  - 3 new tests in test_codex_app_server_session.py covering the
    enriched approval prompts: cwd fallback, change summary on
    apply_patch, fallback when no item/started cache exists.
  - All 26 session tests + 46 migration tests green; 153 total in PR.

* feat(codex-runtime): hermes-tools MCP callback + native plugin migration

The big architectural addition: when codex_app_server runtime is on,
Hermes registers its own tool surface as an MCP server in
~/.codex/config.toml so the codex subprocess can call back into Hermes
for tools codex doesn't ship with — web_search, browser_*, vision,
image_generate, skills, TTS.

Also: 'migrate native codex plugins' (Pash's YouTube-video-worthy bit) —
when the user has plugins like Linear, GitHub, Gmail, Calendar, Canva
installed via 'codex plugin', Hermes discovers them via plugin/list and
writes [plugins.<name>@openai-curated] entries so they activate
automatically.

New module: agent/transports/hermes_tools_mcp_server.py
  FastMCP stdio server exposing 17 Hermes tools. Each call dispatches
  through model_tools.handle_function_call() — same code path as the
  Hermes default runtime. Run with:
    python -m agent.transports.hermes_tools_mcp_server [--verbose]

  Exposed: web_search, web_extract, browser_navigate / _click / _type /
    _press / _snapshot / _scroll / _back / _get_images / _console /
    _vision, vision_analyze, image_generate, skill_view, skills_list,
    text_to_speech.

  NOT exposed (deliberately):
    - terminal/shell/read_file/write_file/patch — codex has built-ins
    - delegate_task/memory/session_search/todo — _AGENT_LOOP_TOOLS in
      model_tools.py:493, require running AIAgent context. Documented
      as a limitation and surfaced in the slash command output.

Migration changes (hermes_cli/codex_runtime_plugin_migration.py):
  - _query_codex_plugins() spawns 'codex app-server' briefly to walk
    plugin/list and pull installed openai-curated plugins. Failures are
    non-fatal — MCP migration still completes.
  - render_codex_toml_section() now takes plugins + permissions args
    AND wraps the managed block with a MIGRATION_END_MARKER comment so
    the stripper can reliably find both ends, even when the block
    contains top-level keys (default_permissions = ...).
  - migrate() defaults: discover_plugins=True, expose_hermes_tools=True,
    default_permission_profile=':workspace' (built-in codex profile name
    — must be prefixed with ':'). All three opt-out via explicit args.
  - _build_hermes_tools_mcp_entry() builds the codex stdio entry with
    HERMES_HOME and PYTHONPATH passthrough so a worktree-launched
    Hermes points the MCP subprocess at the same module layout.

Live-caught wire bugs fixed during this turn:
  1. Permission profile config key is top-level , NOT a [permissions] table. The [permissions] table is
     for *user-defined* profiles with structured fields. Built-in
     profile names start with ':' (':workspace', ':read-only',
     ':danger-no-sandbox'). Was emitting
     which codex rejected with 'invalid type: string "X", expected
     struct PermissionProfileToml'.
  2. Built-in profile is , NOT . Codex
     rejected  with 'unknown built-in profile'.
  3. Codex's MCP layer sends  for
     tool-call confirmation. We weren't handling it, so codex stalled
     and returned 'MCP tool call was rejected'. Now: auto-accept for
     our own hermes-tools server (user already opted in by enabling
     the runtime), decline for third-party servers.

Quirk fixes shipped (from the limitations list):
  #2 default permissions: workspace profile written on enable. No more
     approval prompt on every write.
  #4 apply_patch approval shows what's changing: cache fileChange
     items from item/started, look up by itemId when codex sends
     item/fileChange/requestApproval. Prompt: '1 add, 1 update:
     /tmp/new.py, /tmp/old.py' instead of '0 change(s)'.
  #5/#10 exec approval cwd never empty: fall back to session cwd, then
     '<unknown>'. Also surfaces 'reason' from codex when present.
  #11 banner shows 'Runtime: codex app-server' line when active so
     users understand why tool counts may not match what's reachable.

Tests:
  - 5 new tests in test_codex_runtime_plugin_migration.py covering
    plugin discovery, expose_hermes_tools entry generation, idempotent
    re-runs, opt-out flag, permissions profile.
  - 3 new tests in test_codex_app_server_session.py covering enriched
    approval prompts (cwd fallback, fileChange summary).
  - 2 new tests for mcpServer/elicitation/request handling (accept
    hermes-tools, decline others).
  - New test file test_hermes_tools_mcp_server.py covering module
    surface, EXPOSED_TOOLS safety invariants (no shell/file_ops,
    no agent-loop tools), and main() error paths.
  - 166 codex-runtime tests total, all green.

Live e2e validated against codex 0.130.0 + ChatGPT subscription:
  ✓ /codex-runtime codex_app_server enables, migrates filesystem MCP,
    registers hermes-tools, writes default_permissions = ':workspace'
  ✓ Banner shows 'Runtime: codex app-server' line in subsequent sessions
  ✓ Shell command runs without approval prompt (workspace profile works)
  ✓ Multi-turn — codex remembers prior turn's results
  ✓ apply_patch path via fileChange request approval
  ✓ web_search via hermes-tools MCP callback returns real Firecrawl
    results: 'OpenAI Codex CLI – Getting Started' end-to-end in 13s
  ✓ Disable cycle clean

Docs updated: website/docs/user-guide/features/codex-app-server-runtime.md
  Full re-write covering native plugin migration, the hermes-tools
  callback architecture, the prerequisites change ('codex login is
  separate from hermes auth login codex'), the trade-off table now
  reflecting which Hermes tools work via callback, and the limitations
  list updated with what's actually unavailable on this runtime.

* feat(codex-runtime): pin user-config preservation invariant for quirk #6

Quirk #6 from the limitations list — user MCP servers / overrides /
codex-only sections in ~/.codex/config.toml that live OUTSIDE the
hermes-managed block must survive re-migration verbatim.

This already worked thanks to the MIGRATION_MARKER + MIGRATION_END_MARKER
pair I added when fixing the default_permissions wire format (so the
strip can find both ends of the managed region even with top-level
keys like default_permissions). But it was an emergent property
without a test pinning it.

Now explicitly tested:
  - User MCP server above the managed block survives migration
  - User MCP server below the managed block survives migration
  - Both above + below survive a second re-migration
  - User content (model, providers, sandbox, otel, etc.) outside our
    region is left untouched

Docs added a section "Editing ~/.codex/config.toml safely" explaining
the marker contract — so users know they can add their own MCP
servers, override permissions, configure codex-only options, etc.
without fear of Hermes overwriting their work.

167 codex-runtime tests, all green.

* docs(codex-runtime): clarify the actual tool surface — shell covers terminal/read/write/find

Previous docs and PR description undersold what codex's built-in
toolset actually provides. apply_patch alone made it sound like the
runtime could only edit files in patch format — implying you'd lose
terminal use, read_file, write_file, search/find. That was wrong.

Codex's 'shell' tool runs arbitrary shell commands inside the sandbox,
which covers everything you'd do in bash: cat/head/tail (read), echo>
or heredocs (write), find/rg/grep (search), ls/cd (navigate), build/
test/git/etc. apply_patch is for structured multi-file edits on top
of that. update_plan is its in-runtime todo. view_image loads images.
And codex has its own web_search built in (in addition to the
Firecrawl-backed one Hermes exposes via MCP callback).

Docs now have a 'What tools the model actually has' section right
after Why, breaking the surface into three clearly-labeled buckets:

  1. Codex's built-in toolset (always on) — shell, apply_patch,
     update_plan, view_image, web_search; covers everything terminal-
     adjacent.
  2. Native Codex plugins (auto-migrated from your codex plugin
     install) — Linear, GitHub, Gmail, Calendar, Outlook, Canva, etc.
  3. Hermes tool callback (MCP server in ~/.codex/config.toml) —
     web_search/web_extract via Firecrawl, browser_*, vision_analyze,
     image_generate, skill_view/skills_list, text_to_speech.

Plus a 'What's NOT available' callout listing the four agent-loop tools
(delegate_task, memory, session_search, todo) that need running
AIAgent context and can't reach the codex runtime.

Trade-offs table broken out: shell, apply_patch, update_plan,
view_image, sandbox each get their own row with a one-line description
so users can see at a glance what's available natively.

Architecture diagram updated to list the codex built-ins by name
instead of 'apply_patch + shell + sandbox'.

No code changes — purely docs clarification. 167 codex-runtime tests
still green.

* fix(codex-runtime): _spawn_background_review signature + review fork api_mode downgrade

Two real bugs in the self-improvement loop integration that the previous
test mocked away.

Bug 1: wrong call signature

The codex helper was calling self._spawn_background_review() with no
args after every turn. That function actually requires:
  messages_snapshot=list   (positional or keyword)
  review_memory=bool       (at least one trigger must be True)
  review_skills=bool

So the call would have raised TypeError at runtime — except the only
test that exercised this path mocked _spawn_background_review entirely
and just asserted spawn.called, so the wrong-arg shape never surfaced.

Bug 2: review fork inherits codex_app_server api_mode

The review fork is constructed with:
  api_mode = _parent_runtime.get('api_mode')

So when the parent is codex_app_server, the review fork ALSO runs as
codex_app_server. But the review fork's whole job is to call agent-loop
tools (memory, skill_manage) which require Hermes' own dispatch — they
short-circuit with 'must be handled by the agent loop' on the codex
runtime. So the review fork would have run, decided to save something,
called memory or skill_manage, and silently no-op'd.

Fixed in run_agent.py:_spawn_background_review() — when the parent
api_mode is 'codex_app_server', the review fork is downgraded to
'codex_responses' (same OAuth credentials, same openai-codex provider,
but talks to OpenAI's Responses API directly so Hermes owns the loop).

Also rewrote the codex helper's review wiring to match the
chat_completions path:
  - Computes _should_review_memory in the pre-loop block (was already
    being computed; now passed through to the helper as an arg).
  - Computes _should_review_skills AFTER the codex turn returns +
    counters tick (line ~15432 pattern in chat_completions).
  - Calls _spawn_background_review(messages_snapshot=, review_memory=,
    review_skills=) only when at least one trigger fires.
  - Adds the external memory provider sync (_sync_external_memory_for_turn)
    that the chat_completions path runs after every turn.

Tests:

  Replaced the broken test_background_review_invoked (which only
  asserted spawn.called) with three sharper tests:
    - test_background_review_NOT_invoked_below_threshold:
      single turn at default thresholds → no review fires (would have
      caught the original 'every turn calls spawn with no args' bug)
    - test_background_review_skill_trigger_fires_above_threshold:
      10 tool_iterations at threshold=10 → review fires with
      messages_snapshot=list, review_skills=True, counter resets
    - test_background_review_signature_never_breaks: regression guard
      asserting positional args are always empty and kwargs include
      messages_snapshot

  New TestReviewForkApiModeDowngrade class:
    - test_codex_app_server_parent_downgrades_review_fork: drives the
      real _spawn_background_review function (no mock at that level),
      asserts the review_agent gets api_mode='codex_responses' when
      the parent was codex_app_server.

Live-validated against real run_conversation:
  - Counter ticked from 0 to 5 after a 5-tool-iteration turn
  - _spawn_background_review fired exactly once with kwargs-only signature
  - review_skills=True, review_memory=False
  - messages_snapshot was 12 entries (5 assistant tool_calls + 5 tool
    results + 1 final assistant + initial system/user)
  - Counter reset to 0 after fire

170 codex-runtime tests, all green.

Docs: added a Self-improvement loop section to the codex runtime page
explaining both how the trigger logic stays equivalent and that the
review fork is auto-downgraded to codex_responses for the agent-loop
tools. Also clarified that apply_patch and update_plan ARE codex's
built-in tools (the previous version made it sound like they were
separate from 'codex's stuff' — they're not, all five tools listed
in 'What tools the model actually has' section 1 are codex built-ins).

* feat(codex-runtime): expose kanban tools through Hermes MCP callback

Kanban workers spawn as separate hermes chat -q subprocesses that read
the user's config.yaml. If model.openai_runtime: codex_app_server is set
globally (which is the whole point of opt-in), every dispatched worker
ALSO comes up on the codex runtime.

That mostly works — codex's built-in shell + apply_patch + update_plan
do the actual task work fine — but it had one critical break: the
worker handoff tools (kanban_complete, kanban_block, kanban_comment,
kanban_heartbeat) are Hermes-registered tools, not codex built-ins.
On the codex runtime, codex builds its own tool list and these never
reach the model, so the worker would do the work but not be able to
report back, hanging until the dispatcher's timeout escalates it as
zombie.

Fix: add all 9 kanban tools to the EXPOSED_TOOLS list in the Hermes
MCP callback. They dispatch statelessly through handle_function_call()
just like web_search and the others — they read HERMES_KANBAN_TASK
from env (set by the dispatcher), gate correctly (worker tools require
the env var, orchestrator tools require it unset), and write to
~/.hermes/kanban.db.

Why kanban tools work via stateless dispatch when delegate_task/memory/
session_search/todo don't: those four are listed in _AGENT_LOOP_TOOLS
(model_tools.py:493) and short-circuit in handle_function_call() with
'must be handled by the agent loop' — they need to mutate AIAgent's
mid-loop state. Kanban tools have no such requirement; they're pure
side-effect functions against the kanban.db plus state_meta.

Tools exposed:
  Worker handoff (require HERMES_KANBAN_TASK):
    kanban_complete, kanban_block, kanban_comment, kanban_heartbeat
  Read-only board queries:
    kanban_show, kanban_list
  Orchestrator (require HERMES_KANBAN_TASK unset):
    kanban_create, kanban_unblock, kanban_link

Tests:
  - test_kanban_worker_tools_exposed: complete/block/comment/heartbeat
    in EXPOSED_TOOLS (regression guard for the would-hang-worker bug)
  - test_kanban_orchestrator_tools_exposed: create/show/list/unblock/link

Docs:
  - New 'Workflow features' section in the docs page covering /goal,
    kanban, and cron behavior on this runtime
  - /goal: works fully via run_conversation feedback; only caveat is
    approval-prompt noise on long writes-heavy goals (mitigated by
    the default :workspace permission profile)
  - Kanban: enumerated which tools are reachable via the callback and
    why the env var propagates correctly through the codex subprocess
    to the MCP server subprocess
  - Cron: documented as 'not specifically tested' — same rules as the
    CLI apply since cron runs through AIAgent.run_conversation
  - Trade-offs table gained rows for /goal, kanban worker, kanban
    orchestrator

172/172 codex-runtime tests green (+2 from kanban tests).

* docs(codex-runtime): wire /codex-runtime into slash-commands ref + flag aux token cost

Three docs gaps caught during a final audit:

1. /codex-runtime was only in the feature docs page, not in the
   slash-commands reference. Added rows to both the CLI section and
   the Messaging section so users discover it where they'd look for
   slash command syntax.

2. CODEX_HOME and HERMES_KANBAN_TASK weren't in environment-variables.md.
   CODEX_HOME lets users redirect Codex CLI's config dir (the migration
   honors it). HERMES_KANBAN_TASK is set by the kanban dispatcher and
   propagates to the codex subprocess + the hermes-tools MCP subprocess
   so kanban worker tools gate correctly — documented as 'don't set
   manually' since it's an internal handoff.

3. Aux client behavior on this runtime. When openai_runtime=
   codex_app_server is on with the openai-codex provider, every aux
   task (title generation, context compression, vision auto-detect,
   session search summarization, the background self-improvement review
   fork) flows through the user's ChatGPT subscription by default.

   This is true for the existing codex_responses path too, but it's
   more visible / important here because users explicitly opted in for
   subscription billing. Added a 'Auxiliary tasks and ChatGPT
   subscription token cost' section to the docs page with a YAML
   example showing how to override specific aux tasks to a cheaper
   model (typically google/gemini-3-flash-preview via OpenRouter).

   Also documents how the self-improvement review fork gets
   auto-downgraded from codex_app_server to codex_responses by the
   fix earlier in this PR.

No code changes — pure docs. 172 codex-runtime tests still green.

* docs+test(codex-runtime): pin HOME passthrough, document multi-profile + CODEX_HOME

OpenClaw hit a real footgun in openclaw/openclaw#81562: when spawning
codex app-server they were synthesizing a per-agent HOME alongside
CODEX_HOME. That made every subprocess codex's shell tool launches
(gh, git, aws, npm, gcloud, ...) see a fake $HOME and miss the user's
real config files. They had to back it out in PR #81562 — keep
CODEX_HOME isolation, leave HOME alone.

Audit confirms Hermes' codex spawn doesn't have this problem. We do
os.environ.copy() and only overlay CODEX_HOME (when provided) and
RUST_LOG. HOME passes through unchanged. But it was an emergent
property without a test pinning it, so adding a regression guard:

  test_spawn_env_preserves_HOME — confirms parent HOME survives intact
                                  in the subprocess env
  test_spawn_env_sets_CODEX_HOME_when_provided — confirms codex_home
                                                  arg still isolates
                                                  codex state correctly

Docs additions:

  'HOME environment variable passthrough' section — calls out the
  contract explicitly: CODEX_HOME isolates codex's own state, HOME
  stays user-real so gh/git/aws/npm/etc. find their normal config.
  Cites openclaw#81562 as the cautionary tale.

  'Multi-profile / multi-tenant setups' section — addresses the
  related concern: profiles share ~/.codex/ by default. For users who
  want per-profile codex isolation (separate auth, separate plugins),
  documents the manual CODEX_HOME=<profile-scoped-dir> approach.

  Explains why we DON'T auto-scope CODEX_HOME per profile: doing so
  would silently invalidate existing codex login state for anyone
  upgrading to this PR with tokens already at ~/.codex/auth.json.
  Opt-in is safer than surprising users.

174 codex-runtime tests (+2 from HOME guards), all green.

* fix(codex-runtime): TOML control-char escapes + atomic config.toml write

Two footguns caught in a final audit pass before merge.

Bug 1: TOML control characters not escaped

The _format_toml_value() helper escaped backslashes and double quotes
but passed literal control characters (\n, \t, \r, \f, \b) through
unchanged. TOML basic strings don't allow literal control characters
— a path or env var containing a newline would produce invalid TOML
that codex refuses to load.

Realistic exposure: pathological cases like a HERMES_HOME with a
trailing newline (env var concatenation accident), or a PYTHONPATH
with a tab from a multi-line shell heredoc.

Fix: escape all five TOML basic-string control sequences (\b \t \n
\f \r) in addition to \\ and \" that we already did. Order
matters — backslash must come first or the other escapes get
re-escaped.

Bug 2: config.toml write wasn't atomic

If the python process crashed between target.mkdir() and the
write_text() finishing, a half-written config.toml could be left
behind. On NFS / Windows / some FUSE mounts this is a real concern;
on ext4/APFS small writes are usually atomic in practice but not
guaranteed.

Fix: write to a tempfile.mkstemp() temp file in the same directory,
then Path.replace() (atomic same-dir rename on POSIX, ReplaceFile on
Windows). On rename failure, clean up the temp file so repeated
failed migrations don't pile up .config.toml.* files.

Tests:
  - test_string_with_newline_escaped — \n in value → \n in output
  - test_string_with_tab_escaped — \t in value → \t in output
  - test_string_with_other_controls_escaped — \r, \f, \b
  - test_windows_path_escaped_correctly — backslash doubling
  - test_atomic_write_no_temp_leak_on_success — no .config.toml.*
    left over after a successful write
  - test_atomic_write_cleanup_on_rename_failure — temp file removed
    when Path.replace raises (simulated disk full)

180 codex-runtime tests, all green (+6 from this commit).

Footguns audited but NOT fixed (with rationale):

- Concurrent migrations race. Two Hermes processes hitting
  /codex-runtime codex_app_server within seconds of each other could
  cause one writer to lose entries. Low probability (you'd have to
  enable from two surfaces simultaneously) and low impact (just re-run
  migration). Adding fcntl/msvcrt locking is more code than it's
  worth here. The atomic rename above means each individual write is
  consistent — only the merge step is racy.

- Codex protocol version drift. We pin MIN_CODEX_VERSION=0.125 and
  check at runtime but don't reject too-new versions. Right call —
  the protocol has been stable through 0.125 → 0.130. If OpenAI
  breaks it later we'd see the error in test_codex_app_server_runtime
  on CI before users hit it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant