feat(deploy): hermes.service systemd unit + migrate-from-wamba-build.sh#5
Conversation
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📥 CommitsReviewing files that changed from the base of the PR and between 3c1db255fa2492ed16e1cb18dff6e079f535e447 and 89b30c4. 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThis PR adds production deployment infrastructure for Hermes on the eligia VPS, transitioning from legacy unversioned snapshots to repo-managed configuration. It introduces a systemd service unit, an idempotent migration script, and updates deployment documentation accordingly. ChangesHermes VPS deployment infrastructure
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (3)
deploy/eligia-vps/hermes.service (1)
16-17: ⚡ Quick winSpecify compose file explicitly in ExecStart for consistency.
ExecStart relies on
WorkingDirectory=/opt/hermesto finddocker-compose.yml, but ExecStop explicitly specifies-f /opt/hermes/docker-compose.yml. This inconsistency makes the unit harder to understand and could cause issues if WorkingDirectory behavior changes or if the compose file is in a different location.♻️ Proposed refactor for consistency
-ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env 'docker compose up -d --force-recreate' +ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env '/usr/bin/docker compose -f /opt/hermes/docker-compose.yml up -d --force-recreate'🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/eligia-vps/hermes.service` around lines 16 - 17, Update the systemd unit's ExecStart to explicitly pass the compose file path like ExecStop does: modify the ExecStart line that currently runs "/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env 'docker compose up -d --force-recreate'" so it includes "-f /opt/hermes/docker-compose.yml" (preserving the sops wrapper and flags) to match ExecStop and avoid relying on WorkingDirectory; reference the ExecStart and ExecStop directives when making the change.deploy/eligia-vps/migrate-from-wamba-build.sh (1)
39-44: 💤 Low valueSymlink replacement doesn't backup the previous target.
If
${COMPOSE_LIVE}is already a symlink (perhaps pointing to a different location), the condition on line 39 prevents creating a backup, andln -sfon line 44 silently replaces it. While this might be intentional for idempotency, it could surprise operators who manually changed the symlink target.📝 Optional: Warn when replacing existing symlink
if [[ -f "${COMPOSE_LIVE}" && ! -L "${COMPOSE_LIVE}" ]]; then echo " ➤ Backing up ${COMPOSE_LIVE} → ${COMPOSE_LIVE}.bak-${TS}" cp "${COMPOSE_LIVE}" "${COMPOSE_LIVE}.bak-${TS}" +elif [[ -L "${COMPOSE_LIVE}" ]]; then + echo " ⚠ ${COMPOSE_LIVE} is already a symlink ($(readlink "${COMPOSE_LIVE}")); replacing it" fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/eligia-vps/migrate-from-wamba-build.sh` around lines 39 - 44, The script currently only backs up when ${COMPOSE_LIVE} is a regular file and silently replaces existing symlinks; modify the logic around the COMPOSE_LIVE handling to also detect when it's a symlink (use readlink on ${COMPOSE_LIVE}), create a timestamped backup of the symlink target (or at least warn and copy the resolved file to ${COMPOSE_LIVE}.bak-${TS}) before running ln -sf, and ensure the echo message distinguishes backing up a symlink target versus a regular file; refer to the COMPOSE_LIVE variable, the existing backup naming pattern (.bak-${TS}), the readlink resolution, and the ln -sf invocation when implementing the change.deploy/eligia-vps/README.md (1)
84-88: 💤 Low valueMigration script downloads from live main branch.
The curl command fetches from
main, which means the script version can change between when these docs are read and when the command is run. While convenient for always getting the latest version, this creates a reproducibility gap and prevents verification of the exact script being executed. Consider documenting a verification step or using a tagged release for production migrations.📝 Optional: Add verification or use tagged version
Add a verification step before execution:
> ```bash > # Download and review before executing > ssh root@178.104.140.21 'cd /tmp && \ > curl -sL -o migrate.sh https://raw.githubusercontent.com/Wizarck/hermes-agent/main/deploy/eligia-vps/migrate-from-wamba-build.sh && \ > less migrate.sh && \ > bash migrate.sh' > ```Or reference a specific commit/tag instead of
mainfor reproducible deployments.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deploy/eligia-vps/README.md` around lines 84 - 88, The README's remote-install curl command downloads migrate-from-wamba-build.sh from the repo's main branch, which breaks reproducibility and auditability; update the command in deploy/eligia-vps/README.md to either point to a specific tag or commit SHA for migrate-from-wamba-build.sh or add a download-and-verify step that saves the script (e.g., curl -o migrate.sh), lets the operator inspect/verify its contents/signature, and only then executes it; locate the curl line that references https://raw.githubusercontent.com/.../main/deploy/eligia-vps/migrate-from-wamba-build.sh and replace it with a pinned URL or add the download-review-execute workflow described above.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deploy/eligia-vps/hermes.service`:
- Line 16: The ExecStart command uses an unqualified "docker compose" inside the
sops exec-env invocation which can fail under a restricted PATH; update the
ExecStart line so the docker executable is referenced via its absolute path (use
/usr/bin/docker) inside the sops exec-env command string (i.e., replace the
unqualified "docker compose" with "/usr/bin/docker compose") to match the usage
on the following line and ensure systemd can find the binary.
- Line 16: Add a prerequisite validation before the migration step that restarts
the service: check that /opt/eligia/eligia-core/secrets/secrets.env exists and
is decryptable before invoking the systemd restart that triggers ExecStart in
hermes.service. Implement a bash check in the migration script (before "step 5"
/ the restart operation) that tests file existence (stat/-f) and attempts a
harmless sops decryption (e.g., sops --decrypt or sops exec-env dry-run) to
verify the AGE key can decrypt; if either check fails, print the two-line error
message from the review and exit nonzero so the healthcheck path (which relies
on ExecStart) never runs. Ensure the error message exactly mentions the
secrets.env path and the README prerequisite to help operators.
- Line 11: The service unit hardcodes
SOPS_AGE_KEY_FILE=/root/.config/sops/age/keys.txt (hermes.service Environment)
but the migration/disaster-recovery flow doesn't validate that file exists;
update the migration script that runs before step 2 to check for the age key at
/root/.config/sops/age/keys.txt and fail with a clear message if missing, or
alternatively add an explicit prerequisite in the README/disaster recovery
instructions that the SOPS age key must be present at that exact path before
performing step 2; reference the hermes.service Environment variable name
SOPS_AGE_KEY_FILE when adding the check/docs so maintainers can locate the
related configuration.
In `@deploy/eligia-vps/migrate-from-wamba-build.sh`:
- Around line 64-67: The docker build step currently overwrites/creates the
production IMAGE_TAG directly and can leave the system inconsistent on failure;
change the flow to build to a temporary/test tag (e.g., IMAGE_TAG_TMP) using
SOURCE_DIR and DOCKERFILE_REL, verify the image exists and passes a quick smoke
test (docker inspect and a short container run that checks service start), and
only after the verification succeeds replace the production IMAGE_TAG, update
symlinks and reload the systemd unit; if verification fails, abort and leave
current production image/unit unchanged and log actionable error/rollback steps.
- Around line 73-80: The health-check loop using the docker inspect hermes
command may exit after 12 attempts without failing the script; after the
for-loop finishes check the final status variable and if it is not "healthy"
emit a prominent error message (including the last observed ${status}) and exit
non‑zero so the deployment fails visibly; otherwise continue normally. Reference
the loop that sets status via docker inspect hermes --format
'{{.State.Health.Status}}' and the status variable to implement the post-loop
check and exit behavior.
- Around line 55-57: The current block compares
SOURCE_DIR/deploy/eligia-vps/hermes.service with
/etc/systemd/system/hermes.service and always attempts to back up the target,
which fails if the target doesn't exist; modify the logic so you only create the
backup if /etc/systemd/system/hermes.service exists (e.g., wrap the cp
"/etc/systemd/system/hermes.service"
"/etc/systemd/system/hermes.service.bak-${TS}" in a test -e or [ -f ] check),
keeping the existing cmp -s comparison and the subsequent copy from
"${SOURCE_DIR}/deploy/eligia-vps/hermes.service" to
/etc/systemd/system/hermes.service unchanged.
- Line 15: The script currently sets strict shell options (set -euo pipefail)
but lacks a root check, so add an early guard that verifies the script is
running as root (check $EUID or id -u) and exits with a clear error if not;
place this check near the top of migrate-from-wamba-build.sh immediately after
the existing set -euo pipefail line and before any operations that touch
/etc/systemd/system/ or install systemd units to prevent cryptic permission
errors.
- Around line 28-29: The script runs git -C "${SOURCE_DIR}" fetch origin main
followed immediately by git -C "${SOURCE_DIR}" reset --hard origin/main which
can discard local changes or act on a stale fetch; modify the logic to verify
the fetch succeeded (check the exit status of the git fetch called in the block
using SOURCE_DIR) and abort with an error message if it failed, and before
running reset --hard detect and warn (or require confirmation/log) if there are
local changes (use git -C "${SOURCE_DIR}" status --porcelain or equivalent) so
the destructive reset is only performed after a successful fetch and an explicit
warning/consent when local work would be lost.
---
Nitpick comments:
In `@deploy/eligia-vps/hermes.service`:
- Around line 16-17: Update the systemd unit's ExecStart to explicitly pass the
compose file path like ExecStop does: modify the ExecStart line that currently
runs "/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env
'docker compose up -d --force-recreate'" so it includes "-f
/opt/hermes/docker-compose.yml" (preserving the sops wrapper and flags) to match
ExecStop and avoid relying on WorkingDirectory; reference the ExecStart and
ExecStop directives when making the change.
In `@deploy/eligia-vps/migrate-from-wamba-build.sh`:
- Around line 39-44: The script currently only backs up when ${COMPOSE_LIVE} is
a regular file and silently replaces existing symlinks; modify the logic around
the COMPOSE_LIVE handling to also detect when it's a symlink (use readlink on
${COMPOSE_LIVE}), create a timestamped backup of the symlink target (or at least
warn and copy the resolved file to ${COMPOSE_LIVE}.bak-${TS}) before running ln
-sf, and ensure the echo message distinguishes backing up a symlink target
versus a regular file; refer to the COMPOSE_LIVE variable, the existing backup
naming pattern (.bak-${TS}), the readlink resolution, and the ln -sf invocation
when implementing the change.
In `@deploy/eligia-vps/README.md`:
- Around line 84-88: The README's remote-install curl command downloads
migrate-from-wamba-build.sh from the repo's main branch, which breaks
reproducibility and auditability; update the command in
deploy/eligia-vps/README.md to either point to a specific tag or commit SHA for
migrate-from-wamba-build.sh or add a download-and-verify step that saves the
script (e.g., curl -o migrate.sh), lets the operator inspect/verify its
contents/signature, and only then executes it; locate the curl line that
references
https://raw.githubusercontent.com/.../main/deploy/eligia-vps/migrate-from-wamba-build.sh
and replace it with a pinned URL or add the download-review-execute workflow
described above.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 2af760f1-8f7c-4d15-9a16-2d1a22fed643
📥 Commits
Reviewing files that changed from the base of the PR and between 79414a9 and 3c1db255fa2492ed16e1cb18dff6e079f535e447.
📒 Files selected for processing (3)
deploy/eligia-vps/README.mddeploy/eligia-vps/hermes.servicedeploy/eligia-vps/migrate-from-wamba-build.sh
| Type=oneshot | ||
| RemainAfterExit=yes | ||
| WorkingDirectory=/opt/hermes | ||
| Environment=SOPS_AGE_KEY_FILE=/root/.config/sops/age/keys.txt |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if the migration script or README documents the SOPS key prerequisite
rg -n "SOPS_AGE_KEY_FILE|keys\.txt|age.*key" deploy/eligia-vps/Repository: Wizarck/hermes-agent
Length of output: 234
🏁 Script executed:
cat -n deploy/eligia-vps/README.md | sed -n '120,150p'Repository: Wizarck/hermes-agent
Length of output: 2116
🏁 Script executed:
fd -type f -name "*.md" -o -name "*setup*" -o -name "*deploy*" -o -name "*migration*" | head -20Repository: Wizarck/hermes-agent
Length of output: 235
🏁 Script executed:
rg -A 5 -B 5 "sops|age" deploy/eligia-vps/README.mdRepository: Wizarck/hermes-agent
Length of output: 5541
🏁 Script executed:
cat deploy/eligia-vps/migrate-from-wamba-build.shRepository: Wizarck/hermes-agent
Length of output: 3865
Document the SOPS age key prerequisite in the migration instructions.
The hardcoded path /root/.config/sops/age/keys.txt is not validated by the migration script and is not explicitly documented in the disaster recovery steps. If the age key file is missing or located elsewhere, the service will fail to decrypt secrets during startup. Add a check in the migration script or explicitly document in the README that the age key must be present at this path before step 2 of the disaster recovery sequence.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/hermes.service` at line 11, The service unit hardcodes
SOPS_AGE_KEY_FILE=/root/.config/sops/age/keys.txt (hermes.service Environment)
but the migration/disaster-recovery flow doesn't validate that file exists;
update the migration script that runs before step 2 to check for the age key at
/root/.config/sops/age/keys.txt and fail with a clear message if missing, or
alternatively add an explicit prerequisite in the README/disaster recovery
instructions that the SOPS age key must be present at that exact path before
performing step 2; reference the hermes.service Environment variable name
SOPS_AGE_KEY_FILE when adding the check/docs so maintainers can locate the
related configuration.
| # into the docker-compose process so `${VAR}` placeholders in compose.yaml | ||
| # resolve to real values. Without sops, the compose env block would expand | ||
| # to empty strings and the container would start crippled. | ||
| ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env 'docker compose up -d --force-recreate' |
There was a problem hiding this comment.
Use absolute path for docker command in ExecStart.
Line 16 uses docker compose without an absolute path inside the sops exec-env shell command, while line 17 uses /usr/bin/docker. The PATH environment variable inside sops exec-env may differ from the normal systemd PATH, potentially causing command-not-found failures.
🔧 Proposed fix to use absolute path consistently
-ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env 'docker compose up -d --force-recreate'
+ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env '/usr/bin/docker compose up -d --force-recreate'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env 'docker compose up -d --force-recreate' | |
| ExecStart=/usr/local/bin/sops exec-env /opt/eligia/eligia-core/secrets/secrets.env '/usr/bin/docker compose up -d --force-recreate' |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/hermes.service` at line 16, The ExecStart command uses an
unqualified "docker compose" inside the sops exec-env invocation which can fail
under a restricted PATH; update the ExecStart line so the docker executable is
referenced via its absolute path (use /usr/bin/docker) inside the sops exec-env
command string (i.e., replace the unqualified "docker compose" with
"/usr/bin/docker compose") to match the usage on the following line and ensure
systemd can find the binary.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if migration script verifies eligia-core repo or secrets.env
rg -n "eligia-core|secrets\.env" deploy/eligia-vps/migrate-from-wamba-build.shRepository: Wizarck/hermes-agent
Length of output: 46
🏁 Script executed:
# First, check the migration script content
cat -n deploy/eligia-vps/migrate-from-wamba-build.shRepository: Wizarck/hermes-agent
Length of output: 4544
🏁 Script executed:
# Also check the full hermes.service file
cat -n deploy/eligia-vps/hermes.serviceRepository: Wizarck/hermes-agent
Length of output: 1018
🏁 Script executed:
# Search for any other references to eligia-core in the entire repository
rg -n "eligia-core" --type-list | head -20Repository: Wizarck/hermes-agent
Length of output: 454
🏁 Script executed:
# Search for any validation or pre-flight check scripts
fd -type f -name "*check*" -o -name "*valid*" -o -name "*verify*" deploy/Repository: Wizarck/hermes-agent
Length of output: 235
🏁 Script executed:
# Check if there's any documentation about deployment prerequisites
fd -type f \( -name "README*" -o -name "*.md" \) deploy/ | head -10Repository: Wizarck/hermes-agent
Length of output: 235
🏁 Script executed:
# Search for README or deployment docs
fd -type f \( -name "README*" -o -name "*.md" \) | grep -E "(deploy|README)" | head -20Repository: Wizarck/hermes-agent
Length of output: 235
🏁 Script executed:
# Search for any references to eligia-core or secrets.env setup
rg -n "eligia-core|secrets\.env" --type md --type txtRepository: Wizarck/hermes-agent
Length of output: 981
🏁 Script executed:
# Check if there are any runbook or setup docs
fd -type f -name "*runbook*" -o -name "*setup*" -o -name "*install*"Repository: Wizarck/hermes-agent
Length of output: 235
🏁 Script executed:
# Read the deploy/eligia-vps/README.md file
cat -n deploy/eligia-vps/README.mdRepository: Wizarck/hermes-agent
Length of output: 8451
Add prerequisite validation before service restart.
The migration script progresses to restart the service without verifying that /opt/eligia/eligia-core/secrets/secrets.env exists and is decryptable. If this file is missing or the SOPS age key cannot decrypt it, the ExecStart command will fail silently during the healthcheck phase (lines 73–80), causing deployment to appear successful while the service actually fails to start.
The README documents this as a prerequisite (lines 133–134 in the disaster recovery section), but operators running the migration script directly may miss it. Add a validation check before step 5 (line 70) to ensure the secrets file is present and readable:
if [[ ! -f "/opt/eligia/eligia-core/secrets/secrets.env" ]]; then
echo "ERROR: /opt/eligia/eligia-core/secrets/secrets.env not found."
echo "Required prerequisite: Clone Wizarck/eligia-core and decrypt secrets."
exit 1
fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/hermes.service` at line 16, Add a prerequisite validation
before the migration step that restarts the service: check that
/opt/eligia/eligia-core/secrets/secrets.env exists and is decryptable before
invoking the systemd restart that triggers ExecStart in hermes.service.
Implement a bash check in the migration script (before "step 5" / the restart
operation) that tests file existence (stat/-f) and attempts a harmless sops
decryption (e.g., sops --decrypt or sops exec-env dry-run) to verify the AGE key
can decrypt; if either check fails, print the two-line error message from the
review and exit nonzero so the healthcheck path (which relies on ExecStart)
never runs. Ensure the error message exactly mentions the secrets.env path and
the README prerequisite to help operators.
| # Run on the VPS as root: | ||
| # bash deploy/eligia-vps/migrate-from-wamba-build.sh | ||
|
|
||
| set -euo pipefail |
There was a problem hiding this comment.
Verify script is running as root.
The script comments (line 12) state "Run on the VPS as root", and operations like installing systemd units and modifying /etc/systemd/system/ require root privileges. However, the script doesn't verify it's actually running as root. If run by a non-root user, it will fail with cryptic permission errors.
🛡️ Proposed fix to add root check
set -euo pipefail
+
+if [[ $EUID -ne 0 ]]; then
+ echo "Error: This script must be run as root" >&2
+ exit 1
+fi
REPO_URL="https://github.com/Wizarck/hermes-agent.git"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| set -euo pipefail | |
| set -euo pipefail | |
| if [[ $EUID -ne 0 ]]; then | |
| echo "Error: This script must be run as root" >&2 | |
| exit 1 | |
| fi | |
| REPO_URL="https://github.com/Wizarck/hermes-agent.git" |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/migrate-from-wamba-build.sh` at line 15, The script
currently sets strict shell options (set -euo pipefail) but lacks a root check,
so add an early guard that verifies the script is running as root (check $EUID
or id -u) and exits with a clear error if not; place this check near the top of
migrate-from-wamba-build.sh immediately after the existing set -euo pipefail
line and before any operations that touch /etc/systemd/system/ or install
systemd units to prevent cryptic permission errors.
| git -C "${SOURCE_DIR}" fetch origin main | ||
| git -C "${SOURCE_DIR}" reset --hard origin/main |
There was a problem hiding this comment.
Verify git fetch success before destructive reset.
git reset --hard origin/main on line 29 will discard any local changes without warning. If the fetch on line 28 fails (network issue, auth problem), the reset will use stale origin/main or fail unexpectedly. Consider verifying the fetch succeeded and warning about discarded local changes.
🛡️ Proposed fix to verify fetch and warn user
echo " ✓ ${SOURCE_DIR} is already a git checkout — pulling latest."
- git -C "${SOURCE_DIR}" fetch origin main
+ git -C "${SOURCE_DIR}" fetch origin main || { echo " ✗ fetch failed"; exit 1; }
+ echo " ⚠ Discarding any local changes with hard reset"
git -C "${SOURCE_DIR}" reset --hard origin/main🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/migrate-from-wamba-build.sh` around lines 28 - 29, The
script runs git -C "${SOURCE_DIR}" fetch origin main followed immediately by git
-C "${SOURCE_DIR}" reset --hard origin/main which can discard local changes or
act on a stale fetch; modify the logic to verify the fetch succeeded (check the
exit status of the git fetch called in the block using SOURCE_DIR) and abort
with an error message if it failed, and before running reset --hard detect and
warn (or require confirmation/log) if there are local changes (use git -C
"${SOURCE_DIR}" status --porcelain or equivalent) so the destructive reset is
only performed after a successful fetch and an explicit warning/consent when
local work would be lost.
| if ! cmp -s "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service; then | ||
| cp /etc/systemd/system/hermes.service "/etc/systemd/system/hermes.service.bak-${TS}" | ||
| cp "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service |
There was a problem hiding this comment.
Backup will fail if hermes.service doesn't exist yet.
On a fresh install where /etc/systemd/system/hermes.service doesn't exist, cmp on line 55 will evaluate false (files differ), but line 56 will fail trying to backup a non-existent file.
🛡️ Proposed fix to check existence before backup
if ! cmp -s "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service; then
- cp /etc/systemd/system/hermes.service "/etc/systemd/system/hermes.service.bak-${TS}"
+ if [[ -f /etc/systemd/system/hermes.service ]]; then
+ cp /etc/systemd/system/hermes.service "/etc/systemd/system/hermes.service.bak-${TS}"
+ fi
cp "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if ! cmp -s "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service; then | |
| cp /etc/systemd/system/hermes.service "/etc/systemd/system/hermes.service.bak-${TS}" | |
| cp "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service | |
| if ! cmp -s "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service; then | |
| if [[ -f /etc/systemd/system/hermes.service ]]; then | |
| cp /etc/systemd/system/hermes.service "/etc/systemd/system/hermes.service.bak-${TS}" | |
| fi | |
| cp "${SOURCE_DIR}/deploy/eligia-vps/hermes.service" /etc/systemd/system/hermes.service |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/migrate-from-wamba-build.sh` around lines 55 - 57, The
current block compares SOURCE_DIR/deploy/eligia-vps/hermes.service with
/etc/systemd/system/hermes.service and always attempts to back up the target,
which fails if the target doesn't exist; modify the logic so you only create the
backup if /etc/systemd/system/hermes.service exists (e.g., wrap the cp
"/etc/systemd/system/hermes.service"
"/etc/systemd/system/hermes.service.bak-${TS}" in a test -e or [ -f ] check),
keeping the existing cmp -s comparison and the subsequent copy from
"${SOURCE_DIR}/deploy/eligia-vps/hermes.service" to
/etc/systemd/system/hermes.service unchanged.
| echo | ||
| echo "=== Step 4/6: rebuild image from fork checkout ===" | ||
| cd "${SOURCE_DIR}" | ||
| docker build -f "${DOCKERFILE_REL}" -t "${IMAGE_TAG}" . |
There was a problem hiding this comment.
Docker build failure leaves system in inconsistent state.
If docker build fails (missing dependencies, out of disk space, etc.), the script exits via set -e, leaving the system with updated symlinks and systemd unit but no working image. The old image might still exist, but there's no automatic rollback.
Consider verifying the image exists and works before updating production configuration, or document the manual rollback procedure.
Would you like me to suggest a verification step or rollback logic?
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/migrate-from-wamba-build.sh` around lines 64 - 67, The
docker build step currently overwrites/creates the production IMAGE_TAG directly
and can leave the system inconsistent on failure; change the flow to build to a
temporary/test tag (e.g., IMAGE_TAG_TMP) using SOURCE_DIR and DOCKERFILE_REL,
verify the image exists and passes a quick smoke test (docker inspect and a
short container run that checks service start), and only after the verification
succeeds replace the production IMAGE_TAG, update symlinks and reload the
systemd unit; if verification fails, abort and leave current production
image/unit unchanged and log actionable error/rollback steps.
| for i in $(seq 1 12); do | ||
| status=$(docker inspect hermes --format '{{.State.Health.Status}}' 2>/dev/null || echo "unknown") | ||
| echo " attempt ${i}/12 → ${status}" | ||
| if [[ "${status}" == "healthy" ]]; then | ||
| break | ||
| fi | ||
| sleep 10 | ||
| done |
There was a problem hiding this comment.
Health check loop completes even if service never becomes healthy.
After 12 failed attempts (2 minutes), the loop exits and the script continues to completion without failing. An operator running this script might believe the migration succeeded when the service is actually unhealthy. Consider exiting with an error or at least printing a prominent warning if health check fails after all attempts.
🔧 Proposed fix to fail or warn on unhealthy status
for i in $(seq 1 12); do
status=$(docker inspect hermes --format '{{.State.Health.Status}}' 2>/dev/null || echo "unknown")
echo " attempt ${i}/12 → ${status}"
if [[ "${status}" == "healthy" ]]; then
break
fi
sleep 10
done
+if [[ "${status}" != "healthy" ]]; then
+ echo " ✗ Health check failed after 12 attempts. Service may be broken!"
+ exit 1
+fi📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for i in $(seq 1 12); do | |
| status=$(docker inspect hermes --format '{{.State.Health.Status}}' 2>/dev/null || echo "unknown") | |
| echo " attempt ${i}/12 → ${status}" | |
| if [[ "${status}" == "healthy" ]]; then | |
| break | |
| fi | |
| sleep 10 | |
| done | |
| for i in $(seq 1 12); do | |
| status=$(docker inspect hermes --format '{{.State.Health.Status}}' 2>/dev/null || echo "unknown") | |
| echo " attempt ${i}/12 → ${status}" | |
| if [[ "${status}" == "healthy" ]]; then | |
| break | |
| fi | |
| sleep 10 | |
| done | |
| if [[ "${status}" != "healthy" ]]; then | |
| echo " ✗ Health check failed after 12 attempts. Service may be broken!" | |
| exit 1 | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deploy/eligia-vps/migrate-from-wamba-build.sh` around lines 73 - 80, The
health-check loop using the docker inspect hermes command may exit after 12
attempts without failing the script; after the for-loop finishes check the final
status variable and if it is not "healthy" emit a prominent error message
(including the last observed ${status}) and exit non‑zero so the deployment
fails visibly; otherwise continue normally. Reference the loop that sets status
via docker inspect hermes --format '{{.State.Health.Status}}' and the status
variable to implement the post-loop check and exit behavior.
Two more pieces that lived only on the VPS now live in the fork:
- hermes.service: systemd unit (Description, Requires, sops exec-env,
docker compose up). Drop-in compatible with the current
/etc/systemd/system/hermes.service on the VPS.
- migrate-from-wamba-build.sh: one-shot, idempotent migration script.
Steps:
1. Clone the fork into /opt/hermes/source (or pull if already present).
2. Symlink /opt/hermes/docker-compose.yml + /opt/hermes/data/config.yaml
to the repo-versioned files under deploy/eligia-vps/.
3. Install the new systemd unit (with backup of the old one).
4. Rebuild eligia/hermes-agent:wamba from the fork checkout.
5. systemctl restart hermes + poll healthcheck.
6. Flag the legacy /opt/hermes/wamba_build/ snapshot for manual
deletion once the new flow is verified.
After this PR + running the script once on the VPS, future updates are
just:
git -C /opt/hermes/source pull && \
docker build -f /opt/hermes/source/deploy/eligia-vps/Dockerfile.eligia-overlay \
-t eligia/hermes-agent:wamba \
/opt/hermes/source && \
systemctl restart hermes
— no more scp'ing files around or manually editing wamba_build.
README updated: removes the "TODO: migrate to /opt/hermes/source/"
section, adds a curl|bash one-liner for running the migration, lists
hermes.service in the file table, and folds the systemd-unit follow-up
into the now-completed list.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3c1db25 to
89b30c4
Compare
🔎 Lint report:
|
…rch#25071) * tui: make URLs clickable + hover-highlight in any terminal Problem ------- URLs printed by `hermes --tui` were not clickable in basic macOS Terminal.app. Cmd+click did nothing, the cursor didn't change shape — like nothing was detected — even though arrow buttons and other Box onClick handlers worked fine. Root cause ---------- Two layers of dead plumbing: 1. `<Link>` only emitted the underlying `<ink-link>` (which carries the hyperlink metadata into the screen buffer) when `supportsHyperlinks()` said yes. On Apple_Terminal that's false, so the per-cell hyperlink field stayed empty, so `Ink.getHyperlinkAt()` had nothing to return on click. The visible underline was just decorative. 2. `Ink.openHyperlink()` calls `this.onHyperlinkClick?.(url)`, but `onHyperlinkClick` was never assigned anywhere in the codebase. The click pipeline (`App.tsx → onOpenHyperlink → Ink.openHyperlink`) ran but bailed silently on the optional chain. Bonus discovery: even when wired up, there was no hover affordance — terminal apps can't change the system mouse cursor, so users had no visual signal that a cell was clickable. Arrow buttons in the chrome worked because they had explicit `<Box onClick>` styling; inline link URLs didn't. Fix --- - `Link.tsx`: always emit `<ink-link>` regardless of terminal capability. The renderer's `wrapWithOsc8Link` already gates the actual OSC 8 escape on `supportsHyperlinks()` further down — so terminals that don't understand OSC 8 still don't see the escape, but the screen-buffer metadata (which the click dispatcher reads) is now populated everywhere. - `ink.tsx + root.ts`: add `onHyperlinkClick?: (url: string) => void` to `Options` / `RenderOptions`, wire it to the existing `Ink.onHyperlinkClick` field in the constructor. - `src/lib/openExternalUrl.ts`: small platform-aware opener using `child_process.spawn` with arg-array (no shell) — http(s) only, rejects `file:`, `javascript:`, `data:`, etc., so a hostile model can't trigger arbitrary local handlers via `<Link url="file:///...">`. Detached + stdio ignore so closing the TUI doesn't kill the browser and Chrome stderr doesn't leak into the alt screen. - `entry.tsx`: pass `onHyperlinkClick: openExternalUrl` to `ink.render`. - `hyperlinkHover.ts` + Ink hover wiring: track the URL under the pointer in `Ink.hoveredHyperlink`, update it from `dispatchHover`, and inverse- highlight every cell of the matching link in the render-pass overlay (same pattern as `applySearchHighlight`). This is the cursor-hover affordance for clickable links — terminals don't expose cursor shape, so we light up the link itself. - `types/hermes-ink.d.ts`: add `onHyperlinkClick` to the `RenderOptions` shim so consumers (`entry.tsx`) type-check against the new option. Tests ----- - `src/lib/openExternalUrl.test.ts` (15 cases): http(s) accepted; file/js/ data/mailto/ftp/ssh rejected; macOS open(1), Windows cmd.exe start with empty title slot, Linux xdg-open dispatch; shell-metacharacter URLs pass through unmolested as a single argv element; synchronous spawn failure returns false. Verified empirically in Apple Terminal 455.1 (macOS 15.7.3): clicking a URL opens in default browser, hovering inverts the link cells, and moving away clears the highlight. Full TUI suite: 713 passing, 0 type errors. Reverts ------- The earlier attempt that version-gated Apple_Terminal in `supports-hyperlinks.ts` was based on a wrong assumption — Terminal.app silently strips OSC 8 sequences but does not render them as clickable hyperlinks. Reverted to the original allowlist. * tui: address Copilot review — explorer.exe on win32 + comment fixes - openExternalUrl: switch win32 from `cmd.exe /c start` to `explorer.exe`. cmd.exe's `start` builtin reparses the URL through cmd's tokenizer, so `&`, `|`, `^`, `<`, `>` either split the command or get reinterpreted — breaking both the protocol-allowlist safety story AND plain http(s) URLs with `&` in query strings. `explorer.exe <url>` invokes the registered protocol handler directly with no shell. - openExternalUrl.test.ts: rename the win32 test to reflect the new contract and add two regression tests — one with `&|^<>` metachars, one with the common analytics-URL `&` query-param pattern — both pinned to single-argv-element delivery via explorer.exe. - Link.tsx: fix misleading comment. OSC 8 escapes are emitted unconditionally by the renderer (`wrapWithOsc8Link` in render-node-to-output.ts, `oscLink` in log-update.ts). Non-supporting terminals silently strip the sequence, which is why hover/click affordance has to come from the in-process overlay rather than the terminal's own link rendering. Verified: 715/715 tests pass, type-check + build clean. * tui: address Copilot review #2 — async spawn errors + hover scope + docs 1. openExternalUrl: attach a no-op `'error'` listener on the spawned child BEFORE unref(). spawn() returns a ChildProcess synchronously even when the binary is missing (ENOENT on xdg-open / explorer.exe), unreachable, or otherwise unusable; the failure surfaces later as an 'error' event. An unhandled 'error' on an EventEmitter crashes Node, which would tear down the whole TUI. The listener is a deliberate no-op — we already returned `true` synchronously and the user just doesn't see the browser pop. 2. openExternalUrl.test.ts: add a regression test using a real EventEmitter to simulate the async-error path. Pins both the listener-attached contract and the "doesn't throw on emit" behavior. Was 17/17, now 18/18. 3. ink.tsx dispatchHover: bypass `getHyperlinkAt()` and read `cellAt(...).hyperlink` directly. `getHyperlinkAt` falls back to `findPlainTextUrlAt` for cells without an OSC 8 hyperlink, but the render-pass overlay (`applyHyperlinkHoverHighlight`) only matches on `cell.hyperlink === hoveredUrl` — so plain-text URLs would burn re-renders without ever producing the highlight. Hover is now a strictly 1:1 fit for what the overlay can paint. Plain-text URLs still get the click action via the existing dispatch path. 4. root.ts + ink.tsx doc comments: replace the misleading "typically `open` / `xdg-open` / `start` shell" wording with the actual safe recipe — argv-array spawn into `open` / `xdg-open` / `explorer.exe`, with an explicit warning that `cmd.exe /c start` reparses the URL through cmd's tokenizer and is unsafe + breaks `&`-query URLs. Verified: 716/716 tests pass, type-check + build clean. * tui: address Copilot review #3 — hover damage, alt-screen cleanup, opener allowlist 1. ink.tsx onRender: stop folding steady-state hover into hlActive. hlActive forces a full-screen damage diff so previous-frame inverted cells get re-emitted when the highlight set changes. The transition IS the trigger — enter / leave / change-to-other-link. While the pointer just sits on a link the painted cells don't change and the per-cell diff handles the no-op. Folding the steady state in would burn a full-screen diff on every frame. Added a lastRenderedHoveredHyperlink tracker and gate the hlActive bump on `hovered !== lastRendered`. 2. ink.tsx setAltScreenActive: clear hoveredHyperlink (and the tracker) when toggling alt-screen state. Hover dispatch is alt-screen-gated, so once we leave there's no path to clear it. Without this, remounting <AlternateScreen> would paint a phantom hover from the previous session until the next mouse-move arrived. 3. openExternalUrl.ts openCommand: allowlist linux + the BSD family for xdg-open and return null for everything else (aix, sunos, cygwin, haiku, etc.). Previously the default-fallback always returned xdg-open, which made the caller's `if (!command) return false` dead and yielded a misleading `true` on platforms that probably don't have xdg-open. New tests cover the null path AND the openExternalUrl-returns-false-without-spawning behavior. Verified: 718/718 tests pass, type-check + build clean. * tui: address Copilot review #4 — doc comment accuracy 1. openExternalUrl return-value doc: now lists all three false paths (URL rejected / no opener for platform / synchronous spawn throw) plus a note that async 'error' events still return true because the spawn was attempted. 2. ink.tsx onHyperlinkClick field doc: clarifies the callback receives either an OSC 8 hyperlink OR a plain-text URL detected by findPlainTextUrlAt — App.tsx routes both into the same callback. 3. hyperlinkHover applyHyperlinkHoverHighlight doc: drops the misleading 'caller forces full-frame damage' promise. Caller decides; for hover the current caller only forces full damage on transitions. No behavior change. 718/718 tests pass. * tui: address Copilot review #5 — lint fixes 1. ink.tsx: reorder `./hyperlinkHover.js` import before `./screen.js` to satisfy perfectionist/sort-imports. 2. Link.tsx: drop unused `fallback` parameter destructuring + the trailing `void (null as ...)` dead-statement (would trip no-unused-expressions). Kept `fallback?: ReactNode` on the Props interface as a documented compat shim so existing call sites still compile, with a comment explaining why it's no longer wired up. 3. openExternalUrl.test.ts: replace `typeof import('node:child_process').spawn` inline annotations (forbidden by @typescript-eslint/consistent-type-imports) with a `SpawnLike` type alias backed by a real `import type { spawn as SpawnFn }`. No behavior change. 718/718 tests pass, type-check clean, lint clean on all modified files.
…ex models (NousResearch#24182) * feat(codex-runtime): scaffold optional codex app-server runtime Foundational commit for an opt-in alternate runtime that hands OpenAI/Codex turns to a 'codex app-server' subprocess instead of Hermes' tool dispatch. Default behavior is unchanged. Lands in three pieces: 1. agent/transports/codex_app_server.py — JSON-RPC 2.0 over stdio speaker for codex's app-server protocol (codex-rs/app-server). Spawn, init handshake, request/response, notification queue, server-initiated request queue (for approval round-trips), interrupt-friendly blocking reads. Tested against real codex 0.130.0 binary end-to-end during development. 2. hermes_cli/runtime_provider.py: - Adds 'codex_app_server' to _VALID_API_MODES. - Adds _maybe_apply_codex_app_server_runtime() helper, called at the end of _resolve_runtime_from_pool_entry(). Inert unless 'model.openai_runtime: codex_app_server' is set in config.yaml AND provider in {openai, openai-codex}. Other providers cannot be rerouted (anthropic, openrouter, etc. preserved). 3. tests/agent/transports/test_codex_app_server_runtime.py — 24 tests covering api_mode registration, the rewriter helper (default-off, case-insensitive, opt-in, non-eligible providers preserved), version parser, missing-binary handling, error class. Does NOT require codex CLI installed. This commit is wire-only: the api_mode is recognized but AIAgent does not yet branch on it. Followup commits add the session adapter, event projector, approval bridge, transcript projection (so memory/skill review still works), plugin migration, and slash command. Existing tests remain green: - tests/cli/test_cli_provider_resolution.py (29 passed) - tests/agent/test_credential_pool_routing.py (included above) * feat(codex-runtime): add codex item projector for memory/skill review The translator that lets Hermes' self-improvement loop keep working under the Codex runtime: converts codex 'item/*' notifications into Hermes' standard {role, content, tool_calls, tool_call_id} message shape that agent/curator.py already knows how to read. Item taxonomy (matches codex-rs/app-server-protocol/src/protocol/v2/item.rs): - userMessage → {role: user, content} - agentMessage → {role: assistant, content: text} - reasoning → stashed in next assistant's 'reasoning' field - commandExecution → assistant tool_call(name='exec_command') + tool result - fileChange → assistant tool_call(name='apply_patch') + tool result - mcpToolCall → assistant tool_call(name='mcp.<server>.<tool>') + tool result - dynamicToolCall → assistant tool_call(name=<tool>) + tool result - plan/hookPrompt/etc → opaque assistant note, no fabricated tool_calls Invariants preserved: - Message role alternation never violated: each tool item produces at most one assistant + one tool message in that order, correlated by call_id. - Streaming deltas (item/<type>/outputDelta, item/agentMessage/delta) don't materialize messages — only item/completed does. Mirrors how Hermes already only writes the assistant message after streaming ends. - Tool call ids are deterministic (codex item id-based) so replays produce identical messages and prefix caches stay valid (AGENTS.md pitfall NousResearch#16). - JSON args use sorted_keys for the same reason. Real wire formats verified against codex 0.130.0 by capturing live notifications from thread/shellCommand and including one as a fixture (COMMAND_EXEC_COMPLETED). 23 new tests, all green: - Streaming deltas don't materialize (3 paths) - Turn/thread frame events are silent - commandExecution: 5 tests including non-zero exit annotation + deterministic id stability across replays - agentMessage + reasoning attachment + reasoning consumption - fileChange: summary without inlined content - mcpToolCall: namespaced naming + error surfacing - userMessage: text fragments only (drops images/etc) - opaque items: no fabricated tool_calls - Helpers: deterministic id stability + sorted JSON args - Role alternation invariant across all four tool-shaped item types This commit is a pure addition. AIAgent integration (the wire that uses the projector) is the next commit. * feat(codex-runtime): add session adapter + approval bridge The third self-contained module: CodexAppServerSession owns one Codex thread per Hermes session, drives turn/start, consumes streaming notifications via CodexEventProjector, handles server-initiated approval requests, and translates cancellation into turn/interrupt. The adapter has a single public per-turn method: result = session.run_turn(user_input='...', turn_timeout=600) # result.final_text → assistant text for the caller # result.projected_messages → list ready to splice into AIAgent.messages # result.tool_iterations → tick count for _iters_since_skill nudge # result.interrupted → True on Ctrl+C / deadline / interrupt # result.error → error string when the turn cannot complete # result.turn_id, thread_id → for sessions DB / resume Behavior: - ensure_started() spawns codex, does the initialize handshake, and issues thread/start with cwd + permissions profile. Idempotent. - run_turn() blocks until turn/completed, drains server-initiated requests (approvals) before reading notifications so codex never deadlocks waiting for us, projects every item/completed via the projector, and increments tool_iterations for the skill nudge gate. - request_interrupt() is thread-safe (threading.Event); the next loop iteration issues turn/interrupt and unwinds. - turn_timeout deadlock guard issues turn/interrupt and records an error if the turn never completes. - close() escalates terminate → kill via the underlying client. Approval bridge: Codex emits server-initiated requests for execCommandApproval and applyPatchApproval. The adapter translates Hermes' approval choice vocabulary onto codex's decision vocabulary: Hermes 'once' → codex 'approved' Hermes 'session' or 'always' → codex 'approvedForSession' Hermes 'deny' / anything else → codex 'denied' Routing precedence: 1. _ServerRequestRouting.auto_approve_* flags (cron / non-interactive) 2. approval_callback wired by the CLI (defers to tools.approval.prompt_dangerous_approval()) 3. Fail-closed denial when neither is wired Unknown server-request methods are answered with JSON-RPC error -32601 so codex doesn't hang waiting for us. Permission profile mapping mirrors AGENTS.md: Hermes 'auto' → codex 'workspace-write' Hermes 'approval-required' → codex 'read-only-with-approval' Hermes 'unrestricted/yolo' → codex 'full-access' 20 new tests, all green. Combined with prior commits this PR now has 67 tests across three modules: - test_codex_app_server_runtime.py: 24 (api_mode + transport surface) - test_codex_event_projector.py: 23 (item taxonomy projections) - test_codex_app_server_session.py: 20 (turn loop + approvals + interrupts) Full tests/agent/transports/ directory: 249/249 pass — no regressions to existing transport tests. Still no wire into AIAgent.run_conversation(); that integration commit is small and goes next. * feat(codex-runtime): wire codex_app_server runtime into AIAgent The integration commit. AIAgent.run_conversation() now early-returns to a new helper _run_codex_app_server_turn() when self.api_mode == 'codex_app_server', bypassing the chat_completions tool loop entirely. Three small surgical edits to run_agent.py (~105 LOC total): 1. Line ~1204 (constructor api_mode validation set): Add 'codex_app_server' so an explicit api_mode='codex_app_server' passed to AIAgent() isn't silently rewritten to 'chat_completions'. 2. Line ~12048 (run_conversation, just before the while loop): Early-return to _run_codex_app_server_turn() when self.api_mode is 'codex_app_server'. Placed AFTER all standard pre-loop setup — logging context, session DB, surrogate sanitization, _user_turn_count and _turns_since_memory increments, _ext_prefetch_cache, memory manager on_turn_start — so behavior outside the model-call loop is identical between paths. Default Hermes flow is unchanged when the flag is off. 3. End-of-class (line ~15497): New method _run_codex_app_server_turn(). Lazy-instantiates one CodexAppServerSession per AIAgent (reused across turns), runs the turn, splices projected_messages into messages, increments _iters_since_skill by tool_iterations (since the chat_completions loop normally does that per iteration), fires _spawn_background_review on the same cadence as the default path. Counter accounting: _turns_since_memory ← already incremented at run_conversation:11817 (gated on memory store configured) — codex helper does NOT touch it (would double-count). _user_turn_count ← already incremented at run_conversation:11793 — codex helper does NOT touch it. _iters_since_skill ← incremented in the chat_completions loop per tool iteration. Codex helper increments by turn.tool_iterations since the loop is bypassed. User message: ALREADY appended to messages by run_conversation pre-loop (line 11823) before the early-return reaches us. Helper does NOT append again. Regression test test_user_message_not_duplicated guards this. Approval callback wiring: Lazy-fetches tools.terminal_tool._get_approval_callback at session spawn time, passes to CodexAppServerSession. CLI threads with prompt_toolkit get interactive approvals; gateway/cron contexts get the codex-side fail-closed deny. Error path: Codex session exceptions become a 'partial' result with completed=False and a final_response that explicitly tells the user how to switch back: 'Codex app-server turn failed: ... Fall back to default runtime with /codex-runtime auto.' Same return-dict shape as the chat_completions path so all callers (gateway, CLI, batch_runner, ACP) work unchanged. 9 new integration tests in tests/run_agent/test_codex_app_server_integration.py: - api_mode='codex_app_server' is accepted on AIAgent construction - run_conversation returns the expected codex shape (final_response, codex_thread_id, codex_turn_id, completed, partial) - Projected messages are spliced into messages list - _iters_since_skill ticks per tool iteration - _user_turn_count delegated to standard flow (not double-counted) - User message appears exactly once (regression guard) - _spawn_background_review IS invoked (memory/skill review keeps working) - chat.completions.create is NEVER called (loop fully bypassed) - Session exception → partial result with /codex-runtime auto hint - Interrupted turn → partial result with error preserved Adjacent test runs confirm no regressions: - tests/run_agent/test_memory_nudge_counter_hydration.py: green - tests/run_agent/test_background_review.py: green - tests/run_agent/test_fallback_model.py: green - tests/agent/transports/: 249/249 green Still missing for full feature: /codex-runtime slash command, plugin migration helper, docs page, live e2e test gated on codex binary. Those are the remaining followup commits. * feat(codex-runtime): add /codex-runtime slash command (CLI + gateway) User-facing toggle for the optional codex app-server runtime. Follows the 'Adding a Slash Command (All Platforms)' pattern from AGENTS.md exactly: single CommandDef in the central registry → CLI handler → gateway handler → running-agent guard → all surfaces (autocomplete, /help, Telegram menu, Slack subcommands) update automatically. Surface: /codex-runtime — show current state + codex CLI status /codex-runtime auto — Hermes default runtime /codex-runtime codex_app_server — codex subprocess runtime /codex-runtime on / off — synonyms Files changed: hermes_cli/codex_runtime_switch.py (new): Pure-Python state machine shared by CLI and gateway. Parse args, read/write model.openai_runtime in the config dict, gate enabling behind a codex --version check (don't let users opt in to a runtime they have no binary for; print npm install hint instead). Returns a CodexRuntimeStatus dataclass that callers render however suits their surface. hermes_cli/commands.py: Single CommandDef entry, no aliases (codex-runtime is its own thing). cli.py: Dispatch in process_command() + _handle_codex_runtime() handler that delegates to the shared module and renders results via _cprint. gateway/run.py: Dispatch in _handle_message() + _handle_codex_runtime_command() that returns a string (gateway sends as message). On a successful change that requires a new session, _evict_cached_agent() forces the next inbound message to construct a fresh AIAgent with the new api_mode — avoids prompt-cache invalidation mid-session. gateway/run.py running-agent guard: /codex-runtime joins /model in the early-intercept block so a runtime flip mid-turn can't split a turn across two transports. Tests: tests/hermes_cli/test_codex_runtime_switch.py — 25 tests covering the state machine: arg parsing (10 cases incl. case-insensitive and synonyms), reading current runtime (5 cases incl. malformed configs), writing runtime (3 cases), apply() entry point covering read-only, no-op, codex-missing-blocked, codex-present-success, disable-no-binary-check, and persist-failure paths (8 cases). All green. Adjacent test suites confirm no regressions: - tests/hermes_cli/test_commands.py + test_codex_runtime_switch.py: 167/167 green - tests/agent/transports/: 283/283 green when combined with prior commits Still missing: plugin migration helper, docs page, live e2e test gated on codex binary. Followup commits. * feat(codex-runtime): auto-migrate Hermes MCP servers to ~/.codex/config.toml Translates the user's mcp_servers config from ~/.hermes/config.yaml into the TOML format codex's MCP client expects. Wired into the /codex-runtime codex_app_server enable path so users get their MCP tool surface in the spawned subprocess automatically. The migration runs on every enable. Failures are non-fatal — the runtime change still proceeds and the user gets a warning so they can fix the codex config manually. What translates (mapping verified against codex-rs/core/src/config/edit.rs): Hermes mcp_servers.<n>.command/args/env → codex stdio transport Hermes mcp_servers.<n>.url/headers → codex streamable_http transport Hermes mcp_servers.<n>.timeout → codex tool_timeout_sec Hermes mcp_servers.<n>.connect_timeout → codex startup_timeout_sec Hermes mcp_servers.<n>.cwd → codex stdio cwd Hermes mcp_servers.<n>.enabled: false → codex enabled = false What does NOT translate (warned + skipped per server): Hermes-specific keys (sampling, etc.) — codex's MCP client has no equivalent. Listed in the per-server skipped[] field of the report. What's NOT migrated (intentional): AGENTS.md — codex respects this file natively in its cwd. Hermes' own AGENTS.md (project-level) is already in the worktree, so codex picks it up without translation. No code needed. Idempotency design: All managed content lives between a 'managed by hermes-agent' marker and the next non-mcp_servers section header. _strip_existing_managed_block removes the prior managed region cleanly, preserving any user-added codex config (model, providers.openai, sandbox profiles, etc.) above or below. Files added: hermes_cli/codex_runtime_plugin_migration.py — pure-Python migration helper. Public API: migrate(hermes_config, codex_home=None, dry_run=False) returns MigrationReport with .migrated/.errors/ .skipped_keys_per_server. No external TOML dependency — minimal formatter handles strings/numbers/booleans/lists/inline-tables. tests/hermes_cli/test_codex_runtime_plugin_migration.py — 39 tests covering: - per-server translation (12): stdio/http/sse, cwd, timeouts, enabled flag, command+url precedence, sampling drop, unknown keys - TOML formatter (8): types, escaping, inline tables, error case - existing-block stripping (4): no marker, alone, with user content above, with user content below - end-to-end migrate() (8): empty, dry-run, round-trip, idempotent re-run, preserves user config, error reporting, invalid input, summary formatting Files changed: hermes_cli/codex_runtime_switch.py — apply() now calls migrate() in the codex_app_server enable branch. Migration failure logs a warning in the result message but does NOT fail the runtime change. Disable path (auto) explicitly skips migration. tests/hermes_cli/test_codex_runtime_switch.py — 3 new tests: test_enable_triggers_mcp_migration, test_disable_does_not_trigger_migration, test_migration_failure_does_not_block_enable. All 325 feature tests green: - tests/agent/transports/: 249 (incl. 67 new) - tests/run_agent/test_codex_app_server_integration.py: 9 - tests/hermes_cli/test_codex_runtime_switch.py: 28 (3 new) - tests/hermes_cli/test_codex_runtime_plugin_migration.py: 39 (new) * perf(codex-runtime): cache codex --version check within apply() Single /codex-runtime invocation could spawn 'codex --version' up to 3 times (state report, enable gate, success message). Each spawn is ~50ms, so the cumulative cost wasn't a crisis, but it was wasteful and turned a trivial slash command into something noticeably laggy on slower systems. Refactored to lazy-once via a closure over a nonlocal cache. First call spawns; subsequent calls in the same apply() reuse the result. Behavior unchanged — same return shape, same error handling, same install hint when codex is missing. Just one subprocess per call instead of three. Two regression-guard tests added: - test_binary_check_cached_within_apply: enable path → call_count == 1 - test_binary_check_cached_on_read_only_call: state-report path → call_count == 1 Total tests for /codex-runtime now 30 (was 28); all 143 codex-runtime tests still green. * fix(codex-runtime): correct protocol field names found via live e2e test Three real bugs caught only by running a turn end-to-end against codex 0.130.0 with a real ChatGPT subscription. Unit tests passed because they asserted on our own (incorrect) wire shapes; the wire format from codex-rs/app-server-protocol/src/protocol/v2/* is the source of truth and my initial reading of the README was incomplete. Bug 1: thread/start.permissions wire format Was sending {"profileId": "workspace-write"}. Real format per PermissionProfileSelectionParams enum (tagged union): {"type": "profile", "id": "workspace-write"} AND requires the experimentalApi capability declared during initialize. AND requires a matching [permissions] table in ~/.codex/config.toml or codex fails the request with 'default_permissions requires a [permissions] table'. Fix: stop overriding permissions on thread/start. Codex picks its default profile (read-only unless user configures otherwise), which matches what codex CLI users expect — they configure their default permission profile in ~/.codex/config.toml the standard way. Trying to be clever about profile selection broke every turn we tested. Live error before fix: 'Invalid request: missing field type' on every turn/start, even though our turn/start payload was correct — the field codex was complaining about was inside the permissions sub-object we shouldn't have been sending. Bug 2: server-request method names Was matching 'execCommandApproval' and 'applyPatchApproval'. Real names per common.rs ServerRequest enum: item/commandExecution/requestApproval item/fileChange/requestApproval item/permissions/requestApproval (new third method) Fix: match the documented names. Added handler for item/permissions/requestApproval that always declines — codex sometimes asks to escalate permissions mid-turn and silent acceptance would surprise users. Live symptom before fix: agent.log showed 'Unknown codex server request: item/commandExecution/requestApproval' and codex stalled because we replied with -32601 (unsupported method) instead of an approval decision. The agent reported back 'The write command was rejected' even though Hermes never showed the user an approval prompt. Bug 3: approval decision values Was sending decision strings 'approved'/'approvedForSession'/'denied'. Real values per CommandExecutionApprovalDecision enum (camelCase): accept, acceptForSession, decline, cancel (also AcceptWithExecpolicyAmendment and ApplyNetworkPolicyAmendment variants we don't currently use). Fix: rename _approval_choice_to_codex_decision return values; update auto_approve_* fallbacks; update fail-closed default from 'denied' to 'decline'. Test mapping table updated to match. Live test verified after fixes: $ hermes (with model.openai_runtime: codex_app_server) > Run the shell command: echo hermes-codex-livetest > .../proof.txt then read it back Approval prompt fired with 'Codex requests exec in <cwd>'. User chose 'Allow once'. Codex executed the command, wrote the file, read it back. Final response: 'Read back from proof.txt: hermes-codex-livetest'. File contents on disk match. agent.log confirms: codex app-server thread started: id=019e200e profile=workspace-write cwd=/tmp/hermes-codex-livetest/workspace All 20 session tests still green after wire-format updates. * fix(codex-runtime): correct apply_patch approval params + ship docs Live e2e revealed FileChangeRequestApprovalParams doesn't carry the changeset (just itemId, threadId, turnId, reason, grantRoot) — Codex's 'reason' field describes what the patch wants to do. Test config and display logic updated to use it. The first 'apply_patch (0 change(s))' display from the live test is now 'apply_patch: <reason>'. Adds website/docs/user-guide/features/codex-app-server-runtime.md covering enable/disable, prerequisites, approval UX, MCP migration behavior, permission profile delegation to ~/.codex/config.toml, known limitations, and the architecture diagram. Wired into the Automation category in sidebars.ts. Live e2e validation across the path matrix: ✓ thread/start handshake ✓ turn/start with text input ✓ commandExecution items + projection ✓ item/commandExecution/requestApproval → Hermes UI → response ✓ Approve once → command runs ✓ Deny → command rejected, codex falls back to read-only message ✓ Multi-turn (codex remembers prior turn's results) ✓ apply_patch via Codex's fileChange path ✓ item/fileChange/requestApproval → Hermes UI ✓ MCP server migration loads inside spawned codex (verified via 'use the filesystem MCP tool' prompt) ✓ /codex-runtime auto → codex_app_server toggle cycle ✓ Disable doesn't trigger migration ✓ Enable with codex CLI present succeeds + migrates ✓ Hermes-side interrupt path (turn/interrupt request issued cleanly even if codex finishes before the interrupt lands) Known live-validated limitations now documented in the docs page: - delegate_task subagents unavailable on this runtime - permission profile selection delegated to ~/.codex/config.toml - apply_patch approval prompt has no inline changeset (codex protocol doesn't expose it) 145/145 codex-runtime tests still green. * feat(codex-runtime): native plugin migration + UX polish (quirks 2/4/5/10/11) Major: migrate native Codex plugins (#7 in OpenClaw's PR list) Discovers installed curated plugins via codex's plugin/list RPC and writes [plugins."<name>@<marketplace>"] entries to ~/.codex/config.toml so they're enabled in the spawned Codex sessions. This is the 'YouTube-video-worthy' bit Pash highlighted: when a user has google-calendar, github, etc. installed in their Codex CLI, those plugins activate automatically when they enable Hermes' codex runtime. Implementation: - hermes_cli/codex_runtime_plugin_migration.py: new _query_codex_plugins() helper spawns 'codex app-server' briefly and walks plugin/list. Returns (plugins, error) — failures are non-fatal so MCP migration still works. - render_codex_toml_section() now takes plugins + permissions args. - migrate() defaults: discover_plugins=True, default_permission_profile= 'workspace-write'. Explicit None on either disables that side. - _strip_existing_managed_block() now also strips [plugins.*] and [permissions]/[permissions.*] sections inside the managed block, so re-runs replace plugins cleanly without touching codex's own config. Quirk fixes: #2 Default permissions profile written on enable. Without this, Codex's read-only default kicks in and EVERY write triggers an approval prompt. Now writes [permissions] default = 'workspace-write' so the runtime feels normal out of the box. Set default_permission_profile=None to opt out. #4 apply_patch approval prompt now shows what's changing. Codex's FileChangeRequestApprovalParams doesn't carry the changeset. Session adapter now caches the fileChange item from item/started notifications and looks it up by itemId when codex requests approval. Prompt shows '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of 'apply_patch (0 change(s))'. Side benefit: also drains pending notifications BEFORE handling a server request, so the projector and per-turn caches are up to date when the approval decision fires. Bounded to 8 notifications per loop iter to avoid starving codex's response. #5/#10 Exec approval prompt never shows empty cwd. When codex omits cwd in CommandExecutionRequestApprovalParams, fall back to the session's cwd. If somehow neither is available, show '<unknown>' explicitly instead of an empty string. Also surfaces 'reason' from the approval params when codex provides it — gives users more context on why codex wants to run something. #11 Banner indicates the codex_app_server runtime when active. New 'Runtime: codex app-server (terminal/file ops/MCP run inside codex)' line appears in the welcome banner only when the runtime is on. Default banner is unchanged. Tests: - 7 new tests in test_codex_runtime_plugin_migration.py covering plugin discovery (mocked), failure handling, dry-run skip, opt-out flag, idempotent re-runs, and permissions writing. - 3 new tests in test_codex_app_server_session.py covering the enriched approval prompts: cwd fallback, change summary on apply_patch, fallback when no item/started cache exists. - All 26 session tests + 46 migration tests green; 153 total in PR. * feat(codex-runtime): hermes-tools MCP callback + native plugin migration The big architectural addition: when codex_app_server runtime is on, Hermes registers its own tool surface as an MCP server in ~/.codex/config.toml so the codex subprocess can call back into Hermes for tools codex doesn't ship with — web_search, browser_*, vision, image_generate, skills, TTS. Also: 'migrate native codex plugins' (Pash's YouTube-video-worthy bit) — when the user has plugins like Linear, GitHub, Gmail, Calendar, Canva installed via 'codex plugin', Hermes discovers them via plugin/list and writes [plugins.<name>@openai-curated] entries so they activate automatically. New module: agent/transports/hermes_tools_mcp_server.py FastMCP stdio server exposing 17 Hermes tools. Each call dispatches through model_tools.handle_function_call() — same code path as the Hermes default runtime. Run with: python -m agent.transports.hermes_tools_mcp_server [--verbose] Exposed: web_search, web_extract, browser_navigate / _click / _type / _press / _snapshot / _scroll / _back / _get_images / _console / _vision, vision_analyze, image_generate, skill_view, skills_list, text_to_speech. NOT exposed (deliberately): - terminal/shell/read_file/write_file/patch — codex has built-ins - delegate_task/memory/session_search/todo — _AGENT_LOOP_TOOLS in model_tools.py:493, require running AIAgent context. Documented as a limitation and surfaced in the slash command output. Migration changes (hermes_cli/codex_runtime_plugin_migration.py): - _query_codex_plugins() spawns 'codex app-server' briefly to walk plugin/list and pull installed openai-curated plugins. Failures are non-fatal — MCP migration still completes. - render_codex_toml_section() now takes plugins + permissions args AND wraps the managed block with a MIGRATION_END_MARKER comment so the stripper can reliably find both ends, even when the block contains top-level keys (default_permissions = ...). - migrate() defaults: discover_plugins=True, expose_hermes_tools=True, default_permission_profile=':workspace' (built-in codex profile name — must be prefixed with ':'). All three opt-out via explicit args. - _build_hermes_tools_mcp_entry() builds the codex stdio entry with HERMES_HOME and PYTHONPATH passthrough so a worktree-launched Hermes points the MCP subprocess at the same module layout. Live-caught wire bugs fixed during this turn: 1. Permission profile config key is top-level , NOT a [permissions] table. The [permissions] table is for *user-defined* profiles with structured fields. Built-in profile names start with ':' (':workspace', ':read-only', ':danger-no-sandbox'). Was emitting which codex rejected with 'invalid type: string "X", expected struct PermissionProfileToml'. 2. Built-in profile is , NOT . Codex rejected with 'unknown built-in profile'. 3. Codex's MCP layer sends for tool-call confirmation. We weren't handling it, so codex stalled and returned 'MCP tool call was rejected'. Now: auto-accept for our own hermes-tools server (user already opted in by enabling the runtime), decline for third-party servers. Quirk fixes shipped (from the limitations list): #2 default permissions: workspace profile written on enable. No more approval prompt on every write. #4 apply_patch approval shows what's changing: cache fileChange items from item/started, look up by itemId when codex sends item/fileChange/requestApproval. Prompt: '1 add, 1 update: /tmp/new.py, /tmp/old.py' instead of '0 change(s)'. #5/#10 exec approval cwd never empty: fall back to session cwd, then '<unknown>'. Also surfaces 'reason' from codex when present. #11 banner shows 'Runtime: codex app-server' line when active so users understand why tool counts may not match what's reachable. Tests: - 5 new tests in test_codex_runtime_plugin_migration.py covering plugin discovery, expose_hermes_tools entry generation, idempotent re-runs, opt-out flag, permissions profile. - 3 new tests in test_codex_app_server_session.py covering enriched approval prompts (cwd fallback, fileChange summary). - 2 new tests for mcpServer/elicitation/request handling (accept hermes-tools, decline others). - New test file test_hermes_tools_mcp_server.py covering module surface, EXPOSED_TOOLS safety invariants (no shell/file_ops, no agent-loop tools), and main() error paths. - 166 codex-runtime tests total, all green. Live e2e validated against codex 0.130.0 + ChatGPT subscription: ✓ /codex-runtime codex_app_server enables, migrates filesystem MCP, registers hermes-tools, writes default_permissions = ':workspace' ✓ Banner shows 'Runtime: codex app-server' line in subsequent sessions ✓ Shell command runs without approval prompt (workspace profile works) ✓ Multi-turn — codex remembers prior turn's results ✓ apply_patch path via fileChange request approval ✓ web_search via hermes-tools MCP callback returns real Firecrawl results: 'OpenAI Codex CLI – Getting Started' end-to-end in 13s ✓ Disable cycle clean Docs updated: website/docs/user-guide/features/codex-app-server-runtime.md Full re-write covering native plugin migration, the hermes-tools callback architecture, the prerequisites change ('codex login is separate from hermes auth login codex'), the trade-off table now reflecting which Hermes tools work via callback, and the limitations list updated with what's actually unavailable on this runtime. * feat(codex-runtime): pin user-config preservation invariant for quirk #6 Quirk #6 from the limitations list — user MCP servers / overrides / codex-only sections in ~/.codex/config.toml that live OUTSIDE the hermes-managed block must survive re-migration verbatim. This already worked thanks to the MIGRATION_MARKER + MIGRATION_END_MARKER pair I added when fixing the default_permissions wire format (so the strip can find both ends of the managed region even with top-level keys like default_permissions). But it was an emergent property without a test pinning it. Now explicitly tested: - User MCP server above the managed block survives migration - User MCP server below the managed block survives migration - Both above + below survive a second re-migration - User content (model, providers, sandbox, otel, etc.) outside our region is left untouched Docs added a section "Editing ~/.codex/config.toml safely" explaining the marker contract — so users know they can add their own MCP servers, override permissions, configure codex-only options, etc. without fear of Hermes overwriting their work. 167 codex-runtime tests, all green. * docs(codex-runtime): clarify the actual tool surface — shell covers terminal/read/write/find Previous docs and PR description undersold what codex's built-in toolset actually provides. apply_patch alone made it sound like the runtime could only edit files in patch format — implying you'd lose terminal use, read_file, write_file, search/find. That was wrong. Codex's 'shell' tool runs arbitrary shell commands inside the sandbox, which covers everything you'd do in bash: cat/head/tail (read), echo> or heredocs (write), find/rg/grep (search), ls/cd (navigate), build/ test/git/etc. apply_patch is for structured multi-file edits on top of that. update_plan is its in-runtime todo. view_image loads images. And codex has its own web_search built in (in addition to the Firecrawl-backed one Hermes exposes via MCP callback). Docs now have a 'What tools the model actually has' section right after Why, breaking the surface into three clearly-labeled buckets: 1. Codex's built-in toolset (always on) — shell, apply_patch, update_plan, view_image, web_search; covers everything terminal- adjacent. 2. Native Codex plugins (auto-migrated from your codex plugin install) — Linear, GitHub, Gmail, Calendar, Outlook, Canva, etc. 3. Hermes tool callback (MCP server in ~/.codex/config.toml) — web_search/web_extract via Firecrawl, browser_*, vision_analyze, image_generate, skill_view/skills_list, text_to_speech. Plus a 'What's NOT available' callout listing the four agent-loop tools (delegate_task, memory, session_search, todo) that need running AIAgent context and can't reach the codex runtime. Trade-offs table broken out: shell, apply_patch, update_plan, view_image, sandbox each get their own row with a one-line description so users can see at a glance what's available natively. Architecture diagram updated to list the codex built-ins by name instead of 'apply_patch + shell + sandbox'. No code changes — purely docs clarification. 167 codex-runtime tests still green. * fix(codex-runtime): _spawn_background_review signature + review fork api_mode downgrade Two real bugs in the self-improvement loop integration that the previous test mocked away. Bug 1: wrong call signature The codex helper was calling self._spawn_background_review() with no args after every turn. That function actually requires: messages_snapshot=list (positional or keyword) review_memory=bool (at least one trigger must be True) review_skills=bool So the call would have raised TypeError at runtime — except the only test that exercised this path mocked _spawn_background_review entirely and just asserted spawn.called, so the wrong-arg shape never surfaced. Bug 2: review fork inherits codex_app_server api_mode The review fork is constructed with: api_mode = _parent_runtime.get('api_mode') So when the parent is codex_app_server, the review fork ALSO runs as codex_app_server. But the review fork's whole job is to call agent-loop tools (memory, skill_manage) which require Hermes' own dispatch — they short-circuit with 'must be handled by the agent loop' on the codex runtime. So the review fork would have run, decided to save something, called memory or skill_manage, and silently no-op'd. Fixed in run_agent.py:_spawn_background_review() — when the parent api_mode is 'codex_app_server', the review fork is downgraded to 'codex_responses' (same OAuth credentials, same openai-codex provider, but talks to OpenAI's Responses API directly so Hermes owns the loop). Also rewrote the codex helper's review wiring to match the chat_completions path: - Computes _should_review_memory in the pre-loop block (was already being computed; now passed through to the helper as an arg). - Computes _should_review_skills AFTER the codex turn returns + counters tick (line ~15432 pattern in chat_completions). - Calls _spawn_background_review(messages_snapshot=, review_memory=, review_skills=) only when at least one trigger fires. - Adds the external memory provider sync (_sync_external_memory_for_turn) that the chat_completions path runs after every turn. Tests: Replaced the broken test_background_review_invoked (which only asserted spawn.called) with three sharper tests: - test_background_review_NOT_invoked_below_threshold: single turn at default thresholds → no review fires (would have caught the original 'every turn calls spawn with no args' bug) - test_background_review_skill_trigger_fires_above_threshold: 10 tool_iterations at threshold=10 → review fires with messages_snapshot=list, review_skills=True, counter resets - test_background_review_signature_never_breaks: regression guard asserting positional args are always empty and kwargs include messages_snapshot New TestReviewForkApiModeDowngrade class: - test_codex_app_server_parent_downgrades_review_fork: drives the real _spawn_background_review function (no mock at that level), asserts the review_agent gets api_mode='codex_responses' when the parent was codex_app_server. Live-validated against real run_conversation: - Counter ticked from 0 to 5 after a 5-tool-iteration turn - _spawn_background_review fired exactly once with kwargs-only signature - review_skills=True, review_memory=False - messages_snapshot was 12 entries (5 assistant tool_calls + 5 tool results + 1 final assistant + initial system/user) - Counter reset to 0 after fire 170 codex-runtime tests, all green. Docs: added a Self-improvement loop section to the codex runtime page explaining both how the trigger logic stays equivalent and that the review fork is auto-downgraded to codex_responses for the agent-loop tools. Also clarified that apply_patch and update_plan ARE codex's built-in tools (the previous version made it sound like they were separate from 'codex's stuff' — they're not, all five tools listed in 'What tools the model actually has' section 1 are codex built-ins). * feat(codex-runtime): expose kanban tools through Hermes MCP callback Kanban workers spawn as separate hermes chat -q subprocesses that read the user's config.yaml. If model.openai_runtime: codex_app_server is set globally (which is the whole point of opt-in), every dispatched worker ALSO comes up on the codex runtime. That mostly works — codex's built-in shell + apply_patch + update_plan do the actual task work fine — but it had one critical break: the worker handoff tools (kanban_complete, kanban_block, kanban_comment, kanban_heartbeat) are Hermes-registered tools, not codex built-ins. On the codex runtime, codex builds its own tool list and these never reach the model, so the worker would do the work but not be able to report back, hanging until the dispatcher's timeout escalates it as zombie. Fix: add all 9 kanban tools to the EXPOSED_TOOLS list in the Hermes MCP callback. They dispatch statelessly through handle_function_call() just like web_search and the others — they read HERMES_KANBAN_TASK from env (set by the dispatcher), gate correctly (worker tools require the env var, orchestrator tools require it unset), and write to ~/.hermes/kanban.db. Why kanban tools work via stateless dispatch when delegate_task/memory/ session_search/todo don't: those four are listed in _AGENT_LOOP_TOOLS (model_tools.py:493) and short-circuit in handle_function_call() with 'must be handled by the agent loop' — they need to mutate AIAgent's mid-loop state. Kanban tools have no such requirement; they're pure side-effect functions against the kanban.db plus state_meta. Tools exposed: Worker handoff (require HERMES_KANBAN_TASK): kanban_complete, kanban_block, kanban_comment, kanban_heartbeat Read-only board queries: kanban_show, kanban_list Orchestrator (require HERMES_KANBAN_TASK unset): kanban_create, kanban_unblock, kanban_link Tests: - test_kanban_worker_tools_exposed: complete/block/comment/heartbeat in EXPOSED_TOOLS (regression guard for the would-hang-worker bug) - test_kanban_orchestrator_tools_exposed: create/show/list/unblock/link Docs: - New 'Workflow features' section in the docs page covering /goal, kanban, and cron behavior on this runtime - /goal: works fully via run_conversation feedback; only caveat is approval-prompt noise on long writes-heavy goals (mitigated by the default :workspace permission profile) - Kanban: enumerated which tools are reachable via the callback and why the env var propagates correctly through the codex subprocess to the MCP server subprocess - Cron: documented as 'not specifically tested' — same rules as the CLI apply since cron runs through AIAgent.run_conversation - Trade-offs table gained rows for /goal, kanban worker, kanban orchestrator 172/172 codex-runtime tests green (+2 from kanban tests). * docs(codex-runtime): wire /codex-runtime into slash-commands ref + flag aux token cost Three docs gaps caught during a final audit: 1. /codex-runtime was only in the feature docs page, not in the slash-commands reference. Added rows to both the CLI section and the Messaging section so users discover it where they'd look for slash command syntax. 2. CODEX_HOME and HERMES_KANBAN_TASK weren't in environment-variables.md. CODEX_HOME lets users redirect Codex CLI's config dir (the migration honors it). HERMES_KANBAN_TASK is set by the kanban dispatcher and propagates to the codex subprocess + the hermes-tools MCP subprocess so kanban worker tools gate correctly — documented as 'don't set manually' since it's an internal handoff. 3. Aux client behavior on this runtime. When openai_runtime= codex_app_server is on with the openai-codex provider, every aux task (title generation, context compression, vision auto-detect, session search summarization, the background self-improvement review fork) flows through the user's ChatGPT subscription by default. This is true for the existing codex_responses path too, but it's more visible / important here because users explicitly opted in for subscription billing. Added a 'Auxiliary tasks and ChatGPT subscription token cost' section to the docs page with a YAML example showing how to override specific aux tasks to a cheaper model (typically google/gemini-3-flash-preview via OpenRouter). Also documents how the self-improvement review fork gets auto-downgraded from codex_app_server to codex_responses by the fix earlier in this PR. No code changes — pure docs. 172 codex-runtime tests still green. * docs+test(codex-runtime): pin HOME passthrough, document multi-profile + CODEX_HOME OpenClaw hit a real footgun in openclaw/openclaw#81562: when spawning codex app-server they were synthesizing a per-agent HOME alongside CODEX_HOME. That made every subprocess codex's shell tool launches (gh, git, aws, npm, gcloud, ...) see a fake $HOME and miss the user's real config files. They had to back it out in PR #81562 — keep CODEX_HOME isolation, leave HOME alone. Audit confirms Hermes' codex spawn doesn't have this problem. We do os.environ.copy() and only overlay CODEX_HOME (when provided) and RUST_LOG. HOME passes through unchanged. But it was an emergent property without a test pinning it, so adding a regression guard: test_spawn_env_preserves_HOME — confirms parent HOME survives intact in the subprocess env test_spawn_env_sets_CODEX_HOME_when_provided — confirms codex_home arg still isolates codex state correctly Docs additions: 'HOME environment variable passthrough' section — calls out the contract explicitly: CODEX_HOME isolates codex's own state, HOME stays user-real so gh/git/aws/npm/etc. find their normal config. Cites openclaw#81562 as the cautionary tale. 'Multi-profile / multi-tenant setups' section — addresses the related concern: profiles share ~/.codex/ by default. For users who want per-profile codex isolation (separate auth, separate plugins), documents the manual CODEX_HOME=<profile-scoped-dir> approach. Explains why we DON'T auto-scope CODEX_HOME per profile: doing so would silently invalidate existing codex login state for anyone upgrading to this PR with tokens already at ~/.codex/auth.json. Opt-in is safer than surprising users. 174 codex-runtime tests (+2 from HOME guards), all green. * fix(codex-runtime): TOML control-char escapes + atomic config.toml write Two footguns caught in a final audit pass before merge. Bug 1: TOML control characters not escaped The _format_toml_value() helper escaped backslashes and double quotes but passed literal control characters (\n, \t, \r, \f, \b) through unchanged. TOML basic strings don't allow literal control characters — a path or env var containing a newline would produce invalid TOML that codex refuses to load. Realistic exposure: pathological cases like a HERMES_HOME with a trailing newline (env var concatenation accident), or a PYTHONPATH with a tab from a multi-line shell heredoc. Fix: escape all five TOML basic-string control sequences (\b \t \n \f \r) in addition to \\ and \" that we already did. Order matters — backslash must come first or the other escapes get re-escaped. Bug 2: config.toml write wasn't atomic If the python process crashed between target.mkdir() and the write_text() finishing, a half-written config.toml could be left behind. On NFS / Windows / some FUSE mounts this is a real concern; on ext4/APFS small writes are usually atomic in practice but not guaranteed. Fix: write to a tempfile.mkstemp() temp file in the same directory, then Path.replace() (atomic same-dir rename on POSIX, ReplaceFile on Windows). On rename failure, clean up the temp file so repeated failed migrations don't pile up .config.toml.* files. Tests: - test_string_with_newline_escaped — \n in value → \n in output - test_string_with_tab_escaped — \t in value → \t in output - test_string_with_other_controls_escaped — \r, \f, \b - test_windows_path_escaped_correctly — backslash doubling - test_atomic_write_no_temp_leak_on_success — no .config.toml.* left over after a successful write - test_atomic_write_cleanup_on_rename_failure — temp file removed when Path.replace raises (simulated disk full) 180 codex-runtime tests, all green (+6 from this commit). Footguns audited but NOT fixed (with rationale): - Concurrent migrations race. Two Hermes processes hitting /codex-runtime codex_app_server within seconds of each other could cause one writer to lose entries. Low probability (you'd have to enable from two surfaces simultaneously) and low impact (just re-run migration). Adding fcntl/msvcrt locking is more code than it's worth here. The atomic rename above means each individual write is consistent — only the merge step is racy. - Codex protocol version drift. We pin MIN_CODEX_VERSION=0.125 and check at runtime but don't reject too-new versions. Right call — the protocol has been stable through 0.125 → 0.130. If OpenAI breaks it later we'd see the error in test_codex_app_server_runtime on CI before users hit it.
Summary
Two more pieces that lived only on the VPS now live in the fork:
deploy/eligia-vps/hermes.service/etc/systemd/system/hermes.service. SOPS decrypt + docker-compose up.deploy/eligia-vps/migrate-from-wamba-build.sh/opt/hermes/wamba_build/snapshot to a git checkout at/opt/hermes/source/. Backs up old files, symlinks compose/config from the repo, installs the systemd unit, rebuilds the image, restarts hermes, polls healthcheck.After this PR + one run of the script
Future deploys become:
— no more scp'ing or hand-editing.
README
Updated to:
hermes.servicein the contents table.Test plan
100755).Order of operations
/opt/hermes/wamba_build/.Generated with Claude Code.
Summary by CodeRabbit
Release Notes
Documentation
Chores