fix(hermes): enforce runtime env secret guard on direct hermes gateway (#4975)#4981
Conversation
…ay` (NVIDIA#4975) The Hermes runtime-env secret boundary (refuse raw secret-shaped env vars like SLACK_BOT_TOKEN=xoxb-real-...) ran only on the nemoclaw-start entrypoint and the host-side recovery path. A direct `docker exec ... hermes gateway run` invokes the CLI without crossing either, so the gateway started with raw secrets in its environment and no [SECURITY] rejection. Wrap /usr/local/bin/hermes: relocate the real binary to hermes.real and install a thin wrapper that runs the existing single-source-of-truth validator (validate-env-secret-boundary.py runtime-env) for the `gateway` subcommand before exec'ing the real binary. Every other subcommand passes straight through. SECURITY: the wrapper resolves the validator and the real binary from fixed installed paths, never from the environment — it exists to reject a malicious runtime env, so it must not let that env redirect the guard or the binary it protects (a dev fallback resolves against the script's own directory for checkouts, matching start.sh's _HERMES_BOUNDARY_VALIDATOR resolution). The guard is identical to the one start.sh already runs before launch, and the placeholder refresh only emits allow-listed openshell:resolve:env:* values, so the trusted startup path is not regressed; only a genuinely raw secret-shaped var is rejected. Add Linux-gated tests that run the wrapper against the real validator and a hermes.real stub: gateway+raw-secret is refused (exit 1, value not leaked, real never invoked), gateway+placeholders is allowed, and non-gateway subcommands (dashboard, --version) pass through even with raw secrets present. Signed-off-by: Abhimanyu Kumar <abhimanyukumar7290@gmail.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Hermes CLI wrapper that gates the ChangesHermes Gateway Env-Guard Wrapper
Sequence Diagram(s)sequenceDiagram
participant Wrapper as /usr/local/bin/hermes (wrapper)
participant Py as python3
participant Validator as validate-hermes-env-secret-boundary.py
participant HermesReal as /usr/local/bin/hermes.real
Wrapper->>Py: invoke Validator runtime-env (only for `gateway`)
Py->>Validator: execute validation
Validator-->>Py: exit code (0 or non-zero)
Py-->>Wrapper: return exit code
alt validator succeeds
Wrapper->>HermesReal: exec with original args
else validator fails
Wrapper-->>Wrapper: exit with validator status (non-zero)
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
agents/hermes/Dockerfile (1)
98-109: Run the secret-boundary E2E lane on this branch before merge.This hunk changes the gateway startup security boundary path; run the boundary-focused job explicitly since nightly coverage is not PR-triggered.
gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=hermes-secret-boundary-e2eAs per coding guidelines, changes under
agents/hermes/**should be validated with Hermes E2E coverage, and the referenced workflow docs identifyhermes-secret-boundary-e2eas the boundary contract check.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agents/hermes/Dockerfile` around lines 98 - 109, This change relocates the Hermes binary and installs a wrapper (hermes-wrapper.sh -> /usr/local/bin/hermes, original moved to /usr/local/bin/hermes.real) which affects the gateway secret boundary; before merging, run the Hermes secret-boundary E2E lane to validate behavior by executing the nightly-e2e workflow for the hermes-secret-boundary-e2e job (use: gh workflow run nightly-e2e.yaml --ref <branch> -f jobs=hermes-secret-boundary-e2e), verify the wrapper correctly delegates non-gateway subcommands to hermes.real and that `/usr/local/bin/hermes --version` succeeds in the built image, and address any failures found by updating hermes-wrapper.sh or the Dockerfile install/move steps.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@agents/hermes/hermes-wrapper.sh`:
- Around line 36-37: The wrapper currently invokes the guard with an unpinned
"python3" from PATH; change it to use a fixed interpreter path (e.g.
/usr/bin/python3) and verify it exists/executable before running to enforce the
"never from the environment" intent. Replace the invocation of python3 "$GUARD"
in hermes-wrapper.sh with a pinned interpreter variable (e.g.
PINNED_PYTHON=/usr/bin/python3), check [ -x "$PINNED_PYTHON" ] and then run
"$PINNED_PYTHON" "$GUARD" runtime-env || exit $?, so the guard always uses the
known system Python rather than PATH-resolved python3.
---
Nitpick comments:
In `@agents/hermes/Dockerfile`:
- Around line 98-109: This change relocates the Hermes binary and installs a
wrapper (hermes-wrapper.sh -> /usr/local/bin/hermes, original moved to
/usr/local/bin/hermes.real) which affects the gateway secret boundary; before
merging, run the Hermes secret-boundary E2E lane to validate behavior by
executing the nightly-e2e workflow for the hermes-secret-boundary-e2e job (use:
gh workflow run nightly-e2e.yaml --ref <branch> -f
jobs=hermes-secret-boundary-e2e), verify the wrapper correctly delegates
non-gateway subcommands to hermes.real and that `/usr/local/bin/hermes
--version` succeeds in the built image, and address any failures found by
updating hermes-wrapper.sh or the Dockerfile install/move steps.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 3d414101-6efa-489f-8be6-733a60cd06e4
📒 Files selected for processing (3)
agents/hermes/Dockerfileagents/hermes/hermes-wrapper.shtest/hermes-gateway-wrapper.test.ts
…apper Address CodeRabbit review on NVIDIA#4981: the wrapper invoked the secret-boundary guard via PATH-resolved `python3`. Because PATH is part of the untrusted runtime environment this wrapper exists to reject, an attacker could shadow `python3` (e.g. `docker exec -e PATH=/tmp/evil:...`) with a script that exits 0 and no-op the guard, starting the gateway with raw secrets. Resolve python3 from a fixed set of absolute paths (/usr/bin, /usr/local/bin, the hermes venv), never via PATH, and fail closed if none is executable. The interpreter join the validator and real binary as env-independent, fixed-path resolutions. Add a test that plants a PATH-shadowed python3 and asserts the guard still refuses a raw secret-shaped gateway launch. Signed-off-by: Abhimanyu Kumar <abhimanyukumar7290@gmail.com>
|
Thanks @coderabbitai — good catch on the interpreter. Addressed in 7148a54: the wrapper resolved On the |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
# Check the current state of the wrapper script after the fix in commit 7148a5462
gh pr view 4981 --repo NVIDIA/NemoClaw --json commits,headRefName 2>/dev/null | jq .
# Try to fetch the latest wrapper content
gh api repos/NVIDIA/NemoClaw/contents/agents/hermes/hermes-wrapper.sh?ref=fix/issue-4975-hermes-gateway-secret-guard 2>/dev/null | jq -r '.content' | base64 -d 2>/dev/null || echo "Could not fetch file"Length of output: 4043 🏁 Script executed: #!/bin/bash
# Fetch the current wrapper script content at the latest commit
gh api "repos/NVIDIA/NemoClaw/contents/agents/hermes/hermes-wrapper.sh?ref=7148a54621c16645b6395e043a85ac713cabd870" 2>/dev/null | jq -r '.content' | base64 -d 2>/dev/null || echo "Could not fetch file"Length of output: 2902
The updated wrapper looks correct — the three security-critical resolutions (interpreter, validator, real binary) now all use the same fixed-path-first / dev-fallback pattern and are fully isolated from the untrusted runtime environment: for _candidate in /usr/bin/python3 /usr/local/bin/python3 /opt/hermes/.venv/bin/python3; do
if [ -x "$_candidate" ]; then PYTHON3="$_candidate"; break; fi
done
if [ -z "$PYTHON3" ]; then
echo "[SECURITY] Refusing hermes gateway: no python3 at a trusted absolute path ..." >&2
exit 127
fiFail-closed on a missing interpreter (exit 127 + On the 🐇✨ |
prekshivyas
left a comment
There was a problem hiding this comment.
LGTM on the code — solid security fix for #4975. Wrapping the hermes binary closes the docker exec ... hermes gateway run bypass so every gateway-start path crosses the same env-secret-boundary validator. Security hygiene is right: python3/validator/real-binary all resolved from fixed absolute paths (never $PATH — exactly what this guard must resist), fail-closed if no trusted python3 (exit 127), guard failure propagates (|| exit $?), and exec $REAL_HERMES only runs after the guard passes. Build re-asserts --version through the wrapper, and there's a Linux+python3 test covering the enforcement.
One thing before merge (especially for a security change): CI shows only 4 checks — the bash/python test + shellcheck/vitest suite appears aged on this older PR. Recommend a rebase to re-trigger the full suite and confirm the new test + shellcheck go green.
Apply shfmt (-i 2 -ci -bn) to hermes-wrapper.sh and Biome to the test so static-checks' formatter hooks leave them unchanged. Formatting only; the wrapper's gateway secret-boundary logic is unchanged. Co-authored-by: abhi-0906 <abhimanyukumar7290@gmail.com> Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@abhi-0906 one thing left to make CI green:
Only you (the author) can sign off. Once that line is in the description, (I pushed a small |
It has a shebang but was committed non-executable, which trips the check-shebang-scripts-are-executable hook (surfaced once the formatter drift was cleared). Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Thanks @prekshivyas — done. I've added the sign-off line to the PR description: The edit triggered a fresh |
## Summary - Add the v0.0.63 release-note section using the published development note as source context. - Update source docs for sandbox recovery, OpenClaw config restore safety, managed vLLM selection, Slack Socket Mode conflict handling, and host diagnostics. - Refresh generated `nemoclaw-user-*` skills from the updated Fern MDX docs. - Update the release-doc refresh skill so post-release docs for version `n` look up the matching announcement discussion and use the `n+1` patch release label. - Fix CLI/docs parity by avoiding a `--from <Dockerfile>` flag mention inside the `upgrade-sandboxes` command section. ## Source summary - #5034 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document safer stale-sandbox recovery through `rebuild --yes` before recreating from scratch. - #5091 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document Docker-driver post-reboot recovery from OpenShell container labels. - #5101, #5174, #5177 -> `docs/manage-sandboxes/backup-restore.mdx`, `docs/about/release-notes.mdx`: Document OpenClaw `openclaw.json` preservation, merge behavior, and fail-safe restore handling. - #5102 -> `docs/reference/commands.mdx`, `docs/reference/commands-nemohermes.mdx`, `docs/manage-sandboxes/lifecycle.mdx`, `docs/about/release-notes.mdx`: Document `upgrade-sandboxes` image-fingerprint drift detection. - #4201 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document the installer diagnostic for unexpected Docker daemon access outside the `docker` group. - #5038 -> `docs/inference/inference-options.mdx`, `docs/inference/use-local-inference.mdx`, `docs/about/release-notes.mdx`: Document the interactive managed-vLLM model picker and non-interactive override behavior. - #5040, #5041 -> `docs/reference/troubleshooting.mdx`, `docs/about/release-notes.mdx`: Document Ollama auth-proxy recovery and host DNS preflight diagnostics. - #4986, #5039 -> `docs/manage-sandboxes/messaging-channels.mdx`, `docs/about/release-notes.mdx`: Document Slack validation and duplicate Slack Socket Mode sandbox handling. - #4981, #5168 -> `docs/about/release-notes.mdx`: Capture Hermes gateway secret-guard and wrapped-argv startup hardening in the release surface. - Follow-up -> `.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`: Record the post-release docs workflow, discussion-announcement lookup, and next-patch release label rule. - Follow-up -> `docs/reference/commands.mdx`, `docs/reference/commands-nemohermes.mdx`: Reword custom Dockerfile sandbox text so CLI parity does not treat `--from` as an `upgrade-sandboxes` flag. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` - `npm run build:cli` - `bash test/e2e/e2e-cloud-experimental/check-docs.sh --only-cli` - Skip-term scan for `docs/.docs-skip` blocked terms across generated user skills <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Enhanced local inference setup with interactive model selection prompts and environment variable overrides * Improved sandbox upgrade detection using build fingerprints and version checks * Clarified configuration restore behavior preserving user settings during rebuild/restore * Added gateway authentication as fifth security layer * Expanded Slack messaging validation with live credential checking * Enhanced troubleshooting guidance for Docker access, DNS issues, and sandbox recovery * Updated release notes for v0.0.63 featuring sandbox recovery and inference improvements <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
The Hermes runtime-env secret boundary (refuse startup when raw secret-shaped env vars like
SLACK_BOT_TOKEN=xoxb-real-…are present) ran only on thenemoclaw-startentrypoint and the host-side recovery path. A directdocker exec … hermes gateway runinvokes the CLI without crossing either, so the gateway started with raw secrets and no[SECURITY]rejection. This wraps thehermesCLI so the same guard runs for thegatewaysubcommand however it is invoked.Related Issue
Fixes #4975
Root cause
validate_hermes_runtime_env_secret_boundary(agents/hermes/start.sh) and the recovery path both call the single-source-of-truth validatoragents/hermes/validate-env-secret-boundary.py runtime-env, but the guard is attached to those entrypoints — not to thehermes gatewaycommand — so a direct invocation bypasses it (as the issue notes, "implemented only for a different entrypoint").Changes
agents/hermes/hermes-wrapper.sh(new) — installed at/usr/local/bin/hermes; runs the existing validator for thegatewaysubcommand, thenexecs the real binary. All other subcommands pass straight through.agents/hermes/Dockerfile— relocate the real binary tohermes.real, install the wrapper, re-asserthermes --versionthrough it so a broken relocation fails the build.test/hermes-gateway-wrapper.test.ts(new) — Linux-gated tests against the real validator + ahermes.realstub.Design notes for review: the wrapper reuses the existing validator (no second copy of the security logic); it resolves the validator and real binary from fixed installed paths, never from env (it guards against a malicious env, so env must not redirect it);
execpreserves the PID sostart.sh'swait/respawn still track the gateway; and the trusted path already runs the identical guard before launch with placeholder-only env, so legitimate startup is not regressed.Type of Change
Verification
Validated on Ubuntu 24.04: reporter's exact repro now refused (non-zero,
[SECURITY], gateway never started, token not leaked); placeholder/allow-listed env still starts the gateway; non-gateway subcommands and--versionpass through; Dockerfile relocation preserves the symlink→venv binary; guard is not env-redirectable;execpreserves PID.npm test/npx prek run --all-filesdeferred to CI (ran targeted vitest + typecheck + test-size budget locally)Summary by CodeRabbit
New Features
Tests
Signed-off-by: abhi-0906 abhimanyukumar7290@gmail.com