fix(hermes): enforce env secret boundary on sandbox recover#4959
Conversation
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
📝 WalkthroughWalkthroughThis PR implements Hermes secret-boundary validation: a new Python CLI that detects raw secrets in ChangesHermes Secret-Boundary Validation and Enforcement
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsStopped waiting for pipeline failures after 30000ms. One of your pipelines takes longer than our 30000ms fetch window to run, so review may not consider pipeline-failure results for inline comments if any failures occurred after the fetch window. Increase the timeout if you want to wait longer or run a Comment |
Selective E2E Results — ✅ All requested jobs passedRun: 27140313358
|
PR Review AdvisorFindings: 2 needs attention, 6 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Consider writing more tests for
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
agents/hermes/validate-env-secret-boundary.py (1)
125-146: 💤 Low valueUnreachable return statement.
Line 142 (
return 2) is unreachable becauseargparsewithrequired=Trueon subparsers will exit with an error beforemain()returns when an invalid or missing subcommand is provided.This is harmless but could be removed or replaced with a defensive assertion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agents/hermes/validate-env-secret-boundary.py` around lines 125 - 146, The final `return 2` in main() is unreachable because argparse's subparsers were created with required=True; remove the dead `return 2` or replace it with a defensive assertion (e.g., assert False, "unreachable") to make intent explicit. Locate the main() function and the parser/sub = parser.add_subparsers(...) block and either delete the `return 2` line or swap it for a short defensive assertion to document that control flow should never reach that point.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@agents/hermes/validate-env-secret-boundary.py`:
- Around line 125-146: The final `return 2` in main() is unreachable because
argparse's subparsers were created with required=True; remove the dead `return
2` or replace it with a defensive assertion (e.g., assert False, "unreachable")
to make intent explicit. Locate the main() function and the parser/sub =
parser.add_subparsers(...) block and either delete the `return 2` line or swap
it for a short defensive assertion to document that control flow should never
reach that point.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a30f636c-86cf-4e85-a343-d9f863af0bb1
📒 Files selected for processing (6)
agents/hermes/Dockerfileagents/hermes/start.shagents/hermes/validate-env-secret-boundary.pysrc/lib/agent/runtime.test.tssrc/lib/agent/runtime.tstest/hermes-start.test.ts
… violation Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/agent/runtime.ts`:
- Around line 180-181: The current shell snippet in runtime.ts kills the process
if the validator script is missing; change the first check that tests for the
presence of `validator` to emit a warning/skip instead of invoking `${kill}` and
`exit 1` so older Hermes images remain recoverable, while leaving the second
check (the actual execution: `python3 ${shellQuote(validator)} env-file
/sandbox/.hermes/.env`) as a hard failure when the validator runs and refuses;
apply the same shape to the corresponding runtime-env helper checks (the other
pair around lines 199-200) and update the related test
`runtime-hermes-secret-boundary.test.ts` to expect a warning/skip for the
missing-validator case rather than a process kill.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 8fb5183e-8ddf-4bbe-ae36-73a55113be74
📒 Files selected for processing (3)
agents/hermes/start.shsrc/lib/agent/runtime-hermes-secret-boundary.test.tssrc/lib/agent/runtime.ts
Selective E2E Results — ✅ All requested jobs passedRun: 27142275082
|
….env on recover Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 27143622706
|
…pand tests Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 27145242873
|
Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
agents/hermes/validate-env-secret-boundary.py (1)
116-121: 💤 Low valueAdd an explanatory comment for the empty except clause.
The CodeQL warning is valid — silent
except: passblocks are a code smell. In this context the intent is correct (nothing useful can be done ifclose()fails on cleanup), but an inline comment makes the intent explicit and silences the static analysis warning.Proposed fix
finally: if fd != -1: try: os.close(fd) except OSError: - pass + pass # Best-effort cleanup; nothing to do if close fails🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@agents/hermes/validate-env-secret-boundary.py` around lines 116 - 121, The except OSError: pass is currently silent; update the finally block that closes fd (the os.close(fd) call) to include a brief explanatory comment explaining that failures during cleanup are intentionally ignored (e.g., "Ignore errors closing fd during cleanup — nothing useful can be done here") so the intent is explicit and static analysis warnings are silenced; keep the except handling as-is (no re-raise) but add the comment directly above or inside the except block referencing fd/os.close to make the rationale clear.Source: Linters/SAST tools
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@agents/hermes/validate-env-secret-boundary.py`:
- Around line 116-121: The except OSError: pass is currently silent; update the
finally block that closes fd (the os.close(fd) call) to include a brief
explanatory comment explaining that failures during cleanup are intentionally
ignored (e.g., "Ignore errors closing fd during cleanup — nothing useful can be
done here") so the intent is explicit and static analysis warnings are silenced;
keep the except handling as-is (no re-raise) but add the comment directly above
or inside the except block referencing fd/os.close to make the rationale clear.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ea08bdc2-9564-4254-a826-7545fc5dfe1a
📒 Files selected for processing (5)
agents/hermes/validate-env-secret-boundary.pysrc/lib/agent/hermes-recovery-boundary.tssrc/lib/agent/runtime-hermes-secret-boundary.test.tssrc/lib/agent/runtime.tstest/hermes-start.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- test/hermes-start.test.ts
Selective E2E Results — ❌ Some jobs failedRun: 27145988310
|
… .env to validator Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Selective E2E Results — ❌ Some jobs failedRun: 27146629162
|
Signed-off-by: Carlos Villela <cvillela@nvidia.com>
Selective E2E Results — ✅ All requested jobs passedRun: 27147381522
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/lib/agent/runtime-hermes-secret-boundary-behavioural.test.ts (1)
85-87: ⚡ Quick winPrefer
replaceAlloverRegExpfor literal string replacement.The file path
HERMES_SECRET_BOUNDARY_VALIDATOR_PATHcontains.characters that are regex metacharacters. While not a true ReDoS risk (the path is controlled by the codebase), usingString.prototype.replaceAll()is clearer and avoids the metacharacter issue entirely.♻️ Proposed fix using replaceAll
- const guardWithStubs = opts.guard - .replace(new RegExp(HERMES_SECRET_BOUNDARY_VALIDATOR_PATH, "g"), validatorPath) - .replace(/\/tmp\/gateway-recovery\.log/g, recoveryLogPath); + const guardWithStubs = opts.guard + .replaceAll(HERMES_SECRET_BOUNDARY_VALIDATOR_PATH, validatorPath) + .replaceAll("/tmp/gateway-recovery.log", recoveryLogPath);Apply the same fix at lines 222-234 and 396-397.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/agent/runtime-hermes-secret-boundary-behavioural.test.ts` around lines 85 - 87, The current replacement uses .replace(new RegExp(HERMES_SECRET_BOUNDARY_VALIDATOR_PATH, "g"), validatorPath) which treats dots as regex metacharacters; change these to use String.prototype.replaceAll for literal replacements (e.g. call opts.guard.replaceAll(HERMES_SECRET_BOUNDARY_VALIDATOR_PATH, validatorPath).replaceAll("/tmp/gateway-recovery.log", recoveryLogPath)) so both HERMES_SECRET_BOUNDARY_VALIDATOR_PATH and the recovery log path are replaced literally; apply the same replaceAll change for the other occurrences referenced in the file (the blocks around the current guardWithStubs usage and the occurrences at the later locations noted).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@src/lib/agent/runtime-hermes-secret-boundary-behavioural.test.ts`:
- Around line 85-87: The current replacement uses .replace(new
RegExp(HERMES_SECRET_BOUNDARY_VALIDATOR_PATH, "g"), validatorPath) which treats
dots as regex metacharacters; change these to use String.prototype.replaceAll
for literal replacements (e.g. call
opts.guard.replaceAll(HERMES_SECRET_BOUNDARY_VALIDATOR_PATH,
validatorPath).replaceAll("/tmp/gateway-recovery.log", recoveryLogPath)) so both
HERMES_SECRET_BOUNDARY_VALIDATOR_PATH and the recovery log path are replaced
literally; apply the same replaceAll change for the other occurrences referenced
in the file (the blocks around the current guardWithStubs usage and the
occurrences at the later locations noted).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b1f6e7f1-cd4c-4a9a-8c44-93d43efa892a
📒 Files selected for processing (6)
agents/hermes/start.shagents/hermes/validate-env-secret-boundary.pysrc/lib/agent/hermes-recovery-boundary-fixtures.tssrc/lib/agent/hermes-recovery-boundary.tssrc/lib/agent/runtime-hermes-secret-boundary-behavioural.test.tssrc/lib/agent/runtime-hermes-secret-boundary-shape.test.ts
💤 Files with no reviewable changes (1)
- agents/hermes/start.sh
🚧 Files skipped from review as they are similar to previous changes (2)
- src/lib/agent/hermes-recovery-boundary.ts
- agents/hermes/validate-env-secret-boundary.py
## Summary - Add the v0.0.61 release notes from the GitHub dev announcement. - Document managed vLLM recovery after host reboot and Slack denied-mention feedback. - Refresh generated `nemoclaw-user-*` skills from the source docs. ## Source summary - #4983 -> `docs/about/release-notes.mdx`: Added the v0.0.61 release summary from the dev announcement and linked behavior groups to deeper docs. - #4904 -> `docs/inference/use-local-inference.mdx`: Documented that managed vLLM restarts the `nemoclaw-vllm` container after host reboot during recovery. - #4933 -> `docs/manage-sandboxes/messaging-channels.mdx`: Documented Slack sender feedback for denied channel `@mention` events. - #4879, #4915, #4935, #4759, #4164, #4888, #4897, #4944, #4959 -> `.agents/skills/`: Refreshed generated user skills from the current source docs for release prep. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` (passed outside the tool sandbox after `tsx` IPC pipe creation was blocked in the sandbox) - `npm run build:cli` (refreshed local `dist/` for the pre-push TypeScript hook) - Commit and pre-push hooks passed, including docs-to-skills verification, markdownlint, gitleaks, skills YAML tests, and CLI TypeScript. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Updated sandbox security documentation with file descriptor limits. * Changed default inference model for DGX Station profile. * Enhanced agent policy and backup/restore documentation. * Improved command reference examples with clearer formatting. * Clarified Slack messaging denial notice behavior. * Added automatic vLLM container recovery during host reboot. * Updated release notes for v0.0.61. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
The documented Hermes secret-boundary validator only ran from
start.sh, sosandbox recover/connect --probe-onlyrelaunched the gateway without checking/sandbox/.hermes/.envor the recovery shell's environment for raw secret-shaped values. This PR makes the recover path enforce the same boundary as cold start.Related Issue
Fixes #4957
Changes
agents/hermes/start.shinto a standalone CLI:agents/hermes/validate-env-secret-boundary.py(env-file <path>+runtime-envsubcommands). Single source of truth shared between cold start and recover./usr/local/lib/nemoclaw/validate-hermes-env-secret-boundary.py(Dockerfile, chmod 755).start.shnow delegates both validators to the script via_HERMES_BOUNDARY_VALIDATOR(default: install path, dev fallback: script-relative repo path). Cold-start behaviour unchanged.buildHermesSecretBoundaryGuard()insrc/lib/agent/runtime.tsinvokes both subcommands at the very start of the recovery script, before theALREADY_RUNNINGhealth probe. Refuses the relaunch withSECRET_BOUNDARY_REFUSED+exit 1on any violation. Tolerates a missing validator on older images with a warning so partial upgrades do not break recovery. Wired into bothbuildRecoveryScript(Hermes branch) andbuildHermesDashboardProcessRecoveryScript._HERMES_BOUNDARY_VALIDATOR(not_HERMES_SECRET_BOUNDARY_VALIDATOR) so the variable name itself does not match the secret-shaped key regex when the runtime-env validator scansos.environ.Hermes secret-boundary guard on recoverydescribe block insrc/lib/agent/runtime.test.tsasserts the recovery scripts contain the validator invocations, theSECRET_BOUNDARY_REFUSEDshort-circuit, and the correct order relative to the launch. Existingtest/hermes-start.test.tswrappers updated to point_HERMES_BOUNDARY_VALIDATORat the repo Python script so the shell-wrapper behavioural tests exercise the extracted validator end-to-end.Type of Change
Verification
Signed-off-by: Tinson Lai tinsonl@nvidia.com
Summary by CodeRabbit
New Features
Refactor
Tests