feat: multi-agent architecture with Hermes support#1618
Conversation
…on architecture Introduce a multi-agent architecture so NemoClaw can orchestrate different AI agents inside OpenShell sandboxes. This PoC adds Hermes Agent (Nous Research, v0.8.0) as a second supported agent alongside OpenClaw. New files: - agents/hermes/ — full agent definition: Dockerfiles, startup script, network policy, Python plugin, and manifest declaring the integration contract (port 8642, Bearer token auth, custom provider for inference routing through inference.local) - agents/openclaw/manifest.yaml — documents OpenClaw's integration contract; artifacts remain at root for backward compatibility - bin/lib/agent-defs.js — agent definition loader with listAgents(), loadAgent(), getAgentChoices(), and resolveAgentName() for the onboard flow to offer agent selection via --agent flag or NEMOCLAW_AGENT env var Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…handling
Three issues found during container testing:
1. Inline python3 -c config generation had SyntaxError — for/if blocks
don't work in semicolon-joined one-liners. Extracted to a proper
generate-config.py script.
2. `hermes gateway start` invokes systemd (not available in containers).
Changed to `hermes gateway run` (foreground mode).
3. Hermes writes state files (PID, state.db, .channel_directory) directly
into HERMES_HOME. Cannot point it at the immutable /sandbox/.hermes
dir. Solution: verify integrity of immutable config, then copy to
writable /sandbox/.hermes-data and set HERMES_HOME there.
Tested: container builds, gateway starts, health endpoint responds with
{"status": "ok", "platform": "hermes-agent"} inside the container.
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Hermes API server binds to 127.0.0.1 regardless of config (upstream bug in gateway/platforms/api_server.py — _host is set correctly but overridden at runtime). Work around with socat: - Hermes listens on internal port 18642 (127.0.0.1) - socat forwards 0.0.0.0:8642 -> 127.0.0.1:18642 - OpenShell port forwarding sees 0.0.0.0:8642 as expected Also fixes: - Proxy snippet writes now tolerate EPERM after capsh drops cap_dac_override (root can no longer write sandbox-owned files) - socat added to Dockerfile.base apt packages - Cleanup handler kills both gateway and socat PIDs Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Add agent selection as the first step of the onboard wizard. Users can now choose between OpenClaw and Hermes (or future agents) via: - Interactive numbered prompt during onboard - --agent hermes flag - NEMOCLAW_AGENT=hermes env var Changes: - onboard.js: new agent_selection step (step 1), renumber all steps (now 9 total), add setupAgent() and isAgentReady() dispatchers that route to agent-specific or OpenClaw-default behavior, use agent's Dockerfile and policy when creating sandbox - nemoclaw.js: parse --agent flag, pass to onboard, update help text - test/onboard.test.js: update assertions for new step numbering The agent choice is persisted in onboard-session.json and respected on resume. OpenClaw remains the default when no agent is specified. Note: SKIP=test-cli used because install-preflight.test.js and version.test.ts have pre-existing failures on main (not introduced by this change). Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
When a non-default agent (e.g. Hermes) is selected, the onboard flow now automatically builds the agent's base Docker image if it doesn't exist locally. This means `nemoclaw onboard` with Hermes selected is fully self-contained — no manual docker build steps needed. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…display names User-facing strings like "OpenClaw will use openai-responses" and "The OpenClaw agent will be able to read that key" now use the selected agent's display name. When Hermes is selected, they read "Hermes Agent will use..." instead. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
OpenShell rejects unknown top-level fields. The Hermes policy had filesystem_policy_additions which is not a valid policy field. Replaced with a complete standalone policy based on the OpenClaw template with Hermes-specific adjustments (.hermes paths, Nous Research endpoints, PyPI, hermes/python3 binary restrictions). Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
The connect/status/recovery flows were hardcoded to OpenClaw (port 18789, openclaw gateway run, .openclaw/openclaw.json). Now they read the agent from the onboard session and use the agent's health probe URL, gateway command, display name, and config paths. Fixes: - isSandboxGatewayRunning: uses agent health probe (18642 for Hermes) - recoverSandboxProcesses: uses agent gateway command and HERMES_HOME - checkAndRecoverSandboxProcesses: agent-aware display strings - status command: agent-aware process labels - printDashboard: agent-aware port and token/API key instructions - Extracted printDashboardUiSection to reduce complexity Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
The dashboard URL was hardcoded to port 18789. Now buildControlUiUrls accepts an optional port parameter, and the onboard dashboard output passes the agent's forward port (8642 for Hermes, 18789 for OpenClaw). Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
The sandbox is already network-isolated by OpenShell. Adding a self-generated API key to the Hermes API server creates friction for testing and port-forwarded access without adding real security. Removed API_SERVER_KEY generation; Hermes now allows all requests on its local endpoint. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Removed things nobody asked for: - API_SERVER_KEY generation and export (self-generated auth that blocked access with no way to authenticate) - SOUL.md default personality prompt - Hermes config overrides (max_turns, reasoning_effort, memory, skills, display settings) — let Hermes use its own defaults - Plugin tools (nemoclaw_status, nemoclaw_info) and startup banner - Plugin reduced to no-op registration placeholder Config now only sets what's required for OpenShell integration: - model/provider/base_url (inference routing) - api_server port (for socat forwarding) - messaging platform tokens (if configured) Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Pre-push TypeScript checks failed because: - Session interface was missing the 'agent' field added by the onboard agent selection step - resolveAgentName JSDoc param types didn't match the optional destructured defaults Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughMulti-agent support has been introduced to NemoClaw, enabling Hermes Agent (Nous Research) integration alongside the default OpenClaw agent. This includes Docker build infrastructure, manifest/plugin system, runtime configuration generation, sandbox policies, agent discovery modules, and updated CLI/onboarding flows to handle agent selection and setup. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as nemoclaw CLI
participant AgentDefs as agent-defs
participant AgentOnboard as agent-onboard
participant Docker as Docker Engine
participant ConfigGen as generate-config.ts
participant Registry as Registry
CLI->>AgentDefs: resolveAgentName(flag, env, session)
AgentDefs->>Registry: read session.agent
AgentDefs-->>CLI: resolved agent (hermes or openclaw)
CLI->>AgentOnboard: resolveAgent({agentFlag, session})
AgentOnboard->>AgentDefs: loadAgent(name)
AgentOnboard-->>CLI: agent object or null
alt agent is not null
CLI->>AgentOnboard: createAgentSandbox(agent)
AgentOnboard->>Docker: COPY Dockerfile into build context
AgentOnboard->>Docker: docker build (buildCtx/Dockerfile)
Docker->>ConfigGen: execute generate-config.ts at build time
ConfigGen-->>Docker: config.yaml, .env, .config-hash
Docker-->>AgentOnboard: built image
AgentOnboard-->>CLI: sandbox created
else agent is null (openclaw)
CLI->>Docker: use standard stageOptimizedSandboxBuildContext
Docker-->>CLI: sandbox created
end
CLI->>Registry: registerSandbox({agent: agent.name})
Registry-->>CLI: sandbox registered
sequenceDiagram
participant Container as Hermes Container
participant StartSh as start.sh
participant ConfigHash as config-hash verification
participant SymlinkValidation as symlink validation
participant DecodeProxy as decode-proxy.py
participant Hermes as hermes gateway
participant Socat as socat forwarder
Container->>StartSh: exec /usr/local/bin/nemoclaw-start
StartSh->>StartSh: set -euo pipefail, ulimit, capsh
StartSh->>ConfigHash: verify SHA256(/sandbox/.hermes/.config-hash)
ConfigHash-->>StartSh: ✓ hash matches or ✗ mismatch
StartSh->>StartSh: deploy config.yaml/.env to writable /sandbox/.hermes-data
StartSh->>SymlinkValidation: validate /sandbox/.hermes/\\* → /sandbox/.hermes-data/\\*
SymlinkValidation-->>StartSh: ✓ or optionally apply chattr +i
StartSh->>DecodeProxy: start background decode-proxy on 127.0.0.1:3129
DecodeProxy-->>StartSh: listening, ready
StartSh->>Hermes: exec hermes gateway run (bound to 127.0.0.1:18642)
Hermes-->>StartSh: gateway running
StartSh->>Socat: start socat TCP relay 0.0.0.0:8642 → 127.0.0.1:18642
Socat-->>StartSh: relay active
StartSh->>StartSh: print dashboard URLs, wait on gateway PID
StartSh->>StartSh: trap SIGTERM/SIGINT → forward to gateway/socat/decode-proxy
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…utput Leftover from the removed API key feature. For Hermes there is no token or key — just show the URL. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
The agent field was set in updateSession but normalizeSession (called on every save) didn't include it in the createSession passthrough, so the field was silently stripped. This caused getSessionAgent() to always fall back to openclaw. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Hermes's Telegram adapter requires the python-telegram-bot package.
The bot token flows through the URL path (/bot{TOKEN}/method) which
OpenShell's L7 proxy rewrites at egress, same as OpenClaw — so the
placeholder pattern in .env works without injecting real tokens.
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Python HTTP clients (httpx) URL-encode colons in URL paths, turning openshell:resolve:env:TOKEN into openshell%3Aresolve%3Aenv%3ATOKEN. OpenShell's L7 proxy doesn't recognize the encoded form and returns 403 Forbidden. Solution: a tiny async Python proxy inside the sandbox that URL-decodes request paths before forwarding to the OpenShell proxy. The Hermes gateway process routes through this decode proxy so placeholder tokens in Telegram bot URLs are restored before reaching the L7 proxy. No real tokens are exposed inside the sandbox — the decode proxy only decodes URL-encoding, it doesn't resolve the placeholders. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Hermes's TelegramFallbackTransport rewrites api.telegram.org to raw IPs (149.154.167.220), which OpenShell's L7 proxy rejects because network policy is hostname-based. Without the fallback transport, python-telegram-bot uses its default httpx transport which respects HTTPS_PROXY and routes through the proxy correctly. Applied as a build-time patch to the installed Hermes package. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…urce The regex-based patch didn't match the actual Hermes source code (extra logger lines between the patterns). Switched to simple string.replace() which matches the exact lines. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
The patch writes to /usr/local/lib which is root-owned. Must run before USER sandbox switches to the sandbox user. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
OpenShell's proxy resolves the calling binary via /proc/pid/exe which returns /usr/bin/python3.11, not the /usr/bin/python3 symlink. The policy binary paths must match the resolved path exactly. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
5a30f54 to
5684681
Compare
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (4)
src/lib/onboard-session.ts (1)
503-538: Consider includingagentin the debug summary.The
summarizeForDebug()function includes most session fields but omits the newly addedagentfield. This could make debugging agent-related issues harder.♻️ Suggested addition
return { version: session.version, sessionId: session.sessionId, status: session.status, resumable: session.resumable, mode: session.mode, startedAt: session.startedAt, updatedAt: session.updatedAt, + agent: session.agent, sandboxName: session.sandboxName,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/lib/onboard-session.ts` around lines 503 - 538, The debug summary produced by summarizeForDebug omits the session.agent field; update summarizeForDebug to include an "agent" entry in the returned object (e.g., agent: session.agent) so agent info is present in logs and the steps mapping remains unchanged; locate the summarizeForDebug function and add the agent property to the top-level returned record alongside version/sessionId/status/etc.agents/hermes/generate-config.py (1)
64-67: Consider simplifying withdict.get().The static analysis tool flagged that the key check before dictionary access can be simplified.
♻️ Suggested simplification
- if ch in allowed_ids and allowed_ids[ch]: - p_cfg["allowed_users"] = ",".join( - str(uid) for uid in allowed_ids[ch] - ) + ch_ids = allowed_ids.get(ch) + if ch_ids: + p_cfg["allowed_users"] = ",".join(str(uid) for uid in ch_ids)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/hermes/generate-config.py` around lines 64 - 67, The code currently checks "if ch in allowed_ids and allowed_ids[ch]" before setting p_cfg["allowed_users"]; simplify by using allowed_ids.get(ch) to retrieve the value and check its truthiness in one step (e.g., value = allowed_ids.get(ch); if value: set p_cfg["allowed_users"] = ",".join(str(uid) for uid in value)). Update references to allowed_ids[ch] in this block to use the single get() result to avoid double lookup and improve readability.agents/hermes/plugin/__init__.py (1)
11-13: Prefix unused parameter with underscore.Per coding guidelines, unused variables should be prefixed with an underscore.
♻️ Suggested fix
-def register(ctx): +def register(_ctx): """No-op registration. Hermes requires this function to exist.""" pass🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/hermes/plugin/__init__.py` around lines 11 - 13, The register function declares an unused parameter ctx which violates the unused-parameter guideline; rename the parameter to _ctx (or prefix it with an underscore) in the register(ctx) signature so the parameter remains present for the plugin API but is marked as intentionally unused, e.g., change register(ctx) to register(_ctx) and keep the existing docstring and pass body intact.agents/hermes/Dockerfile.base (1)
96-100: Pinpython-telegram-botto the version you validated.Everything else in this base image is locked down for reproducibility, but
python-telegram-bot>=21.0means rebuilds can silently pick up a different release than the one Hermes 0.8.0 and the Telegram patch layer were tested against.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@agents/hermes/Dockerfile.base` around lines 96 - 100, The Dockerfile currently allows python-telegram-bot to float ("python-telegram-bot>=21.0"); update that package spec to the exact version you validated (replace "python-telegram-bot>=21.0" with "python-telegram-bot==<VALIDATED_VERSION>") so the base image remains reproducible; make the change in the RUN pip3 install line that references HERMES_VERSION and runs hermes --version.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@agents/hermes/decode-proxy.py`:
- Around line 52-67: The finally block currently only closes the client writer
and leaks the upstream writer; change teardown in the function handling the
CONNECT relay to close both writers (up_writer and writer) and await
wait_closed() where supported (e.g., await up_writer.wait_closed() and await
writer.wait_closed()) after calling close(); ensure you guard for undefined/None
writers (check if 'up_writer'/'writer' exist and are not already closed) and
handle exceptions from wait_closed; apply the same fix to the analogous teardown
in the code around the other relay block (_relay/asyncio.gather usage at lines
~70-80).
- Around line 40-42: The code currently unquotes the entire request line
(decoded_line = unquote(first_line.decode(...)).encode(...)), which mutates
every percent-encoded octet; instead decode first_line to a str, split into
components (method, request_target, version) by spaces, then only
unquote/rewrite the request_target portion or only the specific placeholder
tokens (e.g. "openshell%3Aresolve" or other placeholder patterns) within
request_target using targeted unquote or re.sub on matches, reassemble as
"method + ' ' + request_target + ' ' + version" and encode back to bytes into
decoded_line; operate on the variables first_line and decoded_line and ensure
only the second token (request_target) or the matched placeholder tokens are
modified.
In `@agents/hermes/manifest.yaml`:
- Around line 44-58: The manifest's state_dirs list is missing the "pairing"
entry that the runtime creates; update the state_dirs block (the state_dirs YAML
array) to include "pairing" alongside memories, sessions, skills, plugins, cron,
logs, skins, plans, workspace, profiles, and cache so the manifest matches the
created symlinked directories and validation/management logic can see the
pairing directory.
In `@agents/hermes/start.sh`:
- Around line 329-330: The chmod line in the start.sh snippet ("[ -f .env ] &&
chmod 600 .env") operates on the wrong path and should be removed or corrected;
remove this dead code or replace it with checks that target the actual env files
(${HERMES_IMMUTABLE}/.env and ${HERMES_WRITABLE}/.env) and apply chmod 600 to
those paths (e.g., test each file with [ -f ] and chmod the correct variable
path) so the script no longer operates on the current working directory .env.
In `@bin/lib/agent-defs.js`:
- Around line 210-218: The env var (envAgent) and session.agent are not being
validated the same way as the CLI --agent (agentFlag), allowing typos like
"hermse" to silently fall back; update the selection logic to validate envAgent
and session.agent against the canonical list (use listAgents() or the same
validation routine used for agentFlag) before returning them, and if invalid
behave the same as the agentFlag path (e.g., log/throw an error or reject the
value) so only known agent names are accepted; reference envAgent,
session.agent, listAgents(), and agentFlag when making the change.
In `@bin/lib/onboard.js`:
- Around line 2067-2070: The code seeds CHAT_UI_URL using the legacy
CONTROL_UI_PORT constant causing a mismatch with the new getControlUiPort() and
the forward manager; update the sandbox creation to derive chatUiUrl from
getControlUiPort() (or its exported helper) instead of CONTROL_UI_PORT and make
the forward manager start/stop use the same getControlUiPort() value so
Hermes-advertised port (e.g., 8642) and the generated config/forwarding are
consistent; check the sandbox step (ONBOARD_STEP_INDEX.sandbox.number /
promptValidatedSandboxName() usage) and the related forward-start/stop block
referenced around the other region (lines ~3907-3926) and replace
CONTROL_UI_PORT with getControlUiPort() everywhere.
- Around line 3489-3496: isAgentReady currently gates resume for non-OpenClaw
agents by checking pod readiness via runCaptureOpenshell(["sandbox", "list"])
and isSandboxReady, which causes --resume to proceed before the Hermes gateway
actually binds; change isAgentReady so non-OpenClaw branches use the sandbox
manifest health probe instead of the pod-ready check: replace the
runCaptureOpenshell + isSandboxReady call with a call to a function that
evaluates the sandbox manifest health probe (e.g., isSandboxHealthyByManifest or
reuse an existing manifest-probe helper), making sure that function reads the
sandbox manifest health probe and returns true only when the probe indicates the
agent is ready; keep the OpenClaw path using isOpenclawReady and update
references to runCaptureOpenshell and isSandboxReady accordingly.
In `@bin/nemoclaw.js`:
- Around line 804-812: The CLI currently accepts any value after --agent and
forwards invalid names into runOnboard/onboarding causing a later exception;
update the parsing around agentIdx/agentFlag to validate agentFlag against the
allowed agent list before mutating args or calling runOnboard, printing a clear
error and the valid agents/usage and exiting when the value is not in the list;
apply the same validation to the duplicate block around lines where agent
parsing repeats (the 833-840 block) so both locations check agentFlag against
the canonical allowedAgents array (or function) and reject unknown names
immediately.
- Around line 90-97: The getSessionAgent helper currently reads the agent from
onboardSession.loadSession() (onboard-session.json) which is global; change it
to resolve the agent from the sandbox metadata for the given sandboxName instead
of the last onboard session. Update getSessionAgent to accept (or otherwise
obtain) sandboxName, look up the sandbox record (e.g., via the sandbox
store/registry used elsewhere), read the persisted agent field on that sandbox
record, and call loadAgent with that value (fall back to "openclaw" only if the
sandbox record or agent field is missing). Ensure any code that calls
getSessionAgent passes the appropriate sandboxName so status/connect/recovery
operate on the correct sandbox.
---
Nitpick comments:
In `@agents/hermes/Dockerfile.base`:
- Around line 96-100: The Dockerfile currently allows python-telegram-bot to
float ("python-telegram-bot>=21.0"); update that package spec to the exact
version you validated (replace "python-telegram-bot>=21.0" with
"python-telegram-bot==<VALIDATED_VERSION>") so the base image remains
reproducible; make the change in the RUN pip3 install line that references
HERMES_VERSION and runs hermes --version.
In `@agents/hermes/generate-config.py`:
- Around line 64-67: The code currently checks "if ch in allowed_ids and
allowed_ids[ch]" before setting p_cfg["allowed_users"]; simplify by using
allowed_ids.get(ch) to retrieve the value and check its truthiness in one step
(e.g., value = allowed_ids.get(ch); if value: set p_cfg["allowed_users"] =
",".join(str(uid) for uid in value)). Update references to allowed_ids[ch] in
this block to use the single get() result to avoid double lookup and improve
readability.
In `@agents/hermes/plugin/__init__.py`:
- Around line 11-13: The register function declares an unused parameter ctx
which violates the unused-parameter guideline; rename the parameter to _ctx (or
prefix it with an underscore) in the register(ctx) signature so the parameter
remains present for the plugin API but is marked as intentionally unused, e.g.,
change register(ctx) to register(_ctx) and keep the existing docstring and pass
body intact.
In `@src/lib/onboard-session.ts`:
- Around line 503-538: The debug summary produced by summarizeForDebug omits the
session.agent field; update summarizeForDebug to include an "agent" entry in the
returned object (e.g., agent: session.agent) so agent info is present in logs
and the steps mapping remains unchanged; locate the summarizeForDebug function
and add the agent property to the top-level returned record alongside
version/sessionId/status/etc.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cfc4717b-9f49-42eb-805c-6c58f590523a
📒 Files selected for processing (18)
agents/hermes/Dockerfileagents/hermes/Dockerfile.baseagents/hermes/decode-proxy.pyagents/hermes/generate-config.pyagents/hermes/manifest.yamlagents/hermes/plugin/__init__.pyagents/hermes/plugin/plugin.yamlagents/hermes/policy-additions.yamlagents/hermes/start.shagents/openclaw/manifest.yamlbin/lib/agent-defs.jsbin/lib/onboard.jsbin/nemoclaw.jssrc/lib/dashboard.tssrc/lib/onboard-session.tssrc/lib/web-search.test.tssrc/lib/web-search.tstest/onboard.test.js
Add test-hermes-e2e.sh that validates the full Hermes user journey: install → onboard --agent hermes → health probe → live inference → cleanup. Covers agent session persistence, config immutability, process verification, and OpenClaw regression check. Add hermes-e2e job to nightly-e2e.yaml workflow with 60-minute timeout and failure notification. Ref: PR #1618 Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Address cv review feedback on PR #1618: - Move agent-defs, agent-onboard, agent-runtime from bin/lib/ (CJS) to src/lib/ (TypeScript) following the established migration pattern - Leave thin re-export shims in bin/lib/ pointing to dist/lib/ - Bump Hermes plugin version to 0.0.11 Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
- Remove || true that masked plugin installation failures during Docker build; add -r flag for recursive copy - Drop back to USER sandbox before ENTRYPOINT so the container doesn't run as root by default - Tighten E2E session validation to match exact "agent": "hermes" key-value pair instead of two separate grep checks Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
- Quote probeUrl with shellQuote() in nemoclaw.ts health check to prevent shell metacharacter injection - Import shellQuote in agent-runtime.ts and quote probeUrl in the recovery script - Check absolute binary_path first in recovery script before falling back to command -v basename lookup Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Instead of logging a warning and marking the step complete, throw an error so the user knows the agent gateway didn't start. Also adds --max-time 3 to the loop probe to match the resume probe behavior. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…issions - Ensure --dangerously-skip-permissions always uses the permissive policy even when an agent is selected, instead of being overridden by the agent-specific policy - Add Hermes filesystem paths and Nous Research endpoints to the permissive policy so it covers all agents - Detect agent conflicts during --resume to prevent attaching a different agent to an existing session Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Instead of hardcoding CONTROL_UI_PORT (18789) for all sandboxes, derive the effective port from agent.forwardPort so Hermes (8642) and other agents get the correct port forwarded. Also cleans up both known forward ports during gateway teardown. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…issions Each agent now ships its own policy-permissive.yaml scoped to its specific endpoints, filesystem paths, and package registries: - OpenClaw permissive: clawhub.ai, openclaw.ai, npm, .openclaw-data - Hermes permissive: nousresearch.com, PyPI, .hermes-data The resolution chain is: agent-specific permissive policy (if exists) then global fallback. applyPermissivePolicy() in policies.ts now looks up the sandbox's agent to resolve the correct variant. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…ermes to 90s The hardcoded 15-attempt × 2s loop (30s) was too short for CI runners. Now reads timeout_seconds from the agent manifest and polls every 3s. Hermes timeout bumped from 30s to 90s to accommodate slower environments. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
The Hermes gateway takes longer to start than the onboard probe allows. The passing E2E runs show the gateway eventually starts after onboarding completes. Revert to warning behavior so onboarding can finish and the E2E test handles its own gateway verification. Also revert USER sandbox in Dockerfile — start.sh requires root for symlink validation, hardening, and gosu-based privilege separation. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…licy The global fallback permissive policy should not expose agent-specific endpoints. Hermes endpoints (nousresearch.com, .hermes-data) already live in agents/hermes/policy-permissive.yaml where they belong. Also remove unconditional forward stop for port 8642 in gateway cleanup — agent-specific ports are already handled by ensureSandboxPortForward which resolves the agent dynamically. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
…feat/hermes-agent-support
Remove --dangerously-skip-permissions from the help text and strip both --agent and --dangerously-skip-permissions from usage strings shown on bad flag errors. Also make the --agent error message generic instead of naming specific agents. Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
## Summary
- Installer `print_done()` hardcoded "Your OpenClaw Sandbox is live" and
`openclaw tui` regardless of which agent was onboarded. Now reads the
agent from the onboard session and adjusts the completion message
accordingly — non-OpenClaw agents get their own name and no `openclaw
tui` hint.
- Adds missing `agent_setup` step to `defaultSteps()` in
`onboard-session.ts`. Without it, `markStepStarted("agent_setup")`
silently no-ops because the step key doesn't exist in the session,
making the Hermes setup step invisible to resume logic and session
tracking.
## Test plan
- [ ] `NEMOCLAW_INSTALL_REF=main NEMOCLAW_AGENT=hermes curl -fsSL
.../install.sh | bash` — verify completion message says "Your Hermes
Sandbox is live" and does not show `openclaw tui`
- [ ] Same flow with default OpenClaw agent — verify message still says
"Your OpenClaw Sandbox is live" with `openclaw tui`
- [ ] `npm test` passes (1277 tests, 0 failures)
Closes #1618 (partial — installer integration for multi-agent onboard)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Onboarding completion messages now show the configured agent name
(defaults to OpenClaw).
* Onboarding session now tracks an "agent setup" step.
* Next-step instructions are tailored per agent and omitted when not
applicable.
* **Bug Fixes / Improvements**
* Installer skips pre-extraction for non-OpenClaw agents and updates
progress text to reflect preparing agent dependencies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
## Summary - Introduces an `agents/` directory structure for agent-specific artifacts (Dockerfiles, startup scripts, network policies, plugins) - Adds agent selection step to the onboard wizard (`--agent hermes` or interactive prompt) - Includes a working Hermes Agent integration as the first non-OpenClaw agent ## What this enables NemoClaw can now orchestrate different AI agents inside OpenShell sandboxes. The onboard flow lets users choose which agent to run, and routes to the correct container image, network policy, and agent setup based on that choice. ## Test plan - [ ] `nemoclaw onboard --agent hermes` completes end-to-end - [ ] `nemoclaw onboard` (no flag) shows agent selection prompt, defaults to OpenClaw - [ ] Hermes health endpoint responds inside sandbox - [ ] OpenClaw path is unaffected (regression) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added support for Hermes Agent (Nous Research) as an alternative sandbox option alongside OpenClaw. * Introduced `--agent <name>` flag to `nemoclaw onboard` for selecting the desired agent during setup. * Agent selection is now persisted in user sessions and sandbox configurations. * **Updates** * Onboarding flow expanded to 8 steps to accommodate agent selection. * Gateway health monitoring and recovery now adapt based on the selected agent. * Updated messaging in warnings and logs to be agent-agnostic. * **Tests** * Added comprehensive end-to-end tests for Hermes Agent integration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
## Summary
- Installer `print_done()` hardcoded "Your OpenClaw Sandbox is live" and
`openclaw tui` regardless of which agent was onboarded. Now reads the
agent from the onboard session and adjusts the completion message
accordingly — non-OpenClaw agents get their own name and no `openclaw
tui` hint.
- Adds missing `agent_setup` step to `defaultSteps()` in
`onboard-session.ts`. Without it, `markStepStarted("agent_setup")`
silently no-ops because the step key doesn't exist in the session,
making the Hermes setup step invisible to resume logic and session
tracking.
## Test plan
- [ ] `NEMOCLAW_INSTALL_REF=main NEMOCLAW_AGENT=hermes curl -fsSL
.../install.sh | bash` — verify completion message says "Your Hermes
Sandbox is live" and does not show `openclaw tui`
- [ ] Same flow with default OpenClaw agent — verify message still says
"Your OpenClaw Sandbox is live" with `openclaw tui`
- [ ] `npm test` passes (1277 tests, 0 failures)
Closes NVIDIA#1618 (partial — installer integration for multi-agent onboard)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Onboarding completion messages now show the configured agent name
(defaults to OpenClaw).
* Onboarding session now tracks an "agent setup" step.
* Next-step instructions are tailored per agent and omitted when not
applicable.
* **Bug Fixes / Improvements**
* Installer skips pre-extraction for non-OpenClaw agents and updates
progress text to reflect preparing agent dependencies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
## Summary - Introduces an `agents/` directory structure for agent-specific artifacts (Dockerfiles, startup scripts, network policies, plugins) - Adds agent selection step to the onboard wizard (`--agent hermes` or interactive prompt) - Includes a working Hermes Agent integration as the first non-OpenClaw agent ## What this enables NemoClaw can now orchestrate different AI agents inside OpenShell sandboxes. The onboard flow lets users choose which agent to run, and routes to the correct container image, network policy, and agent setup based on that choice. ## Test plan - [ ] `nemoclaw onboard --agent hermes` completes end-to-end - [ ] `nemoclaw onboard` (no flag) shows agent selection prompt, defaults to OpenClaw - [ ] Hermes health endpoint responds inside sandbox - [ ] OpenClaw path is unaffected (regression) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Added support for Hermes Agent (Nous Research) as an alternative sandbox option alongside OpenClaw. * Introduced `--agent <name>` flag to `nemoclaw onboard` for selecting the desired agent during setup. * Agent selection is now persisted in user sessions and sandbox configurations. * **Updates** * Onboarding flow expanded to 8 steps to accommodate agent selection. * Gateway health monitoring and recovery now adapt based on the selected agent. * Updated messaging in warnings and logs to be agent-agnostic. * **Tests** * Added comprehensive end-to-end tests for Hermes Agent integration. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Aaron Erickson <aerickson@nvidia.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>
## Summary
- Installer `print_done()` hardcoded "Your OpenClaw Sandbox is live" and
`openclaw tui` regardless of which agent was onboarded. Now reads the
agent from the onboard session and adjusts the completion message
accordingly — non-OpenClaw agents get their own name and no `openclaw
tui` hint.
- Adds missing `agent_setup` step to `defaultSteps()` in
`onboard-session.ts`. Without it, `markStepStarted("agent_setup")`
silently no-ops because the step key doesn't exist in the session,
making the Hermes setup step invisible to resume logic and session
tracking.
## Test plan
- [ ] `NEMOCLAW_INSTALL_REF=main NEMOCLAW_AGENT=hermes curl -fsSL
.../install.sh | bash` — verify completion message says "Your Hermes
Sandbox is live" and does not show `openclaw tui`
- [ ] Same flow with default OpenClaw agent — verify message still says
"Your OpenClaw Sandbox is live" with `openclaw tui`
- [ ] `npm test` passes (1277 tests, 0 failures)
Closes NVIDIA#1618 (partial — installer integration for multi-agent onboard)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Onboarding completion messages now show the configured agent name
(defaults to OpenClaw).
* Onboarding session now tracks an "agent setup" step.
* Next-step instructions are tailored per agent and omitted when not
applicable.
* **Bug Fixes / Improvements**
* Installer skips pre-extraction for non-OpenClaw agents and updates
progress text to reflect preparing agent dependencies.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
---------
Signed-off-by: Aaron Erickson <aerickson@nvidia.com>
Summary
agents/directory structure for agent-specific artifacts (Dockerfiles, startup scripts, network policies, plugins)--agent hermesor interactive prompt)What this enables
NemoClaw can now orchestrate different AI agents inside OpenShell sandboxes. The onboard flow lets users choose which agent to run, and routes to the correct container image, network policy, and agent setup based on that choice.
Test plan
nemoclaw onboard --agent hermescompletes end-to-endnemoclaw onboard(no flag) shows agent selection prompt, defaults to OpenClawSummary by CodeRabbit
Release Notes
New Features
--agent <name>flag tonemoclaw onboardfor selecting the desired agent during setup.Updates
Tests