Validate sandbox sessionId to prevent command injection by ericksoa · Pull Request #4 · NVIDIA/NemoClaw

ericksoa · 2026-03-16T18:26:30Z

Migrated from NVIDIA/openshell-openclaw-plugin#17 by @vincentkoc

Summary

Validate sessionId on the /api/sandbox-chat path before it is used to build the sandbox execution command.

Changes

add normalizeSessionId() and require sessionId to match ^[A-Za-z0-9_-]{1,64}$
reject empty message values and invalid sessionId values with 400
use the normalized session id for logging and for the sandbox execution path

Testing

node --check .jensenclaw/server.js

Signed-off-by: Vincent Koc <vincentkoc@ieee.org>

ericksoa · 2026-03-16T20:26:34Z

Egg was removed, closing issue

Fix Brev CLI link and add Brev account link in README

- Guard runArgv/runArgvCapture against shell:true to prevent security bypass (finding #1) — throws if a caller attempts to re-enable shell interpretation. Added 2 tests. - Document the intentional bash -c exception in getOllamaWarmupCommand explaining why it's safe (finding NVIDIA#2). - Remove dead getOpenshellCommand() from policies.ts (finding NVIDIA#3). - Remove unused shellQuote import from nim.ts (finding NVIDIA#4). - Fix brittle indexOf assertion in onboard-readiness test (finding NVIDIA#5).

- Remove unused getForwardList() call from getActiveSandboxSessions — only pgrep/ps is needed for SSH session detection (warning #1) - Consolidate double-prompt in sandboxDestroy into single enriched confirmation prompt (warning #2) - Remove noisy cleanupGatewayAfterLastSandbox forward check that would always fire due to dashboard forward (warning #3) - Use word-boundary regex in parseSshProcesses to prevent false positives when sandbox names share prefixes (warning #4) - Export SessionClassification as named interface (suggestion #1) - Use cross-platform ps -axo instead of Linux-only pgrep -a for macOS compatibility (suggestion #2) - Add forwardCount to SessionClassification for future consumers - Add tests for word-boundary matching edge cases

) ## Summary Fixes #3138 — On some deployments (e.g. Brev launchables) the sandbox stays in `Running` phase which is functionally equivalent to `Ready`. The CLI previously treated this as "not ready", causing false-negative health reports, dashboard showing the agent as stopped, and recovery logic not triggering. ## Changes | File | Change | |------|--------| | `src/lib/state/gateway.ts` | `isSandboxReady()` now matches both `Ready` and `Running` columns | | `src/lib/actions/sandbox/gateway-state.ts` | `ensureLiveSandboxOrExit()` no longer exits on `Running` phase | | `test/onboard-readiness.test.ts` | +2 tests for Running phase detection (plain text + ANSI-wrapped) | ## Testing - `test/onboard-readiness.test.ts` — 36/36 passed - `test/gateway-state.test.ts` — 37/37 passed ## Context This is the CLI-side prerequisite for [brevdev/nemoclaw-image#10](brevdev/nemoclaw-image#10), which replaces the onboard-ui's reinvented health checks with NemoClaw CLI calls. Originated from #2258 (sub-item #4), which was closed with work redirected to the nemoclaw-image repo.  ## Summary by CodeRabbit ## Release Notes * **New Features** * Sandboxes in "Running" state are now recognized as operational, expanding support for different sandbox phase conditions. * **Tests** * Added test coverage for sandbox readiness detection with "Running" state and ANSI-encoded outputs.

Lands shared fixtures, helpers, assertion helpers, install-method splits, conventions + lint, and the parity-compare CI harness that unblock the per-wave migration phases (2\u201312). Deliverables (per specs/2026-05-11_e2e-test-migration/spec.md Phase 1): Fixtures (test/e2e/lib/fixtures/): - fake-openai.sh: local OpenAI-compatible endpoint (Risk NVIDIA#2 mitigation) - fake-{telegram,discord,slack}.sh: messaging stubs via shared _fake-http-stub.sh harness - older-base-image.sh: tagged ghcr base-image Dockerfile generator Helpers (test/e2e/lib/): - logging.sh: canonical e2e_{section,info,pass,fail} with stable PASS:/FAIL:/=== Phase markers (absorbs reuse category #1) - sandbox-exec.sh: canonical nemoclaw-shell wrapper with safe quoting, exit-code propagation, and dry-run short-circuit (category NVIDIA#10) - env.sh: auto-sources logging.sh so every consumer gets it for free Assertion helpers (test/e2e/lib/assert/): - inference-works.sh: chat-completion round-trip - no-credentials-leaked.sh: credential-pattern scan - policy-preset-applied.sh: gateway policy preset verification - messaging-bridge-reachable.sh: L7 proxy / bridge reachability Install dispatcher splits (test/e2e/lib/setup/): - install-{repo,curl,ollama,launchable}.sh (four profiles) - install.sh: dispatcher routes by profile/method name (category NVIDIA#5) Runtime probe wiring: - run-scenario.sh: adds --validate-only flag (probe-only, no setup) - resolver/index.ts: E2E_PROBE_OVERRIDES_JSON escape hatch for keys with embedded underscores (e.g. security.policy_engine) Convention lint + parity harness: - scripts/e2e/lint-conventions.ts: enforces 6 conventions on suite step scripts + requires parity-map.yaml entries for legacy scripts - scripts/e2e/compare-parity.sh: diffs legacy vs scenario PASS/FAIL via parity-map.yaml; flaky: true marker supported (Risk NVIDIA#4) - test/e2e/parity-map.yaml: seeded with one entry per existing legacy script; migration phases 2\u201312 append assertion mappings - .github/workflows/e2e-parity-compare.yaml: dispatches legacy script + migrated scenario on same runner and diffs outcomes Tests (all passing, 41 total): - test/e2e-lib-helpers.test.ts: +18 tests (1.A\u20131.E) - test/e2e-convention-lint.test.ts: +11 tests (1.G\u20131.H) - test/e2e-expected-state-validator.test.ts: +2 tests (1.F) No regressions: full cli Vitest project (3258 tests) still green.

Resolve the two output threads in #3456 left after the core dead-loop fix landed via #3459 + #3434: Sub-bug #3 — `src/lib/onboard.ts` printed `nemoclaw <name> destroy --yes && nemoclaw onboard --gpu` with a literal `<name>` placeholder, and assumed at least one sandbox was registered. When the GPU-passthrough mismatch hit on the State B re-run path with an empty registry (the dead-loop case), the hint was not actionable. Replace with a registry-aware helper at `src/lib/onboard/gpu-recovery.ts` that renders the right shape: - empty registry → suggest `nemoclaw uninstall && nemoclaw onboard --gpu` - one sandbox → suggest destroy --yes --cleanup-gateway for that name - multiple sandboxes → list each, only the last gets --cleanup-gateway Sub-bug #4 — `src/lib/actions/uninstall/run-plan.ts` printed `Destroyed gateway 'nemoclaw' skipped` when the openshell destroy no-op'd (gateway already gone) — the "Destroyed … skipped" wording was self-contradictory. Extend `runOptional` with an `onSkip` option; route the gateway destroy to emit `Gateway 'nemoclaw' already removed or unreachable` on no-op. Tests: - `src/lib/onboard/gpu-recovery.test.ts` (6 tests): forbid literal `<name>` placeholder anywhere in the output; cover empty / single / multi-sandbox cases; defensive filter on whitespace names so a `nemoclaw destroy` rendering can never happen. - `src/lib/actions/uninstall/run-plan.test.ts`: assert the new "already removed or unreachable" wording and the absence of the "Destroyed gateway 'nemoclaw' skipped" string. The core dead loop itself (sub-bugs #1, #2 and State B GPU mismatch) is already addressed by #3459 + #3434 + #3483; #3456 will close once this lands. See the #3456 status comment for the full mapping. Refs #3456. Mirrors (and tightens) the approach in the closed PR #3464, which left the literal `<name>` placeholder in tests per CodeRabbit feedback that was never addressed. Signed-off-by: Charan Jagwani <charjags100@gmail.com>

…3520) > **Draft for visibility.** Issue-autopilot Stages 4-5 of #3456. Will mark ready once batch self-review + CI complete. ## Summary Closes the two remaining output threads in #3456 after the core dead-loop fix already landed on `main` (via #3459, #3434, #3483). Full sub-bug mapping in the [#3456 status comment](#3456 (comment)). - **Sub-bug #3** — `nemoclaw <name> destroy --yes` recovery hint replaced with a registry-aware helper. - **Sub-bug #4** — `Destroyed gateway 'nemoclaw' skipped` self-contradictory wording replaced with `Gateway 'nemoclaw' already removed or unreachable`. ## Acceptance criteria mapping | Sub-bug | Resolution | Evidence | |---|---|---| | #1 dead loop | Already fixed on main (#3459) | out of scope | | #2 firewall diagnostic | Already fixed on main (#3459) | out of scope | | **#3** literal `<name>` placeholder | **This PR** | `src/lib/onboard/gpu-recovery.ts` + `onboard.ts:10387-10405` | | **#4** misleading "skipped" wording | **This PR** | `src/lib/actions/uninstall/run-plan.ts:210-228, 407-414` | | #5 uninstall residuals | Already fixed on main (#3483) | out of scope | ## Behavior matrix `gpuPassthroughRecoveryLines(names)`: | Input | Suggestion | |---|---| | `null` / `[]` | `nemoclaw uninstall && nemoclaw onboard --gpu` | | one sandbox | `nemoclaw <name> destroy --yes --cleanup-gateway && nemoclaw onboard --gpu` | | many sandboxes | each `destroy --yes`, only the last gets `--cleanup-gateway` | ## Test plan ``` npm run typecheck:cli npx vitest run src/lib/onboard/gpu-recovery.test.ts src/lib/actions/uninstall/run-plan.test.ts ``` 22 tests pass (6 new + 16 existing). ## Notes for reviewers - This is the work [#3464 attempted](#3464); that PR was closed without merging after CodeRabbit asked for the `<name>` placeholder to be forbidden in tests via negative assertion. This PR adopts that refinement. - `runOptional` extension is backwards-compatible — existing callers without `onSkip` get the original wording. Closes #3456 once merged. --------- Signed-off-by: Charan Jagwani <charjags100@gmail.com> Co-authored-by: Charan Jagwani <charjags100@gmail.com> Co-authored-by: Carlos Villela <cvillela@nvidia.com>

…advisor #4) Advisor finding: README and scenario model define 'setup scenario -> expected state -> suite sequence', but the live TS path skipped expected-state validation entirely. The legacy bash runner gated this behind E2E_VALIDATE_EXPECTED_STATE=1 and used env-var probe overrides only; the actual environment-shape contract ran inline as e2e_gateway_assert_healthy / e2e_sandbox_assert_running between onboarding and suite execution. The TS runner did neither. Promote the inline preconditions to a first-class typed phase, in the spirit of EnvironmentOrchestrator/OnboardingOrchestrator: real probes, real clients, framework-owned timeouts and redaction, single source of truth in the typed registry. - types.ts: PhaseName extends to 'state-validation'. New ExpectedState typed shape (cli/gateway/sandbox/inference/credentials with present|absent|optional). New StateProbeId union. ExpectedFailurePhase intentionally excludes state-validation so scenarios cannot declare expected failures against an internal phase. - scenarios/expected-states.ts (new): typed mirror of nemoclaw_scenarios/expected-states.yaml. Source of truth for the TS runner during transition; YAML stays in place for the legacy resolver until that path is fully retired. probesForState() maps the typed contract to the concrete probe ids the orchestrator emits. Inference and credentials remain declared but emit no probe actions yet (probe scripts not implemented); a registry test pins this gap so a future probe-script PR is forced to update the mapping. - nemoclaw_scenarios/probes/ (new): - dispatch.sh exports e2e_state_probe <id>; the typed runner spawns it via the shared dispatch-action.sh launcher. - cli-installed.sh: command -v nemoclaw + executable check. - gateway-healthy.sh: defers to validation_suites/assert/gateway-alive.sh so there's one implementation of the gateway-health contract. - sandbox-running.sh: defers to validation_suites/assert/sandbox-alive.sh. - gateway-absent.sh: nemoclaw gateway status + URL reachability. Typed replacement for the run-scenario.sh inline forbidden-effect check on the gateway axis. - sandbox-absent.sh: nemoclaw list + openshell sandbox list. Typed replacement for the inline 'openshell sandbox list | grep -Fq'. - compiler.ts: state-validation phase actions emitted from scenario.expectedStateId via probesForState(). Hard error on unknown expected_state id (typed runner is stricter than legacy resolver). Phase order is environment -> onboarding -> state-validation -> runtime. - orchestrators/state-validation.ts (new): tiny subclass of PhaseOrchestrator. No new control flow; probe actions reuse the existing shell-fn action machinery (timeouts, redaction, evidence logs). - orchestrators/runner.ts: phase-blocking semantics get one rule refinement. environment failure blocks onboarding, state-validation, and runtime. onboarding failure does NOT block state-validation - negative scenarios deliberately fail onboarding and rely on absent-state probes running afterward to verify forbidden side effects did not occur. state-validation failure blocks runtime so suites never run against a missing/wedged environment. - orchestrators/negative-matcher.ts: state-validation forbidden-effect probe ids (gateway-absent, sandbox-absent) excluded from observed-failure scanning. They are post-failure verification, not the failure mode itself; their pass/fail status is reported separately and feeds the phase chain through normal action-failure semantics. - 16 new tests in e2e-expected-state.test.ts cover: typed registry mirrors YAML structurally, probesForState mapping for ready/absent/ optional states (inference/credentials gap pinned), compiler emits the right probe actions per scenario, phase order, hard error on unknown state id, and the three runner short-circuit cases (env failure blocks state-validation, onboarding failure does not, state-validation failure blocks runtime). Existing tests updated for the new phase ordering (e2e-phase-orchestrators, e2e-negative-matcher). Spec ownership boundaries kept honest: - Typed contract over env-var probes: replaces E2E_PROBE_OVERRIDE_* with structurally-typed expected-state declarations. - One mode, no opt-out: state-validation always runs after onboarding for any scenario with an expectedStateId. - Framework infra, not shell: orchestrator + clients + redaction reused; only the probe scripts are bash, same shape as install/ onboard dispatchers. - Single source of truth: scenario.expectedStateId -> typed registry -> emitted probe actions; one declaration drives everything. - Failures stay visibly real: probes that aren't implemented yet (inference, credentials) stay declared but inert with a registry test pinning the gap. Their first execution lands when the probe ships, not as silent green. Out of scope (deferred follow-ups): - inference-available and credentials-present probe scripts. - Per-suite requires_state gating at the assertion level. - Retiring expected-states.yaml and runtime/resolver/validator.ts; the typed registry is now the SoT for the TS runner, but both artifacts remain alongside until the legacy resolver is unused. - Replacing the runtime.expected-failure.no-side-effects required pending step. State-validation absent probes now do the real work, but the placeholder stays put until a follow-up confirms the switchover and removes it. 418 framework tests pass (16 new). tsc clean. Plan output verified on positive (ubuntu-repo-cloud-openclaw) and negative (ubuntu-no-docker-preflight-negative) scenarios. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

…h (PR #4380) Closes the architectural loose end from advisor #4: the typed state-validation phase landed in ff4b0a8 with a parity test pinning scenarios/expected-states.ts to nemoclaw_scenarios/expected-states.yaml. Two sources of truth tied by a brittle parity test is exactly the 'localized fix without source removal' the advisor warns about. Now that nothing live consumes the YAML (e2e-scenarios.yaml workflow goes through scenarios/run.ts; the bash run-scenario.sh is a stub), retire the whole legacy resolver path. Deleted (2,236 lines of source + test): - nemoclaw_scenarios/expected-states.yaml - runtime/resolver/ (load.ts, plan.ts, schema.ts, coverage.ts, index.ts, validator.ts, expected-failure.ts, js-yaml.d.ts) - runtime/coverage-report.sh - 6 framework tests that exercised only the resolver path: e2e-scenario-schema, e2e-scenario-resolver, e2e-coverage-report, e2e-scenario-additional-families, e2e-metadata-final-hygiene, e2e-expected-failure (its typed equivalent now lives in e2e-negative-matcher.test.ts). Trimmed: - e2e-migration-inventory-lock.test.ts: drop the expected-states.yaml-vs-typed-registry test; expected-state migration is complete and parity is now intra-typed. - e2e-expected-state.test.ts: drop the YAML-mirror parity tests; replace with id-coverage assertions against listExpectedStates(). - types.ts / expected-states.ts / compiler.ts comments: state plainly that the typed registry is the single source of truth. Updated: - docs/README.md: typed scenarios/ tree is now the documented source of truth; scenarios.yaml + suites.yaml called out as historical references during cleanup. Out of scope (deferred): - nemoclaw_scenarios/scenarios.yaml and validation_suites/suites.yaml remain on disk. With the resolver gone, both are now non-runtime reference docs. Their final removal needs a follow-up because e2e-assertion-modules.test.ts and e2e-migration-inventory-lock.test.ts still cross-check against suites.yaml, and e2e-scenario-advisor.test.ts references the path string. Removing them is mechanical but its own blast radius. 348 framework tests pass after deletion (was 418 \u2014 70 tests gone with the 5 deleted resolver test files; 16 expected-state tests trimmed to 3 since the YAML is gone). tsc clean. Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>

@cv

…m anchor Addresses @cv's review ask #4 on PR #4175 — runtime validation through the Hermes gateway first-message path — by adding an integration-style test that drives the actual `_pre_llm_call` → `AIAgent._strip_think_blocks` chain Hermes uses, rather than only exercising the normalizer in isolation. The new test: - Stubs `run_agent.AIAgent._strip_think_blocks` as a real method. - Calls `_pre_llm_call(platform="telegram", is_first_turn=True)` to simulate the first Telegram turn arriving via the gateway hook — this both sets the platform anchor and installs the patch via the same code path Hermes uses at runtime. - Calls `AIAgent._strip_think_blocks(...)` twice to assert (a) same-platform body is extracted, (b) cross-platform `to slack: ...` is preserved. - Calls `_pre_llm_call(platform="discord", is_first_turn=True)` to simulate a turn on a different platform and asserts the anchor refreshes per-turn (Discord body extracted, telegram target preserved). Combined with the existing isolated-unit cases and the cited `hermes-e2e` / `hermes-discord-e2e` / `hermes-slack-e2e` scenarios, this gives both "added" and "cited" runtime validation for the gateway first-message path. Signed-off-by: Charan Jagwani <cjagwani@nvidia.com> Co-authored-by: Chengjie Wang <chengjiew@nvidia.com>

Validate sandbox sessionId to prevent command injection

8e3af16

Signed-off-by: Vincent Koc <vincentkoc@ieee.org>

ericksoa closed this Mar 16, 2026

jessesanford pushed a commit to jessesanford/NemoClaw that referenced this pull request Mar 24, 2026

Merge pull request NVIDIA#4 from NVIDIA/fix-brev-cli-link

db409c7

Fix Brev CLI link and add Brev account link in README

zNeill mentioned this pull request Mar 26, 2026

[Linux] Brev deploy: vLLM exits - wrong HF model id, no HF_TOKEN, missing --trust-remote-code #996

Closed

ericksoa mentioned this pull request Apr 26, 2026

chore: upgrade OpenClaw from 2026.4.9 to 2026.4.24 #2484

Merged

6 tasks

cjagwani mentioned this pull request May 14, 2026

[All Platforms][Onboard] ./install.sh and nemoclaw onboard enter dead loop — sandbox can't reach gateway, "firewall" hint is misleading #3456

Closed

cjagwani mentioned this pull request May 14, 2026

fix(onboard,uninstall): replace misleading recovery messages (#3456) #3520

Merged

cjagwani mentioned this pull request May 14, 2026

[Linux][Uninstall] nemoclaw uninstall leaves ~/.local/state/nemoclaw/ behind (residual of #3456 #5c) #3535

Closed

3 tasks

hulynn mentioned this pull request May 28, 2026

[All Platforms][Docs] quickstart.mdx wizard description still out of sync with v0.0.53 (follow-up of NVB 6187477) #4417

Closed

cv mentioned this pull request Jun 5, 2026

feat(hermes): expose built-in dashboard #4811

Merged

12 tasks

wscurran added the feature PR adds or expands user-visible functionality label Jun 9, 2026

jyaunches mentioned this pull request Jun 10, 2026

test(onboard): add helper-level Vitest coverage for docker-unreachable gateway-start abort (#4355) #5109

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate sandbox sessionId to prevent command injection#4

Validate sandbox sessionId to prevent command injection#4
ericksoa wants to merge 1 commit into
mainfrom
vincentkoc-validate-sessionid

ericksoa commented Mar 16, 2026

Uh oh!

ericksoa commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ericksoa commented Mar 16, 2026

Summary

Changes

Testing

Uh oh!

ericksoa commented Mar 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants