Skip to content

Bootstrap sandbox OpenClaw config after launch and migrate#5

Closed
ericksoa wants to merge 7 commits into
mainfrom
vincentkoc-openclaw-json-fix
Closed

Bootstrap sandbox OpenClaw config after launch and migrate#5
ericksoa wants to merge 7 commits into
mainfrom
vincentkoc-openclaw-json-fix

Conversation

@ericksoa

Copy link
Copy Markdown
Contributor

Migrated from NVIDIA/openshell-openclaw-plugin#12 by @vincentkoc

Summary

  • run openclaw setup inside the sandbox after a fresh launch so NemoClaw creates openclaw.json, the default workspace, and session directories before first use
  • run the same sandbox bootstrap step after migrate copies host state so config-less installs still get a valid sandbox config
  • add a Docker E2E regression with a fake openshell harness that proves the bootstrap path invokes openclaw setup and materializes openclaw.json

Testing

  • cd nemoclaw && npm install
  • cd nemoclaw && npm run build
  • cd nemoclaw && npx eslint src/commands/launch.ts src/commands/migrate.ts src/commands/sandbox-bootstrap.ts
  • docker build -f test/Dockerfile.sandbox -t nemoclaw-e2e .
  • docker run --rm -v "$PWD/test/e2e-test.sh:/tmp/e2e-test.sh:ro" nemoclaw-e2e /tmp/e2e-test.sh

vincentkoc and others added 7 commits March 16, 2026 01:04
Signed-off-by: Vincent Koc <vincentkoc@ieee.org>
Signed-off-by: Vincent Koc <vincentkoc@ieee.org>
Signed-off-by: Vincent Koc <vincentkoc@ieee.org>
Signed-off-by: Vincent Koc <vincentkoc@ieee.org>
Signed-off-by: Vincent Koc <vincentkoc@ieee.org>
(cherry picked from commit 0662c8cc12ca42c6791a771be5d3e5f4fe14bbd9)
Signed-off-by: Vincent Koc <vincentkoc@ieee.org>
@wscurran

Copy link
Copy Markdown
Contributor

Thanks for enhancing the sandbox setup process by running openclaw setup after launch and migrate, ensuring that NemoClaw creates the necessary OpenClaw config and directories for a smooth user experience.

@wscurran wscurran added enhancement: feature integration: openclaw OpenClaw integration behavior and removed OpenClaw Plugin labels Mar 20, 2026
@cv

cv commented Mar 21, 2026

Copy link
Copy Markdown
Collaborator

Hey @ericksoa, wanted to touch base on this PR as well. I see you've been active with several contributions and we really appreciate that. The codebase has changed quite a bit since this was opened — we've landed CI checks and a number of new features that may affect the sandbox bootstrap flow. When you have a moment, a rebase onto main would help us review this with the current state of things in mind. No rush, just whenever you get a chance!

@ericksoa ericksoa closed this Mar 22, 2026
jessesanford pushed a commit to jessesanford/NemoClaw that referenced this pull request Mar 24, 2026
…migrate-hooks

Fix NemoClaw migrate state sync for hooks and local assets
jyaunches pushed a commit to jyaunches/NemoClaw that referenced this pull request Apr 14, 2026
- Guard runArgv/runArgvCapture against shell:true to prevent security
  bypass (finding #1) — throws if a caller attempts to re-enable shell
  interpretation. Added 2 tests.
- Document the intentional bash -c exception in getOllamaWarmupCommand
  explaining why it's safe (finding NVIDIA#2).
- Remove dead getOpenshellCommand() from policies.ts (finding NVIDIA#3).
- Remove unused shellQuote import from nim.ts (finding NVIDIA#4).
- Fix brittle indexOf assertion in onboard-readiness test (finding NVIDIA#5).
jyaunches added a commit to jyaunches/NemoClaw that referenced this pull request May 11, 2026
Lands shared fixtures, helpers, assertion helpers, install-method
splits, conventions + lint, and the parity-compare CI harness that
unblock the per-wave migration phases (2\u201312).

Deliverables (per specs/2026-05-11_e2e-test-migration/spec.md Phase 1):

Fixtures (test/e2e/lib/fixtures/):
- fake-openai.sh: local OpenAI-compatible endpoint (Risk NVIDIA#2 mitigation)
- fake-{telegram,discord,slack}.sh: messaging stubs via shared
  _fake-http-stub.sh harness
- older-base-image.sh: tagged ghcr base-image Dockerfile generator

Helpers (test/e2e/lib/):
- logging.sh: canonical e2e_{section,info,pass,fail} with stable
  PASS:/FAIL:/=== Phase markers (absorbs reuse category #1)
- sandbox-exec.sh: canonical nemoclaw-shell wrapper with safe quoting,
  exit-code propagation, and dry-run short-circuit (category NVIDIA#10)
- env.sh: auto-sources logging.sh so every consumer gets it for free

Assertion helpers (test/e2e/lib/assert/):
- inference-works.sh: chat-completion round-trip
- no-credentials-leaked.sh: credential-pattern scan
- policy-preset-applied.sh: gateway policy preset verification
- messaging-bridge-reachable.sh: L7 proxy / bridge reachability

Install dispatcher splits (test/e2e/lib/setup/):
- install-{repo,curl,ollama,launchable}.sh (four profiles)
- install.sh: dispatcher routes by profile/method name (category NVIDIA#5)

Runtime probe wiring:
- run-scenario.sh: adds --validate-only flag (probe-only, no setup)
- resolver/index.ts: E2E_PROBE_OVERRIDES_JSON escape hatch for keys
  with embedded underscores (e.g. security.policy_engine)

Convention lint + parity harness:
- scripts/e2e/lint-conventions.ts: enforces 6 conventions on suite
  step scripts + requires parity-map.yaml entries for legacy scripts
- scripts/e2e/compare-parity.sh: diffs legacy vs scenario PASS/FAIL
  via parity-map.yaml; flaky: true marker supported (Risk NVIDIA#4)
- test/e2e/parity-map.yaml: seeded with one entry per existing legacy
  script; migration phases 2\u201312 append assertion mappings
- .github/workflows/e2e-parity-compare.yaml: dispatches legacy script
  + migrated scenario on same runner and diffs outcomes

Tests (all passing, 41 total):
- test/e2e-lib-helpers.test.ts: +18 tests (1.A\u20131.E)
- test/e2e-convention-lint.test.ts: +11 tests (1.G\u20131.H)
- test/e2e-expected-state-validator.test.ts: +2 tests (1.F)

No regressions: full cli Vitest project (3258 tests) still green.
cv added a commit that referenced this pull request May 14, 2026
…3520)

> **Draft for visibility.** Issue-autopilot Stages 4-5 of #3456. Will
mark ready once batch self-review + CI complete.

## Summary

Closes the two remaining output threads in #3456 after the core
dead-loop fix already landed on `main` (via #3459, #3434, #3483). Full
sub-bug mapping in the [#3456 status
comment](#3456 (comment)).

- **Sub-bug #3** — `nemoclaw <name> destroy --yes` recovery hint
replaced with a registry-aware helper.
- **Sub-bug #4** — `Destroyed gateway 'nemoclaw' skipped`
self-contradictory wording replaced with `Gateway 'nemoclaw' already
removed or unreachable`.

## Acceptance criteria mapping

| Sub-bug | Resolution | Evidence |
|---|---|---|
| #1 dead loop | Already fixed on main (#3459) | out of scope |
| #2 firewall diagnostic | Already fixed on main (#3459) | out of scope
|
| **#3** literal `<name>` placeholder | **This PR** |
`src/lib/onboard/gpu-recovery.ts` + `onboard.ts:10387-10405` |
| **#4** misleading "skipped" wording | **This PR** |
`src/lib/actions/uninstall/run-plan.ts:210-228, 407-414` |
| #5 uninstall residuals | Already fixed on main (#3483) | out of scope
|

## Behavior matrix

`gpuPassthroughRecoveryLines(names)`:

| Input | Suggestion |
|---|---|
| `null` / `[]` | `nemoclaw uninstall && nemoclaw onboard --gpu` |
| one sandbox | `nemoclaw <name> destroy --yes --cleanup-gateway &&
nemoclaw onboard --gpu` |
| many sandboxes | each `destroy --yes`, only the last gets
`--cleanup-gateway` |

## Test plan

```
npm run typecheck:cli
npx vitest run src/lib/onboard/gpu-recovery.test.ts src/lib/actions/uninstall/run-plan.test.ts
```

22 tests pass (6 new + 16 existing).

## Notes for reviewers

- This is the work [#3464
attempted](#3464); that PR was
closed without merging after CodeRabbit asked for the `<name>`
placeholder to be forbidden in tests via negative assertion. This PR
adopts that refinement.
- `runOptional` extension is backwards-compatible — existing callers
without `onSkip` get the original wording.

Closes #3456 once merged.

---------

Signed-off-by: Charan Jagwani <charjags100@gmail.com>
Co-authored-by: Charan Jagwani <charjags100@gmail.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
tantodefi added a commit to tantodefi/NemoClaw that referenced this pull request May 14, 2026
Audit found ~20 issues across 8 docs; fixed in this commit:

chad-devflow.md:
  - Removed stale chad-spawn-poll cron table row (moved to host launchd
    watchdog 2026-05-14; was double-listed in cron table + watchdogs
    section)
  - Added chad-experiment-night cron row
  - Added chad-webui / chad-webui-mcp / chad-experiment rows to the
    helper-binaries section
  - Noted chad-shim.py v0.2+ per-operator routing + the new
    chad-shim-watchdog supervision

chad-autonomy.md:
  - Updated "five loops" → "six loops" (added experiment lifecycle as
    loop 6, fully cross-referenced to chad-experiments.md)
  - Frontmatter description updated

chad-skills.md:
  - Renamed "Chad-managed skills (3)" → "Chad-managed skills,
    repo-shipped (3)" for clarity
  - Added new section "Runtime-synced chad-managed skills (2)"
    cataloging openwebui (60-cmd control surface) and chad-experiment
    (lifecycle methodology)
  - Updated total skill count from "≈40 + 3" → "~45"

chad-readme.md § 7:
  - Updated "nine cron jobs" → "eleven openclaw cron + three host-side
    launchd watchdogs", restructured the table accordingly
  - Removed spawn-poll cron row (now a host watchdog)
  - Added experiment-night, chad-proposal-apply, chad-skill-watch,
    gbrain-prune rows that were missing
  - Added new "host-side launchd watchdogs" subsection (3 watchdogs)
  - Section-226 chad-spawn-poll reference now points at the watchdog
    with cost rationale

log-locations.md:
  - Added agent-inbox.jsonl to Cron + queue section
  - New section "Experiments (autonomous lifecycle, 2026-05-14+)" with
    config/ledger/active/archive paths
  - New section "Host-side watchdogs" indexing the three host log files
    + per-launchd stdout/stderr

workspace/workspace-files.md:
  - Added rows for identities/, bin/, skills/openwebui+chad-experiment,
    state/experiments/*, state/agent-inbox.jsonl to the runtime-state
    table

resources/backup-policy.md:
  - New subsection 2.2b "Sandbox Runtime State" covering 9 paths added
    2026-05-13/14 with priority + recreatability columns. Distinguishes
    CRITICAL (experiments/config, experiments/active — losing strands
    artifacts in OpenWebUI) from MEDIUM (bin/) from LOW (agent-inbox)

Discoveries flagged for follow-up (not patched in this commit):
- self-improve cron has 3 consecutive errors; should be unstuck by
  yesterday's idleTimeoutSeconds 180→600 fix but won't retry until Sun
- memory-curator cron has 2 consecutive errors, same root cause
- chad-skill-watch hit error today (9h ago); needs investigation
- experiment-night fired manually but errors on the deferred NVIDIA#5
  "delivery channel" issue — fires but reports as error

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
jyaunches added a commit that referenced this pull request May 28, 2026
…re pending steps

Addresses PR Review Advisor finding #2: 'Required security probes and
expected-failure checks still skip without failing live runs'.

Same architectural failure mode as findings #5 (E2E_DRY_RUN) and #6
(phase-1-skeleton): the typed framework declared a contract, but a
code path silently produced non-failing skips that contradicted this
PR's 'one mode, no fake green' invariant.

Three suites (security-shields, security-policy, security-injection)
were wired as kind: 'probe' steps; the orchestrator marked them
status: 'skipped' because the probe registry hasn't landed yet. The
expected-failure side-effect validator (runtime.expected-failure.no-
side-effects) was wired as kind: 'pending'. run.ts only sets
process.exitCode = 1 on status === 'failed', so a live run could
omit security shields, network policy, prompt-injection blocking,
AND the negative-scenario forbidden-side-effect contract while still
appearing non-failing.

Fix follows the framework spirit: declare requirement in the typed
metadata, enforce it once at the orchestrator boundary.

Typed contract extension (types.ts):
  AssertionStep.required?: boolean
  When true, a probe/pending step that resolves as 'skipped' is
  reclassified as 'failed' by the phase orchestrator. Defaults to
  false (existing behavior preserved for diagnostics, docs-validation,
  and other non-security probes).

Orchestrator (phase.ts):
  executeStep: probe and pending branches now check step.required
  and return status: 'failed' with a 'required <kind> not <state>'
  message when set. Non-required probes/pending steps continue to
  surface as visible skipped results so unrelated gaps stay
  diagnostic.

Registry (assertions/registry.ts):
  - probeStep / pendingStep helpers grew an options arg with
    required?: boolean (matches the secretEnv pattern: options come
    in via typed objects, not positional args).
  - security-shields, security-policy, security-injection probes
    annotated required: true. These are the suites the run is not
    safe without; failing closed beats fake green.
  - runtime.expected-failure.no-side-effects pending step annotated
    required: true. Negative scenarios cannot honestly pass while
    their core side-effect contract is unimplemented.
  - diagnostics, docs-validation probes intentionally remain
    non-required: they are informational, not safety-critical, and
    will switch to required when the probe registry lands and they
    have real implementations.

run.ts: no change needed. The new failed-status flows through the
existing exit-code path (process.exitCode = 1 when any phase status
is failed), so the workflow correctly reports red for missing
required probes/pending steps.

Tests (e2e-phase-orchestrators.test.ts, +5):
  - test_required_probe_step_that_is_unregistered_fails_the_phase
  - test_non_required_probe_step_continues_to_skip_visibly
  - test_required_pending_step_fails_closed
  - test_security_suite_groups_in_registry_mark_their_steps_as_required
  - test_expected_failure_no_side_effects_step_in_registry_is_required

318/318 e2e-scenario framework tests pass.

Until the probe registry follow-up PR registers actual
implementations for shieldsConfigProbe / networkPolicyProbe /
injectionBlockedProbe / expectedFailureNoSideEffectsProbe, scenarios
that include those suites/expected-failure groups will fail loudly
with 'required probe not registered' or 'required pending step not
implemented'. That is the correct intermediate state: visible red
until real, instead of green-by-default.

Signed-off-by: Julie Yaunches <jyaunches@nvidia.com>
@wscurran wscurran added area: cli Command line interface, flags, terminal UX, or output feature PR adds or expands user-visible functionality and removed priority: medium labels Jun 3, 2026
@cv cv mentioned this pull request Jun 5, 2026
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: cli Command line interface, flags, terminal UX, or output feature PR adds or expands user-visible functionality integration: openclaw OpenClaw integration behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants