fix(scripts): extract step-down function source verbatim instead of declare -f by laitingsheng · Pull Request #4775 · NVIDIA/NemoClaw

laitingsheng · 2026-06-04T17:02:08Z

Summary

NV QA reopened #4512 after verifying on nemoclaw v0.0.58 that the CAP_DAC_OVERRIDE-dropped startup path still aborts with /tmp/nemoclaw-step-down-*.sh: line 21: syntax error near unexpected token 'fi'.

PR #4528 fixed the bash -c "$(declare -f ...) ..." argv round-trip and the heredoc-as-last-statement shape that bash declare -f reproduces correctly. It did not cover the heredoc-as-if-condition shape: declare -f re-serialises the indented then-body command on the line right after the <<TAG opener, before the heredoc body, so the step-down shell absorbs that command into the heredoc body, leaves the then block empty, and aborts on the closing fi. seed_default_workspace_templates carries that exact shape, which is why the failure resurfaced.

This change bypasses declare -f. run_step_down_as_sandbox now extracts each dispatched function's literal source bytes from its defining file, so every heredoc placement survives intact.

Related Issue

Fixes #4512

Changes

Add _step_down_extract_function. It locates each function's source file and line number through shopt -s extdebug plus declare -F, then copies the definition byte-exact from disk. It handles both the multi-line opener/closer convention and the single-line name() { ... } shape. It tracks heredoc state so a future heredoc body containing a column-0 } cannot terminate extraction early.
Rewrite run_step_down_as_sandbox to assemble the temp script through the new extractor. The assembly runs in a subshell so a single extraction failure aborts cleanly with [step-down] failed to assemble dispatch script. A bash -n syntax check runs on the generated temp script before the step-down invocation as a fail-closed guard against any future opener/closer regression.
Inline _step_down_extract_function alongside run_step_down_as_sandbox at every existing test call site that pastes the helper into a synthetic script, so the extdebug-based extractor resolves through to the synthetic file on disk.
Add a regression test covering the heredoc-as-if-condition shape end-to-end. The stub mirrors seed_default_workspace_templates's broken layout exactly: if ! node - ... <<'NODE' ...; then return 0; fi with the heredoc body between the two. The assertion checks that the temp script runs through step-down with the sentinel written, the unexpected token 'fi' error string absent, and the bash -n guard not firing.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

…eclare -f Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

coderabbitai · 2026-06-04T17:02:25Z

📝 Walkthrough

Walkthrough

This PR fixes a bash syntax error (fi near line 20) that occurred when nemoclaw-start dispatched functions containing heredocs through the sandbox user. The root cause was declare -f reshaping heredoc-bearing functions incorrectly. The solution introduces a new _step_down_extract_function utility that extracts function definitions byte-exact from disk using extdebug metadata and awk, avoiding serialization. The dispatcher now uses this utility with bash -n syntax validation before execution.

Changes

Step-down function dispatch fix for heredoc handling

Layer / File(s)	Summary
Bash function extraction utility `scripts/nemoclaw-start.sh`	`_step_down_extract_function` resolves function source file/line via `extdebug` metadata and emits the definition byte-exact from disk using `awk`, with strict heredoc opener/closer validation.
Dispatcher assembly and validation `scripts/nemoclaw-start.sh`	`run_step_down_as_sandbox` now extracts each dispatched function's source via `_step_down_extract_function`, appends the invocation, and validates syntax with `bash -n` before execution; failures clean up temp files and return explicit errors.
Test harness updates and heredoc regression test `test/nemoclaw-start.test.ts`	Test helpers updated to extract both extraction utility and dispatcher; new regression test verifies dispatch succeeds for functions with heredoc-bearing commands in `if` conditions, confirming heredoc body executes, no syntax error near `fi`, and temp script cleanup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

NVIDIA/NemoClaw#4528: Both PRs modify the step-down mechanism in scripts/nemoclaw-start.sh to prevent syntax/serialization failures when dispatching functions with heredocs under the sandbox user.

Suggested labels

fix, area: sandbox, bug-fix

Suggested reviewers

jyaunches
cv

Poem

🐰 A heredoc walked into a fi,
But declare -f made bash cry,
So we extract from the source file true,
Byte-exact and whole—the syntax shines through!
No more fi surprises, we're sandbox-safe now. 🎉

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: replacing declare -f with verbatim source extraction for step-down functions.
Linked Issues check	✅ Passed	The PR directly addresses issue `#4512` by implementing the _step_down_extract_function helper and rewriting run_step_down_as_sandbox to preserve here-doc syntax, fixing the shell syntax error near 'fi' on Ubuntu 24.04.
Out of Scope Changes check	✅ Passed	All changes are focused on fixing the step-down function dispatch mechanism for here-doc handling and adding corresponding tests; no unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/4512-step-down-source-extract

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-04T17:05:23Z

E2E Advisor Recommendation

Required E2E: cloud-onboard-e2e, sandbox-survival-e2e, test-e2e-gateway-isolation
Optional E2E: openclaw-onboard-security-posture-e2e, test-non-root-sandbox-smoke

Dispatch hint: cloud-onboard-e2e,sandbox-survival-e2e

Auto-dispatched E2E: cloud-onboard-e2e, sandbox-survival-e2e via nightly-e2e.yaml at c743974d7cca691be24688b5e155ac60f92d4302 — nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

cloud-onboard-e2e (high): Validates install.sh plus real OpenClaw onboard/startup on a clean environment, exercising the entrypoint path where the changed step-down helper is used.
sandbox-survival-e2e (high): Covers gateway stop/start and sandbox restart recovery; the modified helper runs during startup and could break restart-time entrypoint setup even if initial onboard succeeds.
test-e2e-gateway-isolation (medium): Builds the production sandbox image and validates gateway/sandbox isolation and entrypoint hardening, which is directly adjacent to the changed privilege step-down path.

Optional E2E

openclaw-onboard-security-posture-e2e (high): Useful extra confidence for OpenClaw onboard with runtime guard/security-posture assertions after changing entrypoint privilege and trusted-file assembly behavior.
test-non-root-sandbox-smoke (low): Adjacent entrypoint smoke coverage for the non-root fallback path; the changed helper is root-mode focused, so this is not merge-blocking but can catch accidental entrypoint syntax or setup regressions.

New E2E recommendations

root-mode step-down heredoc regression (high): The PR adds strong unit coverage for here-doc extraction, but there is no clearly targeted E2E that runs the production image root entrypoint and asserts seed_default_workspace_templates_as_sandbox/setup_auth_profile_as_sandbox succeed with here-doc-bearing functions and no generated-script syntax failure.
- Suggested test: Add a targeted root-entrypoint step-down E2E, or extend test-e2e-gateway-isolation.sh, to start the production image through nemoclaw-start in root mode and assert the step-down helper executes workspace seeding/auth-profile setup without 'unexpected token fi' or bash -n failures.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: cloud-onboard-e2e,sandbox-survival-e2e

github-actions · 2026-06-04T17:05:24Z

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: wsl-repo-cloud-openclaw, gpu-repo-local-ollama-openclaw

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

ubuntu-repo-cloud-openclaw: scripts/nemoclaw-start.sh is a startup/install-helper path exercised by repo-current scenario onboarding. The Ubuntu OpenClaw cloud scenario is the smallest primary path that validates sandbox startup, gateway health, credential/auth-profile handling, and baseline onboarding after the step-down helper changes.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

wsl-repo-cloud-openclaw: Optional adjacent platform coverage for the same repo-current OpenClaw startup path under WSL; useful because shell/startup behavior can differ, but it uses a special Windows/WSL runner so it is not the primary required path.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw
gpu-repo-local-ollama-openclaw: Optional adjacent coverage for the same repo-current OpenClaw startup path with local Ollama/GPU runtime; useful if maintainers want to validate startup under the GPU/local-provider path, but it requires a special GPU runner.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Relevant changed files

scripts/nemoclaw-start.sh

github-actions · 2026-06-04T17:07:36Z

PR Review Advisor

Findings: 0 needs attention, 3 worth checking, 0 nice ideas
Top item: Make step-down source extraction fail closed on malformed functions

Review findings

🛠️ Needs attention

None.

🔎 Worth checking

Source-of-truth review needed: run_step_down_as_sandbox source extraction workaround: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: The workaround explains the invalid state and adds a focused regression, but the extractor's documented fail-closed cases are not fully implemented/tested; see scripts/nemoclaw-start.sh:2792.
Make step-down source extraction fail closed on malformed functions (scripts/nemoclaw-start.sh:2792): _step_down_extract_function is now part of a sandbox trusted-code boundary, but its implementation does not fully match the documented contract. The comment says extraction returns 1 when the matching closing `}` is not found, yet the awk END block only exits nonzero when EOF occurs while still inside a heredoc. A malformed or future-edited dispatched function that enters extraction and reaches EOF outside heredoc can be treated as successfully extracted until the later `bash -n` guard catches it, if it catches it. The helper also relies on trusted static function names without an explicit identifier allowlist and leaves the generated temp script world-readable, which remains safe only while future call sites never embed secret values in snippets or function bodies.
- Recommendation: Have awk track whether the column-0 closing brace was seen and exit 1 when EOF occurs without it; add an explicit shell-function-name allowlist/regex check before passing `fn` into awk; and add negative tests for missing closers, bad opener/function names, and the `bash -n` failure path. Keep the no-secret-in-temp-script contract documented or enforced for any future call site.
- Evidence: The new extractor at scripts/nemoclaw-start.sh:2737-2793 only has `END { if (in_fn && in_heredoc) exit 1 }`, while the function comment says missing matching `}` returns 1. `run_step_down_as_sandbox` writes the extracted source to `/tmp/nemoclaw-step-down-XXXXXX.sh` with mode 0644 before step-down execution.
Add targeted runtime validation for the CAP_DAC_OVERRIDE startup path (test/nemoclaw-start.test.ts:4507): The unit and synthetic-script tests cover the heredoc-in-if regression and simulate the direct-root CAP_DAC_OVERRIDE helper chain, but the linked issue specifically reproduces in a built sandbox image with `--cap-drop=CAP_DAC_OVERRIDE` invoking `/usr/local/bin/nemoclaw-start` on Ubuntu 24.04. That runtime/container boundary is not exercised by this diff, so the acceptance clause about the normal image entrypoint keeping the container running is only partially covered.
- Recommendation: Add or identify a targeted runtime/integration validation that runs the built sandbox image with CAP_DAC_OVERRIDE dropped through the normal `nemoclaw-start` entrypoint and verifies no `fi` syntax error, token/proxy preparation, read-only RC files, and continued startup. Do not rely solely on the synthetic unit composition for this infrastructure path.
- Evidence: The new regression test at test/nemoclaw-start.test.ts:4199-4238 asserts the heredoc-in-if case through a stubbed `STEP_DOWN_PREFIX_SANDBOX`. The direct-root composition test starts at test/nemoclaw-start.test.ts:4507, but it is still a synthetic script rather than a built image run with actual cap-drop and entrypoint behavior.

🌱 Nice ideas

None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

github-actions · 2026-06-04T17:12:02Z

Selective E2E Results — ✅ All requested jobs passed

Run: 26967072744
Target ref: c743974d7cca691be24688b5e155ac60f92d4302
Workflow ref: main
Requested jobs: cloud-onboard-e2e,sandbox-survival-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job	Result
cloud-onboard-e2e	✅ success
sandbox-survival-e2e	✅ success

## Summary - Add the v0.0.59 release notes from the GitHub announcement discussion. - Refresh local inference and credential-storage guidance for the current release behavior. - Regenerate the user skills from the updated Fern docs. - Tighten release-prep and docs review guidance for generated skills, PR labels, and shared `$$nemoclaw` command placeholders. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" --glob '*.{md,mdx}'` - `git diff --check` - `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC permission failure) - `npm run typecheck:cli` - Pre-commit hooks during commit passed, including markdownlint, docs-to-skills verification, gitleaks, commitlint, and skills YAML tests. ## Source Summary - #3679, #4437, #4681, #4766, #4772, #4775, #4786 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`, `docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27 compatibility, runtime path pinning, plugin registry recovery, live gateway reconciliation, and clearer host-alias/startup diagnostics. - #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`, `docs/inference/inference-options.mdx`, `docs/inference/use-local-inference.mdx`, `docs/inference/switch-inference-providers.mdx`: Document the release inference changes covering Local NIM waits, Hermes Anthropic routing, Nemotron 3 Ultra, the current Ollama starter fallback, and Spark managed-vLLM context length. - #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`, `docs/security/credential-storage.mdx`, `docs/manage-sandboxes/messaging-channels.mdx`, `docs/reference/troubleshooting.mdx`: Capture permission healing, gateway-stored credential reuse, cross-sandbox messaging credential conflict checks, and CDI preflight diagnostics. - #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`: Regenerate the user skill references from the updated source docs. - Follow-up maintenance -> `.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`, `.coderabbit.yaml`: Add release-prep area labels for docs and skills PRs, and teach docs review guidance that `$$nemoclaw` is the correct shared command placeholder for examples that work across agent aliases. Note: the `documentation` label was not present in the repository, so this PR is labeled with `v0.0.59` only.  ## Summary by CodeRabbit * **Documentation** * Updated default model for local Ollama inference setup to qwen3.5:9b * Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option * Clarified credential storage and reuse behavior for post-deployment (day-two) operations * Added v0.0.59 release notes covering OpenClaw compatibility, inference options, Hermes messaging sync, and troubleshooting * Clarified CLI selection guidance and updated OpenClaw version example in status output * Revised release-prep instructions and docs review guidance for CLI alias usage

fix(scripts): extract step-down function source verbatim instead of d…

c743974

…eclare -f Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

laitingsheng added platform: ubuntu Affects Ubuntu Linux environments bug-fix PR fixes a bug or regression labels Jun 4, 2026

cv approved these changes Jun 4, 2026

View reviewed changes

cv merged commit 5c82cd9 into main Jun 4, 2026
35 checks passed

cv deleted the fix/4512-step-down-source-extract branch June 4, 2026 17:15

cv added the v0.0.59 Release target label Jun 4, 2026

miyoungc mentioned this pull request Jun 5, 2026

docs: refresh 0.0.59 release notes #4790

Merged

coderabbitai Bot mentioned this pull request Jun 8, 2026

fix(sandbox): make workspace template seeding shell-serializable #3787

Open

12 tasks

Conversation

laitingsheng commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 4, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 4, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 4, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

github-actions Bot commented Jun 4, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

laitingsheng commented Jun 4, 2026 •

edited

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading