Skip to content

fix(scripts): extract step-down function source verbatim instead of declare -f#4775

Merged
cv merged 1 commit into
mainfrom
fix/4512-step-down-source-extract
Jun 4, 2026
Merged

fix(scripts): extract step-down function source verbatim instead of declare -f#4775
cv merged 1 commit into
mainfrom
fix/4512-step-down-source-extract

Conversation

@laitingsheng

@laitingsheng laitingsheng commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Summary

NV QA reopened #4512 after verifying on nemoclaw v0.0.58 that the CAP_DAC_OVERRIDE-dropped startup path still aborts with /tmp/nemoclaw-step-down-*.sh: line 21: syntax error near unexpected token 'fi'.

PR #4528 fixed the bash -c "$(declare -f ...) ..." argv round-trip and the heredoc-as-last-statement shape that bash declare -f reproduces correctly. It did not cover the heredoc-as-if-condition shape: declare -f re-serialises the indented then-body command on the line right after the <<TAG opener, before the heredoc body, so the step-down shell absorbs that command into the heredoc body, leaves the then block empty, and aborts on the closing fi. seed_default_workspace_templates carries that exact shape, which is why the failure resurfaced.

This change bypasses declare -f. run_step_down_as_sandbox now extracts each dispatched function's literal source bytes from its defining file, so every heredoc placement survives intact.

Related Issue

Fixes #4512

Changes

  • Add _step_down_extract_function. It locates each function's source file and line number through shopt -s extdebug plus declare -F, then copies the definition byte-exact from disk. It handles both the multi-line opener/closer convention and the single-line name() { ... } shape. It tracks heredoc state so a future heredoc body containing a column-0 } cannot terminate extraction early.
  • Rewrite run_step_down_as_sandbox to assemble the temp script through the new extractor. The assembly runs in a subshell so a single extraction failure aborts cleanly with [step-down] failed to assemble dispatch script. A bash -n syntax check runs on the generated temp script before the step-down invocation as a fail-closed guard against any future opener/closer regression.
  • Inline _step_down_extract_function alongside run_step_down_as_sandbox at every existing test call site that pastes the helper into a synthetic script, so the extdebug-based extractor resolves through to the synthetic file on disk.
  • Add a regression test covering the heredoc-as-if-condition shape end-to-end. The stub mirrors seed_default_workspace_templates's broken layout exactly: if ! node - ... <<'NODE' ...; then return 0; fi with the heredoc body between the two. The assertion checks that the temp script runs through step-down with the sentinel written, the unexpected token 'fi' error string absent, and the bash -n guard not firing.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • npx prek run --all-files passes
  • npm test passes
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • npm run docs builds without warnings (doc changes only)
  • Doc pages follow the style guide (doc changes only)
  • New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: Tinson Lai tinsonl@nvidia.com

…eclare -f

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

This PR fixes a bash syntax error (fi near line 20) that occurred when nemoclaw-start dispatched functions containing heredocs through the sandbox user. The root cause was declare -f reshaping heredoc-bearing functions incorrectly. The solution introduces a new _step_down_extract_function utility that extracts function definitions byte-exact from disk using extdebug metadata and awk, avoiding serialization. The dispatcher now uses this utility with bash -n syntax validation before execution.

Changes

Step-down function dispatch fix for heredoc handling

Layer / File(s) Summary
Bash function extraction utility
scripts/nemoclaw-start.sh
_step_down_extract_function resolves function source file/line via extdebug metadata and emits the definition byte-exact from disk using awk, with strict heredoc opener/closer validation.
Dispatcher assembly and validation
scripts/nemoclaw-start.sh
run_step_down_as_sandbox now extracts each dispatched function's source via _step_down_extract_function, appends the invocation, and validates syntax with bash -n before execution; failures clean up temp files and return explicit errors.
Test harness updates and heredoc regression test
test/nemoclaw-start.test.ts
Test helpers updated to extract both extraction utility and dispatcher; new regression test verifies dispatch succeeds for functions with heredoc-bearing commands in if conditions, confirming heredoc body executes, no syntax error near fi, and temp script cleanup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • NVIDIA/NemoClaw#4528: Both PRs modify the step-down mechanism in scripts/nemoclaw-start.sh to prevent syntax/serialization failures when dispatching functions with heredocs under the sandbox user.

Suggested labels

fix, area: sandbox, bug-fix

Suggested reviewers

  • jyaunches
  • cv

Poem

🐰 A heredoc walked into a fi,
But declare -f made bash cry,
So we extract from the source file true,
Byte-exact and whole—the syntax shines through!
No more fi surprises, we're sandbox-safe now. 🎉

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: replacing declare -f with verbatim source extraction for step-down functions.
Linked Issues check ✅ Passed The PR directly addresses issue #4512 by implementing the _step_down_extract_function helper and rewriting run_step_down_as_sandbox to preserve here-doc syntax, fixing the shell syntax error near 'fi' on Ubuntu 24.04.
Out of Scope Changes check ✅ Passed All changes are focused on fixing the step-down function dispatch mechanism for here-doc handling and adding corresponding tests; no unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/4512-step-down-source-extract

Comment @coderabbitai help to get the list of available commands and usage tips.

@laitingsheng laitingsheng added platform: ubuntu Affects Ubuntu Linux environments bug-fix PR fixes a bug or regression labels Jun 4, 2026
@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: cloud-onboard-e2e, sandbox-survival-e2e, test-e2e-gateway-isolation
Optional E2E: openclaw-onboard-security-posture-e2e, test-non-root-sandbox-smoke

Dispatch hint: cloud-onboard-e2e,sandbox-survival-e2e

Auto-dispatched E2E: cloud-onboard-e2e, sandbox-survival-e2e via nightly-e2e.yaml at c743974d7cca691be24688b5e155ac60f92d4302nightly run

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • cloud-onboard-e2e (high): Validates install.sh plus real OpenClaw onboard/startup on a clean environment, exercising the entrypoint path where the changed step-down helper is used.
  • sandbox-survival-e2e (high): Covers gateway stop/start and sandbox restart recovery; the modified helper runs during startup and could break restart-time entrypoint setup even if initial onboard succeeds.
  • test-e2e-gateway-isolation (medium): Builds the production sandbox image and validates gateway/sandbox isolation and entrypoint hardening, which is directly adjacent to the changed privilege step-down path.

Optional E2E

  • openclaw-onboard-security-posture-e2e (high): Useful extra confidence for OpenClaw onboard with runtime guard/security-posture assertions after changing entrypoint privilege and trusted-file assembly behavior.
  • test-non-root-sandbox-smoke (low): Adjacent entrypoint smoke coverage for the non-root fallback path; the changed helper is root-mode focused, so this is not merge-blocking but can catch accidental entrypoint syntax or setup regressions.

New E2E recommendations

  • root-mode step-down heredoc regression (high): The PR adds strong unit coverage for here-doc extraction, but there is no clearly targeted E2E that runs the production image root entrypoint and asserts seed_default_workspace_templates_as_sandbox/setup_auth_profile_as_sandbox succeed with here-doc-bearing functions and no generated-script syntax failure.
    • Suggested test: Add a targeted root-entrypoint step-down E2E, or extend test-e2e-gateway-isolation.sh, to start the production image through nemoclaw-start in root mode and assert the step-down helper executes workspace seeding/auth-profile setup without 'unexpected token fi' or bash -n failures.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: cloud-onboard-e2e,sandbox-survival-e2e

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: wsl-repo-cloud-openclaw, gpu-repo-local-ollama-openclaw

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • ubuntu-repo-cloud-openclaw: scripts/nemoclaw-start.sh is a startup/install-helper path exercised by repo-current scenario onboarding. The Ubuntu OpenClaw cloud scenario is the smallest primary path that validates sandbox startup, gateway health, credential/auth-profile handling, and baseline onboarding after the step-down helper changes.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

  • wsl-repo-cloud-openclaw: Optional adjacent platform coverage for the same repo-current OpenClaw startup path under WSL; useful because shell/startup behavior can differ, but it uses a special Windows/WSL runner so it is not the primary required path.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw
  • gpu-repo-local-ollama-openclaw: Optional adjacent coverage for the same repo-current OpenClaw startup path with local Ollama/GPU runtime; useful if maintainers want to validate startup under the GPU/local-provider path, but it requires a special GPU runner.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=gpu-repo-local-ollama-openclaw

Relevant changed files

  • scripts/nemoclaw-start.sh

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 3 worth checking, 0 nice ideas
Top item: Make step-down source extraction fail closed on malformed functions

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • Source-of-truth review needed: run_step_down_as_sandbox source extraction workaround: The advisor marked localized patch analysis as needs_followup.
    • Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
    • Evidence: The workaround explains the invalid state and adds a focused regression, but the extractor's documented fail-closed cases are not fully implemented/tested; see scripts/nemoclaw-start.sh:2792.
  • Make step-down source extraction fail closed on malformed functions (scripts/nemoclaw-start.sh:2792): _step_down_extract_function is now part of a sandbox trusted-code boundary, but its implementation does not fully match the documented contract. The comment says extraction returns 1 when the matching closing `}` is not found, yet the awk END block only exits nonzero when EOF occurs while still inside a heredoc. A malformed or future-edited dispatched function that enters extraction and reaches EOF outside heredoc can be treated as successfully extracted until the later `bash -n` guard catches it, if it catches it. The helper also relies on trusted static function names without an explicit identifier allowlist and leaves the generated temp script world-readable, which remains safe only while future call sites never embed secret values in snippets or function bodies.
    • Recommendation: Have awk track whether the column-0 closing brace was seen and exit 1 when EOF occurs without it; add an explicit shell-function-name allowlist/regex check before passing `fn` into awk; and add negative tests for missing closers, bad opener/function names, and the `bash -n` failure path. Keep the no-secret-in-temp-script contract documented or enforced for any future call site.
    • Evidence: The new extractor at scripts/nemoclaw-start.sh:2737-2793 only has `END { if (in_fn && in_heredoc) exit 1 }`, while the function comment says missing matching `}` returns 1. `run_step_down_as_sandbox` writes the extracted source to `/tmp/nemoclaw-step-down-XXXXXX.sh` with mode 0644 before step-down execution.
  • Add targeted runtime validation for the CAP_DAC_OVERRIDE startup path (test/nemoclaw-start.test.ts:4507): The unit and synthetic-script tests cover the heredoc-in-if regression and simulate the direct-root CAP_DAC_OVERRIDE helper chain, but the linked issue specifically reproduces in a built sandbox image with `--cap-drop=CAP_DAC_OVERRIDE` invoking `/usr/local/bin/nemoclaw-start` on Ubuntu 24.04. That runtime/container boundary is not exercised by this diff, so the acceptance clause about the normal image entrypoint keeping the container running is only partially covered.
    • Recommendation: Add or identify a targeted runtime/integration validation that runs the built sandbox image with CAP_DAC_OVERRIDE dropped through the normal `nemoclaw-start` entrypoint and verifies no `fi` syntax error, token/proxy preparation, read-only RC files, and continued startup. Do not rely solely on the synthetic unit composition for this infrastructure path.
    • Evidence: The new regression test at test/nemoclaw-start.test.ts:4199-4238 asserts the heredoc-in-if case through a stubbed `STEP_DOWN_PREFIX_SANDBOX`. The direct-root composition test starts at test/nemoclaw-start.test.ts:4507, but it is still a synthetic script rather than a built image run with actual cap-drop and entrypoint behavior.

🌱 Nice ideas

  • None.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@github-actions

github-actions Bot commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 26967072744
Target ref: c743974d7cca691be24688b5e155ac60f92d4302
Workflow ref: main
Requested jobs: cloud-onboard-e2e,sandbox-survival-e2e
Summary: 2 passed, 0 failed, 0 skipped

Job Result
cloud-onboard-e2e ✅ success
sandbox-survival-e2e ✅ success

@cv cv merged commit 5c82cd9 into main Jun 4, 2026
35 checks passed
@cv cv deleted the fix/4512-step-down-source-extract branch June 4, 2026 17:15
@cv cv added the v0.0.59 Release target label Jun 4, 2026
cv pushed a commit that referenced this pull request Jun 5, 2026
## Summary
- Add the v0.0.59 release notes from the GitHub announcement discussion.
- Refresh local inference and credential-storage guidance for the
current release behavior.
- Regenerate the user skills from the updated Fern docs.
- Tighten release-prep and docs review guidance for generated skills, PR
labels, and shared `$$nemoclaw` command placeholders.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" --glob '*.{md,mdx}'`
- `git diff --check`
- `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC
permission failure)
- `npm run typecheck:cli`
- Pre-commit hooks during commit passed, including markdownlint,
docs-to-skills verification, gitleaks, commitlint, and skills YAML
tests.

## Source Summary
- #3679, #4437, #4681, #4766, #4772, #4775, #4786 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`,
`docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27
compatibility, runtime path pinning, plugin registry recovery, live
gateway reconciliation, and clearer host-alias/startup diagnostics.
- #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`,
`docs/inference/inference-options.mdx`,
`docs/inference/use-local-inference.mdx`,
`docs/inference/switch-inference-providers.mdx`: Document the release
inference changes covering Local NIM waits, Hermes Anthropic routing,
Nemotron 3 Ultra, the current Ollama starter fallback, and Spark
managed-vLLM context length.
- #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`,
`docs/security/credential-storage.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/troubleshooting.mdx`: Capture permission healing,
gateway-stored credential reuse, cross-sandbox messaging credential
conflict checks, and CDI preflight diagnostics.
- #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`:
Regenerate the user skill references from the updated source docs.
- Follow-up maintenance ->
`.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`,
`.coderabbit.yaml`: Add release-prep area labels for docs and skills
PRs, and teach docs review guidance that `$$nemoclaw` is the correct
shared command placeholder for examples that work across agent aliases.

Note: the `documentation` label was not present in the repository, so
this PR is labeled with `v0.0.59` only.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
  * Updated default model for local Ollama inference setup to qwen3.5:9b
  * Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option
* Clarified credential storage and reuse behavior for post-deployment
(day-two) operations
* Added v0.0.59 release notes covering OpenClaw compatibility, inference
options, Hermes messaging sync, and troubleshooting
* Clarified CLI selection guidance and updated OpenClaw version example
in status output
* Revised release-prep instructions and docs review guidance for CLI
alias usage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression platform: ubuntu Affects Ubuntu Linux environments v0.0.59 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Ubuntu 24.04][Security] CAP_DAC_OVERRIDE-dropped startup path fails with nemoclaw-start syntax error

3 participants