fix(runtime-context): try sandbox network/file actions before reporting them unavailable by zyang-dev · Pull Request #4875 · NVIDIA/NemoClaw

zyang-dev · 2026-06-05T23:03:49Z

Summary

Related Issue

Fixes #4850

Changes

Network guidance now states allowed endpoints work and instructs the agent to attempt a request rather than assume a host is unreachable; clarifies that attempting a restricted endpoint is what raises the operator approval request in OpenShell.
Removed the literal 403 as the required proof of "blocked"; the agent now reports the actual error and distinguishes a proxy/policy denial (operator-approvable) from DNS, timeout, or TLS failures.
Filesystem guidance keeps the accurate "no host-level access" caveat but adds that the agent can create, edit, and run files inside the sandbox (e.g. /tmp or /sandbox) and should try the operation and report if it fails.
Generalized the Behavior: directives to "verify before asserting" in both directions: don't claim something is blocked/unavailable without attempting it this turn, and don't claim unrestricted access either.
Updated runtime-context.test.ts assertions to match the new strings and added guards for the grounding directive.

Type of Change

Code change (feature, bug fix, or refactor)
Code change with doc updates
Doc only (prose changes, no code sample modifications)
Doc only (includes code sample changes)

Verification

npx prek run --all-files passes
npm test passes
Tests added or updated for new or changed behavior
No secrets, API keys, or credentials committed
Docs updated for user-facing behavior changes
npm run docs builds without warnings (doc changes only)
Doc pages follow the style guide (doc changes only)
New doc pages include SPDX header and frontmatter (new pages only)

Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com

Summary by CodeRabbit

Release Notes

Bug Fixes
- Improved clarity of network and filesystem policy messages to better distinguish between policy denials and network failures (DNS timeouts, TLS errors).
- Enhanced guidance for more accurate behavior when attempting actions before asserting access restrictions.

…work and file actions before reporting them unavailable Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

coderabbitai · 2026-06-05T23:04:01Z

Lost in the diff? Review this PR in Change Stack to follow the change map from intent to exact ranges.

📝 Walkthrough

Walkthrough

This PR refines agent sandbox policy messages to prevent model hallucination of resource restrictions. It updates network and filesystem policy context strings to emphasize verification-by-action and distinguishes real failures from policy denials. Behavior guidance is clarified to instruct agents to attempt actions before asserting blocks and to report actual failure modes rather than speculated ones.

Changes

Runtime Context Policy Clarity

Layer / File(s)	Summary
Network and filesystem policy strings `nemoclaw/src/runtime-context.ts`, `nemoclaw/src/runtime-context.test.ts`	`getRuntimeSummary()` policy template strings are rewritten to clarify deny-by-default sandbox scoping, require verification via attempted requests, and distinguish policy/proxy denials from DNS/timeout/TLS failures. Test expectations updated to match new wording.
Runtime behavior guidance `nemoclaw/src/runtime-context.ts`, `nemoclaw/src/runtime-context.test.ts`	`buildRuntimeContextText()` injects clarified behavior directives: attempt actions before asserting blocks, report real failure modes, and avoid false claims of unrestricted access. Test assertions verify new operator-approval and attempt-before-asserting directives.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

NVIDIA/NemoClaw#4037: Refactored injection of <nemoclaw-runtime> block via prependSystemContext, which directly relates to this PR's updates to the injected policy and behavior guidance.

Suggested labels

bug-fix, area: policy, enhancement: policy

Suggested reviewers

cv

Poem

🐰 Agents were dreaming of walls that weren't there,
claiming blocked /tmp with decidedly no care,
Now we ask them to try before saying "nope,"
distinguish real failures—don't just hallucinate hope!
Attempt first, report true, sandbox scripted with care. 🏗️

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly summarizes the main change: updating sandbox context to require agents to attempt network/file actions before reporting them unavailable.
Linked Issues check	✅ Passed	The PR changes align with issue `#4850` requirements: agents now attempt actions before asserting blocks, report actual errors (distinguishing proxy/policy denials from DNS/timeout/TLS), and avoid fabricating policy restrictions.
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing the guidance context for agents to verify actions before asserting unavailability, matching the PR objectives and linked issue `#4850`.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/runtime-context-verify-before-asserting

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-06-05T23:06:03Z

E2E Advisor Recommendation

Required E2E: common-egress-agent-e2e
Optional E2E: network-policy-e2e, agent-turn-latency-e2e

Dispatch hint: common-egress-agent-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: medium

Required E2E

common-egress-agent-e2e (high; provisions real sandboxes and runs model-backed agent turns): Closest existing E2E for the changed behavior: runs real OpenClaw agent turns that must use web_fetch against policy-allowed endpoints. This validates that NemoClaw/OpenClaw runtime guidance does not cause the assistant to preemptively refuse allowed network access and that real user network-tool flows still work.

Optional E2E

network-policy-e2e (high; full sandbox/network policy exercise): Useful adjacent confidence because the changed text describes deny-by-default egress, allowed endpoints, proxy/policy denials, and operator approval. This job validates the underlying network-policy enforcement and allowed/blocked endpoint behavior, although it does not directly assert the injected prompt context.
agent-turn-latency-e2e (high; provisions OpenClaw and Hermes sandboxes and runs real model-backed turns): Optional broad real-turn smoke for OpenClaw plugin/runtime integration after changing the prompt context hook. It confirms a model-backed OpenClaw turn still works, but it is less targeted than common-egress-agent-e2e because it does not exercise network-tool decisions.

New E2E recommendations

OpenClaw runtime context injection (high): Existing E2E coverage does not appear to assert that the real OpenClaw prompt contains the block or that the before_prompt_build hook is active in an onboarded sandbox. Add a lightweight real-sandbox E2E that triggers an OpenClaw turn with a debug/mock provider or prompt-capture hook and verifies the NemoClaw runtime context is prepended.
- Suggested test: real-openclaw-runtime-context-injection-e2e
Restricted endpoint approval behavior (medium): The PR changes the agent directive from preemptive refusal to attempting restricted endpoints and distinguishing proxy/policy denials from DNS/timeout/TLS failures. Add an E2E with a controlled restricted URL and mock/observable approval path that verifies the assistant attempts the request and reports the actual policy-denial/approval semantics instead of refusing before trying.
- Suggested test: restricted-egress-attempt-and-approval-request-e2e

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: common-egress-agent-e2e

github-actions · 2026-06-05T23:06:04Z

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw
Optional scenario E2E: wsl-repo-cloud-openclaw, macos-repo-cloud-openclaw

Dispatch required scenario E2E:

gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

ubuntu-repo-cloud-openclaw: The source change modifies NemoClaw's OpenClaw runtime context injected into prompts. The standard Ubuntu repo-current cloud OpenClaw scenario is the smallest ROUTES-backed scenario that installs the current branch with OpenClaw and exercises baseline OpenClaw startup/sandbox/inference behavior.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Optional scenario E2E

wsl-repo-cloud-openclaw: Optional adjacent coverage for the same OpenClaw repo-current surface on WSL; special-runner scenario, so not primary unless WSL-specific behavior is suspected.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=wsl-repo-cloud-openclaw
macos-repo-cloud-openclaw: Optional adjacent coverage for repo-current OpenClaw installation on macOS; special-runner/platform-only scenario and not the primary path for this runtime-context change.
- Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=macos-repo-cloud-openclaw

Relevant changed files

nemoclaw/src/runtime-context.ts

github-actions · 2026-06-05T23:10:42Z

PR Review Advisor

Findings: 1 needs attention, 3 worth checking, 0 nice ideas
Top item: Add unavailable-tool guidance before closing the linked hallucination bug

Review findings

🛠️ Needs attention

Linked issue expects missing-tool honesty, but the prompt only says to try the operation (nemoclaw/src/runtime-context.ts:21): Issue [Ubuntu 24.04][Agent&Skills] Nano Omni 30B hallucinates policy restrictions when no tools configured — claims /tmp blocked, network blocked #4850 identifies the actual failure as absent file-write/bash/fetch tools and expects the model to say it lacks the tool instead of fabricating a policy denial. The new filesystem text says the agent can create/edit/run files "using your file and shell tools" and should try the operation, but it never tells the agent what to do when those tools are not available. That leaves the acceptance-critical no-tool case only partially addressed.
- Recommendation: Add explicit runtime-context guidance such as: if the needed file, shell, web/fetch, or execution tool is not available in the current session, say the tool is unavailable and offer code/instructions instead of describing a sandbox policy denial. Add a unit assertion for that wording.
- Evidence: Issue [Ubuntu 24.04][Agent&Skills] Nano Omni 30B hallucinates policy restrictions when no tools configured — claims /tmp blocked, network blocked #4850 expected: "I don't have a file-writing tool available in this session..." and states the real issue was no file-write/bash-execute tools. The diff adds line 21: "within the sandbox you can create, edit, and run files ... using your file and shell tools; try the operation and report if it fails rather than assuming it is unavailable".

🔎 Worth checking

Source-of-truth review needed: Runtime prompt workaround for hallucinated sandbox policy denials: The advisor marked localized patch analysis as needs_followup.
- Recommendation: Identify the invalid state, source boundary, source-fix constraint, regression test, and removal condition before merging the localized behavior.
- Evidence: Issue [Ubuntu 24.04][Agent&Skills] Nano Omni 30B hallucinates policy restrictions when no tools configured — claims /tmp blocked, network blocked #4850 states the actual issue was no file-write or bash-execute tools. The new prompt text says to try operations using file and shell tools, but does not address the missing-tool state.
Network-attempt guidance should be bounded to task-relevant, policy-mediated requests (nemoclaw/src/runtime-context.ts:16): The new system context intentionally nudges the agent to make restricted endpoint requests so OpenShell can raise an approval prompt. The policy boundary is not bypassed by this code, but the wording can encourage probing arbitrary user-supplied URLs before considering whether they are task-relevant, sensitive, internal, or metadata-style targets.
- Recommendation: Qualify the directive so attempts are limited to task-relevant requests through normal sandbox tools and do not encourage probing internal, metadata, credential-bearing, or otherwise sensitive destinations unless explicitly required and mediated by policy.
- Evidence: Line 16 says: "attempting a restricted endpoint is productive... so make the request rather than refusing preemptively". Line 67 similarly says not to assert a URL or host is blocked unless it has been attempted this turn.
Prompt tests do not guard the safety and no-tool behaviors this change relies on (nemoclaw/src/runtime-context.test.ts:135): The updated tests check that the grounding directive and approval-request wording appear, but they do not assert the acceptance-critical missing-tool guidance, nor do they guard that the prompt still denies unrestricted host/internet access while encouraging real attempts.
- Recommendation: Add behavior-focused assertions for missing tool reporting, sandbox-writable `/tmp` and `/sandbox` with no host-path access, no unrestricted internet/host claims, and policy-denial reporting without requiring a literal 403.
- Evidence: The test currently asserts general section headers and substrings including "unless you have actually attempted it this turn" and "raises an operator approval request", but there is no assertion about unavailable file/shell/fetch tools or preserving the restrictive host/internet posture.

🌱 Nice ideas

None.

Consider writing more tests for

**Runtime validation** — injects guidance to report missing file, shell, execution, or fetch tools instead of fabricating sandbox policy denials. The changed production logic is deterministic prompt construction and can be guarded with unit tests, but the linked bug is model behavior under a no-tool runtime configuration, so prompt-string tests alone cannot prove the hallucination is resolved.
**Runtime validation** — keeps `/tmp` and `/sandbox` described as sandbox-writable while still denying host-path access. The changed production logic is deterministic prompt construction and can be guarded with unit tests, but the linked bug is model behavior under a no-tool runtime configuration, so prompt-string tests alone cannot prove the hallucination is resolved.
**Runtime validation** — keeps unrestricted internet and host access prohibited while encouraging task-relevant policy-mediated attempts. The changed production logic is deterministic prompt construction and can be guarded with unit tests, but the linked bug is model behavior under a no-tool runtime configuration, so prompt-string tests alone cannot prove the hallucination is resolved.
**Runtime validation** — reports policy denials without requiring a literal proxy 403 status. The changed production logic is deterministic prompt construction and can be guarded with unit tests, but the linked bug is model behavior under a no-tool runtime configuration, so prompt-string tests alone cannot prove the hallucination is resolved.
**Runtime validation** — validates the no-file-write/no-shell/no-fetch tool configuration does not produce `/tmp blocked by operator` or total network-blocked wording. The changed production logic is deterministic prompt construction and can be guarded with unit tests, but the linked bug is model behavior under a no-tool runtime configuration, so prompt-string tests alone cannot prove the hallucination is resolved.
**Prompt tests do not guard the safety and no-tool behaviors this change relies on** — Add behavior-focused assertions for missing tool reporting, sandbox-writable `/tmp` and `/sandbox` with no host-path access, no unrestricted internet/host claims, and policy-denial reporting without requiring a literal 403.
**Acceptance clause:** When `nvidia/nemotron-3-nano-omni-30b-a3b-reasoning` is asked to write a file or fetch a URL, it invents false policy restrictions rather than reporting that no tool is available. — add test evidence or identify existing coverage. The diff discourages speculative blocked/unreachable claims, but it does not add guidance for reporting missing file-write, shell, or fetch tools.
**Acceptance clause:** It claims `/tmp` is "blocked by operator" and "network access is blocked" — both incorrect. — add test evidence or identify existing coverage. The new runtime context says `/tmp` and `/sandbox` are usable inside the sandbox and says not to assert a URL/host is blocked without attempting it. Tests do not specifically guard against the `/tmp blocked by operator` wording recurring.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

## Summary - Adds the `v0.0.60` section to `docs/about/release-notes.mdx` using the dev announcement from discussion #4877. - Fills the source-doc gaps found during release-prep review across inference, policy tiers, command behavior, security boundaries, Hermes dashboard/tooling, runtime context, and troubleshooting. - Refreshes generated agent skills under `.agents/skills/` from the current Fern docs output and upgrades Fern from `5.44.3` to `5.45.0`. ## Source summary - #4037 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents system-only runtime context that stays out of visible chat. - #4875 -> `docs/reference/architecture.mdx`, `docs/about/how-it-works.mdx`, `docs/about/release-notes.mdx`: Documents try-first sandbox network/filesystem guidance and clearer failure classification. - #4788 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents shared OpenClaw device-approval policy for startup and connect. - #4768 -> `docs/reference/network-policies.mdx`, `docs/network-policy/integration-policy-examples.mdx`, `docs/get-started/quickstart.mdx`, `docs/get-started/quickstart-hermes.mdx`, `docs/reference/commands.mdx`: Documents `weather`, `public-reference`, and Hermes managed-tool gateway preset behavior. - #3788 and #4864 -> `docs/reference/network-policies.mdx`, `docs/reference/commands.mdx`: Documents non-interactive policy-tier fail-fast behavior and interactive prompt fallback. - #4756 and #4866 -> `docs/reference/commands.mdx`: Documents env-aware default sandbox resolution for `list`, `status`, and `tunnel` commands. - #4320 -> `docs/reference/commands.mdx`: Documents `$$nemoclaw tunnel status` behavior. - #4328 -> `docs/reference/commands.mdx`: Documents line-scoped policy preset descriptions in `policy-list`. - #4580 and #4748 -> `docs/reference/architecture.mdx`: Documents package-managed OpenShell gateway service and Docker-driver gateway-marker behavior. - #4598 -> `docs/manage-sandboxes/lifecycle.mdx`: Documents concurrent gateway/dashboard cleanup isolation by sandbox name and port. - #4777 -> `docs/reference/troubleshooting.mdx`: Documents Docker GPU patch rollback behavior. - #4610 -> `docs/reference/troubleshooting.mdx`, `docs/reference/commands.mdx`: Keeps mutable OpenClaw config permission guidance aligned and removes skipped experimental wording. - #4868 -> `docs/reference/commands.mdx`: Keeps `.dockerignore` handling for custom `onboard --from <Dockerfile>` contexts in generated skills. - #4870 -> `docs/reference/commands.mdx`, `docs/manage-sandboxes/runtime-controls.mdx`: Documents `NEMOCLAW_MINIMAL_BOOTSTRAP` and generated skill coverage. - #4641 -> `docs/inference/inference-options.mdx`, `docs/reference/troubleshooting.mdx`: Documents local NVIDIA NIM platform-digest pulls and served-model id adoption. - #4810 and #4867 -> `docs/inference/inference-options.mdx`: Documents stable NGC managed-vLLM image lineage and DGX Station DeepSeek V4 Flash coverage. - #4852 -> `docs/inference/use-local-inference.mdx`, `docs/reference/troubleshooting.mdx`: Documents Ollama model fit filtering, 16K context floor, cold-load retry, and failed-model exclusion. - #4847 -> `docs/inference/switch-inference-providers.mdx`: Documents API-family sync, Hermes `api_mode`, and Bedrock Runtime exception. - #4800 -> `docs/inference/tool-calling-reliability.mdx`: Documents Nemotron managed-inference native tool-search fallback. - #4333 -> `docs/inference/switch-inference-providers.mdx`: Documents interactive multimodal input prompting. - #4086 -> `docs/reference/troubleshooting.mdx`: Keeps proxy bypass normalization in generated troubleshooting coverage. - #4811 and #4855 -> `docs/get-started/quickstart-hermes.mdx`: Documents prebuilt Hermes dashboard assets and TUI recovery without runtime rebuilds. - #4854 -> `docs/inference/switch-inference-providers.mdx`, `docs/reference/commands.mdx`: Documents Hermes proxy API-key placeholder preservation during inference switches. - #4248 -> `docs/manage-sandboxes/messaging-channels.mdx`, `.agents/skills/`: Keeps messaging enrollment behavior aligned with manifest-hook implementation. - #4771 -> `docs/security/best-practices.mdx`, `docs/security/credential-storage.mdx`: Documents Hermes placeholder-only secret boundary for sandbox-visible runtime files. - #4787 -> `docs/security/best-practices.mdx`, `docs/about/release-notes.mdx`: Documents expanded memory scanner examples for OpenAI project keys and Slack app-level tokens. - #4848 -> `docs/reference/commands.mdx`: Documents OpenClaw skill install mirroring into the agent home directory. - #4790 -> `docs/about/release-notes.mdx`: Uses the prior release-prep structure and generated `.agents/skills/` refresh as the template for this release. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ skills/ --prefix nemoclaw-user --doc-platform fern-mdx --dry-run` - `npm run docs` - `git diff --check` - skip-term scan across `docs/`, `.agents/skills/`, and `skills/` - `npm run build:cli` - `npm run typecheck:cli` - Commit and pre-push hook suites, including markdownlint, gitleaks, env-var docs gate, docs-to-skills verification, and skills YAML tests  ## Summary by CodeRabbit ## Release Notes * **New Features** * DeepSeek-V4-Flash now available as default inference model for DGX Station. * Hermes dashboard improved with dedicated port and OAuth-authenticated tool gateway selection. * Added weather and public-reference policy presets for expanded agent capabilities. * Enhanced Ollama model selection with GPU memory filtering and automatic retry for timeouts. * **Bug Fixes** * Improved policy tier validation to prevent invalid configurations. * Better sandbox cleanup scoping by port to prevent conflicts across deployments. * Added GPU patch failure recovery with automatic rollback. * **Documentation** * Expanded troubleshooting guides for inference, security, and sandbox lifecycle. * Added .dockerignore best practices for custom deployments.  --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>

fix(runtime-context): ground sandbox prompt so the agent attempts net…

eb473af

…work and file actions before reporting them unavailable Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>

zyang-dev added the v0.0.60 Release target label Jun 5, 2026

cv approved these changes Jun 5, 2026

View reviewed changes

cv merged commit 2b68caa into main Jun 5, 2026
42 checks passed

cv deleted the fix/runtime-context-verify-before-asserting branch June 5, 2026 23:18

miyoungc mentioned this pull request Jun 6, 2026

docs: refresh v0.0.60 release notes #4879

Merged

wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026

Conversation

zyang-dev commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

github-actions Bot commented Jun 5, 2026

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented Jun 5, 2026

E2E Scenario Advisor Recommendation

E2E Scenario Advisor

Required scenario E2E

Optional scenario E2E

Relevant changed files

Uh oh!

github-actions Bot commented Jun 5, 2026

PR Review Advisor

🛠️ Needs attention

🔎 Worth checking

🌱 Nice ideas

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zyang-dev commented Jun 5, 2026 •

edited

Loading

coderabbitai Bot commented Jun 5, 2026 •

edited

Loading