Skip to content

fix(inference): increase timeout for local providers to 180s#1620

Merged
ericksoa merged 1 commit into
NVIDIA:mainfrom
paritoshd-nv:fix/update-local-inference-timeout
Apr 8, 2026
Merged

fix(inference): increase timeout for local providers to 180s#1620
ericksoa merged 1 commit into
NVIDIA:mainfrom
paritoshd-nv:fix/update-local-inference-timeout

Conversation

@paritoshd-nv

@paritoshd-nv paritoshd-nv commented Apr 8, 2026

Copy link
Copy Markdown
Contributor

Local models on DGX Spark can exceed the default 60s inference timeout, especially with large prompts (~90k tokens of system context). Pass --timeout 180 to openshell inference set for ollama-local and vllm-local providers. The timeout is configurable via NEMOCLAW_LOCAL_INFERENCE_TIMEOUT env var. Also add timeout_secs to blueprint InferenceProfile for the nim-local and vllm profiles.

Closes #1588

Summary

Related Issue

Changes

Type of Change

  • Code change for a new feature, bug fix, or refactor.
  • Code change with doc updates.
  • Doc only. Prose changes without code sample modifications.
  • Doc only. Includes code sample changes.

Testing

  • npx prek run --all-files passes (or equivalently make check).
  • npm test passes.
  • make docs builds without warnings. (for doc-only changes)

Checklist

General

Code Changes

  • Formatters applied — npx prek run --all-files auto-fixes formatting (or make format for targeted runs).
  • Tests added or updated for new or changed behavior.
  • No secrets, API keys, or credentials committed.
  • Doc pages updated for any user-facing behavior changes (new commands, changed defaults, new features, bug fixes that contradict existing docs).

Doc Changes

  • Follows the style guide. Try running the nemoclaw-contributor-update-docs agent skill to draft changes while complying with the style guide. For example, prompt your agent with "/nemoclaw-contributor-update-docs catch up the docs for the new changes I made in this PR."
  • New pages include SPDX license header and frontmatter, if creating a new page.
  • Cross-references and links verified.

Signed-off-by: Your Name your-email@example.com

Summary by CodeRabbit

  • New Features
    • Added configurable inference timeouts for local providers with a default of 180 seconds; timeouts are applied to local inference requests so long-running operations can be controlled.
  • Tests
    • Added tests to validate timeout behavior is applied when configured and omitted when not.

@coderabbitai

coderabbitai Bot commented Apr 8, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

Adds optional per-profile inference timeouts and wires them through onboarding and CLI invocation: an optional timeout_secs on inference profiles, onboarding parses LOCAL_INFERENCE_TIMEOUT_SECS (default 180s) and openshell inference set calls include --timeout when configured.

Changes

Cohort / File(s) Summary
Blueprint / Profiles
nemoclaw-blueprint/blueprint.yaml
Added timeout_secs: 180 to components.inference.profiles.nim-local and components.inference.profiles.vllm.
Runner (blueprint apply)
nemoclaw/src/blueprint/runner.ts, nemoclaw/src/blueprint/runner.test.ts
Added optional InferenceProfile.timeout_secs?: number; actionApply builds openshell inference set args dynamically and appends --timeout <value> when present. Tests added to assert presence/absence of --timeout.
Onboarding script
bin/lib/onboard.js
Parsed LOCAL_INFERENCE_TIMEOUT_SECS (env var NEMOCLAW_LOCAL_INFERENCE_TIMEOUT, default 180, clamped to non-negative integer) and pass --timeout to openshell inference set for local providers (vllm-local, ollama-local).

Sequence Diagram(s)

sequenceDiagram
    participant Onboard as Onboard Script
    participant Openshell as openshell CLI
    participant Provider as Local Inference Provider

    Onboard->>Openshell: run "inference set" --provider <p> --model <m> [--timeout <t>]
    Openshell->>Provider: configure provider with model and optional timeout
    Provider-->>Openshell: ack
    Openshell-->>Onboard: operation result
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

🐰 I nibbled flags and counted ticks,
Timeout seeds in tidy nooks,
Local models given longer breaths,
Commands now carry careful clocks,
Hoppity hop — setups finish quick!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: increasing the timeout for local inference providers from the default 60s to 180s, which is the primary objective of the PR.
Linked Issues check ✅ Passed The PR fully addresses issue #1588 by making local inference timeout configurable via environment variable and setting it to 180s for ollama-local and vllm-local providers.
Out of Scope Changes check ✅ Passed All changes directly support the timeout configuration objective: environment variable handling, CLI flag passing, blueprint profile updates, and corresponding test additions are all in scope.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@nemoclaw/src/blueprint/runner.ts`:
- Around line 309-311: The code currently uses a truthy check for
inferenceCfg.timeout_secs which skips valid values like 0; update the
conditional around building inferenceArgs (the block that references
inferenceCfg and inferenceArgs in runner.ts) to check presence instead, e.g.
test that inferenceCfg.timeout_secs is not undefined/null
(inferenceCfg.timeout_secs !== undefined && inferenceCfg.timeout_secs !== null
or != null) before pushing "--timeout" and String(inferenceCfg.timeout_secs), so
zero is accepted but missing values are still skipped.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dc6bd835-c9a9-4cc0-a321-3d303085823a

📥 Commits

Reviewing files that changed from the base of the PR and between e2bfdcf and b480fd8.

📒 Files selected for processing (4)
  • bin/lib/onboard.js
  • nemoclaw-blueprint/blueprint.yaml
  • nemoclaw/src/blueprint/runner.test.ts
  • nemoclaw/src/blueprint/runner.ts

Comment thread nemoclaw/src/blueprint/runner.ts Outdated
Local models on DGX Spark can exceed the default 60s inference timeout,
especially with large prompts (~90k tokens of system context). Pass
--timeout 180 to openshell inference set for ollama-local and vllm-local
providers. The timeout is configurable via NEMOCLAW_LOCAL_INFERENCE_TIMEOUT
env var. Also add timeout_secs to blueprint InferenceProfile for the
nim-local and vllm profiles.

Closes NVIDIA#1588

Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com>
@paritoshd-nv paritoshd-nv force-pushed the fix/update-local-inference-timeout branch from b480fd8 to 22b3a89 Compare April 8, 2026 20:11

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
nemoclaw/src/blueprint/runner.test.ts (1)

464-496: LGTM! Test correctly verifies --timeout argument passing.

The test properly sets up a blueprint with timeout_secs: 180, invokes actionApply, and verifies the mocked execa call includes the expected arguments. Good use of try/finally for environment variable cleanup.

One optional improvement: the current assertions verify that both "--timeout" and "180" exist in the argument array but don't confirm their adjacency. For stronger guarantees, you could check positioning:

♻️ Optional: More precise argument verification
       const inferenceCall = mockExeca.mock.calls.find(
         (c) => Array.isArray(c[1]) && c[1].includes("inference") && c[1].includes("set"),
       );
       if (!inferenceCall) throw new Error("inference set call not found");
-      expect(inferenceCall[1]).toContain("--timeout");
-      expect(inferenceCall[1]).toContain("180");
+      const args = inferenceCall[1] as string[];
+      const idx = args.indexOf("--timeout");
+      expect(idx).toBeGreaterThanOrEqual(0);
+      expect(args[idx + 1]).toBe("180");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@nemoclaw/src/blueprint/runner.test.ts` around lines 464 - 496, The test
should assert that "--timeout" and "180" are adjacent in the mock execa args
rather than only both present; after locating the mocked call (variable
inferenceCall in the "passes --timeout when timeout_secs is set in profile" test
that invokes actionApply), get the argument array (inferenceCall[1]) and assert
that the index of "--timeout" is found and that the next element (index + 1)
equals "180" to guarantee adjacency.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@nemoclaw/src/blueprint/runner.test.ts`:
- Around line 464-496: The test should assert that "--timeout" and "180" are
adjacent in the mock execa args rather than only both present; after locating
the mocked call (variable inferenceCall in the "passes --timeout when
timeout_secs is set in profile" test that invokes actionApply), get the argument
array (inferenceCall[1]) and assert that the index of "--timeout" is found and
that the next element (index + 1) equals "180" to guarantee adjacency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: abe34734-326b-4035-9520-b7e2299a0bbe

📥 Commits

Reviewing files that changed from the base of the PR and between b480fd8 and 22b3a89.

📒 Files selected for processing (4)
  • bin/lib/onboard.js
  • nemoclaw-blueprint/blueprint.yaml
  • nemoclaw/src/blueprint/runner.test.ts
  • nemoclaw/src/blueprint/runner.ts
✅ Files skipped from review due to trivial changes (1)
  • nemoclaw-blueprint/blueprint.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • nemoclaw/src/blueprint/runner.ts
  • bin/lib/onboard.js

@paritoshd-nv paritoshd-nv requested review from cv and ericksoa April 8, 2026 20:25

@ericksoa ericksoa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — clean, focused fix for local inference timeouts on DGX Spark. Correct use of !== undefined guard, configurable via env var, good test coverage.

@ericksoa ericksoa merged commit b8a5245 into NVIDIA:main Apr 8, 2026
13 of 24 checks passed
miyoungc added a commit that referenced this pull request Apr 9, 2026
## Summary
- Document `nemoclaw credentials list` and `nemoclaw credentials reset`
commands in commands reference (#1597)
- Add `--dry-run` flag documentation for `policy-add` (#1276)
- Update policy presets table: remove `docker` (#1647), add `brave` and
`brew`, update HuggingFace endpoint (#1540)
- Document `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` env var for local
providers (#1620)
- Document `NEMOCLAW_PROXY_HOST`/`NEMOCLAW_PROXY_PORT` env vars (#1563)
- Add troubleshooting entries for Docker group permissions (#1614),
sandbox survival after gateway restart (#1587), and proxy configuration
- Regenerate `nemoclaw-user-*` skills from updated docs

## Test plan
- [x] `make docs` builds without warnings
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify rendered pages in docs site preview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added `nemoclaw credentials list` command to display stored credential
names
* Added `nemoclaw credentials reset <KEY>` command with `--yes` flag to
remove credentials
  * Added `--dry-run` flag for policy-add to preview endpoint changes
  * New policy presets: `brave` and `brew`
* New configuration options: `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT`,
`NEMOCLAW_PROXY_HOST`, and `NEMOCLAW_PROXY_PORT`

* **Documentation**
* Expanded troubleshooting guides for Docker permissions, sandbox
connectivity, local inference timeouts, and proxy configuration

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
brandonpelfrey pushed a commit that referenced this pull request Apr 9, 2026
## Summary

- Allow model and provider changes without rebuilding the sandbox image
- The entrypoint patches `openclaw.json` at startup when
`NEMOCLAW_MODEL_OVERRIDE` is set, then recomputes the config hash so
integrity checks still pass
- Same pattern as `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` (PR #1620)

### New env vars

| Env var | Purpose | When needed |
|---------|---------|-------------|
| `NEMOCLAW_MODEL_OVERRIDE` | Override `agents.defaults.model.primary`
and provider model name | Any model switch |
| `NEMOCLAW_INFERENCE_API_OVERRIDE` | Override inference API type
(`openai-completions` or `anthropic-messages`) | Cross-provider switches
only |

### Usage example (NVIDIA → Anthropic)

```bash
# On host: configure gateway route
openshell inference set --provider anthropic-prod --model claude-sonnet-4-6 --no-verify

# Set env vars for the sandbox (via openshell or Docker)
export NEMOCLAW_MODEL_OVERRIDE="anthropic/claude-sonnet-4-6"
export NEMOCLAW_INFERENCE_API_OVERRIDE="anthropic-messages"

# Restart sandbox — no image rebuild needed
```

### Security

- Env vars come from the host (Docker/OpenShell), not from inside the
sandbox
- Config integrity is verified first (detects build-time tampering),
then override is applied
- Config hash is recomputed after patching
- Landlock locks the file after this function runs
- Agent cannot set these env vars

## Related Issue

Closes #759

## Test plan

- [ ] `npm test` passes (39 tests in nemoclaw-start.test.js)
- [ ] Set `NEMOCLAW_MODEL_OVERRIDE` → sandbox starts with overridden
model
- [ ] Unset env var → sandbox starts with original baked model (no
regression)
- [ ] Set invalid model → sandbox starts but inference fails (expected)
- [ ] Config hash passes integrity check on restart after override

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Env vars to override the active model and (optionally) the inference
API at container startup; when used in privileged startup, the runtime
config is updated and its integrity hash recomputed so startup
verification aligns.

* **Runtime safeguards**
* Input validation, API allowlist, symlink protections, applies only in
privileged mode, and no-op behavior when unset.

* **Tests**
* New unit and end-to-end tests covering override behavior, timing, hash
recomputation, validation, and noop cases.

* **Documentation**
* Guidance added for cross-provider switching and the runtime override
workflow.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
ericksoa pushed a commit to cheese-head/NemoClaw that referenced this pull request Apr 14, 2026
## Summary

- Allow model and provider changes without rebuilding the sandbox image
- The entrypoint patches `openclaw.json` at startup when
`NEMOCLAW_MODEL_OVERRIDE` is set, then recomputes the config hash so
integrity checks still pass
- Same pattern as `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` (PR NVIDIA#1620)

### New env vars

| Env var | Purpose | When needed |
|---------|---------|-------------|
| `NEMOCLAW_MODEL_OVERRIDE` | Override `agents.defaults.model.primary`
and provider model name | Any model switch |
| `NEMOCLAW_INFERENCE_API_OVERRIDE` | Override inference API type
(`openai-completions` or `anthropic-messages`) | Cross-provider switches
only |

### Usage example (NVIDIA → Anthropic)

```bash
# On host: configure gateway route
openshell inference set --provider anthropic-prod --model claude-sonnet-4-6 --no-verify

# Set env vars for the sandbox (via openshell or Docker)
export NEMOCLAW_MODEL_OVERRIDE="anthropic/claude-sonnet-4-6"
export NEMOCLAW_INFERENCE_API_OVERRIDE="anthropic-messages"

# Restart sandbox — no image rebuild needed
```

### Security

- Env vars come from the host (Docker/OpenShell), not from inside the
sandbox
- Config integrity is verified first (detects build-time tampering),
then override is applied
- Config hash is recomputed after patching
- Landlock locks the file after this function runs
- Agent cannot set these env vars

## Related Issue

Closes NVIDIA#759

## Test plan

- [ ] `npm test` passes (39 tests in nemoclaw-start.test.js)
- [ ] Set `NEMOCLAW_MODEL_OVERRIDE` → sandbox starts with overridden
model
- [ ] Unset env var → sandbox starts with original baked model (no
regression)
- [ ] Set invalid model → sandbox starts but inference fails (expected)
- [ ] Config hash passes integrity check on restart after override

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Env vars to override the active model and (optionally) the inference
API at container startup; when used in privileged startup, the runtime
config is updated and its integrity hash recomputed so startup
verification aligns.

* **Runtime safeguards**
* Input validation, API allowlist, symlink protections, applies only in
privileged mode, and no-op behavior when unset.

* **Tests**
* New unit and end-to-end tests covering override behavior, timing, hash
recomputation, validation, and noop cases.

* **Documentation**
* Guidance added for cross-provider switching and the runtime override
workflow.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
…1620)

Local models on DGX Spark can exceed the default 60s inference timeout,
especially with large prompts (~90k tokens of system context). Pass
--timeout 180 to openshell inference set for ollama-local and vllm-local
providers. The timeout is configurable via
NEMOCLAW_LOCAL_INFERENCE_TIMEOUT env var. Also add timeout_secs to
blueprint InferenceProfile for the nim-local and vllm profiles.

Closes NVIDIA#1588

<!-- markdownlint-disable MD041 -->
## Summary
<!-- 1-3 sentences: what this PR does and why. -->

## Related Issue
<!-- Link to the issue: Fixes #NNN or Closes #NNN. Remove this section
if none. -->

## Changes
<!-- Bullet list of key changes. -->

## Type of Change
<!-- Check the one that applies. -->
- [x] Code change for a new feature, bug fix, or refactor.
- [ ] Code change with doc updates.
- [ ] Doc only. Prose changes without code sample modifications.
- [ ] Doc only. Includes code sample changes.

## Testing
<!-- What testing was done? -->
- [x] `npx prek run --all-files` passes (or equivalently `make check`).
- [x] `npm test` passes.
- [ ] `make docs` builds without warnings. (for doc-only changes)

## Checklist

### General

- [x] I have read and followed the [contributing
guide](https://github.com/NVIDIA/NemoClaw/blob/main/CONTRIBUTING.md).
- [x] I have read and followed the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md).
(for doc-only changes)

### Code Changes
<!-- Skip if this is a doc-only PR. -->
- [x] Formatters applied — `npx prek run --all-files` auto-fixes
formatting (or `make format` for targeted runs).
- [x] Tests added or updated for new or changed behavior.
- [x] No secrets, API keys, or credentials committed.
- [ ] Doc pages updated for any user-facing behavior changes (new
commands, changed defaults, new features, bug fixes that contradict
existing docs).

### Doc Changes
<!-- Skip if this PR has no doc changes. -->
- [ ] Follows the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md).
Try running the `nemoclaw-contributor-update-docs` agent skill to draft
changes while complying with the style guide. For example, prompt your
agent with "`/nemoclaw-contributor-update-docs` catch up the docs for
the new changes I made in this PR."
- [ ] New pages include SPDX license header and frontmatter, if creating
a new page.
- [ ] Cross-references and links verified.

---
<!-- DCO sign-off (required by CI). Replace with your real name and
email. -->
Signed-off-by: Your Name <your-email@example.com>


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added configurable inference timeouts for local providers with a
default of 180 seconds; timeouts are applied to local inference requests
so long-running operations can be controlled.
* **Tests**
* Added tests to validate timeout behavior is applied when configured
and omitted when not.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Paritosh Dixit <paritoshd@nvidia.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
## Summary
- Document `nemoclaw credentials list` and `nemoclaw credentials reset`
commands in commands reference (NVIDIA#1597)
- Add `--dry-run` flag documentation for `policy-add` (NVIDIA#1276)
- Update policy presets table: remove `docker` (NVIDIA#1647), add `brave` and
`brew`, update HuggingFace endpoint (NVIDIA#1540)
- Document `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` env var for local
providers (NVIDIA#1620)
- Document `NEMOCLAW_PROXY_HOST`/`NEMOCLAW_PROXY_PORT` env vars (NVIDIA#1563)
- Add troubleshooting entries for Docker group permissions (NVIDIA#1614),
sandbox survival after gateway restart (NVIDIA#1587), and proxy configuration
- Regenerate `nemoclaw-user-*` skills from updated docs

## Test plan
- [x] `make docs` builds without warnings
- [x] All pre-commit and pre-push hooks pass
- [ ] Verify rendered pages in docs site preview

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **New Features**
* Added `nemoclaw credentials list` command to display stored credential
names
* Added `nemoclaw credentials reset <KEY>` command with `--yes` flag to
remove credentials
  * Added `--dry-run` flag for policy-add to preview endpoint changes
  * New policy presets: `brave` and `brew`
* New configuration options: `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT`,
`NEMOCLAW_PROXY_HOST`, and `NEMOCLAW_PROXY_PORT`

* **Documentation**
* Expanded troubleshooting guides for Docker permissions, sandbox
connectivity, local inference timeouts, and proxy configuration

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
gemini2026 pushed a commit to gemini2026/NemoClaw that referenced this pull request Apr 14, 2026
## Summary

- Allow model and provider changes without rebuilding the sandbox image
- The entrypoint patches `openclaw.json` at startup when
`NEMOCLAW_MODEL_OVERRIDE` is set, then recomputes the config hash so
integrity checks still pass
- Same pattern as `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` (PR NVIDIA#1620)

### New env vars

| Env var | Purpose | When needed |
|---------|---------|-------------|
| `NEMOCLAW_MODEL_OVERRIDE` | Override `agents.defaults.model.primary`
and provider model name | Any model switch |
| `NEMOCLAW_INFERENCE_API_OVERRIDE` | Override inference API type
(`openai-completions` or `anthropic-messages`) | Cross-provider switches
only |

### Usage example (NVIDIA → Anthropic)

```bash
# On host: configure gateway route
openshell inference set --provider anthropic-prod --model claude-sonnet-4-6 --no-verify

# Set env vars for the sandbox (via openshell or Docker)
export NEMOCLAW_MODEL_OVERRIDE="anthropic/claude-sonnet-4-6"
export NEMOCLAW_INFERENCE_API_OVERRIDE="anthropic-messages"

# Restart sandbox — no image rebuild needed
```

### Security

- Env vars come from the host (Docker/OpenShell), not from inside the
sandbox
- Config integrity is verified first (detects build-time tampering),
then override is applied
- Config hash is recomputed after patching
- Landlock locks the file after this function runs
- Agent cannot set these env vars

## Related Issue

Closes NVIDIA#759

## Test plan

- [ ] `npm test` passes (39 tests in nemoclaw-start.test.js)
- [ ] Set `NEMOCLAW_MODEL_OVERRIDE` → sandbox starts with overridden
model
- [ ] Unset env var → sandbox starts with original baked model (no
regression)
- [ ] Set invalid model → sandbox starts but inference fails (expected)
- [ ] Config hash passes integrity check on restart after override

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>

🤖 Generated with [Claude Code](https://claude.com/claude-code)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Env vars to override the active model and (optionally) the inference
API at container startup; when used in privileged startup, the runtime
config is updated and its integrity hash recomputed so startup
verification aligns.

* **Runtime safeguards**
* Input validation, API allowlist, symlink protections, applies only in
privileged mode, and no-op behavior when unset.

* **Tests**
* New unit and end-to-end tests covering override behavior, timing, hash
recomputation, validation, and noop cases.

* **Documentation**
* Guidance added for cross-provider switching and the runtime override
workflow.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Prekshi Vyas <prekshiv@nvidia.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
cv added a commit that referenced this pull request May 15, 2026
## Summary
`NEMOCLAW_SANDBOX_READY_TIMEOUT` has been a recognised env var since
#2849, but no documentation accompanied it —
`docs/reference/commands.md`, `docs/reference/troubleshooting.md`, and
the inference / deployment guides only mention the companion
`NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` (added in #1620 and documented at
that time). Operators hitting `Sandbox '<name>' was created but did not
become ready within 180s` have no doc-grep path to the workaround, and
the two timeouts are easy to conflate. This closes the documentation gap
left by #2849.

Originally tried under #3435; closed because that PR mis-framed the docs
as resolving #3344 / #3416 (the root cause of both was the GPU policy
bug fixed in #3436, not a timeout misconfiguration). The docs themselves
still have value as a follow-up to the env-var introductions, so
reopening as a new PR with the correct framing.

## Related Issue
<!-- Not closing any issue; this addresses the doc-gap surfaced while
investigating #3344 and #3416 (both already fixed in code by #3436). -->

## Changes
- `docs/reference/commands.md`: add `NEMOCLAW_SANDBOX_READY_TIMEOUT` and
`NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` to the Onboard Timeouts table.
- `docs/reference/troubleshooting.md`: new troubleshooting entry
"Sandbox onboard times out with 'did not become ready within Ns'" that
distinguishes the readiness wait from the inference-probe budget, with a
worked example.
- `docs/inference/use-local-inference.md`: cross-link the two timeouts
from the existing `NEMOCLAW_LOCAL_INFERENCE_TIMEOUT` section so readers
of either knob land on the other.
- `docs/deployment/deploy-to-remote-gpu.md`: new "First-Run Readiness
Budget" section calling out DGX Station / cloud-VM /
large-quantised-model conditions that exceed the default and showing how
to raise it.

No code changes — the readiness behaviour is unchanged.

## Type of Change

- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [x] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [ ] No secrets, API keys, or credentials committed
- [ ] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

---
Signed-off-by: Tinson Lai <tinsonl@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Added a “First-Run Readiness Budget” note for remote GPU hosts
explaining longer initial sandbox build/upload times and advice to
increase NEMOCLAW_SANDBOX_READY_TIMEOUT.
* Clarified that NEMOCLAW_LOCAL_INFERENCE_TIMEOUT applies to
inference-server validation while sandbox readiness uses
NEMOCLAW_SANDBOX_READY_TIMEOUT (default 180s).
* Expanded examples for exporting both timeouts and onboarding timeout
messaging.
* Added troubleshooting guidance and inspection steps when sandbox
readiness timeouts delete partial sandboxes.

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3440)

<!-- review_stack_entry_end -->
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Tinson Lai <tinsonl@nvidia.com>
Co-authored-by: Carlos Villela <cvillela@nvidia.com>
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

How to change timeout for local inference?

3 participants