Skip to content

fix(runtime): probe live gateway to reconcile agent identity (#3175)#3679

Merged
cv merged 6 commits into
mainfrom
worktree-fix-3175-reconciler-uses-live-gateway
Jun 4, 2026
Merged

fix(runtime): probe live gateway to reconcile agent identity (#3175)#3679
cv merged 6 commits into
mainfrom
worktree-fix-3175-reconciler-uses-live-gateway

Conversation

@jason-ma-nv

@jason-ma-nv jason-ma-nv commented May 18, 2026

Copy link
Copy Markdown
Contributor

Summary

The reconcile_agent_model_with_provider safety net added in #3319 reads both sides of its equality check from the same file (primary vs providers.inference.models[0].name). When a user runs openshell inference set — which only writes the gateway and leaves /sandbox/.openclaw/openclaw.json untouched — both fields stay equal to each other and the reconciler short-circuits as a no-op. This PR makes the reconciler probe the live gateway via openshell inference get --json and use that as the source of truth so the file is actually realigned to the routed model on the next sandbox start.

Related Issue

Partial fix for #3175. See the issue comment for why a complete fix also requires an OpenShell-side change.

Changes

  • scripts/nemoclaw-start.sh: reconcile_agent_model_with_provider now probes the live gateway and, when a model is returned, treats it as authoritative — aligning both agents.defaults.model.primary and models.providers.inference.models[0].{id,name} so the file no longer carries a stale entry the gateway can push back on its next reconcile.
  • Legacy in-file reconcile is preserved as a fallback for environments where the openshell binary is missing or the probe fails (no behavioral change for existing test scaffolds without a stubbed openshell).
  • test/nemoclaw-start-reconcile.test.ts: harness now optionally installs a stubbed openshell on PATH and scrubs any inherited host openshell. Adds four cases covering the gateway-mode happy path (the user's repro from [Inference] openshell inference set changes gateway model but sandbox agent still reports old model #3175), inference-prefix idempotency, gateway-matches-file no-op, and empty-probe fallback.

Scope

This PR does not stop the silent runtime revert reported in the #3175 repro (gateway version 3 → 4 without a sandbox restart). That push-back is owned by openshell-gateway and needs an OpenShell-side change; see the linked issue comment.

Type of Change

  • Code change (feature, bug fix, or refactor)
  • Code change with doc updates
  • Doc only (prose changes, no code sample modifications)
  • Doc only (includes code sample changes)

Verification

  • `npx prek run --all-files` passes
  • `npm test` passes for the touched test file and the adjacent `test/nemoclaw-start.test.ts` (81/81)
  • Tests added or updated for new or changed behavior
  • No secrets, API keys, or credentials committed
  • Docs updated for user-facing behavior changes
  • `make docs` builds without warnings (doc changes only)

Notes for reviewers:

  • Local full CLI vitest run had 4 pre-existing failures on this machine unrelated to the diff (ssrf-parity build artifact missing, cli.test.ts gateway-cleanup timeouts, sandbox-build-context umask 775 vs 755). Will rely on PR CI to validate.
  • The gateway-probe Python block uses subprocess.run(timeout=3) to bound the probe and silently falls back on any error.

Signed-off-by: jason-ma-nv jama@nvidia.com

Summary by CodeRabbit

  • New Features

    • Reconciliation now prefers a live gateway probe as the source of truth for model alignment; when available it updates the agent's primary model and the provider's primary model entry, with a legacy in-file fallback if the probe is unavailable.
  • Tests

    • Expanded tests cover probe-driven updates, no-op when already aligned, fallback behavior, malformed probe output, and deterministic probe stubbing.

The reconciler in scripts/nemoclaw-start.sh was meant to be the safety
net for users who run `openshell inference set` instead of `nemoclaw
inference set`. As written it read both sides of its equality check
from the same file (primary vs providers.inference.models[0].name),
so when `openshell inference set` modified only the gateway and left
the sandbox file untouched, the reconciler saw the two file fields
still equal and short-circuited as a no-op.

Probe the live gateway via `openshell inference get --json` and use
it as the source of truth. When the gateway model differs from the
file, align primary AND models[0].{id,name} so the agent identity
and the gateway route stay consistent across the next reconcile
cycle. Fall back to the legacy in-file reconcile when openshell is
unavailable so environments without the probe still resolve drift.

This is a partial fix for #3175: it closes the agent self-report
drift at the next sandbox restart, but the silent runtime revert
in the user's repro also needs an OpenShell-side fix to stop the
gateway from pushing the sandbox's stale config back onto a runtime
`inference set` change.

Signed-off-by: jason-ma-nv <jama@nvidia.com>
@coderabbitai

coderabbitai Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5c63d087-4a55-4868-a6b7-c3a3e741f971

📥 Commits

Reviewing files that changed from the base of the PR and between f32465f and d1d4982.

📒 Files selected for processing (1)
  • scripts/nemoclaw-start.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • scripts/nemoclaw-start.sh

📝 Walkthrough

Walkthrough

The PR makes reconcile_agent_model_with_provider probe the live gateway (openshell) for the current model, normalizes and aligns agent and provider model entries when successful, and falls back to legacy in-file reconciliation when the probe is unavailable or invalid. Tests and harness support for gateway stubbing are added.

Changes

Gateway Model Reconciliation

Layer / File(s) Summary
Gateway probe logic and reconciliation algorithm
scripts/nemoclaw-start.sh
Updated reconcile_agent_model_with_provider to call openshell inference get --json, extract/qualify data.model, decide gateway vs legacy write, and in gateway mode update agents.defaults.model.primary and models.providers.inference.models[0] (repairing the models list/first element when missing).
Test harness setup for gateway mocking
test/nemoclaw-start-reconcile.test.ts
Added RunReconcileOptions (gatewayModel, gatewayRawOutput, env), changed runReconcile signature, conditionally install bin/openshell stub that emits controlled stdout, and construct deterministic PATH removing inherited openshell and prepending the stub; spawned process receives merged env.
Test coverage for gateway-driven reconciliation
test/nemoclaw-start-reconcile.test.ts
Added gateway-focused tests: updating both agent and provider when stale, accepting already-qualified gateway models without double-prefixing, no-op when file fields already match gateway model, and legacy fallback when gateway returns empty or malformed output.

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

fix, NemoClaw CLI, OpenShell

Suggested reviewers

  • ericksoa
  • cv

"I hopped through logs and paths aglow,
A stub that answers when probes must go,
I nudged the models so files all show,
When gateways hush, old ways still flow,
A rabbit cheers — the tests now know."

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: probing the live gateway to reconcile agent identity, directly reflecting the core fix described in the PR objectives.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-fix-3175-reconciler-uses-live-gateway

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 18, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: openclaw-inference-switch-e2e, model-router-provider-routed-inference-e2e
Optional E2E: runtime-overrides-e2e, inference-routing-e2e

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • openclaw-inference-switch-e2e (high): Best existing coverage for the affected path: installs OpenClaw, switches inference with nemoclaw inference set, verifies OpenShell route, openclaw.json primary/model entries, config hash, and live OpenClaw requests.
  • model-router-provider-routed-inference-e2e (high): The PR changes provider-routed model reconciliation against the live gateway; this regression job validates Model Router/provider-routed inference through inference.local after onboard.

Optional E2E

  • runtime-overrides-e2e (medium): Useful adjacent confidence for the same startup script config-write/ownership/hash machinery, though it does not specifically exercise the live OpenShell gateway probe.
  • inference-routing-e2e (medium): Optional broader inference routing smoke coverage if maintainers want extra confidence that route selection still behaves with real provider calls.

New E2E recommendations

  • sandbox restart reconciliation after route drift (high): No existing E2E appears to directly reproduce the new [Inference] openshell inference set changes gateway model but sandbox agent still reports old model #3175 startup scenario: live OpenShell gateway route is changed while /sandbox/.openclaw/openclaw.json is stale, then the sandbox restarts and must reconcile primary, models[0].id/name, .config-hash, and live agent responses to the gateway model.
    • Suggested test: Add an OpenClaw inference-switch restart E2E that changes the live gateway model, intentionally verifies/creates stale openclaw.json state, restarts the sandbox, then asserts openshell inference get --json, agents.defaults.model.primary, models.providers.inference.models[0], .config-hash, and an OpenClaw agent turn all agree.

@wscurran

Copy link
Copy Markdown
Contributor

@wscurran

Copy link
Copy Markdown
Contributor

Sprint 5 planning update: we’re organizing this PR as a partial fix for #3175.

Relationship:

This PR should be tracked for Sprint 5 review, but it should not auto-close #3175 when merged.

@wscurran wscurran added the v0.0.58 Release target label Jun 2, 2026
@wscurran wscurran added area: inference Inference routing, serving, model selection, or outputs area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality and removed OpenShell labels Jun 3, 2026
… JSON

Adds the missing coverage case from a deep-review pass: today the
gateway probe absorbs malformed stdout via Python's `json.loads` →
`SystemExit(0)` → empty `gateway_model` → legacy in-file reconcile.
That behavior is correct but only protected by the implementation
detail of the parser. A future refactor that swaps the parser or
changes the exit handling could silently turn the malformed-output
case into "do nothing" without any unit test catching it.

Extends the test harness with a `gatewayRawOutput` option that lets
the stub emit arbitrary stdout (bypassing the JSON-formatted shape),
then adds one test that drives an HTML-like response and asserts the
legacy-fallback shape (primary aligned to file's first model, models[0]
untouched).

10/10 reconciler tests pass.

Signed-off-by: Charan Jagwani <cjagwani@nvidia.com>

@cjagwani cjagwani left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. Pushed one follow-up test commit (7f6ea8c) to pin the legacy-fallback shape when the gateway probe emits malformed JSON.

The case is already handled correctly today (json.loadsSystemExit(0) → empty gateway_model → legacy in-file reconcile), but only via implementation detail of the parser. A future refactor that swaps how stdout is parsed or how the exit is handled could silently degrade malformed-output to "do nothing" with no test catching it. The new test drives an HTML-like response through the existing harness (extended with a gatewayRawOutput option) and asserts the legacy-fallback config shape.

10/10 reconciler tests pass.

Heads-up not blocking this PR:

  • command -v openshell at the call site is host-namespaced; if a future packaging change ever ships a shim openshell without inference get --json, the current absorb-via-SystemExit(0) path covers it but worth a code comment.
  • Sync'd-but-non-canonical id: "inference/foo" values pass the equality check without rewriting (write path always writes bare). Cosmetic — config stays non-canonical until next true drift.

Scope check vs #3175: PR is intentionally a partial fix per its body and @wscurran's note — closes the "agent self-report stays old after restart" symptom but not the runtime v3→v4 silent revert that lives in the openshell-gateway layer. Issue should stay open.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

E2E Scenario Advisor Recommendation

Required scenario E2E: ubuntu-repo-cloud-openclaw-double-same-provider, ubuntu-repo-cloud-openclaw-double-provider-switch
Optional scenario E2E: ubuntu-repo-cloud-openclaw

Dispatch required scenario E2E:

  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw-double-same-provider
  • gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw-double-provider-switch

Workflow run

Full scenario advisor summary

E2E Scenario Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required scenario E2E

  • ubuntu-repo-cloud-openclaw-double-same-provider: scripts/nemoclaw-start.sh changes OpenClaw sandbox startup reconciliation after inference route/model changes. This scenario exercises the same-provider double switch lifecycle on the standard Ubuntu repo-current OpenClaw path.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw-double-same-provider
  • ubuntu-repo-cloud-openclaw-double-provider-switch: The changed reconciliation logic explicitly handles gateway/provider routing drift after inference provider changes. This scenario is the smallest routed scenario that exercises the OpenClaw double-provider-switch lifecycle.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw-double-provider-switch

Optional scenario E2E

  • ubuntu-repo-cloud-openclaw: Adjacent baseline Ubuntu OpenClaw scenario can provide broad startup/inference regression coverage for the modified sandbox start script, but it does not specifically exercise the route-switch lifecycle targeted by this PR.
    • Dispatch: gh workflow run e2e-scenarios.yaml --ref <pr-head-ref> --field scenarios=ubuntu-repo-cloud-openclaw

Relevant changed files

  • scripts/nemoclaw-start.sh

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

PR Review Advisor

Findings: 0 needs attention, 1 worth checking, 0 nice ideas
Top item: PR review advisor unavailable

Review findings

🛠️ Needs attention

  • None.

🔎 Worth checking

  • PR review advisor unavailable: The automated advisor could not complete: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt
    • Recommendation: Re-run the PR Review Advisor or perform a manual review.
    • Evidence: Could not parse JSON from PR review advisor output; see /home/runner/work/NemoClaw/NemoClaw/artifacts/pr-review-advisor/pr-review-advisor-raw-output.txt

🌱 Nice ideas

  • None.
Consider writing more tests for
  • **Runtime validation** — Add or identify targeted runtime/integration validation for the changed behavior; do not report external E2E job pass/fail here.. Runtime/sandbox/infrastructure paths need behavioral runtime validation: scripts/nemoclaw-start.sh.

Workflow run details

This is an automated advisory review. A human maintainer must make the final merge decision.

@cv cv added v0.0.59 Release target and removed v0.0.58 Release target labels Jun 4, 2026
@cv cv merged commit 30ae331 into main Jun 4, 2026
30 checks passed
@cv cv deleted the worktree-fix-3175-reconciler-uses-live-gateway branch June 4, 2026 20:21
cv pushed a commit that referenced this pull request Jun 5, 2026
## Summary
- Add the v0.0.59 release notes from the GitHub announcement discussion.
- Refresh local inference and credential-storage guidance for the
current release behavior.
- Regenerate the user skills from the updated Fern docs.
- Tighten release-prep and docs review guidance for generated skills, PR
labels, and shared `$$nemoclaw` command placeholders.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `rg "permissive mode|shields down|shields up|shields status|config
rotate-token|rotate-token" --glob '*.{md,mdx}'`
- `git diff --check`
- `npm run docs` (rerun outside sandbox after sandbox-only `tsx` IPC
permission failure)
- `npm run typecheck:cli`
- Pre-commit hooks during commit passed, including markdownlint,
docs-to-skills verification, gitleaks, commitlint, and skills YAML
tests.

## Source Summary
- #3679, #4437, #4681, #4766, #4772, #4775, #4786 ->
`docs/about/release-notes.mdx`, `docs/reference/commands.mdx`,
`docs/reference/troubleshooting.mdx`: Summarize OpenClaw 2026.5.27
compatibility, runtime path pinning, plugin registry recovery, live
gateway reconciliation, and clearer host-alias/startup diagnostics.
- #4332, #4402, #4769, #4776, #4779 -> `docs/about/release-notes.mdx`,
`docs/inference/inference-options.mdx`,
`docs/inference/use-local-inference.mdx`,
`docs/inference/switch-inference-providers.mdx`: Document the release
inference changes covering Local NIM waits, Hermes Anthropic routing,
Nemotron 3 Ultra, the current Ollama starter fallback, and Spark
managed-vLLM context length.
- #4628, #4652, #4733, #4745 -> `docs/about/release-notes.mdx`,
`docs/security/credential-storage.mdx`,
`docs/manage-sandboxes/messaging-channels.mdx`,
`docs/reference/troubleshooting.mdx`: Capture permission healing,
gateway-stored credential reuse, cross-sandbox messaging credential
conflict checks, and CDI preflight diagnostics.
- #4728, #4737, #4743, #4744, #4782 -> `.agents/skills/nemoclaw-user-*`:
Regenerate the user skill references from the updated source docs.
- Follow-up maintenance ->
`.agents/skills/nemoclaw-contributor-update-docs/SKILL.md`,
`.coderabbit.yaml`: Add release-prep area labels for docs and skills
PRs, and teach docs review guidance that `$$nemoclaw` is the correct
shared command placeholder for examples that work across agent aliases.

Note: the `documentation` label was not present in the repository, so
this PR is labeled with `v0.0.59` only.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
  * Updated default model for local Ollama inference setup to qwen3.5:9b
  * Added Nemotron 3 Ultra 550B as an NVIDIA Endpoints model option
* Clarified credential storage and reuse behavior for post-deployment
(day-two) operations
* Added v0.0.59 release notes covering OpenClaw compatibility, inference
options, Hermes messaging sync, and troubleshooting
* Clarified CLI selection guidance and updated OpenClaw version example
in status output
* Revised release-prep instructions and docs review guidance for CLI
alias usage
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
@wscurran wscurran removed the feature PR adds or expands user-visible functionality label Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: inference Inference routing, serving, model selection, or outputs area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression v0.0.59 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Inference] openshell inference set changes gateway model but sandbox agent still reports old model

4 participants