fix(connect): stop silent gateway inference revert#4611
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds route reconciliation types and planner, makes runInferenceSet update the registry atomically while treating in-sandbox config sync as best-effort (exposing inSandboxConfigSynced), and refactors connect to use the planner and emit louder divergence warnings. ChangesGateway inference state reconciliation and robustness
Sequence Diagram(s)sequenceDiagram
participant User
participant InferenceSet as inference set
participant OpenShell
participant Registry as Registry/Sandbox
participant SandboxSide as Sandbox Config
User->>InferenceSet: set provider/model
InferenceSet->>OpenShell: update gateway inference route
OpenShell-->>InferenceSet: success
InferenceSet->>Registry: updateSandbox (provider/model)
Registry-->>InferenceSet: updated or error
alt registry update fails
InferenceSet-->>User: InferenceSetError
else registry update succeeds
InferenceSet->>SandboxSide: writeSandboxConfig (try)
alt config write succeeds
SandboxSide-->>InferenceSet: ok
InferenceSet->>SandboxSide: recomputeSandboxConfigHash (try)
alt hash ok
InferenceSet-->>User: log "inference route synced"
else hash fails
InferenceSet-->>User: log rebuild guidance
end
else config write fails
InferenceSet-->>User: log rebuild guidance
end
InferenceSet-->>User: return InferenceSetResult (provider/model, inSandboxConfigSynced)
end
sequenceDiagram
participant Connect
participant Gateway as OpenShell Gateway
participant Planner as planInferenceRouteReconcile
User->>Connect: connect to sandbox
Connect->>Gateway: fetch live inference state
alt live state exists
Gateway-->>Connect: GatewayInference
Connect->>Planner: plan(live, sandbox-recorded)
else live state is null
Gateway-->>Connect: null
Connect->>Planner: plan(null, sandbox-recorded)
end
alt plan.kind = aligned
Planner-->>Connect: { kind: aligned }
Connect->>User: skip (silent)
else plan.kind = repair
Planner-->>Connect: { kind: repair }
Connect->>Gateway: quiet route setup
else plan.kind = diverged
Planner-->>Connect: { kind: diverged, live, recorded }
Connect-->>User: loud "differs from the recorded route"
Connect-->>User: loud "Aligning the gateway to <provider>/<model>"
Connect->>Gateway: update route
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Auto-dispatched E2E: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: Dispatch required scenario E2E:
Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 3 needs attention, 10 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
Selective E2E Results — ✅ All requested jobs passedRun: 26750157399
|
|
@hunglp6d can you address the findings in #4611 (comment) please? |
Selective E2E Results — ✅ All requested jobs passedRun: 26766842602
|
|
@cv — one of the advisor's "--no-sync-inference" findings: |
Selective E2E Results — ✅ All requested jobs passedRun: 26806392358
|
Selective E2E Results — ✅ All requested jobs passedRun: 26816819955
|
|
@cv addressed the advisor findings:
PR now is ready |
Selective E2E Results — ✅ All requested jobs passedRun: 26852944551
|
Selective E2E Results — ✅ All requested jobs passedRun: 26854232296
|
## Summary - Add the missing `v0.0.57` release-notes section with links to the detailed docs pages for command, inference, onboarding, messaging, status, installer, and policy changes. - Remove public references to docs-skip terms from source docs and regenerate the NemoClaw user skills from the current Fern MDX docs. - Carry forward generated references for the per-agent documentation split, including Hermes-specific reference files. ## Source summary - #4615 and #4653 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release notes now cover host-side `sessions` and `agents` commands plus `NEMOCLAW_EXTRA_AGENTS_JSON` secondary-agent baking. - #4163, #4204, #4611, #4619, and #4676 -> `docs/about/release-notes.mdx`, `docs/inference/use-local-inference.mdx`: Release notes now cover managed vLLM progress/readiness, DGX Spark model default changes, local Ollama streaming usage, and inference route divergence warnings. - #4267, #4601, #4609, #4642, #4645, and #4661 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release notes now cover UFW auto-remediation, local-inference reachability gates, gateway reuse/binding, cancel rollback, and policy selection persistence. - #4577, #4582, #4607, and #4660 -> `docs/about/release-notes.mdx`, `docs/manage-sandboxes/messaging-channels.mdx`: Release notes now cover Slack validation, atomic `channels add`, WhatsApp QR diagnostics, and Slack placeholder normalization. - #4388, #4600, #4646, and #4647 -> `docs/about/release-notes.mdx`, `docs/reference/commands.mdx`: Release notes now cover status failure layers, paused-container hints, Docker-driver doctor behavior, and non-destructive stale-registry recovery. - #4569, #4579, and #4678 -> `docs/about/release-notes.mdx`, `docs/manage-sandboxes/lifecycle.mdx`, `docs/network-policy/integration-policy-examples.mdx`: Release notes now cover installer tag pinning, PyPI `uv` policy access, and observable Jira validation. - #4632 -> `.agents/skills/`: Regenerated user skills from the current per-agent docs source, including newly generated Hermes reference files. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" docs --glob "*.mdx"` - `rg "permissive mode|shields down|shields up|shields status|config rotate-token|rotate-token" .agents/skills --glob "*.md"` - `npm run docs` - `npm run build:cli` - Commit hooks: markdownlint, docs-to-skills verification, gitleaks, skills YAML, commitlint <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Restructured documentation to clearly distinguish OpenClaw and Hermes agent variants throughout user guides. * Enhanced security, credential storage, and deployment guidance with clearer setup flows. * Added Hermes plugin installation and ecosystem documentation. * Improved workspace, messaging, and policy management references with variant-specific command examples. * Refined troubleshooting and CLI reference sections for clarity. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
When the live gateway inference route and a sandbox's recorded route diverge,
connectused to silently reapply the recorded route to the gateway — quietly reverting a user's model change with only a status-style line (#3726). This PR makes that override loud (a warning naming the mismatch and the supported way to change the route) and removes the precondition that made the revert destructive, by persisting the route to the registry before the failure-prone in-sandbox sync ininference set. It intentionally only detects and surfaces the mismatch — it does not try to silently reconcile every layer fromconnect, since some layers cannot be set safely there.Related Issue
Fixes #3726
Related: #3725 — the in-sandbox sync failure that this PR makes non-destructive (no longer leaves the recorded route stale).
Changes
inference set: write the host registry before the in-sandbox config sync, and make that sync best-effort — on failure it warns and points atrebuildinstead of aborting and leaving the gateway/registry split.connect: classify the gateway-vs-recorded route asaligned/repair/divergedvia a new pure helperplanInferenceRouteReconcile(src/lib/inference/config.ts).diverged: print a loud warning (the mismatch + which route is being applied + the supported command), even in--probe-onlyquiet mode, instead of a silent status line. The shared gateway is still re-pointed to the recorded route.inference setregistry-before-sync resilience; updated the connect integration test for the loud output.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: Hung Le hple@nvidia.com
Summary by CodeRabbit
New Features
Bug Fixes
Tests