fix(inference): force completions API for providers without /v1/responses#5241
Conversation
…nses Switching a sandbox to nvidia-prod/nvidia-nim/gemini-api from a different provider could configure api: "openai-responses" and 404 every request. On a provider switch resolveRuntimeInferenceApi returns null (its session and config branches are gated on currentProvider === provider), so runInferenceSet falls back to the persisted shared "inference" provider api, which may carry a prior provider's "openai-responses" over. getSandboxInferenceConfig did not reset it for these providers (unlike anthropic-prod, which forces anthropic-messages), so buildProviderConfig wrote a Responses API the endpoint does not expose. The switch path runs no validation probe, so it failed silently on the first turn. Force inferenceApi to "openai-completions" when shouldSkipResponsesProbe(provider) is true, the canonical no-/responses set. Closes NVIDIA#5239 Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR fixes a bug where switching to inference providers that lack ChangesResponses API Provider Guard
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
prekshivyas
left a comment
There was a problem hiding this comment.
Reviewed (code + 9-cat security), verifying the logic against source. Forces openai-completions in getSandboxInferenceConfig for providers that lack a /v1/responses endpoint, so a stale openai-responses api can't carry over on a provider switch and 404 every turn (#).
✅ Approve — correct, minimal, well-tested.
Verified against source:
shouldSkipResponsesProbe(validation.ts:251) is exactlynvidia-prod/nvidia-nim/gemini-api.- The guard runs before the
switch (provider), and those three providers' switch cases only setinferenceCompat, notinferenceApi— so the forcedopenai-completionspersists.anthropic-prodoverrides toanthropic-messagesin the switch (and isn't in the skip-list, so the guard correctly doesn't fire), andcompatible-endpoint(not skip-listed) still honors a passedopenai-responses. - The 4 new tests map exactly onto that matrix (stale-responses→completions for the 3; responses honored for compatible-endpoint; anthropic-prod→anthropic-messages).
Security: all 9 pass — pure config-selection change, no secrets/network/injection/auth/crypto surface; forcing the universally-supported completions path for known-no-/responses providers is a safe default.
Nit (non-blocking): the guard relies on those providers' switch cases never assigning inferenceApi. That holds today, but a one-line comment in the nvidia-prod/gemini-api cases (or asserting it in a test) would guard against a future edit silently reintroducing the bug.
## Summary - Add v0.0.64 release notes from the release announcement and link them to the relevant deeper docs. - Document that custom policy presets recorded through `policy-add --from-file` and `--from-dir` survive snapshot restore and sandbox recreation. - Refresh generated NemoClaw user skills from the current source docs. ## Source summary - #5104 -> `docs/manage-sandboxes/backup-restore.mdx`, `docs/network-policy/customize-network-policy.mdx`: Documents custom policy presets preserved through snapshot restore. - #4955 -> `docs/about/release-notes.mdx`: Adds release-note coverage for Brave web-search pinning and `BRAVE_API_KEY` placeholder preservation. - #5116, #5269 -> `docs/about/release-notes.mdx`: Adds release-note coverage for Docker-driver gateway health and rootfs guard stability. - #5241, #5085 -> `docs/about/release-notes.mdx`: Adds release-note coverage for chat-completions provider selection and Nemotron Ultra 550B tool-less request compatibility. - #5268, #5210, #5257 -> `docs/about/release-notes.mdx`: Adds release-note coverage for messaging render plan refresh, OpenClaw scope-upgrade approval recovery, and Hermes WhatsApp bridge dependency setup. - Current source docs -> `.agents/skills/`: Regenerates user-skill references so agent-facing guidance matches the source documentation. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` - `npm run build:cli` - `npm run typecheck:cli` - Commit/pre-push hooks: markdownlint, gitleaks, docs-to-skills verification, TypeScript CLI, and skills YAML checks passed. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Clarified sandbox snapshot restore preserves custom policy presets and restores them without original files. * Switched sandbox setup and remote deployment guidance to Docker-based workflows and emphasized remote onboarding flow. * Expanded troubleshooting for gateway recovery, Docker GPU/WSL issues, and onboarding resume. * Added/updated CLI docs: advanced maintenance, session export, upload/download wrappers, and status recovery guidance. * Added v0.0.64 release notes and links to NemoClaw Community; fixed command reference formatting. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary
Switching a sandbox to
nvidia-prod/nvidia-nim/gemini-apifrom a different provider could configureapi: "openai-responses"and 404 every request, because those providers do not expose/v1/responses. The switch path runs no validation probe, so it failed silently on the first turn.On a provider switch,
resolveRuntimeInferenceApireturnsnull(its session and config branches are gated oncurrentProvider === provider), sorunInferenceSetfalls back togetPreferredInferenceApi(config), which reads the persisted shared"inference"provider api. A prior compatible-endpoint validated asopenai-responsesis still recorded there and gets carried into the switch.getSandboxInferenceConfigdid not resetinferenceApifor these providers (unlikeanthropic-prod, which forcesanthropic-messages), sobuildProviderConfigwrote a Responses API the endpoint does not expose.Related Issue
Closes #5239.
Changes
src/lib/inference/config.ts: ingetSandboxInferenceConfig, forceinferenceApi = "openai-completions"whenshouldSkipResponsesProbe(provider)is true (the canonical no-/responsesset:nvidia-prod,nvidia-nim,gemini-api), mirroring howanthropic-prodforcesanthropic-messages.test/inference-config-responses-api.test.ts: cover that the three no-responses providers force completions even with a staleopenai-responsescarryover, that a compatible-endpoint still honorsopenai-responses, and thatanthropic-prodstays onanthropic-messages.Type of Change
Verification
npx vitest run --project cli test/inference-config-responses-api.test.ts test/onboard-model-router.test.ts test/nemotron-inference-fix.test.ts— 15/15 passnpm run typecheck:cliandnpm run build:cli— cleanSigned-off-by: latenighthackathon latenighthackathon@users.noreply.github.com
Summary by CodeRabbit
Bug Fixes
Tests