Skip to content

fix(inference): force completions API for providers without /v1/responses#5241

Merged
cv merged 2 commits into
NVIDIA:mainfrom
latenighthackathon:fix/inference-set-stale-responses-api
Jun 11, 2026
Merged

fix(inference): force completions API for providers without /v1/responses#5241
cv merged 2 commits into
NVIDIA:mainfrom
latenighthackathon:fix/inference-set-stale-responses-api

Conversation

@latenighthackathon

@latenighthackathon latenighthackathon commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Switching a sandbox to nvidia-prod / nvidia-nim / gemini-api from a different provider could configure api: "openai-responses" and 404 every request, because those providers do not expose /v1/responses. The switch path runs no validation probe, so it failed silently on the first turn.

On a provider switch, resolveRuntimeInferenceApi returns null (its session and config branches are gated on currentProvider === provider), so runInferenceSet falls back to getPreferredInferenceApi(config), which reads the persisted shared "inference" provider api. A prior compatible-endpoint validated as openai-responses is still recorded there and gets carried into the switch. getSandboxInferenceConfig did not reset inferenceApi for these providers (unlike anthropic-prod, which forces anthropic-messages), so buildProviderConfig wrote a Responses API the endpoint does not expose.

Related Issue

Closes #5239.

Changes

  • src/lib/inference/config.ts: in getSandboxInferenceConfig, force inferenceApi = "openai-completions" when shouldSkipResponsesProbe(provider) is true (the canonical no-/responses set: nvidia-prod, nvidia-nim, gemini-api), mirroring how anthropic-prod forces anthropic-messages.
  • test/inference-config-responses-api.test.ts: cover that the three no-responses providers force completions even with a stale openai-responses carryover, that a compatible-endpoint still honors openai-responses, and that anthropic-prod stays on anthropic-messages.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)

Verification

  • npx vitest run --project cli test/inference-config-responses-api.test.ts test/onboard-model-router.test.ts test/nemotron-inference-fix.test.ts — 15/15 pass
  • npm run typecheck:cli and npm run build:cli — clean

Signed-off-by: latenighthackathon latenighthackathon@users.noreply.github.com

Summary by CodeRabbit

  • Bug Fixes

    • Improved AI provider configuration handling to ensure proper endpoint selection across different providers, enhancing stability and consistency in inference API behavior.
  • Tests

    • Added comprehensive test coverage for provider-specific API endpoint configuration validation.

…nses

Switching a sandbox to nvidia-prod/nvidia-nim/gemini-api from a different
provider could configure api: "openai-responses" and 404 every request.

On a provider switch resolveRuntimeInferenceApi returns null (its session and
config branches are gated on currentProvider === provider), so runInferenceSet
falls back to the persisted shared "inference" provider api, which may carry a
prior provider's "openai-responses" over. getSandboxInferenceConfig did not
reset it for these providers (unlike anthropic-prod, which forces
anthropic-messages), so buildProviderConfig wrote a Responses API the endpoint
does not expose. The switch path runs no validation probe, so it failed
silently on the first turn.

Force inferenceApi to "openai-completions" when shouldSkipResponsesProbe(provider)
is true, the canonical no-/responses set.

Closes NVIDIA#5239

Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d341f5d4-d615-4639-b474-f0466084c9eb

📥 Commits

Reviewing files that changed from the base of the PR and between be0ed7b and 57d1760.

📒 Files selected for processing (2)
  • src/lib/inference/config.ts
  • test/inference-config-responses-api.test.ts

📝 Walkthrough

Walkthrough

This PR fixes a bug where switching to inference providers that lack /v1/responses endpoint support (NVIDIA Build, NIM, Gemini API) inherits a stale openai-responses API configuration from a previously-active provider, causing all requests to 404. The fix forces openai-completions for those providers in the configuration function and adds targeted test coverage.

Changes

Responses API Provider Guard

Layer / File(s) Summary
Configuration override for unsupported providers
src/lib/inference/config.ts
shouldSkipResponsesProbe is imported and used inside getSandboxInferenceConfig to conditionally force inferenceApi to "openai-completions" for providers that do not support the /v1/responses endpoint, preventing runtime fallback to a persisted "openai-responses" API across provider switches.
Provider-specific API override tests
test/inference-config-responses-api.test.ts
Vitest suite validates that stale openai-responses API is overridden to openai-completions for nvidia-prod, nvidia-nim, and gemini-api, while compatible endpoints preserve openai-responses and anthropic-prod consistently maps to anthropic-messages.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 The config once fell to its knees,
When switching providers with careless ease—
Now a simple guard stands true and tall,
Forcing completions for those without all,
No more 404s when the providers call!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix(inference): force completions API for providers without /v1/responses' is directly related to the main change and accurately summarizes the primary objective of the PR.
Linked Issues check ✅ Passed The PR fully implements the requirement from issue #5239 by modifying getSandboxInferenceConfig to force inferenceApi to 'openai-completions' for providers documented by shouldSkipResponsesProbe (nvidia-prod, nvidia-nim, gemini-api) and includes comprehensive tests validating this behavior.
Out of Scope Changes check ✅ Passed All changes are directly scoped to the stated objectives: modifications to src/lib/inference/config.ts to handle the API forcing logic and new tests in test/inference-config-responses-api.test.ts to validate the fix, with no unrelated alterations present.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@prekshivyas prekshivyas left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed (code + 9-cat security), verifying the logic against source. Forces openai-completions in getSandboxInferenceConfig for providers that lack a /v1/responses endpoint, so a stale openai-responses api can't carry over on a provider switch and 404 every turn (#).

Approve — correct, minimal, well-tested.

Verified against source:

  • shouldSkipResponsesProbe (validation.ts:251) is exactly nvidia-prod / nvidia-nim / gemini-api.
  • The guard runs before the switch (provider), and those three providers' switch cases only set inferenceCompat, not inferenceApi — so the forced openai-completions persists. anthropic-prod overrides to anthropic-messages in the switch (and isn't in the skip-list, so the guard correctly doesn't fire), and compatible-endpoint (not skip-listed) still honors a passed openai-responses.
  • The 4 new tests map exactly onto that matrix (stale-responses→completions for the 3; responses honored for compatible-endpoint; anthropic-prod→anthropic-messages).

Security: all 9 pass — pure config-selection change, no secrets/network/injection/auth/crypto surface; forcing the universally-supported completions path for known-no-/responses providers is a safe default.

Nit (non-blocking): the guard relies on those providers' switch cases never assigning inferenceApi. That holds today, but a one-line comment in the nvidia-prod/gemini-api cases (or asserting it in a test) would guard against a future edit silently reintroducing the bug.

@prekshivyas prekshivyas self-assigned this Jun 11, 2026
@cv cv merged commit 1fca488 into NVIDIA:main Jun 11, 2026
38 checks passed
@cv cv added the v0.0.64 Release target label Jun 12, 2026
@wscurran wscurran added area: inference Inference routing, serving, model selection, or outputs area: routing Request routing, policy routing, model selection, or fallback logic bug-fix PR fixes a bug or regression provider: nvidia NVIDIA inference endpoint, NIM, or NVIDIA provider behavior labels Jun 12, 2026
@wscurran

Copy link
Copy Markdown
Contributor

cv pushed a commit that referenced this pull request Jun 12, 2026
## Summary
- Add v0.0.64 release notes from the release announcement and link them
to the relevant deeper docs.
- Document that custom policy presets recorded through `policy-add
--from-file` and `--from-dir` survive snapshot restore and sandbox
recreation.
- Refresh generated NemoClaw user skills from the current source docs.

## Source summary
- #5104 -> `docs/manage-sandboxes/backup-restore.mdx`,
`docs/network-policy/customize-network-policy.mdx`: Documents custom
policy presets preserved through snapshot restore.
- #4955 -> `docs/about/release-notes.mdx`: Adds release-note coverage
for Brave web-search pinning and `BRAVE_API_KEY` placeholder
preservation.
- #5116, #5269 -> `docs/about/release-notes.mdx`: Adds release-note
coverage for Docker-driver gateway health and rootfs guard stability.
- #5241, #5085 -> `docs/about/release-notes.mdx`: Adds release-note
coverage for chat-completions provider selection and Nemotron Ultra 550B
tool-less request compatibility.
- #5268, #5210, #5257 -> `docs/about/release-notes.mdx`: Adds
release-note coverage for messaging render plan refresh, OpenClaw
scope-upgrade approval recovery, and Hermes WhatsApp bridge dependency
setup.
- Current source docs -> `.agents/skills/`: Regenerates user-skill
references so agent-facing guidance matches the source documentation.

## Verification
- `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix
nemoclaw-user --doc-platform fern-mdx`
- `npm run docs`
- `npm run build:cli`
- `npm run typecheck:cli`
- Commit/pre-push hooks: markdownlint, gitleaks, docs-to-skills
verification, TypeScript CLI, and skills YAML checks passed.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Documentation**
* Clarified sandbox snapshot restore preserves custom policy presets and
restores them without original files.
* Switched sandbox setup and remote deployment guidance to Docker-based
workflows and emphasized remote onboarding flow.
* Expanded troubleshooting for gateway recovery, Docker GPU/WSL issues,
and onboarding resume.
* Added/updated CLI docs: advanced maintenance, session export,
upload/download wrappers, and status recovery guidance.
* Added v0.0.64 release notes and links to NemoClaw Community; fixed
command reference formatting.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: inference Inference routing, serving, model selection, or outputs area: routing Request routing, policy routing, model selection, or fallback logic bug-fix PR fixes a bug or regression provider: nvidia NVIDIA inference endpoint, NIM, or NVIDIA provider behavior v0.0.64 Release target

Projects

None yet

Development

Successfully merging this pull request may close these issues.

inference set: switching to nvidia-prod/nvidia-nim/gemini-api inherits a stale openai-responses API and 404s every request

4 participants