fix(inference): force completions API for providers without /v1/responses by latenighthackathon · Pull Request #5241 · NVIDIA/NemoClaw

latenighthackathon · 2026-06-11T14:55:30Z

Summary

Switching a sandbox to nvidia-prod / nvidia-nim / gemini-api from a different provider could configure api: "openai-responses" and 404 every request, because those providers do not expose /v1/responses. The switch path runs no validation probe, so it failed silently on the first turn.

On a provider switch, resolveRuntimeInferenceApi returns null (its session and config branches are gated on currentProvider === provider), so runInferenceSet falls back to getPreferredInferenceApi(config), which reads the persisted shared "inference" provider api. A prior compatible-endpoint validated as openai-responses is still recorded there and gets carried into the switch. getSandboxInferenceConfig did not reset inferenceApi for these providers (unlike anthropic-prod, which forces anthropic-messages), so buildProviderConfig wrote a Responses API the endpoint does not expose.

Related Issue

Closes #5239.

Changes

src/lib/inference/config.ts: in getSandboxInferenceConfig, force inferenceApi = "openai-completions" when shouldSkipResponsesProbe(provider) is true (the canonical no-/responses set: nvidia-prod, nvidia-nim, gemini-api), mirroring how anthropic-prod forces anthropic-messages.
test/inference-config-responses-api.test.ts: cover that the three no-responses providers force completions even with a stale openai-responses carryover, that a compatible-endpoint still honors openai-responses, and that anthropic-prod stays on anthropic-messages.

Type of Change

Bug fix (non-breaking change which fixes an issue)

Verification

npx vitest run --project cli test/inference-config-responses-api.test.ts test/onboard-model-router.test.ts test/nemotron-inference-fix.test.ts — 15/15 pass
npm run typecheck:cli and npm run build:cli — clean

Signed-off-by: latenighthackathon latenighthackathon@users.noreply.github.com

Summary by CodeRabbit

Bug Fixes
- Improved AI provider configuration handling to ensure proper endpoint selection across different providers, enhancing stability and consistency in inference API behavior.
Tests
- Added comprehensive test coverage for provider-specific API endpoint configuration validation.

…nses Switching a sandbox to nvidia-prod/nvidia-nim/gemini-api from a different provider could configure api: "openai-responses" and 404 every request. On a provider switch resolveRuntimeInferenceApi returns null (its session and config branches are gated on currentProvider === provider), so runInferenceSet falls back to the persisted shared "inference" provider api, which may carry a prior provider's "openai-responses" over. getSandboxInferenceConfig did not reset it for these providers (unlike anthropic-prod, which forces anthropic-messages), so buildProviderConfig wrote a Responses API the endpoint does not expose. The switch path runs no validation probe, so it failed silently on the first turn. Force inferenceApi to "openai-completions" when shouldSkipResponsesProbe(provider) is true, the canonical no-/responses set. Closes NVIDIA#5239 Signed-off-by: latenighthackathon <latenighthackathon@users.noreply.github.com>

coderabbitai · 2026-06-11T14:56:06Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: d341f5d4-d615-4639-b474-f0466084c9eb

📥 Commits

Reviewing files that changed from the base of the PR and between be0ed7b and 57d1760.

📒 Files selected for processing (2)

src/lib/inference/config.ts
test/inference-config-responses-api.test.ts

📝 Walkthrough

Walkthrough

This PR fixes a bug where switching to inference providers that lack /v1/responses endpoint support (NVIDIA Build, NIM, Gemini API) inherits a stale openai-responses API configuration from a previously-active provider, causing all requests to 404. The fix forces openai-completions for those providers in the configuration function and adds targeted test coverage.

Changes

Responses API Provider Guard

Layer / File(s)	Summary
Configuration override for unsupported providers `src/lib/inference/config.ts`	`shouldSkipResponsesProbe` is imported and used inside `getSandboxInferenceConfig` to conditionally force `inferenceApi` to `"openai-completions"` for providers that do not support the `/v1/responses` endpoint, preventing runtime fallback to a persisted `"openai-responses"` API across provider switches.
Provider-specific API override tests `test/inference-config-responses-api.test.ts`	Vitest suite validates that stale `openai-responses` API is overridden to `openai-completions` for `nvidia-prod`, `nvidia-nim`, and `gemini-api`, while compatible endpoints preserve `openai-responses` and `anthropic-prod` consistently maps to `anthropic-messages`.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐰 The config once fell to its knees,
When switching providers with careless ease—
Now a simple guard stands true and tall,
Forcing completions for those without all,
No more 404s when the providers call!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix(inference): force completions API for providers without /v1/responses' is directly related to the main change and accurately summarizes the primary objective of the PR.
Linked Issues check	✅ Passed	The PR fully implements the requirement from issue `#5239` by modifying getSandboxInferenceConfig to force inferenceApi to 'openai-completions' for providers documented by shouldSkipResponsesProbe (nvidia-prod, nvidia-nim, gemini-api) and includes comprehensive tests validating this behavior.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to the stated objectives: modifications to src/lib/inference/config.ts to handle the API forcing logic and new tests in test/inference-config-responses-api.test.ts to validate the fix, with no unrelated alterations present.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

prekshivyas

Reviewed (code + 9-cat security), verifying the logic against source. Forces openai-completions in getSandboxInferenceConfig for providers that lack a /v1/responses endpoint, so a stale openai-responses api can't carry over on a provider switch and 404 every turn (#).

✅ Approve — correct, minimal, well-tested.

Verified against source:

shouldSkipResponsesProbe (validation.ts:251) is exactly nvidia-prod / nvidia-nim / gemini-api.
The guard runs before the switch (provider), and those three providers' switch cases only set inferenceCompat, not inferenceApi — so the forced openai-completions persists. anthropic-prod overrides to anthropic-messages in the switch (and isn't in the skip-list, so the guard correctly doesn't fire), and compatible-endpoint (not skip-listed) still honors a passed openai-responses.
The 4 new tests map exactly onto that matrix (stale-responses→completions for the 3; responses honored for compatible-endpoint; anthropic-prod→anthropic-messages).

Security: all 9 pass — pure config-selection change, no secrets/network/injection/auth/crypto surface; forcing the universally-supported completions path for known-no-/responses providers is a safe default.

Nit (non-blocking): the guard relies on those providers' switch cases never assigning inferenceApi. That holds today, but a one-line comment in the nvidia-prod/gemini-api cases (or asserting it in a test) would guard against a future edit silently reintroducing the bug.

wscurran · 2026-06-12T14:13:48Z

✨
Related open issues:

#5239 inference set: switching to nvidia-prod/nvidia-nim/gemini-api inherits a stale openai-responses API and 404s every request

## Summary - Add v0.0.64 release notes from the release announcement and link them to the relevant deeper docs. - Document that custom policy presets recorded through `policy-add --from-file` and `--from-dir` survive snapshot restore and sandbox recreation. - Refresh generated NemoClaw user skills from the current source docs. ## Source summary - #5104 -> `docs/manage-sandboxes/backup-restore.mdx`, `docs/network-policy/customize-network-policy.mdx`: Documents custom policy presets preserved through snapshot restore. - #4955 -> `docs/about/release-notes.mdx`: Adds release-note coverage for Brave web-search pinning and `BRAVE_API_KEY` placeholder preservation. - #5116, #5269 -> `docs/about/release-notes.mdx`: Adds release-note coverage for Docker-driver gateway health and rootfs guard stability. - #5241, #5085 -> `docs/about/release-notes.mdx`: Adds release-note coverage for chat-completions provider selection and Nemotron Ultra 550B tool-less request compatibility. - #5268, #5210, #5257 -> `docs/about/release-notes.mdx`: Adds release-note coverage for messaging render plan refresh, OpenClaw scope-upgrade approval recovery, and Hermes WhatsApp bridge dependency setup. - Current source docs -> `.agents/skills/`: Regenerates user-skill references so agent-facing guidance matches the source documentation. ## Verification - `python3 scripts/docs-to-skills.py docs/ .agents/skills/ --prefix nemoclaw-user --doc-platform fern-mdx` - `npm run docs` - `npm run build:cli` - `npm run typecheck:cli` - Commit/pre-push hooks: markdownlint, gitleaks, docs-to-skills verification, TypeScript CLI, and skills YAML checks passed.  ## Summary by CodeRabbit * **Documentation** * Clarified sandbox snapshot restore preserves custom policy presets and restores them without original files. * Switched sandbox setup and remote deployment guidance to Docker-based workflows and emphasized remote onboarding flow. * Expanded troubleshooting for gateway recovery, Docker GPU/WSL issues, and onboarding resume. * Added/updated CLI docs: advanced maintenance, session export, upload/download wrappers, and status recovery guidance. * Added v0.0.64 release notes and links to NemoClaw Community; fixed command reference formatting.

prekshivyas approved these changes Jun 11, 2026

View reviewed changes

Merge branch 'main' into fix/inference-set-stale-responses-api

6fbb0ee

prekshivyas self-assigned this Jun 11, 2026

cv merged commit 1fca488 into NVIDIA:main Jun 11, 2026
38 checks passed

cv added the v0.0.64 Release target label Jun 12, 2026

miyoungc mentioned this pull request Jun 12, 2026

docs: refresh v0.0.64 release docs #5358

Merged

coderabbitai Bot mentioned this pull request Jun 13, 2026

fix(ci): prefer completions for hosted inference #5395

Merged

12 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(inference): force completions API for providers without /v1/responses#5241

fix(inference): force completions API for providers without /v1/responses#5241
cv merged 2 commits into
NVIDIA:mainfrom
latenighthackathon:fix/inference-set-stale-responses-api

latenighthackathon commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

prekshivyas left a comment

Uh oh!

Uh oh!

wscurran commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

latenighthackathon commented Jun 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Related Issue

Changes

Type of Change

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

prekshivyas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wscurran commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

latenighthackathon commented Jun 11, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 11, 2026 •

edited

Loading