Skip to content

fix(onboard): force chat completions for compatible-endpoint providers#1984

Merged
ericksoa merged 3 commits into
mainfrom
fix/1932-compatible-endpoint-force-completions
Apr 16, 2026
Merged

fix(onboard): force chat completions for compatible-endpoint providers#1984
ericksoa merged 3 commits into
mainfrom
fix/1932-compatible-endpoint-force-completions

Conversation

@jyaunches

@jyaunches jyaunches commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Bug

When a user selects "Other OpenAI-compatible endpoint" and points it at Ollama (v0.20+), the onboard wizard probes /v1/responses, finds it working, and selects openai-responses mode. That mode sends the system prompt using the developer role, which many backends silently drop — the model receives no tool definitions and no system prompt, and all tool use fails silently.

Fix

Force openai-completions API mode for the compatible-endpoint provider path during onboard, matching the existing behavior of ollama-local and vllm-local. Honor NEMOCLAW_PREFERRED_API=openai-responses as an explicit opt-in for users who know their backend supports it.

Why forcing openai-completions is safe for all compatible endpoints

  • Users pointing at actual OpenAI would use the dedicated openai-api provider, not compatible-endpoint
  • Every local/proxy backend tested (Ollama, vLLM, LiteLLM, NIM) has Responses API compatibility issues — openai-completions works universally with the system role
  • The NEMOCLAW_PREFERRED_API=openai-responses env var override remains available for users who explicitly need the Responses API

Reproduction

Confirmed on DGX Spark (Ollama 0.20.7, nemotron-3-super:120b) using an isolated Docker-in-Docker container:

  1. Ollama's /v1/responses endpoint responds successfully — the wizard probe passes
  2. The wizard would select openai-responses mode for compatible-endpoint
  3. The ollama-local path correctly forces openai-completions, but compatible-endpoint did not

Changes

  • src/lib/onboard.ts: Override preferredInferenceApi to openai-completions in the compatible-endpoint validation block, with an informational log message when the override fires. Honor NEMOCLAW_PREFERRED_API env var as an explicit opt-in escape hatch.
  • test/onboard-selection.test.ts: Two new tests — one verifying the forced-completions default, one verifying the env var override path.

Test plan

  • npx vitest run --project cli test/onboard-selection.test.ts — 32/32 pass
  • Full CLI test suite — no regressions (pre-existing failures in unrelated test files)
  • DinD verification on DGX Spark: onboard with compatible-endpoint → Ollama confirms openai-completions selected
  • Security code review — clean, no findings

Closes #1932

Summary by CodeRabbit

  • Improvements

    • Onboarding better handles custom OpenAI-compatible endpoints, defaulting to completions when appropriate and honoring an explicit environment override.
    • Emits an informational message when a fallback to completions is enforced.
  • Tests

    • Added end-to-end onboarding tests covering custom endpoint behavior and the environment-override path.

@coderabbitai

coderabbitai Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 30ebb041-95b5-420f-8f80-509a8354999c

📥 Commits

Reviewing files that changed from the base of the PR and between 14542db and 4584292.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • test/onboard-selection.test.ts
✅ Files skipped from review due to trivial changes (2)
  • src/lib/onboard.ts
  • test/onboard-selection.test.ts

📝 Walkthrough

Walkthrough

When validating custom OpenAI‑compatible endpoints, onboarding reads NEMOCLAW_PREFERRED_API and, unless that env var is explicitly one of openai-completions or chat-completions, forces preferredInferenceApi = "openai-completions". If forcing a fallback when the endpoint reported a different API, an informational log may be emitted.

Changes

Cohort / File(s) Summary
Onboard Logic
src/lib/onboard.ts
Update setupNim validation for selected.key === "custom": read and normalize NEMOCLAW_PREFERRED_API; only preserve validation.api when the env var is openai-completions or chat-completions; otherwise set preferredInferenceApi = "openai-completions" and log an informational message when validation.api differed.
Onboard Tests
test/onboard-selection.test.ts
Add two end‑to‑end onboarding tests exercising /v1/responses probes: one verifies the forced openai-completions outcome for a tool-call style response; the other exercises the NEMOCLAW_PREFERRED_API path and verifies the override/log behavior is consistent with the env var.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant CLI as "CLI (nemoclaw)"
  participant Wizard as "Onboard Wizard\n(src/lib/onboard.ts)"
  participant Endpoint as "Compatible Endpoint\n(/v1/responses)"
  participant Env as "Env / Config\n(NEMOCLAW_PREFERRED_API)"
  CLI->>Wizard: start onboard
  Wizard->>Endpoint: probe /v1/responses (validation)
  Endpoint-->>Wizard: returns valid response (validation.api)
  Wizard->>Env: read NEMOCLAW_PREFERRED_API
  alt env is "openai-completions" or "chat-completions"
    Wizard->>Wizard: preserve validation.api as preferredInferenceApi
    Wizard-->>CLI: proceed with validation.api
  else env unset or other value
    Wizard->>Wizard: set preferredInferenceApi = "openai-completions"
    alt validation.api != "openai-completions"
      Wizard-->>CLI: log informational message about forcing completions
    end
  end
  Wizard-->>CLI: output preferredInferenceApi in wizard output
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I hopped through endpoints, nose to the breeze,
Found hidden roles dropped by strange APIs.
I nudged prefs to completions, tidy and bright,
Patted logs with a whisper, set things right. 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: forcing chat completions for compatible-endpoint providers during onboarding.
Linked Issues check ✅ Passed The PR successfully implements objective 1 from issue #1932: force openai-completions for compatible-endpoint during onboard, plus adds NEMOCLAW_PREFERRED_API override and comprehensive tests.
Out of Scope Changes check ✅ Passed All changes directly address the issue: onboard.ts modifies compatible-endpoint validation logic; test file adds verification tests. No unrelated changes detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/1932-compatible-endpoint-force-completions

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/lib/onboard.ts`:
- Around line 3343-3351: The comment claims users can override the chosen API
with NEMOCLAW_PREFERRED_API but the code unconditionally sets
preferredInferenceApi to "openai-completions"; update the assignment to honor
the env var: check process.env.NEMOCLAW_PREFERRED_API and, if present, use that
value for preferredInferenceApi (e.g., "openai-responses"), otherwise fall back
to "openai-completions"; keep the existing validation.api check and log message
around it (symbols: preferredInferenceApi, validation.api,
NEMOCLAW_PREFERRED_API).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: e4f38e50-d1b1-48b0-b0e7-03c7a4568fa1

📥 Commits

Reviewing files that changed from the base of the PR and between f7a3c33 and 03ff649.

📒 Files selected for processing (2)
  • src/lib/onboard.ts
  • test/onboard-selection.test.ts

Comment thread src/lib/onboard.ts Outdated
The compatible-endpoint code path accepted whatever API mode the
validation probe returned. When backends like Ollama (v0.20+) expose a
working /v1/responses endpoint, the wizard selected openai-responses
mode. The Responses API sends the system prompt using the `developer`
role, which many OpenAI-compatible backends (Ollama, vLLM, LiteLLM)
either silently drop or handle incorrectly — causing the model to
receive no tool definitions and no system prompt.

The ollama-local and vllm-local paths already forced openai-completions.
This commit applies the same override to compatible-endpoint.

Forcing openai-completions is safe for compatible-endpoint because:
- Users pointing at actual OpenAI would use the dedicated openai-api
  provider, not compatible-endpoint
- Every local/proxy backend tested (Ollama, vLLM, LiteLLM, NIM) has
  Responses API issues — openai-completions works universally
- The NEMOCLAW_PREFERRED_API=openai-responses env var override remains
  available for users who explicitly need the Responses API

Closes #1932
…dpoint

Address CodeRabbit review: the previous commit claimed users could
override via NEMOCLAW_PREFERRED_API=openai-responses but the code
unconditionally forced openai-completions. Now check the env var and
respect an explicit user preference while still defaulting to
openai-completions for safety.
…ible-endpoint

Verifies that setting NEMOCLAW_PREFERRED_API=openai-responses bypasses
the forced-completions override, proving the escape hatch works for
users who know their backend supports the Responses API.
@ericksoa ericksoa force-pushed the fix/1932-compatible-endpoint-force-completions branch from f868bdd to 4584292 Compare April 16, 2026 22:00

@ericksoa ericksoa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean fix — forces chat completions for compatible-endpoint providers where the Responses API developer role is unreliable, with a sensible env var escape hatch. Good test coverage for both paths. LGTM.

@ericksoa ericksoa merged commit 002ae01 into main Apr 16, 2026
11 checks passed
miyoungc added a commit that referenced this pull request Apr 17, 2026
Refresh user-facing docs against the 34 commits merged between v0.0.17
and v0.0.18. Highlights:

- Replace the Ollama 0.0.0.0 binding guidance with the new authenticated
  reverse proxy on 127.0.0.1:11435 (#1922).
- Document the compatible-endpoint provider defaulting to
  /v1/chat/completions and the NEMOCLAW_PREFERRED_API=openai-responses
  opt-in (#1984).
- Add the new nemoclaw upgrade-sandboxes command with --check, --auto,
  and --yes flags (#1943).
- Note the cross-sandbox messaging overlap warning and 409 detection in
  nemoclaw <name> status (#1953).
- Document the messaging-token rotation auto-rebuild flow (#1967).
- Cover new troubleshooting entries for the Ollama auth proxy, IPv6
  localhost resolution, orphan SSH port-forward cleanup on re-onboard,
  and rotated messaging credentials (#1978, #1950).
- Note tar failure exit code for nemoclaw debug --output (#1770) and the
  orphaned openshell process cleanup in nemoclaw uninstall (#1940).

Also:

- Extend docs/.docs-skip to exclude the experimental sandbox-mgmt
  shields and config commands (#1976).
- Fix a sphinx-autobuild infinite rebuild loop in docs/conf.py by
  writing docs/project.json only when its contents change.
- Bump the docs version switcher preferred entry to 0.0.18.
- Regenerate nemoclaw-user-* agent skills from docs/.

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
Made-with: Cursor
miyoungc added a commit that referenced this pull request Apr 17, 2026
## Summary

Refresh user-facing documentation against the 34 commits merged between
v0.0.17 and v0.0.18, bump the docs version switcher to v0.0.18, and fix
a
`sphinx-autobuild` infinite-rebuild loop triggered by `docs/conf.py`.

## Changes

- **Ollama authenticated reverse proxy** (#1922): Replace the
`0.0.0.0:11434` guidance in `docs/inference/use-local-inference.md` with
the new token-gated proxy on `127.0.0.1:11435`, including persisted
token,
health-check exemption, and sandbox provider wiring. Replace the
matching
  troubleshooting entry in `docs/reference/troubleshooting.md`.
- **Compatible-endpoint default API path** (#1984): Document that the
compatible-endpoint provider now defaults to `/v1/chat/completions` and
  update `NEMOCLAW_PREFERRED_API` to describe `openai-responses` as the
  opt-in instead of `openai-completions`. Updates in
  `use-local-inference.md`, `switch-inference-providers.md`, and
  `troubleshooting.md`.
- **`nemoclaw upgrade-sandboxes` command** (#1943): Add a new reference
entry in `docs/reference/commands.md` covering `--check`, `--auto`, and
  `--yes` flags.
- **Messaging token rotation auto-rebuild** (#1967, #1953): Note the
  automatic rebuild behavior and cross-sandbox overlap warning in
  `docs/deployment/set-up-telegram-bridge.md`, `commands.md`, and
  `troubleshooting.md`.
- **Other troubleshooting additions**:
  - `localhost` → `127.0.0.1` IPv6 note (#1978)
  - Orphan SSH port-forward cleanup on re-onboard (#1950)
  - Orphan `openshell` process cleanup in `nemoclaw uninstall` (#1940)
  - Non-zero exit on tar failure in `nemoclaw debug --output` (#1770)
- **Skip list**: Extend `docs/.docs-skip` to exclude the experimental
  sandbox-mgmt shields and config commands feature (#1976), which was
  explicitly merged as not-yet-documented.
- **Build stability**: `docs/conf.py` now writes `docs/project.json`
only
when contents change, so `make docs-live` / `sphinx-autobuild` no longer
detects its own generated file as a source change and enters an infinite
  rebuild loop.
- **Version switcher**: Bump `docs/versions1.json` and
`docs/project.json`
preferred entry to v0.0.18 so this refresh renders under the new
version.
- **Agent skills**: Regenerate `nemoclaw-user-*` skills from `docs/`
with
  `scripts/docs-to-skills.py`.

## Type of Change

- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [x] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification

- [x] `npx prek run --all-files` passes (ran via pre-commit hook on
staged files)
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

## AI Disclosure

- [x] AI-assisted — tool: Cursor

---

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

Made with [Cursor](https://cursor.com)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Added `nemoclaw upgrade-sandboxes` command to rebuild sandboxes when
base-image digests change.
* Introduced authenticated reverse proxy for local Ollama inference with
token-based access control.
* Automatic sandbox backup, recreation, and restore when messaging
credentials are updated.
* Cross-sandbox messaging token overlap detection with status warnings.

* **Improvements**
* Compatible-endpoint provider now defaults to `/v1/chat/completions`
API path.
* Enhanced troubleshooting documentation with new diagnostics sections.

* **Documentation**
  * Updated onboarding and configuration guides.
  * Expanded version documentation to 0.0.18.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>
@wscurran wscurran added the bug-fix PR fixes a bug or regression label Jun 8, 2026
@jyaunches jyaunches deleted the fix/1932-compatible-endpoint-force-completions branch June 12, 2026 13:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix PR fixes a bug or regression

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Responses API + Ollama: developer role silently drops entire system prompt, breaking all tool use

3 participants