fix(onboard): retry compatible endpoint smoke after reasoning-only output by ericksoa · Pull Request #3514 · NVIDIA/NemoClaw

ericksoa · 2026-05-14T12:55:15Z

Summary

raise the compatible-endpoint sandbox smoke response budget and retry when a reasoning model returns finish_reason: length with only reasoning_content
keep route/config/auth failures hard-failing, but report reasoning-budget failures as model output budget problems rather than inference.local route failures
add executable regression coverage for the MiniMax-shaped reasoning-only response and the still-failing retry path

Testing

npx vitest run --project cli src/lib/onboard/compatible-endpoint-smoke.test.ts
npx vitest run --project cli src/lib/onboard/compatible-endpoint-smoke.test.ts test/onboard.test.ts -t "compatible-endpoint"
npm run build:cli
npm run source-shape:check

Local hook notes

Pre-commit full CLI coverage hook failed on unrelated 5s timeouts in test/nemoclaw-start-reconcile.test.ts, test/nemoclaw-start.test.ts, and test/onboard.test.ts.
Pre-push hook then failed before tests on local dist cleanup with ENOTEMPTY: directory not empty, rmdir dist.
The commit and push were completed with verification bypassed after the targeted checks above passed.

Summary by CodeRabbit

Tests
- Expanded end-to-end smoke tests for endpoint compatibility, adding scenarios for initial reasoning-only responses and retry behavior with token budget adjustments and distinct failure/success outcomes.
- Improved assertions to verify retry attempts, call counts, and surfaced diagnostic messages when content is missing.
Chores
- Enhanced sandbox smoke tooling and test infrastructure to better simulate managed provider routing and capture child-process output for diagnostics.

coderabbitai · 2026-05-14T12:55:28Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3432bef6-14cb-4ccb-8923-56f3fbfa8d28

📥 Commits

Reviewing files that changed from the base of the PR and between 6d659f0 and b6e1c0e.

📒 Files selected for processing (1)

src/lib/onboard/compatible-endpoint-smoke.ts

🚧 Files skipped from review as they are similar to previous changes (1)

src/lib/onboard/compatible-endpoint-smoke.ts

📝 Walkthrough

Walkthrough

The compatible endpoint smoke test builder now accepts configurable token budgets for initial and retry requests. The generated sandbox script refactors request/response logic into shell functions and implements two-phase retry: a retry is triggered only when the first response lacks assistant content and has finish reason length, resuming with a higher token allocation. End-to-end test coverage validates both successful retry and failure scenarios via a mocked curl executable.

Changes

Smoke Test Retry Configuration and Coverage

Layer / File(s)	Summary
Configuration options and defaults `src/lib/onboard/compatible-endpoint-smoke.ts`	`CompatibleEndpointSandboxSmokeScriptOptions` type defines configurable config path, inference URL, and initial/retry token budgets. `positiveInt` helper sanitizes numeric values with fallback defaults.
Script builder and spawn helper `src/lib/onboard/compatible-endpoint-smoke.ts`, `src/lib/onboard/compatible-endpoint-smoke.test.ts`	Adds exported `spawnOutputToString` and expands `buildCompatibleEndpointSandboxSmokeScript` to accept options, compute defaults, and inject `CONFIG`, `INFERENCE_URL`, `INITIAL_MAX_TOKENS`, `RETRY_MAX_TOKENS` into the generated bash script. Test helpers for running scripts via `spawnSync` are added.
Shell functions and payload handling `src/lib/onboard/compatible-endpoint-smoke.ts`	Generated bash script refactors payload/curl/response flow into `write_payload`, `run_smoke_request`, and `check_response` shell functions using configurable `max_tokens`.
Python response checker and retry control `src/lib/onboard/compatible-endpoint-smoke.ts`	Python checker now defensively extracts `choices[0].message.content`, treats missing/blank content as failure, inspects `finish_reason` for `length` exhaustion, and returns distinct exit codes to drive shell-level two-phase retry with a larger token budget.
Test helpers, fake curl, and end-to-end cases `src/lib/onboard/compatible-endpoint-smoke.test.ts`	Helpers create temporary OpenClaw config and a fake `curl` that records calls and returns scripted responses. New tests verify retry success when initial response lacks assistant content (status 0, expected stdout/stderr markers) and failure when both attempts lack content (status 1, stderr validation, curl call count).
Existing test assertion update `test/onboard.test.ts`	Assertion updated to validate script sets `INFERENCE_URL` to a `/chat/completions` endpoint and that the `curl` command references the `$INFERENCE_URL` variable.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

NemoClaw CLI, E2E

Suggested reviewers

cv

🐰
I hopped to run a tiny test,
Sent a curl to do its best,
When reasoning ran short, we tried again,
More tokens lent to finish when—
The sandbox smiled and logged success.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title directly describes the main change: adding retry logic to the compatible endpoint smoke script when reasoning-only output occurs.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/compatible-smoke-reasoning-retry

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-14T12:57:46Z

E2E Advisor Recommendation

Required E2E: messaging-compatible-endpoint-e2e
Optional E2E: inference-routing-e2e

Dispatch hint: messaging-compatible-endpoint-e2e

Auto-dispatched E2E: messaging-compatible-endpoint-e2e via nightly-e2e.yaml at c67826c54b84cee9df2977669586c892b8fa9ec5

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

messaging-compatible-endpoint-e2e (medium): Direct coverage for the changed path: onboards OpenClaw with Telegram and an OpenAI-compatible endpoint, asserts the compatible-endpoint sandbox smoke check runs, verifies managed inference.local provider config, and proves sandbox-side chat completions reach the mock endpoint.

Optional E2E

inference-routing-e2e (medium): Adjacent confidence for gateway inference routing, credential isolation, and custom OpenAI-compatible endpoint/error-classification behavior. Useful if maintainers want broader inference-route validation, but not as directly targeted as messaging-compatible-endpoint-e2e.

New E2E recommendations

compatible-endpoint reasoning-model smoke retry (medium): Existing messaging-compatible-endpoint-e2e validates the compatible-endpoint smoke happy path, but the new retry branch for finish_reason=length with reasoning_content is only covered by unit tests. Add or extend an E2E mock mode that returns a reasoning-only length response on the first sandbox smoke request and normal assistant content on retry, then assert onboarding succeeds and the retry diagnostic appears.
- Suggested test: Extend test/e2e/test-messaging-compatible-endpoint.sh with a reasoning-only first-response retry case for the onboard compatible-endpoint sandbox smoke.

Dispatch hint

Workflow: nightly-e2e.yaml
jobs input: messaging-compatible-endpoint-e2e

github-actions · 2026-05-14T13:04:16Z

Selective E2E Results — ✅ All requested jobs passed

Run: 25861274793
Target ref: 6d659f0a0d47dad5ab4566fc6372376c84d29324
Workflow ref: main
Requested jobs: messaging-compatible-endpoint-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
messaging-compatible-endpoint-e2e	✅ success

github-actions · 2026-05-14T17:28:45Z

Selective E2E Results — ✅ All requested jobs passed

Run: 25874539402
Target ref: b6e1c0e7c9d9315bbb363432b0c1d3bb26ca5bd9
Workflow ref: main
Requested jobs: messaging-compatible-endpoint-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
messaging-compatible-endpoint-e2e	✅ success

github-actions · 2026-05-14T17:42:48Z

Selective E2E Results — ✅ All requested jobs passed

Run: 25875243826
Target ref: c67826c54b84cee9df2977669586c892b8fa9ec5
Workflow ref: main
Requested jobs: messaging-compatible-endpoint-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job	Result
messaging-compatible-endpoint-e2e	✅ success

## Summary Refreshes the NemoClaw documentation for the local `main` changes included in the 0.0.42 release. The update adds release notes, updates the affected user-facing setup and troubleshooting pages, bumps docs metadata to 0.0.42, and regenerates the matching user skills. ## Changes - #3537 -> `docs/reference/commands.md`, `docs/reference/troubleshooting.md`: Documented host-level status fields, cloudflared state-specific recovery hints, and Local Ollama auth proxy status diagnostics. - #3454 -> `docs/get-started/prerequisites.md`, `docs/get-started/quickstart.md`: Documented macOS Docker-driver onboarding and removed the expectation that standard macOS setup needs a VM driver helper. - #3514 -> `docs/inference/use-local-inference.md`: Documented compatible-endpoint retry behavior for reasoning-only smoke responses. - #3448 -> `docs/reference/commands.md`, `docs/manage-sandboxes/messaging-channels.md`: Documented canonical channel names and policy preset hints after `channels add`. - #3520 -> `docs/about/release-notes.md`: Captured clearer GPU recovery and uninstall wording in the 0.0.42 release notes. - #3313 -> `docs/get-started/quickstart.md`, `docs/reference/troubleshooting.md`: Documented stronger dashboard port detection and rollback when a forward cannot start. - #3502 -> `docs/about/release-notes.md`: Captured batched onboarding policy preset application in the 0.0.42 release notes. - #3505 -> `docs/reference/troubleshooting.md`: Documented the top-level Colima socket path. - #3421 -> `docs/about/release-notes.md`: Captured idempotent installer shim logging in the 0.0.42 release notes. - Updated `docs/project.json`, `docs/versions1.json`, and regenerated `.agents/skills/nemoclaw-user-*` outputs. ## Type of Change - [ ] Code change (feature, bug fix, or refactor) - [ ] Code change with doc updates - [x] Doc only (prose changes, no code sample modifications) - [ ] Doc only (includes code sample changes) ## Verification - [ ] `npx prek run --all-files` passes - [ ] `npm test` passes - [ ] Tests added or updated for new or changed behavior - [x] No secrets, API keys, or credentials committed - [x] Docs updated for user-facing behavior changes - [x] `make docs` builds without warnings (doc changes only) - [x] Doc pages follow the [style guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md) (doc changes only) - [ ] New doc pages include SPDX header and frontmatter (new pages only) --- Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>  ## Summary by CodeRabbit ## Release Notes - v0.0.42 * **Documentation** * Enhanced macOS onboarding guidance for Docker gateway setup * Improved dashboard port conflict handling with automatic rollback * Better local Ollama inference diagnostics and authentication proxy checks * Clarified status command output and recovery procedures * Refined messaging channel setup documentation * **Chores** * Version bump to 0.0.42  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3540)   Co-authored-by: Carlos Villela <cvillela@nvidia.com>

fix(onboard): retry compatible smoke after reasoning-only output

6d659f0

ericksoa changed the title ~~Retry compatible endpoint smoke after reasoning-only output~~ fix(onboard): retry compatible endpoint smoke after reasoning-only output May 14, 2026

wscurran added fix labels May 14, 2026

ericksoa requested a review from cv May 14, 2026 16:05

ericksoa added v0.0.42 bug Something fails against expected or documented behavior integration: openclaw OpenClaw integration behavior labels May 14, 2026

ericksoa self-assigned this May 14, 2026

docs(onboard): document compatible smoke helpers

b6e1c0e

Merge branch 'main' into fix/compatible-smoke-reasoning-retry

c67826c

cv approved these changes May 14, 2026

View reviewed changes

cv merged commit 1c99e5d into main May 14, 2026
25 checks passed

miyoungc mentioned this pull request May 14, 2026

docs(release): refresh 0.0.42 docs #3540

Merged

12 tasks

latenighthackathon mentioned this pull request May 15, 2026

fix(onboard): accept reasoning-mode models in the inference smoke probe #3356

Closed

11 tasks

This was referenced May 16, 2026

test(cli): restore source-shape guard #3636

Merged

fix(inference): auto-detect Bedrock Runtime custom endpoints #3767

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(onboard): retry compatible endpoint smoke after reasoning-only output#3514

fix(onboard): retry compatible endpoint smoke after reasoning-only output#3514
cv merged 3 commits into
mainfrom
fix/compatible-smoke-reasoning-retry

ericksoa commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 14, 2026 •

edited

Loading

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

github-actions Bot commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ericksoa commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Local hook notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Suggested labels

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

github-actions Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

E2E Advisor Recommendation

E2E Recommendation Advisor

Required E2E

Optional E2E

New E2E recommendations

Dispatch hint

Uh oh!

github-actions Bot commented May 14, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented May 14, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

github-actions Bot commented May 14, 2026

Selective E2E Results — ✅ All requested jobs passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ericksoa commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading

github-actions Bot commented May 14, 2026 •

edited

Loading