Skip to content

fix(onboard): retry compatible endpoint smoke after reasoning-only output#3514

Merged
cv merged 3 commits into
mainfrom
fix/compatible-smoke-reasoning-retry
May 14, 2026
Merged

fix(onboard): retry compatible endpoint smoke after reasoning-only output#3514
cv merged 3 commits into
mainfrom
fix/compatible-smoke-reasoning-retry

Conversation

@ericksoa

@ericksoa ericksoa commented May 14, 2026

Copy link
Copy Markdown
Contributor

Summary

  • raise the compatible-endpoint sandbox smoke response budget and retry when a reasoning model returns finish_reason: length with only reasoning_content
  • keep route/config/auth failures hard-failing, but report reasoning-budget failures as model output budget problems rather than inference.local route failures
  • add executable regression coverage for the MiniMax-shaped reasoning-only response and the still-failing retry path

Testing

  • npx vitest run --project cli src/lib/onboard/compatible-endpoint-smoke.test.ts
  • npx vitest run --project cli src/lib/onboard/compatible-endpoint-smoke.test.ts test/onboard.test.ts -t "compatible-endpoint"
  • npm run build:cli
  • npm run source-shape:check

Local hook notes

  • Pre-commit full CLI coverage hook failed on unrelated 5s timeouts in test/nemoclaw-start-reconcile.test.ts, test/nemoclaw-start.test.ts, and test/onboard.test.ts.
  • Pre-push hook then failed before tests on local dist cleanup with ENOTEMPTY: directory not empty, rmdir dist.
  • The commit and push were completed with verification bypassed after the targeted checks above passed.

Summary by CodeRabbit

  • Tests

    • Expanded end-to-end smoke tests for endpoint compatibility, adding scenarios for initial reasoning-only responses and retry behavior with token budget adjustments and distinct failure/success outcomes.
    • Improved assertions to verify retry attempts, call counts, and surfaced diagnostic messages when content is missing.
  • Chores

    • Enhanced sandbox smoke tooling and test infrastructure to better simulate managed provider routing and capture child-process output for diagnostics.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 3432bef6-14cb-4ccb-8923-56f3fbfa8d28

📥 Commits

Reviewing files that changed from the base of the PR and between 6d659f0 and b6e1c0e.

📒 Files selected for processing (1)
  • src/lib/onboard/compatible-endpoint-smoke.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/lib/onboard/compatible-endpoint-smoke.ts

📝 Walkthrough

Walkthrough

The compatible endpoint smoke test builder now accepts configurable token budgets for initial and retry requests. The generated sandbox script refactors request/response logic into shell functions and implements two-phase retry: a retry is triggered only when the first response lacks assistant content and has finish reason length, resuming with a higher token allocation. End-to-end test coverage validates both successful retry and failure scenarios via a mocked curl executable.

Changes

Smoke Test Retry Configuration and Coverage

Layer / File(s) Summary
Configuration options and defaults
src/lib/onboard/compatible-endpoint-smoke.ts
CompatibleEndpointSandboxSmokeScriptOptions type defines configurable config path, inference URL, and initial/retry token budgets. positiveInt helper sanitizes numeric values with fallback defaults.
Script builder and spawn helper
src/lib/onboard/compatible-endpoint-smoke.ts, src/lib/onboard/compatible-endpoint-smoke.test.ts
Adds exported spawnOutputToString and expands buildCompatibleEndpointSandboxSmokeScript to accept options, compute defaults, and inject CONFIG, INFERENCE_URL, INITIAL_MAX_TOKENS, RETRY_MAX_TOKENS into the generated bash script. Test helpers for running scripts via spawnSync are added.
Shell functions and payload handling
src/lib/onboard/compatible-endpoint-smoke.ts
Generated bash script refactors payload/curl/response flow into write_payload, run_smoke_request, and check_response shell functions using configurable max_tokens.
Python response checker and retry control
src/lib/onboard/compatible-endpoint-smoke.ts
Python checker now defensively extracts choices[0].message.content, treats missing/blank content as failure, inspects finish_reason for length exhaustion, and returns distinct exit codes to drive shell-level two-phase retry with a larger token budget.
Test helpers, fake curl, and end-to-end cases
src/lib/onboard/compatible-endpoint-smoke.test.ts
Helpers create temporary OpenClaw config and a fake curl that records calls and returns scripted responses. New tests verify retry success when initial response lacks assistant content (status 0, expected stdout/stderr markers) and failure when both attempts lack content (status 1, stderr validation, curl call count).
Existing test assertion update
test/onboard.test.ts
Assertion updated to validate script sets INFERENCE_URL to a /chat/completions endpoint and that the curl command references the $INFERENCE_URL variable.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

NemoClaw CLI, E2E

Suggested reviewers

  • cv

🐰
I hopped to run a tiny test,
Sent a curl to do its best,
When reasoning ran short, we tried again,
More tokens lent to finish when—
The sandbox smiled and logged success.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title directly describes the main change: adding retry logic to the compatible endpoint smoke script when reasoning-only output occurs.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/compatible-smoke-reasoning-retry

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

ESLint skipped: no ESLint configuration detected in root package.json. To enable, add eslint to devDependencies.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

E2E Advisor Recommendation

Required E2E: messaging-compatible-endpoint-e2e
Optional E2E: inference-routing-e2e

Dispatch hint: messaging-compatible-endpoint-e2e

Auto-dispatched E2E: messaging-compatible-endpoint-e2e via nightly-e2e.yaml at c67826c54b84cee9df2977669586c892b8fa9ec5

Workflow run

Full advisor summary

E2E Recommendation Advisor

Base: origin/main
Head: HEAD
Confidence: high

Required E2E

  • messaging-compatible-endpoint-e2e (medium): Direct coverage for the changed path: onboards OpenClaw with Telegram and an OpenAI-compatible endpoint, asserts the compatible-endpoint sandbox smoke check runs, verifies managed inference.local provider config, and proves sandbox-side chat completions reach the mock endpoint.

Optional E2E

  • inference-routing-e2e (medium): Adjacent confidence for gateway inference routing, credential isolation, and custom OpenAI-compatible endpoint/error-classification behavior. Useful if maintainers want broader inference-route validation, but not as directly targeted as messaging-compatible-endpoint-e2e.

New E2E recommendations

  • compatible-endpoint reasoning-model smoke retry (medium): Existing messaging-compatible-endpoint-e2e validates the compatible-endpoint smoke happy path, but the new retry branch for finish_reason=length with reasoning_content is only covered by unit tests. Add or extend an E2E mock mode that returns a reasoning-only length response on the first sandbox smoke request and normal assistant content on retry, then assert onboarding succeeds and the retry diagnostic appears.
    • Suggested test: Extend test/e2e/test-messaging-compatible-endpoint.sh with a reasoning-only first-response retry case for the onboard compatible-endpoint sandbox smoke.

Dispatch hint

  • Workflow: nightly-e2e.yaml
  • jobs input: messaging-compatible-endpoint-e2e

@ericksoa ericksoa changed the title Retry compatible endpoint smoke after reasoning-only output fix(onboard): retry compatible endpoint smoke after reasoning-only output May 14, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25861274793
Target ref: 6d659f0a0d47dad5ab4566fc6372376c84d29324
Workflow ref: main
Requested jobs: messaging-compatible-endpoint-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
messaging-compatible-endpoint-e2e ✅ success

@ericksoa ericksoa requested a review from cv May 14, 2026 16:05
@ericksoa ericksoa added v0.0.42 bug Something fails against expected or documented behavior integration: openclaw OpenClaw integration behavior labels May 14, 2026
@ericksoa ericksoa self-assigned this May 14, 2026
@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25874539402
Target ref: b6e1c0e7c9d9315bbb363432b0c1d3bb26ca5bd9
Workflow ref: main
Requested jobs: messaging-compatible-endpoint-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
messaging-compatible-endpoint-e2e ✅ success

@github-actions

Copy link
Copy Markdown
Contributor

Selective E2E Results — ✅ All requested jobs passed

Run: 25875243826
Target ref: c67826c54b84cee9df2977669586c892b8fa9ec5
Workflow ref: main
Requested jobs: messaging-compatible-endpoint-e2e
Summary: 1 passed, 0 failed, 0 skipped

Job Result
messaging-compatible-endpoint-e2e ✅ success

@cv cv merged commit 1c99e5d into main May 14, 2026
25 checks passed
@miyoungc miyoungc mentioned this pull request May 14, 2026
12 tasks
miyoungc added a commit that referenced this pull request May 14, 2026
## Summary
Refreshes the NemoClaw documentation for the local `main` changes
included in the 0.0.42 release. The update adds release notes, updates
the affected user-facing setup and troubleshooting pages, bumps docs
metadata to 0.0.42, and regenerates the matching user skills.

## Changes
- #3537 -> `docs/reference/commands.md`,
`docs/reference/troubleshooting.md`: Documented host-level status
fields, cloudflared state-specific recovery hints, and Local Ollama auth
proxy status diagnostics.
- #3454 -> `docs/get-started/prerequisites.md`,
`docs/get-started/quickstart.md`: Documented macOS Docker-driver
onboarding and removed the expectation that standard macOS setup needs a
VM driver helper.
- #3514 -> `docs/inference/use-local-inference.md`: Documented
compatible-endpoint retry behavior for reasoning-only smoke responses.
- #3448 -> `docs/reference/commands.md`,
`docs/manage-sandboxes/messaging-channels.md`: Documented canonical
channel names and policy preset hints after `channels add`.
- #3520 -> `docs/about/release-notes.md`: Captured clearer GPU recovery
and uninstall wording in the 0.0.42 release notes.
- #3313 -> `docs/get-started/quickstart.md`,
`docs/reference/troubleshooting.md`: Documented stronger dashboard port
detection and rollback when a forward cannot start.
- #3502 -> `docs/about/release-notes.md`: Captured batched onboarding
policy preset application in the 0.0.42 release notes.
- #3505 -> `docs/reference/troubleshooting.md`: Documented the top-level
Colima socket path.
- #3421 -> `docs/about/release-notes.md`: Captured idempotent installer
shim logging in the 0.0.42 release notes.
- Updated `docs/project.json`, `docs/versions1.json`, and regenerated
`.agents/skills/nemoclaw-user-*` outputs.

## Type of Change
- [ ] Code change (feature, bug fix, or refactor)
- [ ] Code change with doc updates
- [x] Doc only (prose changes, no code sample modifications)
- [ ] Doc only (includes code sample changes)

## Verification
- [ ] `npx prek run --all-files` passes
- [ ] `npm test` passes
- [ ] Tests added or updated for new or changed behavior
- [x] No secrets, API keys, or credentials committed
- [x] Docs updated for user-facing behavior changes
- [x] `make docs` builds without warnings (doc changes only)
- [x] Doc pages follow the [style
guide](https://github.com/NVIDIA/NemoClaw/blob/main/docs/CONTRIBUTING.md)
(doc changes only)
- [ ] New doc pages include SPDX header and frontmatter (new pages only)

---
Signed-off-by: Miyoung Choi <miyoungc@nvidia.com>

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes - v0.0.42

* **Documentation**
  * Enhanced macOS onboarding guidance for Docker gateway setup
  * Improved dashboard port conflict handling with automatic rollback
* Better local Ollama inference diagnostics and authentication proxy
checks
  * Clarified status command output and recovery procedures
  * Refined messaging channel setup documentation

* **Chores**
  * Version bump to 0.0.42

<!-- review_stack_entry_start -->

[![Review Change
Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/NVIDIA/NemoClaw/pull/3540)

<!-- review_stack_entry_end -->

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Carlos Villela <cvillela@nvidia.com>
@wscurran wscurran added area: e2e End-to-end tests, nightly failures, or validation infrastructure area: inference Inference routing, serving, model selection, or outputs area: install Install, setup, prerequisites, or uninstall flow area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow labels Jun 3, 2026
@wscurran wscurran added area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression feature PR adds or expands user-visible functionality and removed Getting Started bug Something fails against expected or documented behavior feature PR adds or expands user-visible functionality labels Jun 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area: e2e End-to-end tests, nightly failures, or validation infrastructure area: inference Inference routing, serving, model selection, or outputs area: install Install, setup, prerequisites, or uninstall flow area: onboarding Onboarding FSM, provider setup, sandbox launch, or first-run flow area: sandbox OpenShell sandbox lifecycle, runtime, config, or recovery bug-fix PR fixes a bug or regression integration: openclaw OpenClaw integration behavior

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants