fix(onboard): smoke test inference before success#3603
Conversation
Add a post-route inference smoke probe so onboard fails before reporting success when the configured provider/model cannot serve chat completions. Fixes #3253
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds a lightweight OpenAI-like endpoint smoke test to onboarding and invokes it immediately after ChangesOnboarding inference smoke check
Sequence Diagram(s)sequenceDiagram
participant SetupInference
participant VerifyOnboardInferenceSmoke
participant CredentialStore
participant ProbeOpenAiLikeEndpoint
SetupInference->>VerifyOnboardInferenceSmoke: verifyOnboardInferenceSmoke({provider, model, endpointUrl, credentialEnv})
VerifyOnboardInferenceSmoke->>CredentialStore: resolveProviderCredential/getCredential
VerifyOnboardInferenceSmoke->>ProbeOpenAiLikeEndpoint: probeOpenAiLikeEndpoint(skipResponsesProbe: true)
alt probe succeeds
ProbeOpenAiLikeEndpoint-->>VerifyOnboardInferenceSmoke: 200 OK
VerifyOnboardInferenceSmoke-->>SetupInference: log ✓
else probe fails
ProbeOpenAiLikeEndpoint-->>VerifyOnboardInferenceSmoke: error
VerifyOnboardInferenceSmoke-->>SetupInference: process.exit(1)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 2044-2082: The new smoke helper functions
(shouldSmokeOpenAiLikeOnboardRoute and verifyOnboardInferenceSmoke) must be
extracted into a new module and imported into src/lib/onboard.ts to reduce that
file's size: create a dedicated file that exports these two functions and
re-export any dependent symbols it needs (REMOTE_PROVIDER_CONFIG,
INFERENCE_ROUTE_URL, probeOpenAiLikeEndpoint, hydrateCredentialEnv,
getCredential, getProbeAuthMode, redact, compactText or import them from their
existing modules), move the implementations exactly as-is into that file, then
replace the local definitions in onboard.ts with imports of the two functions
and update any references; ensure the new module preserves the same signatures
and side-effects (process.exit(1), console logging) and run tests/CI to confirm
no behavioral changes.
- Around line 2051-2064: The helper verifyOnboardInferenceSmoke currently falls
back to INFERENCE_ROUTE_URL and null when options.endpointUrl/credentialEnv are
omitted, which can mismatch the actual provider config used during setup; update
verifyOnboardInferenceSmoke to first resolve endpointUrl and credentialEnv from
the provider's configuration (e.g., via the project's provider config lookup
such as getProviderConfig(options.provider) or
providerConfigs[options.provider]) and only if those are absent fall back to
options.* values and then INFERENCE_ROUTE_URL/null; keep existing behavior when
explicit options are passed, and continue to use
hydrateCredentialEnv/getCredential to resolve the final apiKey from the resolved
credentialEnv.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: a0a245bf-15eb-450c-b873-c9bffb4dda75
📒 Files selected for processing (1)
src/lib/onboard.ts
Selective E2E Results — ✅ All requested jobs passedRun: 25930044916
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Around line 2044-2057: The local wrapper verifyOnboardInferenceSmoke is adding
unnecessary lines; remove the local function and instead export a trimmed alias
that immediately wires hydrateCredentialEnv into
verifyOnboardInferenceSmokeWithDeps (so callers get the same behavior without
the extra wrapper). Concretely, drop the verifyOnboardInferenceSmoke function
body and replace it with a single export/re-export that binds/passes
hydrateCredentialEnv into verifyOnboardInferenceSmokeWithDeps (keep the existing
require/destructure of verifyOnboardInferenceSmokeWithDeps and
shouldSmokeOpenAiLikeOnboardRoute and continue to reference
hydrateCredentialEnv).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: c258d80e-5988-4066-9083-b8050e1c8881
📒 Files selected for processing (2)
src/lib/onboard.tssrc/lib/onboard/inference-smoke.ts
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/inference/onboard-probes.ts`:
- Around line 835-842: The smoke-check logging prints endpointUrl raw and may
leak credentials; update the log to redact and compact the URL before printing
by passing endpointUrl through redact() and compactText() like the upstream
error, i.e., replace the raw endpointUrl usage in the Onboard probes logging
block that references endpointUrl with compactText(redact(endpointUrl ||
"unknown API base")) so the diagnostics show a redacted URL while preserving
existing provider/model/credentialEnv logging and behavior.
In `@src/lib/onboard.ts`:
- Line 7713: When the onboarding code takes the resume fast-path (when resume &&
isInferenceRouteReady(provider, model)) it skips setupInference and therefore
skips the smoke probe; ensure verifyOnboardInferenceSmoke({ provider, model,
endpointUrl, credentialEnv }) is invoked in the onboard resume fast-path before
returning success (i.e., add the same smoke validation call to the branch
guarded by resume && isInferenceRouteReady in the onboard function), and mirror
the same insertion for the other resume-fast-path occurrence referenced (the
other isInferenceRouteReady/resume branch) so resumed runs run the smoke probe
as well.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 0d444574-c7a0-4d77-a773-f43ed2d01096
📒 Files selected for processing (2)
src/lib/inference/onboard-probes.tssrc/lib/onboard.ts
| const { compactText } = require("../core/url-utils"); | ||
| const { redact } = require("../runner"); | ||
| console.error(" Onboard inference smoke check failed."); | ||
| console.error(` Provider: ${options.provider}`); | ||
| console.error(` Model: ${options.model}`); | ||
| console.error(` API base: ${endpointUrl}`); | ||
| if (credentialEnv) console.error(` Credential env: ${credentialEnv}`); | ||
| console.error(` Upstream error: ${compactText(redact(probe.message || "unknown inference failure"))}`); |
There was a problem hiding this comment.
Redact API base in smoke-failure diagnostics.
Line 840 logs endpointUrl raw. If the configured base URL contains embedded credentials (query param or userinfo), this leaks secrets to console/log capture. Redact before printing.
Suggested patch
const { compactText } = require("../core/url-utils");
const { redact } = require("../runner");
+ const safeEndpointUrl = compactText(redact(String(endpointUrl || "")));
console.error(" Onboard inference smoke check failed.");
console.error(` Provider: ${options.provider}`);
console.error(` Model: ${options.model}`);
- console.error(` API base: ${endpointUrl}`);
+ console.error(` API base: ${safeEndpointUrl}`);
if (credentialEnv) console.error(` Credential env: ${credentialEnv}`);
console.error(` Upstream error: ${compactText(redact(probe.message || "unknown inference failure"))}`);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const { compactText } = require("../core/url-utils"); | |
| const { redact } = require("../runner"); | |
| console.error(" Onboard inference smoke check failed."); | |
| console.error(` Provider: ${options.provider}`); | |
| console.error(` Model: ${options.model}`); | |
| console.error(` API base: ${endpointUrl}`); | |
| if (credentialEnv) console.error(` Credential env: ${credentialEnv}`); | |
| console.error(` Upstream error: ${compactText(redact(probe.message || "unknown inference failure"))}`); | |
| const { compactText } = require("../core/url-utils"); | |
| const { redact } = require("../runner"); | |
| const safeEndpointUrl = compactText(redact(String(endpointUrl || ""))); | |
| console.error(" Onboard inference smoke check failed."); | |
| console.error(` Provider: ${options.provider}`); | |
| console.error(` Model: ${options.model}`); | |
| console.error(` API base: ${safeEndpointUrl}`); | |
| if (credentialEnv) console.error(` Credential env: ${credentialEnv}`); | |
| console.error(` Upstream error: ${compactText(redact(probe.message || "unknown inference failure"))}`); |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/inference/onboard-probes.ts` around lines 835 - 842, The smoke-check
logging prints endpointUrl raw and may leak credentials; update the log to
redact and compact the URL before printing by passing endpointUrl through
redact() and compactText() like the upstream error, i.e., replace the raw
endpointUrl usage in the Onboard probes logging block that references
endpointUrl with compactText(redact(endpointUrl || "unknown API base")) so the
diagnostics show a redacted URL while preserving existing
provider/model/credentialEnv logging and behavior.
| } | ||
|
|
||
| verifyInferenceRoute(provider, model); | ||
| verifyOnboardInferenceSmoke({ provider, model, endpointUrl, credentialEnv }); |
There was a problem hiding this comment.
Run the smoke probe on the resume fast-path too.
These probes run in setupInference, but onboard() can skip that function when resume && isInferenceRouteReady(provider, model), which allows a resumed run to report success without executing the new smoke validation.
Suggested fix
if (isRoutedInferenceProvider(provider)) {
try {
await reconcileModelRouter();
} catch (err) {
console.error(
` ✗ Failed to reconcile model router: ${err instanceof Error ? err.message : String(err)}`,
);
process.exit(1);
}
}
skippedStepMessage("inference", `${provider} / ${model}`);
+ verifyOnboardInferenceSmoke({ provider, model, endpointUrl, credentialEnv });
if (nimContainer && sandboxName) {
registry.updateSandbox(sandboxName, { nimContainer });
}Also applies to: 7989-7989
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/onboard.ts` at line 7713, When the onboarding code takes the resume
fast-path (when resume && isInferenceRouteReady(provider, model)) it skips
setupInference and therefore skips the smoke probe; ensure
verifyOnboardInferenceSmoke({ provider, model, endpointUrl, credentialEnv }) is
invoked in the onboard resume fast-path before returning success (i.e., add the
same smoke validation call to the branch guarded by resume &&
isInferenceRouteReady in the onboard function), and mirror the same insertion
for the other resume-fast-path occurrence referenced (the other
isInferenceRouteReady/resume branch) so resumed runs run the smoke probe as
well.
Selective E2E Results — ✅ All requested jobs passedRun: 25931101243
|
Selective E2E Results — ✅ All requested jobs passedRun: 25931525914
|
Selective E2E Results — ✅ All requested jobs passedRun: 25932296025
|
Selective E2E Results — ✅ All requested jobs passedRun: 25932590930
|
Selective E2E Results — ✅ All requested jobs passedRun: 25932946889
|
Summary
Fixes #3253 by adding a post-route onboard inference smoke check before setup reports success.
After
openshell inference setand route verification, OpenAI-compatible onboard providers now run a one-shot chat completions probe against the configured endpoint/model. If the probe fails, onboard exits before the success summary and prints actionable diagnostics: provider, model, API base, credential env, and upstream probe error.Validation
npm run build:clibash test/e2e/test-onboard-inference-smoke.shnpm test -- --run test/onboard.test.tsRegression guard
Guard PR #3595 is merged. The fix is complete when this branch flips
onboard-inference-smoke-e2egreen:Related: #3253, #3595
Summary by CodeRabbit