fix(inference): use vLLM runtime max_model_len for context window#4689
Conversation
Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds vLLM runtime context window auto-detection: a helper reads ChangesvLLM Runtime Context Window Auto-Detection
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
E2E Advisor RecommendationRequired E2E: Dispatch hint: Full advisor summaryE2E Recommendation AdvisorBase: Required E2E
Optional E2E
New E2E recommendations
Dispatch hint
|
E2E Scenario Advisor RecommendationRequired scenario E2E: None Full scenario advisor summaryE2E Scenario AdvisorBase: Required scenario E2E
Optional scenario E2E
Relevant changed files
|
PR Review AdvisorFindings: 0 needs attention, 2 worth checking, 0 nice ideas Review findings🛠️ Needs attention
🔎 Worth checking
🌱 Nice ideas
Since last review detailsCurrent findings:
This is an automated advisory review. A human maintainer must make the final merge decision. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/onboard-selection.test.ts (1)
570-577: ⚡ Quick winUnset
NEMOCLAW_CONTEXT_WINDOWinstead of passing an empty string.This test is meant to cover the "no explicit override" path, but
NEMOCLAW_CONTEXT_WINDOW: ""still presents the variable to the child process. That can mask regressions if the production code ever distinguishes unset from empty. Build the child env first anddeletethe key so the fixture matches the contract you're asserting.♻️ Proposed fix
- const result = spawnSync(process.execPath, [scriptPath], { + const childEnv = { + ...process.env, + HOME: tmpDir, + PATH: `${fakeBin}:${process.env.PATH || ""}`, + NEMOCLAW_EXPERIMENTAL: "", + NEMOCLAW_PROVIDER: "", + }; + delete childEnv.NEMOCLAW_CONTEXT_WINDOW; + + const result = spawnSync(process.execPath, [scriptPath], { cwd: repoRoot, encoding: "utf-8", - env: { - ...process.env, - HOME: tmpDir, - PATH: `${fakeBin}:${process.env.PATH || ""}`, - NEMOCLAW_EXPERIMENTAL: "", - NEMOCLAW_PROVIDER: "", - NEMOCLAW_CONTEXT_WINDOW: "", - }, + env: childEnv, });🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@test/onboard-selection.test.ts` around lines 570 - 577, The test currently sets NEMOCLAW_CONTEXT_WINDOW to an empty string in the child process env, which differs from an unset variable; modify the test to construct the env object (spreading process.env, HOME: tmpDir, PATH: `${fakeBin}:${process.env.PATH || ""}`, etc.), then explicitly remove the key (delete env.NEMOCLAW_CONTEXT_WINDOW) before passing it to the child process so the child sees the variable as unset; update the code around the env construction in onboard-selection.test.ts (look for the env object creation referencing tmpDir and fakeBin) to perform the delete rather than assigning "".
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/onboard.ts`:
- Line 5082: The call to
localInference.applyVllmRuntimeContextWindow(vllmModels, detectedModel) mutates
global onboarding state (NEMOCLAW_CONTEXT_WINDOW) before
validateOpenAiLikeSelection(...) completes, allowing the vLLM context window to
leak if validation does a continue in selectionLoop; move or delay the mutation
so it only runs after validateOpenAiLikeSelection returns success. Specifically,
remove or defer the applyVllmRuntimeContextWindow invocation at the current spot
and instead invoke it immediately after validateOpenAiLikeSelection(...)
confirms the selection (inside the same control path where selectionLoop would
not continue), ensuring the mutation occurs only for confirmed vLLM selections
(reference localInference.applyVllmRuntimeContextWindow,
validateOpenAiLikeSelection, NEMOCLAW_CONTEXT_WINDOW, and selectionLoop).
---
Nitpick comments:
In `@test/onboard-selection.test.ts`:
- Around line 570-577: The test currently sets NEMOCLAW_CONTEXT_WINDOW to an
empty string in the child process env, which differs from an unset variable;
modify the test to construct the env object (spreading process.env, HOME:
tmpDir, PATH: `${fakeBin}:${process.env.PATH || ""}`, etc.), then explicitly
remove the key (delete env.NEMOCLAW_CONTEXT_WINDOW) before passing it to the
child process so the child sees the variable as unset; update the code around
the env construction in onboard-selection.test.ts (look for the env object
creation referencing tmpDir and fakeBin) to perform the delete rather than
assigning "".
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: b5f4aeca-8945-41a4-b526-9d59a52e174e
📒 Files selected for processing (5)
src/lib/inference/local.tssrc/lib/inference/vllm-runtime-context.test.tssrc/lib/inference/vllm-runtime-context.tssrc/lib/onboard.tstest/onboard-selection.test.ts
|
@zyang-dev can you verify the coderabbit/pr advisor findings, please? |
Signed-off-by: zyang-dev <267119621+zyang-dev@users.noreply.github.com>
|
@cv addressed coderabbit/pr advisor findings. |
Summary
Detect vLLM's runtime context window from
/v1/models.max_model_lenduring onboarding instead of leaving local vLLM sandboxes on NemoClaw's default context size. This lets generated OpenClaw config reflect the actual vLLM server limit.Changes
/v1/models.max_model_lentoNEMOCLAW_CONTEXT_WINDOWbefore sandbox config generation.max_model_len.Type of Change
Verification
npx prek run --all-filespassesnpm testpassesnpm run docsbuilds without warnings (doc changes only)Signed-off-by: zyang-dev 267119621+zyang-dev@users.noreply.github.com
Summary by CodeRabbit
New Features
Bug Fixes
Tests