feat(recap): route to protolabs/fast when available, force no-think on Qwen#173
Conversation
…n Qwen The post-agent recap (where-we-left-off card after long turns) was using the main session model — which on Qwen3 / kimi / vLLM-served thinking models meant several seconds of CoT for what should be a 1-3 sentence summary. Two fixes: 1. Recap detects 'protolabs/fast' in the user's configured providers and routes to it. The alias is the gateway's fast non-thinking model; when present, recap latency drops to single-call no-CoT speed. Pipeline.ts gains a small opt-in: `request.config.allowModelOverride` tells it to honor `request.model` instead of the configured default (the existing 'ignore request.model for safety' default stays in place for everyone else, since recap is the one caller that knows the alias resolves). 2. When the alias isn't present, recap falls back to the current model with `thinkingConfig.includeThoughts: false`. Pipeline.ts now translates that into `extra_body.enable_thinking: false` on the wire, which Qwen3 / vLLM actually honor (previously it only skipped the `reasoning` field, which gpt-5/glm respect but Qwen ignored). The injection composes with custom `samplingParams` blocks via a shallow merge into the existing extra_body, so users with custom sampler keys still get no-think for opted-in callers. Tests: - recapGenerator.test.ts: model-picker behavior across present/absent alias plus invariant that thinkingConfig.includeThoughts is always false. - pipeline.test.ts: extra_body.enable_thinking injection on the thinkingConfig path; allowModelOverride respect/ignore; default-ignore preserved when the flag is absent. Net: 5317 core / 3774 cli pass, lint clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughThis pull request implements model selection and configuration propagation for content generation pipelines. The OpenAI pipeline now supports conditional per-request model overrides and ensures thinking suppression is reliably propagated. The recap generator adds logic to prefer routing requests to a fast gateway alias when available. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
packages/core/src/core/openaiContentGenerator/pipeline.test.ts (1)
1344-1367: ⚡ Quick winAdd a regression test that preserves pre-existing
extra_bodykeys while disabling thinking.Current test only covers injection on empty
extra_body. Add a case wheresamplingParams.extra_bodyalready has fields and assert they survive alongsideenable_thinking: false.✅ Suggested additional test
+ it('merges enable_thinking=false into existing samplingParams.extra_body', async () => { + mockContentGeneratorConfig.samplingParams = { + temperature: 0.7, + extra_body: { foo: 'bar' }, + } as ContentGeneratorConfig['samplingParams']; + pipeline = new ContentGenerationPipeline(mockConfig); + + const request: GenerateContentParameters = { + model: 'test-model', + contents: [{ parts: [{ text: 'q' }], role: 'user' }], + config: { thinkingConfig: { includeThoughts: false } }, + }; + (mockConverter.convertGeminiRequestToOpenAI as Mock).mockReturnValue([]); + (mockConverter.convertOpenAIResponseToGemini as Mock).mockReturnValue( + new GenerateContentResponse(), + ); + (mockClient.chat.completions.create as Mock).mockResolvedValue({ + id: 't', + choices: [], + }); + + await pipeline.execute(request, 'p'); + + expect(mockClient.chat.completions.create).toHaveBeenCalledWith( + expect.objectContaining({ + extra_body: { foo: 'bar', enable_thinking: false }, + }), + expect.anything(), + ); + });🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/core/openaiContentGenerator/pipeline.test.ts` around lines 1344 - 1367, Add a regression test that ensures existing extra_body fields are preserved when thinking is disabled: create a new it(...) similar to the current test but have (mockConverter.convertGeminiRequestToOpenAI as Mock).mockReturnValue([{ samplingParams: { extra_body: { existing_key: 'value' } }, /* other needed fields */ }]); call pipeline.execute(request, 'p') with config.thinkingConfig.includeThoughts = false, and assert mockClient.chat.completions.create was called with expect.objectContaining({ extra_body: { existing_key: 'value', enable_thinking: false } }) (and the same signal expectation), so the pre-existing keys survive alongside enable_thinking:false.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/core/src/recap/recapGenerator.ts`:
- Around line 47-51: The selection is using getAllConfiguredModels() but ignores
auth type, so filter the list to only models matching the active model's
auth/provider before checking PREFERRED_RECAP_MODEL_ID: call
getAllConfiguredModels().filter(...) to keep entries whose authType/provider
matches the current config model (compare each model's authType or provider
field to config.getModel()'s corresponding field), then run the existing
.some((m) => m.id === PREFERRED_RECAP_MODEL_ID) and return { model:
PREFERRED_RECAP_MODEL_ID, isOverride: true } only when found; otherwise return {
model: config.getModel(), isOverride: false } as before.
---
Nitpick comments:
In `@packages/core/src/core/openaiContentGenerator/pipeline.test.ts`:
- Around line 1344-1367: Add a regression test that ensures existing extra_body
fields are preserved when thinking is disabled: create a new it(...) similar to
the current test but have (mockConverter.convertGeminiRequestToOpenAI as
Mock).mockReturnValue([{ samplingParams: { extra_body: { existing_key: 'value' }
}, /* other needed fields */ }]); call pipeline.execute(request, 'p') with
config.thinkingConfig.includeThoughts = false, and assert
mockClient.chat.completions.create was called with expect.objectContaining({
extra_body: { existing_key: 'value', enable_thinking: false } }) (and the same
signal expectation), so the pre-existing keys survive alongside
enable_thinking:false.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 26965a06-66fe-4fcd-8412-bf5659e90901
📒 Files selected for processing (4)
packages/core/src/core/openaiContentGenerator/pipeline.test.tspackages/core/src/core/openaiContentGenerator/pipeline.tspackages/core/src/recap/recapGenerator.test.tspackages/core/src/recap/recapGenerator.ts
| const available = config.getModelsConfig().getAllConfiguredModels(); | ||
| if (available.some((m) => m.id === PREFERRED_RECAP_MODEL_ID)) { | ||
| return { model: PREFERRED_RECAP_MODEL_ID, isOverride: true }; | ||
| } | ||
| return { model: config.getModel(), isOverride: false }; |
There was a problem hiding this comment.
Filter fast-alias selection by active model auth type to avoid silent recap misrouting.
On Line 47, getAllConfiguredModels() returns models across all auth types, but selection on Line 48 checks only ID. In mixed-provider configs, this can choose protolabs/fast from a different auth type and silently fail recap (caught and returned as null).
🔧 Proposed fix
function pickRecapModel(config: Config): {
model: string;
isOverride: boolean;
} {
const available = config.getModelsConfig().getAllConfiguredModels();
- if (available.some((m) => m.id === PREFERRED_RECAP_MODEL_ID)) {
+ const currentModel = config.getModel();
+ const currentAuthType = available.find((m) => m.id === currentModel)?.authType;
+ const hasFastForCurrentAuth = available.some(
+ (m) =>
+ m.id === PREFERRED_RECAP_MODEL_ID &&
+ (currentAuthType === undefined || m.authType === currentAuthType),
+ );
+
+ if (hasFastForCurrentAuth) {
return { model: PREFERRED_RECAP_MODEL_ID, isOverride: true };
}
return { model: config.getModel(), isOverride: false };
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/recap/recapGenerator.ts` around lines 47 - 51, The
selection is using getAllConfiguredModels() but ignores auth type, so filter the
list to only models matching the active model's auth/provider before checking
PREFERRED_RECAP_MODEL_ID: call getAllConfiguredModels().filter(...) to keep
entries whose authType/provider matches the current config model (compare each
model's authType or provider field to config.getModel()'s corresponding field),
then run the existing .some((m) => m.id === PREFERRED_RECAP_MODEL_ID) and return
{ model: PREFERRED_RECAP_MODEL_ID, isOverride: true } only when found; otherwise
return { model: config.getModel(), isOverride: false } as before.
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
…hinking-style scaffolding (#174) Empirical regression caught while smoke-testing #173 against the live gateway. Injecting extra_body.enable_thinking=false on the OpenAI pipeline's no-think path didn't disable thinking on protolabs/fast — it triggered the model to dump 6KB of "Here's a thinking process: 1. Analyze User Input..." Qwen-style scaffolding as visible content, truncating at max_tokens before reaching the actual answer. Comparison on the same gateway, same prompt: WITH extra_body.enable_thinking=false (#173 behavior): 6817ms | finish=length | 1500 tokens of thinking scaffolding, never reached the actual recap WITHOUT extra_body (this fix): 752ms | finish=stop | 37 tokens, clean 161-char recap Reverts the extra_body injection from buildReasoningConfig. The `thinkingConfig.includeThoughts: false` hint reverts to its prior semantic: skip the `reasoning` field (which still disables thinking on gpt-5 / glm-4.7 via their reasoning APIs), but do NOT try to flip the model-level enable_thinking flag. For Qwen3 / vLLM-served thinking models that can't be disabled via reasoning APIs, the right fix remains model routing — recap → the protolabs/fast alias landed in #173 and works as intended without any extra_body knob. Tests updated to assert the absence of the injection. Co-authored-by: Automaker <automaker@localhost> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Why
The post-agent recap (the "where we left off" 1-3 sentence card after long turns) was using whatever heavyweight model the main session is on. On Qwen3 / kimi / vLLM-served thinking models, that meant several seconds of CoT for what should be a quick summary — and the user reported it failing to send a message at all in their Qwen instance.
Two layered fixes, both opt-in / non-breaking for other callers.
What
1. Route recap to
protolabs/fastwhen availablerecapGenerator.pickRecapModel()scansconfig.getModelsConfig().getAllConfiguredModels()for theprotolabs/fastalias. When present, recap routes to it. Pipeline gets a small opt-in:```ts
request.config.allowModelOverride = true;
```
…tells the OpenAI pipeline to honor
request.modelinstead of the configured default. The existing "ignore request.model for safety" default stays in place for everyone else — recap is the one caller that knows the alias resolves.2. Make
thinkingConfig.includeThoughts: falseactually disable thinkingPreviously the hint only suppressed the
reasoningfield on the wire. That works for gpt-5 (controlled byreasoning.effort) and glm-4.7 (controlled byextra_body.thinking.enabled), but Qwen3 / vLLM-served models default to thinking-on at the model level — they needextra_body.enable_thinking: falsein the request body to actually skip CoT.buildReasoningConfig()now emitsextra_body.enable_thinking: falsewhenincludeThoughts === false. The injection composes with customsamplingParamsblocks via a shallow merge into existingextra_body, so users with custom sampler keys still get the no-think hint propagated for opted-in callers.Other backends ignore unknown
extra_bodykeys; LiteLLM strips per-model.Test plan
recapGenerator.test.ts(new) — model-picker behavior with alias present / absent, plus invariant thatthinkingConfig.includeThoughtsis alwaysfalsepipeline.test.ts— extra_body injection on the no-think path;allowModelOverriderespect/ignore; default-ignore preserved when the flag is absentnpm run typecheckcleannpm run lintcleanmodel=protolabs/fast🤖 Generated with Claude Code
Summary by CodeRabbit
Tests
Bug Fixes
New Features