fix(recap): drop extra_body.enable_thinking injection — it triggers thinking-style scaffolding#174
Conversation
…hinking-style scaffolding Empirical regression caught while smoke-testing #173 against the live gateway. Injecting extra_body.enable_thinking=false on the OpenAI pipeline's no-think path didn't disable thinking on protolabs/fast — it triggered the model to dump 6KB of "Here's a thinking process: 1. Analyze User Input..." Qwen-style scaffolding as visible content, truncating at max_tokens before reaching the actual answer. Comparison on the same gateway, same prompt: WITH extra_body.enable_thinking=false (#173 behavior): 6817ms | finish=length | 1500 tokens of thinking scaffolding, never reached the actual recap WITHOUT extra_body (this fix): 752ms | finish=stop | 37 tokens, clean 161-char recap Reverts the extra_body injection from buildReasoningConfig. The `thinkingConfig.includeThoughts: false` hint reverts to its prior semantic: skip the `reasoning` field (which still disables thinking on gpt-5 / glm-4.7 via their reasoning APIs), but do NOT try to flip the model-level enable_thinking flag. For Qwen3 / vLLM-served thinking models that can't be disabled via reasoning APIs, the right fix remains model routing — recap → the protolabs/fast alias landed in #173 and works as intended without any extra_body knob. Tests updated to assert the absence of the injection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
WalkthroughThe OpenAI content generator pipeline is modified to simplify thinking configuration handling. When thinking is disabled, the pipeline no longer explicitly sends Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
packages/core/src/core/openaiContentGenerator/pipeline.test.ts (1)
1344-1369:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winExercise the branch this change actually affects.
This test still inherits
mockContentGeneratorConfig.samplingParamsfrombeforeEach, so it does not hit thebuildReasoningConfig()path that previously injectedextra_body. ClearsamplingParamshere, or add a second case, so the assertion validates the real no-thinking regression path.Suggested test setup tweak
it('does NOT inject extra_body.enable_thinking when thinkingConfig.includeThoughts is false', async () => { + mockContentGeneratorConfig.samplingParams = undefined; + pipeline = new ContentGenerationPipeline(mockConfig); + const request: GenerateContentParameters = { model: 'test-model', contents: [{ parts: [{ text: 'q' }], role: 'user' }], config: { thinkingConfig: { includeThoughts: false } }, };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/core/src/core/openaiContentGenerator/pipeline.test.ts` around lines 1344 - 1369, The test "does NOT inject extra_body.enable_thinking when thinkingConfig.includeThoughts is false" still inherits mockContentGeneratorConfig.samplingParams from the beforeEach and so never exercises the buildReasoningConfig() branch that would have injected extra_body; modify the test to clear or override mockContentGeneratorConfig.samplingParams (e.g., set it to undefined or {}) before calling pipeline.execute(request, 'p') so the reasoning-config path runs and the assertion on callArgs['extra_body'] actually validates the no-thinking regression path; reference mockContentGeneratorConfig.samplingParams, buildReasoningConfig(), and pipeline.execute(...) when making the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@packages/core/src/core/openaiContentGenerator/pipeline.test.ts`:
- Around line 1344-1369: The test "does NOT inject extra_body.enable_thinking
when thinkingConfig.includeThoughts is false" still inherits
mockContentGeneratorConfig.samplingParams from the beforeEach and so never
exercises the buildReasoningConfig() branch that would have injected extra_body;
modify the test to clear or override mockContentGeneratorConfig.samplingParams
(e.g., set it to undefined or {}) before calling pipeline.execute(request, 'p')
so the reasoning-config path runs and the assertion on callArgs['extra_body']
actually validates the no-thinking regression path; reference
mockContentGeneratorConfig.samplingParams, buildReasoningConfig(), and
pipeline.execute(...) when making the change.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8f85dbfb-e805-4d20-b993-46759f5fc5d0
📒 Files selected for processing (2)
packages/core/src/core/openaiContentGenerator/pipeline.test.tspackages/core/src/core/openaiContentGenerator/pipeline.ts
Code Coverage Summary
CLI Package - Full Text ReportCore Package - Full Text ReportFor detailed HTML reports, please see the 'coverage-reports-22.x-ubuntu-latest' artifact from the main CI run. |
Why
Smoke-testing #173 against the live gateway caught a real regression in the `extra_body.enable_thinking: false` injection. Injecting the flag didn't disable thinking on `protolabs/fast` — it caused the model to dump Qwen-style internal scaffolding ("Here's a thinking process: 1. Analyze User Input…") as visible content, blowing past max_tokens before reaching the actual answer.
Empirical evidence
Same gateway, same prompt:
The extra_body injection was strictly worse: 9× slower, truncated, and produced no usable summary.
What
Reverts the `extra_body.enable_thinking: false` emit from `buildReasoningConfig()`. The `thinkingConfig.includeThoughts: false` hint goes back to its prior semantic — skip the `reasoning` field (still disables thinking on gpt-5 / glm-4.7 via their reasoning APIs), do NOT try to flip the model-level enable_thinking flag.
For Qwen3 / vLLM-served thinking models that can't be disabled via reasoning APIs, the right fix remains model routing — recap → `protolabs/fast` alias landed in #173 and works as intended without any extra_body knob.
Tests updated:
Test plan
🤖 Generated with Claude Code
Summary by CodeRabbit
Refactor
Tests