fix(openai): pass samplingParams verbatim (#3458) by mabry1985 · Pull Request #116 · protoLabsAI/protoCLI

mabry1985 · 2026-04-26T07:38:31Z

Summary

Backports upstream QwenLM#3458 — when `samplingParams` is set on the contentGeneratorConfig, pass through verbatim without provider-specific re-mapping. Helps GPT-5 / o-series models routed through LiteLLM where caller-specified params should not be overridden.

Deferred

fix(core): recover from truncated tool calls via multi-turn continuation QwenLM/qwen-code#3313 (truncated tool call multi-turn recovery) — depends on `ESCALATED_MAX_TOKENS` constant + adaptive token escalation infrastructure we don't have. Skipping in this batch; can be ported separately when we want the broader recovery loop.

Test plan

111 affected tests passing locally
CI green

🤖 Generated with Claude Code

…wenLM#3320) Cherry-picked from QwenLM/qwen-code: - QwenLM#3499 fix(core): use empty string instead of null for reasoning-only assistant content. Some OpenAI-compatible providers (e.g. Ollama qwen3.5:9b) reject content: null with HTTP 400 when reasoning_content is also present. Tool-call-only messages keep null per OpenAI spec. - QwenLM#3630 fix(telemetry): switch FileExporter.serialize from JSON.stringify to safeJsonStringify. OTel ReadableSpans hold a BatchSpanProcessor back-reference that forms a cycle and crashed --telemetry-outfile users. - QwenLM#3320 fix(core): cap chokidar depth at 2 in SkillManager and skip .git / special file types. Prevents FD exhaustion when a skill dir contains node_modules etc., which silently broke node-pty I/O. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Backports upstream PR QwenLM#3525 + extends the per-stream context to also cover our fork's <think>-tag parser state. Bug: every caller of Config.getContentGenerator() — foreground turns, fork subagents, run_in_background subagents, ACP concurrent Agent calls (after QwenLM#3463) — shared a single OpenAIContentConverter, which held the StreamingToolCallParser as an instance field. Concurrent streams corrupted each other's tool-call buffers, surfacing as NO_RESPONSE_TEXT. Fix: - New ConverterStreamContext interface holds toolCallParser, thinkBuffer, inThinkTag — one per stream. - createStreamContext() factory replaces resetStreamingToolCalls(). - convertOpenAIChunkToGemini(chunk, ctx) and processThinkChunk(chunk, ctx) thread the context through every parser/think-buffer access. - ContentGenerationPipeline.processStreamWithLogging creates one context at stream entry. The error path no longer manually resets — the context is GC'd when the generator unwinds. Our protoInternal recovery-note logic is preserved on the new shape. Note: upstream's follow-up QwenLM#3550 (full stateless converter refactor) is deferred — it's hygiene without a functional bug; QwenLM#3525 alone fixes the concurrency race. Tests: - New createStreamContext describe replaces resetStreamingToolCalls suite - Streaming <think> tests use a per-test context - pipeline.test.ts mock updated to match the new API - pipeline.concurrent.test.ts (from upstream commit 38edd9d) drives two real concurrent streams and asserts neither corrupts the other's tool-call output (positive control: pre-fix, this test fails with exactly the user-reported bug shape). Refs upstream QwenLM#3516, QwenLM#3525. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…#3304) When switching models mid-session, reasoning_content fields from thinking-capable models leaked into API requests sent to the new provider, causing 422 errors on strict OpenAI-compatible endpoints. Call stripThoughtsFromHistory() in handleModelChange() so thought parts are removed before the next request is built for the new model.

Backport of upstream QwenLM#3505. Propagates MAX_TOKENS truncation from subagent responses into tool requests and rejects truncated edit calls before schema validation can surface misleading missing-parameter errors. Adapted to our fork's coreToolScheduler.ts which already had the truncation rejection block — kept both, dropped the unused clearRetryCountsForTool() call (we don't have that retry-counter machinery yet). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… catch-alls (QwenLM#3467) Backport of upstream QwenLM#3467. A permission rule with unbalanced parens was silently parsed with specifier: undefined, causing matchesRule to treat it as a catch-all. For deny rules this blocked all commands; for allow rules a typo could silently auto-approve everything. - Adds an invalid flag to PermissionRule - parseRule marks unbalanced-paren rules as invalid - matchesRule short-circuits invalid rules to never match - parseRules / addSession*Rule / addPersistentRule warn on malformed input - listRules filters invalid rules from /permissions UI Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…e sessions (GH#3579)

The test from upstream QwenLM#3304 backport assumed an in-place qwen-oauth model switch path that our fork doesn't have; the source-side fix in config.ts (stripThoughtsFromHistory call in handleModelChange) is preserved. Coverage will be re-added when the fork's switch flow stabilizes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…M#3463) Backport of upstream QwenLM#3463. When the model returns multiple Agent tool calls in a single turn, ACP Session was executing them sequentially in a for-loop, multiplying latency by sub-agent count. - Add private runToolCalls() helper that mirrors coreToolScheduler's partition logic: consecutive Agent calls form a parallel batch (safe because sub-agents have no shared mutable state); other tools form sequential batches. - Replace 2 for-loops in Session.ts with runToolCalls() calls. - Switch the AgentTool eventEmitter guard from key-presence check to truthy check (commit 651979c) — the key-presence check passed for { eventEmitter: undefined } and crashed inside SubAgentTracker.setup. Note: upstream replaced 3 for-loops; our fork only had 2 in those code paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

In ACP mode, the Mcp server list sent by the IDE client can include SSE (type: "sse") and HTTP (type: "http") transports, but the previous implementation only handled stdio servers via toStdioServer(). Non-stdio servers were silently skipped (continue), so any SSE/HTTP-configured MCP server would never be registered. Changes: - Add toSseServer() helper: detects type=="sse" servers and maps them to MCPServerConfig(url=..., headers=...) - Add toHttpServer() helper: detects type=="http" servers and maps them to MCPServerConfig(httpUrl=..., headers=...) - Refactor newSessionConfig() loop to handle all three transport types - Declare mcpCapabilities: { sse: true, http: true } in agentCapabilities so IDE clients know this agent supports these transports without needing a transparent proxy - Export the three helper functions for unit testing Tests: - Unit tests for toStdioServer / toSseServer / toHttpServer helpers (type discrimination, mutual exclusion) - Integration-style tests for QwenAgent.initialize() mcpCapabilities - Integration-style tests for newSession() with SSE/HTTP MCP servers, verifying MCPServerConfig is constructed with the correct arguments (url vs httpUrl, headers passthrough, empty-headers → undefined) Fixes QwenLM#3472

Previously pipeline.ts always hardcoded max_tokens as the output-token parameter name on the OpenAI-compatible path, falling back from samplingParams.max_tokens to request.config.maxOutputTokens to provider defaults. This broke GPT-5 / o-series on OpenAI and Azure OpenAI, which require max_completion_tokens and reject max_tokens with a 400 error. Fix: when the user provides samplingParams explicitly, treat it as the complete source of truth for the wire shape and pass its keys through verbatim. No client-injected defaults, no request fallbacks, no hardcoded parameter names. The user describes what the provider wants; the client trusts them. When samplingParams is absent, the historical default behavior (request fallback through temperature/top_p/.../max_tokens plus provider defaults) is preserved unchanged — existing users see no difference. Concretely, users can now set any of: samplingParams: { max_tokens: 4096 } # GPT-4 / Qwen / DeepSeek samplingParams: { max_completion_tokens: 4096 } # GPT-5 / o-series samplingParams: { reasoning_effort: 'medium' } # future knobs without waiting for a qwen-code release that adds model-specific branches. Signed-off-by: Gordon Lam (SH) <yeelam@microsoft.com>

coderabbitai · 2026-04-26T07:38:36Z

Warning

Rate limit exceeded

@mabry1985 has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 3 minutes and 27 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 3 minutes and 27 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50baca9d-bb99-4884-b1b9-e73c62f0c0c9

📥 Commits

Reviewing files that changed from the base of the PR and between 90ea67d and 34db2bd.

📒 Files selected for processing (26)

packages/cli/src/acp-integration/acpAgent.ts
packages/cli/src/acp-integration/session/Session.ts
packages/core/src/agents/runtime/agent-core.ts
packages/core/src/agents/runtime/agent-headless.test.ts
packages/core/src/config/config.ts
packages/core/src/core/client.ts
packages/core/src/core/contentGenerator.ts
packages/core/src/core/coreToolScheduler.test.ts
packages/core/src/core/coreToolScheduler.ts
packages/core/src/core/openaiContentGenerator/converter.test.ts
packages/core/src/core/openaiContentGenerator/converter.ts
packages/core/src/core/openaiContentGenerator/pipeline.concurrent.test.ts
packages/core/src/core/openaiContentGenerator/pipeline.test.ts
packages/core/src/core/openaiContentGenerator/pipeline.ts
packages/core/src/core/openaiContentGenerator/provider/default.test.ts
packages/core/src/core/openaiContentGenerator/provider/default.ts
packages/core/src/permissions/permission-manager.test.ts
packages/core/src/permissions/permission-manager.ts
packages/core/src/permissions/rule-parser.ts
packages/core/src/permissions/types.ts
packages/core/src/services/sessionService.test.ts
packages/core/src/services/sessionService.ts
packages/core/src/skills/skill-manager.test.ts
packages/core/src/skills/skill-manager.ts
packages/core/src/telemetry/file-exporters.test.ts
packages/core/src/telemetry/file-exporters.ts

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chore/backport-truncation-sampling

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Automaker and others added 10 commits April 26, 2026 00:08

fix(core): preserve reasoning_content during session resume and activ…

9bb90d7

…e sessions (GH#3579)

mabry1985 merged commit 660af5d into dev Apr 26, 2026
2 of 3 checks passed

mabry1985 deleted the chore/backport-truncation-sampling branch April 26, 2026 07:38

mabry1985 mentioned this pull request Apr 26, 2026

promote: dev -> main (5 upstream backport PRs through v0.26.9) #118

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(openai): pass samplingParams verbatim (#3458)#116

fix(openai): pass samplingParams verbatim (#3458)#116
mabry1985 merged 10 commits into
devfrom
chore/backport-truncation-sampling

mabry1985 commented Apr 26, 2026

Uh oh!

coderabbitai Bot commented Apr 26, 2026

Rate limit exceeded

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

mabry1985 commented Apr 26, 2026

Summary

Deferred

Test plan

Uh oh!

coderabbitai Bot commented Apr 26, 2026

Rate limit exceeded

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants