Skip to content

Fix issue #654 Respect --model parameter when using custom base URL#30

Open
BingqingLyu wants to merge 1 commit into
mainfrom
fork-pr-694-issue-654
Open

Fix issue #654 Respect --model parameter when using custom base URL#30
BingqingLyu wants to merge 1 commit into
mainfrom
fork-pr-694-issue-654

Conversation

@BingqingLyu

@BingqingLyu BingqingLyu commented Apr 27, 2026

Copy link
Copy Markdown
Owner

TLDR

Fixed the --model parameter being ignored when using a custom base URL. The CLI now correctly respects the model specified via -m or --model flags when using alternative API endpoints.

Dive Deeper

The issue was in the createContentGeneratorConfig function in packages/core/src/core/contentGenerator.ts. When using OpenAI auth type (triggered when a custom base URL is set), the code was overriding the model selection with the OPENAI_MODEL environment variable or falling back to DEFAULT_QWEN_MODEL, completely bypassing the model specified through the CLI's --model parameter.

The fix ensures proper priority ordering:

  1. Model from config (includes CLI --model parameter) - highest priority
  2. Environment variable (OPENAI_MODEL or QWEN_MODEL) - fallback
  3. Default model constant - last resort

This same logic was applied to both OpenAI and QWEN OAuth authentication paths to ensure consistency.

Reviewer Test Plan

To validate this fix:

  1. Test with custom base URL and model parameter:

    export OPENAI_API_KEY="your-key"
    export OPENAI_BASE_URL="https://your-custom-endpoint.com"
    qwen --model gpt-4 "test prompt"
    # Should use gpt-4, not qwen3-coder-plus or coder-model
  2. Test that environment variable still works as fallback:

    export OPENAI_MODEL="gpt-3.5-turbo"
    qwen "test prompt"  # No --model flag
    # Should use gpt-3.5-turbo from env var
  3. Test precedence (CLI flag should override env var):

    export OPENAI_MODEL="gpt-3.5-turbo"
    qwen --model gpt-4 "test prompt"
    # Should use gpt-4 from CLI, not gpt-3.5-turbo from env

Testing Matrix

🍏 🪟 🐧
npm run
npx
Docker
Podman - -
Seatbelt - -

Linked issues / bugs

Fixes QwenLM#654

@BingqingLyu

BingqingLyu commented May 7, 2026

Copy link
Copy Markdown
Owner Author

Conflict Group 1

This PR shares modified functions with 2 other PR(s): #10, #25.

These PRs should be reviewed as a batch — merging one may affect the others.

Function File Also modified by
getSeedSource contentGenerator.ts #25
setSource contentGenerator.ts #10, #25
graph LR
    PR30["PR #30"]
    FgetSeedSource_5386["getSeedSource<br>contentGenerator.ts"]
    PR30 -->|modifies| FgetSeedSource_5386
    PR25["PR #25"]
    PR25 -->|modifies| FgetSeedSource_5386
    FsetSource_5386["setSource<br>contentGenerator.ts"]
    PR30 -->|modifies| FsetSource_5386
    PR10["PR #10"]
    PR10 -->|modifies| FsetSource_5386
    PR25 -->|modifies| FsetSource_5386
Loading

Posted by codegraph-ai conflict detection.

BingqingLyu pushed a commit that referenced this pull request May 7, 2026
…rovider (QwenLM#3788)

* fix(core): inject thinking blocks for DeepSeek anthropic-compatible provider

DeepSeek's anthropic-compatible endpoint
(https://api.deepseek.com/anthropic) rejects follow-up requests with
HTTP 400 ("The content[].thinking in the thinking mode must be passed
back to the API.") whenever a prior assistant turn carrying tool_use
omits a thinking block. The model can legitimately return a tool round
without thinking text, so qwen-code stored no thought parts and rebuilt
the next request with no thinking block, tripping the API's check.

Mirroring the existing OpenAI-side fix (QwenLM#3729, QwenLM#3747), the converter
now detects DeepSeek by base URL or model name and prepends an empty
{ type: 'thinking', thinking: '', signature: '' } block to assistant
turns missing one. Other anthropic-protocol providers are unaffected.

Verified against the live api.deepseek.com/anthropic endpoint:
- assistant with tool_use, no thinking → 400 (reproduces QwenLM#3786)
- assistant with tool_use, empty thinking injected → 200 OK

Refs QwenLM#3786

* fix(core): gate DeepSeek thinking-block injection on thinking mode

Address PR review feedback:

1. (Critical) Gate empty-thinking injection on the same per-request
   condition that emits the top-level `thinking` parameter. The previous
   implementation injected unconditionally on DeepSeek providers, but
   `buildThinkingConfig()` may omit `thinking` when reasoning=false or
   `thinkingConfig.includeThoughts=false` — which is exactly what
   suggestionGenerator / ArenaManager / forkedAgent do. Shipping
   thinking blocks without enabling thinking mode is a protocol
   violation that DeepSeek may reject. Move the option from converter
   constructor to a per-request `convertGeminiRequestToAnthropic`
   parameter so the generator can compute the gate correctly.

2. (CodeQL) Replace `baseUrl.includes('api.deepseek.com')` with
   `new URL(baseUrl).hostname` exact-match. The substring check would
   accept spoofed hosts like `api.deepseek.com.evil.com`.

3. Document the empty-signature workaround inline.

4. Rename the misleading "redacted_thinking" test case.

* fix(core): narrow DeepSeek thinking injection to tool_use turns + subdomain test

Address PR review round 2:

1. Narrow injection scope to assistant turns containing tool_use. Live
   verification against api.deepseek.com/anthropic showed plain-text
   assistant turns without thinking are accepted unchanged — only
   tool_use turns trigger the HTTP 400. Injecting on every assistant
   turn unnecessarily bloats replay history with synthetic blocks the
   API does not require. Existing thinking blocks on any turn are still
   preserved untouched.

2. Add test coverage for the subdomain hostname branch
   (us.api.deepseek.com → matches), addressing the gap noted in review.

3. Update existing negative-case tests (non-deepseek / spoofed /
   reasoning=false / includeThoughts=false) to use tool_use scenarios
   so they actually exercise the gating logic instead of trivially
   passing under the narrowed scope.

* docs(core): align DeepSeek thinking-injection comments with narrowed scope

Address PR review round 3 (copilot-pull-request-reviewer × 3): comments
in three locations still described the constraint as applying to "any
prior assistant turn", which was true before commit 8721b41 but no
longer matches the implementation. Update the doc comment on
isDeepSeekAnthropicProvider and the two test-suite header comments to
state the actual narrower contract: the API rejects only tool-use turns
that omit thinking blocks; plain-text assistant turns are accepted
unchanged.

Comment-only change; 58 tests still pass.

* fix(core): per-request DeepSeek detection + strip thinking when off

Address PR review round 4 (copilot-pull-request-reviewer × 2):

1. Stale provider-detection cache (HCia). The constructor cached
   isDeepSeekProvider once, but Config.setModel() mutates
   contentGeneratorConfig.model in place. After a runtime /model switch
   from a non-DeepSeek model to a DeepSeek one on the same auth config,
   buildRequest() would keep using the stale flag. Move the detection
   into buildRequest so each call sees the current model. The detector
   is cheap (URL parse + string compare).

2. Real thought parts leak through when thinking is disabled (HCib).
   The previous gate only blocked synthetic injection — but the
   converter still replayed any existing `thought: true` parts in
   request.contents as thinking blocks. Code paths that disable
   thinking against a session whose history was built with thinking on
   (suggestionGenerator / ArenaManager / forkedAgent) would still emit
   thinking blocks alongside an absent top-level `thinking` config —
   the same protocol mismatch the gate was meant to avoid.

   Add a `stripAssistantThinking` converter option, set in buildRequest
   to `isDeepSeek && !thinking`. The converter strips thinking and
   redacted_thinking blocks from assistant messages before message
   construction completes. Mirror behavior is already proven safe by
   live verification (DeepSeek currently tolerates either shape, but
   stripping makes the request body internally consistent and robust
   to future validation tightening).

3 new tests:
- converter strips thinking from assistant turns when option set
- generator strips real thought parts when reasoning=false
- generator reflects runtime model changes (no stale cache)

61 tests pass; lint + typecheck clean.

* fix(core): preserve thinking-only assistant turns instead of emitting empty content

Address PR review round 5 (copilot-pull-request-reviewer × 2 — code +
test):

stripThinkingFromAssistantMessages previously replaced message.content
with the filtered array unconditionally. For an assistant turn whose
only blocks are thinking/redacted_thinking (e.g. a round cut off by
max_tokens before any text or tool_use was emitted), this left
`content: []` — which Anthropic API rejects.

Dropping the message entirely was considered but would break the
required user/assistant alternation. Instead, fall back to leaving the
original blocks in place when stripping would empty the message.
DeepSeek empirically tolerates the residual `thinking-block +
no-thinking-config` shape (verified against api.deepseek.com/anthropic
in the V2/X scenarios), so leaving the message untouched is the safer
choice than emitting invalid structure.

Add regression test for the thinking-only turn shape.

62 tests pass; lint + typecheck clean.

* fix(core): validate thinking-block signature, rename option, gate output_config

Address PR review round 6 — five substantive items:

1. (Critical) Drop non-compliant thinking blocks lacking a `signature`
   field and replace them with a synthetic one. A `redacted_thinking`
   block round-tripped through Gemini Part format becomes
   `{ text: '', thought: true }` (no thoughtSignature) and converts
   back to `{ type: 'thinking', thinking: '' }` without `signature` —
   not spec-compliant. The previous `hasThinking` check accepted these
   as already-satisfying, leaving non-compliant blocks in the wire
   message. Tighten the check so they're filtered out and the
   synthetic injection runs. Live verification: DeepSeek currently
   tolerates both shapes (lenient), but normalizing is defensively
   correct against future tightening.

2. Rename converter option `ensureAssistantThinking` →
   `ensureThinkingOnToolUseTurns`. The new name reflects the actual
   contract (tool-use turns only, not every assistant turn).

3. Honor `thinkingConfig.includeThoughts: false` in `buildOutputConfig`.
   Previously a per-request opt-out dropped the top-level `thinking`
   parameter but still emitted `output_config.effort`, leaking a
   reasoning-shaped field into side queries that don't want it.

4. Add regression test for mixed text + tool_use assistant turns
   (common shape: model says something, then calls a tool).

5. Add explicit test for the signature-validation path: an existing
   compliant thinking block (with signature) is preserved untouched.

64 tests pass; lint + typecheck clean.

* fix(core): clean up non-compliant thinking blocks on plain-text turns + assert output_config gating

Address PR review round 7 (copilot-pull-request-reviewer × 2):

1. (Hp-x) Round-tripped redacted_thinking blocks were left malformed on
   assistant turns lacking tool_use. The previous structure only ran
   the cleanup pass when a tool_use block was present (early return on
   `!hasToolUse`), so plain-text turns kept the non-compliant
   `{ type: 'thinking', thinking: '' }` shape. Restructure into two
   sequential steps:
     a. Drop non-compliant thinking blocks (no `signature`) on every
        assistant turn — same fallback that avoids `content: []` if
        the message is thinking-only.
     b. Inject the synthetic empty thinking block on tool_use turns
        that still lack a compliant thinking block after step (a).

2. (Hp-7) The includeThoughts=false test asserted that the top-level
   `thinking` field is suppressed but didn't cover `output_config`,
   leaving regressions in the new `buildOutputConfig` gate uncaught.
   Tighten the assertion to also verify `output_config` is absent.

3. New converter test: cleanup runs on plain-text assistant turns too.

65 tests pass; lint + typecheck clean.

* test(core): add explicit redacted_thinking injection-path coverage

Address PR review round 8 (#30 — copilot reviewer). The converter
treats `redacted_thinking` as already satisfying the thinking-block
requirement (no synthetic injected), distinguished from a
signature-less `thinking` block which is non-compliant and gets
dropped/replaced. Existing tests covered the latter path; this adds
explicit coverage of the former.

processContent doesn't synthesize redacted_thinking from Gemini parts,
so the test reaches into the private helper directly. (#31 — subdomain
hostname coverage — already exists at line 602.)

66 tests pass; lint + typecheck clean.

* fix(core): per-request anthropic-beta + normalize thinking-only turns

Address PR review round 9 (copilot-pull-request-reviewer × 2):

1. (Hydz) thinking-only assistant turns (e.g. max_tokens cutoff or
   round-tripped redacted_thinking) hit the cleanup-empties fallback
   and kept the original non-compliant `{ type: 'thinking',
   thinking: '' }` block. The fallback now replaces the message with
   a synthetic empty thinking block (`signature: ''` included), which
   keeps the message non-empty AND spec-compliant.

2. (Hyd4) `anthropic-beta` was set once at construction from the global
   `reasoning` config, so requests with per-request
   `thinkingConfig.includeThoughts=false` still advertised
   interleaved-thinking / effort even though the body had dropped the
   matching fields. Move beta computation to a new
   `buildPerRequestHeaders` that derives the header from the actual
   `thinking` / `output_config` fields present in the request body, and
   pass it via `messages.create(..., { headers })`. The wire shape is
   now internally consistent.

Test updates:
- Drop the three constructor-time beta assertions; they no longer apply.
- Add four per-request header tests covering: both betas present,
  only interleaved-thinking, reasoning=false (no betas), and per-request
  includeThoughts=false (no betas).

67 tests pass; lint + typecheck clean.

* fix(core): preserve thinking text by normalizing in place + merge user beta flags

Address PR review round 10 (copilot-pull-request-reviewer × 2):

1. (H0oF) The previous cleanup filtered out every thinking block missing
   a `signature` field. But that shape is the normal output from
   OpenAI/Gemini/agent-runtime generators, which only set `thought:
   true` without a signature. Users switching providers mid-session
   would silently lose preserved thinking text on the first DeepSeek
   request. Change Step 1 to NORMALIZE in place: when a thinking block
   has no signature, set `signature: ''` rather than dropping the
   block. The original `thinking` text is preserved; DeepSeek
   empirically accepts empty signatures so the wire shape stays valid.

2. (H0oL) `buildPerRequestHeaders()` overwrote
   `customHeaders['anthropic-beta']` whenever the per-request override
   fired, regressing the customHeaders escape hatch for unrelated
   Anthropic beta features. Merge the user's flags into the computed
   list (deduped) so users can stack their own betas alongside
   interleaved-thinking / effort.

Test changes:
- Renamed and rewrote "drops non-compliant... plain-text" test to
  assert in-place normalization that preserves thinking text.
- Updated "replaces a non-compliant thinking block" comment + name to
  describe the normalization (the assertion was already correct because
  the test happened to use empty thinking text).
- The empty-content fallback in Step 1 is no longer reachable under
  the new logic, so the dedicated thinking-only-turn test now exercises
  only the strip path (where it remains relevant).
- Added 3 customHeaders[anthropic-beta] tests: merge with computed,
  passthrough when no thinking/effort, dedupe.

70 tests pass; lint + typecheck clean.

* docs(core): align thinking-injection comments with normalize semantics + add stream test

Address PR review round 11 (copilot-pull-request-reviewer × 3):

1. (H3Iw) Update the `ensureThinkingOnToolUseTurns` option docstring
   to describe in-place normalization (preserving thinking text by
   filling in `signature: ''`) instead of the old drop-and-replace
   semantics.

2. (H3I9) Same update on the `applyEmptyThinkingToToolUseTurns` helper
   JSDoc — clarify that signature-less thinking blocks are normalized
   in place (preserving original text), not dropped. Mention the
   common case of cross-provider history where non-Anthropic
   generators only set `thought: true`.

3. (H3I3) Add a streaming test asserting that
   `generateContentStream()` also attaches the per-request
   `anthropic-beta` header. The previous coverage only exercised
   `generateContent()`, leaving the streaming path's separate code
   path (line 144 in anthropicContentGenerator.ts) unverified.

71 tests pass; lint + typecheck clean.

* refactor(core): split DeepSeek thinking option in two + add header coexistence test

Address PR review round 12 (copilot-pull-request-reviewer × 2):

1. (H6ws) The single `ensureThinkingOnToolUseTurns` option was
   misleadingly narrow: the implementation also rewrote non-tool-use
   turns by normalizing malformed thinking blocks. Future callers
   could enable it expecting only the tool-use behavior. Split into
   two precisely-named options:
     - normalizeAssistantThinkingSignature: fill missing `signature`
       on every assistant `thinking` block (cross-provider history
       compat).
     - injectThinkingOnToolUseTurns: prepend synthetic empty thinking
       on tool_use turns missing one (issue QwenLM#3786 trigger).
   The generator wires both together for DeepSeek when thinking mode
   is on; either can be used independently if a future caller needs
   only one pass.

2. (H6w4) Add a test asserting that the per-request `headers` path
   coexists correctly with `customHeaders`: User-Agent and unrelated
   customHeaders entries stay in `defaultHeaders` while only the
   computed `anthropic-beta` rides on the per-request path. Defends
   against a future regression where header config might be routed
   through a code path that wipes the constructor defaults.

72 tests pass; lint + typecheck clean.

* fix(core): case-insensitive customHeaders[anthropic-beta] merge

Address yiliang114 review feedback (QwenLM#3788).

HTTP header names are case-insensitive by spec, and the Anthropic SDK
lower-cases them during merge. Previously buildPerRequestHeaders only
read the lower-case `anthropic-beta` key from customHeaders, so a
user-configured `Anthropic-Beta` or `ANTHROPIC-BETA` would be silently
overwritten by the per-request computed value.

Replace the direct dict lookup with collectCustomBetaFlags() which
walks all customHeaders entries and matches the key case-insensitively.
Multiple matching entries (unlikely but possible) are concatenated; the
existing dedupe pass handles any duplicates.

Add a regression test for both `Anthropic-Beta` and `ANTHROPIC-BETA`
key shapes.

73 tests pass; lint + typecheck clean.

* docs(core): align thinking-injection docs with normalize-in-place semantics + redacted_thinking strip test

Address PR review round 14 (copilot-pull-request-reviewer × 4):

1. (IAl4) PR description still described "dropped here so synthetic
   injection takes over" but the implementation now normalizes
   signature-less thinking blocks in place (preserving text). PR
   description rewritten to describe the two-pass model:
   normalize-in-place + injection-when-truly-missing.

2. (IAl7) `injectThinkingOnToolUseTurns` option docstring claimed
   signature-less blocks would be "seen as missing" so the synthetic
   replaces them. Updated to describe the actual flow: the
   normalization pass runs first, blocks become compliant in place,
   the injector then sees them as already-satisfying and prepends
   nothing. Helper JSDoc on `injectEmptyThinkingOnToolUseTurns` fixed
   the same way.

3. (IAl8) Strip-path coverage missed `redacted_thinking` blocks. Added
   regression test that verifies both thinking and redacted_thinking
   blocks are removed when `stripAssistantThinking` is set.

4. (IAl-) Renamed the converter test suite from "thinking-mode
   injection + normalization (DeepSeek thinking on)" to "DeepSeek
   thinking-mode normalization, injection, and stripping" so the
   title accurately covers all behavior the block exercises (including
   `stripAssistantThinking` cases later in the same describe).

74 tests pass; lint + typecheck clean.

* fix(core): exclude anthropic-beta variants from defaultHeaders to avoid wire duplication

Address PR review round 15 (copilot-pull-request-reviewer #1).

`buildHeaders()` previously spread the entire `customHeaders` map into
the SDK's `defaultHeaders`. After moving anthropic-beta computation to
the per-request path, a user-configured mixed-case `Anthropic-Beta`
key would survive in defaultHeaders verbatim, while the per-request
override added a lowercase `anthropic-beta`. The wire then carried two
physical headers for the same logical name — SDK behavior on duplicate
headers with different casings is undefined.

`buildPerRequestHeaders()` already merges those user flags
case-insensitively (commit 0d8b5de), so dropping the entry from
defaultHeaders is the right boundary: the per-request path owns the
header end-to-end. Other customHeaders entries continue to pass
through.

Add a regression test asserting no `Anthropic-Beta` (any casing) lands
in defaultHeaders while unrelated customHeaders are kept.

75 tests pass; lint + typecheck clean.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

conflicting-group-1 Conflicting PR group 1 — review as a batch conflicting-pr Shares at least one cross-PR dependency with other PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--model parameter ignored when using a different baseurl

2 participants