Skip to content

fix(core): use per-model settings for fast model side queries#3815

Merged
wenshao merged 8 commits into
QwenLM:mainfrom
B-A-M-N:fix/fast-model-side-query-settings
May 5, 2026
Merged

fix(core): use per-model settings for fast model side queries#3815
wenshao merged 8 commits into
QwenLM:mainfrom
B-A-M-N:fix/fast-model-side-query-settings

Conversation

@B-A-M-N

@B-A-M-N B-A-M-N commented May 3, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes #3765. When side queries (session recap, title generation, tool-use summary) run on the fast model, they previously used the main model's ContentGeneratorConfig, causing extra_body, samplingParams, and reasoning settings to leak from the main model into fast model requests.

For example, if the main model has extra_body.enable_thinking: true and the fast model has extra_body.enable_thinking: false, side queries would still send enable_thinking: true to the fast model because the main model's config was used.

Changes

  • `client.ts`: In GeminiClient.generateContent(), when the requested model differs from the main model, resolve the target model's own ContentGeneratorConfig via buildAgentContentGeneratorConfig() (same pattern used for subagents) and create a dedicated ContentGenerator with the correct per-model settings.

    Falls back to the main content generator if the model is not in the registry or if creation fails (e.g. test environments without full auth).

  • `client.test.ts`: Add 2 tests:

    • Verifies getResolvedModel is called when model differs from main
    • Verifies getResolvedModel is NOT called when model matches main

Validation

  • npm run build — 0 errors
  • npx vitest run packages/core/src/core/client.test.ts — 67 tests pass
  • npx vitest run packages/core/src/services/sessionRecap.test.ts — passes
  • npx vitest run packages/core/src/services/sessionTitle.test.ts — 15 tests pass
  • npx vitest run packages/core/src/services/toolUseSummary.test.ts — 35 tests pass
  • Pre-commit hooks (prettier + eslint) pass

Risk

Low. Only affects the code path where model !== mainModel in generateContent(). The main model path is unchanged. The fallback to the main content generator ensures graceful degradation if the target model is not in the registry.

@B-A-M-N B-A-M-N force-pushed the fix/fast-model-side-query-settings branch 2 times, most recently from 2af695d to 3bbad53 Compare May 3, 2026 22:59
// Note: there is currently no "fallback mode" model routing; the model used
// is always the one explicitly requested by the caller.
});

@wenshao wenshao May 4, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] createContentGeneratorForModel has zero test coverage for its success path.

The test "should resolve per-model config when the requested model differs" explicitly acknowledges reliance on createContentGenerator implicitly failing in the test environment ("Since createContentGenerator will fail in the test env (no auth), it falls back to the main content generator"), but never mocks createContentGenerator to verify the success path. The core purpose of this PR — calling buildAgentContentGeneratorConfig to construct the correct per-model config for the target model — is untested.

Impact: Regressions in buildAgentContentGeneratorConfig or createContentGenerator integration would pass all existing tests. Per-model extra_body, samplingParams, and other behaviors that customers rely on are untested.

Suggested change
// Mock createContentGenerator to return a test double, then assert:
// (a) buildAgentContentGeneratorConfig receives the correct arguments (authType, baseUrl, envKey from resolvedModel)
// (b) the returned dedicated content generator is actually called instead of mockContentGenerator

— deepseek-v4-pro via Qwen Code /review

Comment thread packages/core/src/core/client.ts Outdated
);

return await createContentGenerator(targetConfig, this.config);
} catch {

@wenshao wenshao May 4, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The catch {} block in createContentGeneratorForModel is completely silent — zero logging.

If createContentGenerator throws due to validation failure (missing API key, unsupported authType), all diagnostic information is discarded. In production, the fast model appears to "work" (falling back to the main generator), but abnormal behavior becomes difficult to diagnose.

Suggested change
} catch {
} catch (err: unknown) {
debugLogger.warn(
`Failed to create content generator for model "${model}", falling back to main generator.`,
err instanceof Error ? err.message : String(err),
);
return this.getContentGeneratorOrFail();
}

— deepseek-v4-pro via Qwen Code /review

COMPRESSION_TOKEN_THRESHOLD,
} from '../services/chatCompressionService.js';
import { LoopDetectionService } from '../services/loopDetectionService.js';
import { buildAgentContentGeneratorConfig } from '../models/content-generator-config.js';

@wenshao wenshao May 4, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] buildAgentContentGeneratorConfig is imported from ../models/content-generator-config.js, but placed under the // Services section comment.

This import comes from ../models/, while all other imports are consistent with their section titles. This breaks the carefully maintained import grouping convention in the file.

Suggested change
import { buildAgentContentGeneratorConfig } from '../models/content-generator-config.js';
// Move this import to after the // Services section and before the // Tools section, or create a new // Models section

— deepseek-v4-pro via Qwen Code /review

B-A-M-N added a commit to B-A-M-N/qwen-code that referenced this pull request May 4, 2026
…ueries

Three changes from the review:

1. Test coverage for createContentGenerator success path — Added a test that
   mocks createContentGenerator to return a dedicated content generator and
   verifies it is used instead of the main content generator. Includes a
   module-level vi.mock for createContentGenerator with beforeEach override.

2. Debug logging in createContentGeneratorForModel catch block — Added
   debugLogger.warn with model name and error details so production failures
   (missing API key, unsupported authType) are diagnosable instead of silently
   falling back to the main generator.

3. Import grouping — Moved buildAgentContentGeneratorConfig import from
   under // Services to a new // Models section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@B-A-M-N

B-A-M-N commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for the review — all three items have been addressed.

createContentGenerator success path test coverage — Added a test (should use a dedicated content generator for the fast model on success) that mocks createContentGenerator to return a test double, then verifies:

  • buildAgentContentGeneratorConfig is called with the correct arguments (config, model ID, resolved model settings)
  • The dedicated content generator's generateContent is called instead of the main one
  • The main content generator is NOT called

Silent catch block — Added debugLogger.warn with the model name and error details in the catch block of createContentGeneratorForModel, so production failures (missing API key, unsupported authType) are diagnosable.

Import grouping — Moved buildAgentContentGeneratorConfig import from under // Services to a new // Models section.

Validation:

  • npm run build — 0 errors
  • npx vitest run packages/core/src/core/client.test.ts — 66 tests passed
  • Pre-commit hooks (prettier + eslint) — passed

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: retryWithBackoff uses main model's authType for fast model queries

In generateContent (line ~1117), retryWithBackoff receives this.config.getContentGeneratorConfig()?.authType — the main model's authType. After this PR, the content generator is now per-model, but the retry logic still uses the main model's authType. In retry.ts, QWEN_OAUTH quota exhaustion detection checks against the wrong provider when the fast model differs from the main model. Currently protected by getFastModel() which validates same authType, but this is a future maintenance risk if cross-provider fast models are ever supported.

Suggestion: Missing test for !resolvedModel fallback path

The createContentGeneratorForModel method has a if (!resolvedModel) early return branch that is not covered by tests. The existing fallback test covers the catch block (when createContentGenerator rejects), but not the case where getResolvedModel returns undefined. Consider adding a test that mocks getResolvedModel to return undefined and verifies the main generator is used and buildAgentContentGeneratorConfig is not called.


— deepseek-v4-pro via Qwen Code /review

Comment thread packages/core/src/core/client.ts Outdated
.getModelsConfig()
.getResolvedModel(authType, model);

if (!resolvedModel) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] The !resolvedModel early return path has no logging, unlike the catch block which now has debugLogger.warn. When getResolvedModel returns undefined (model not found in registry), the fallback to the main content generator is completely silent. This makes model configuration errors invisible in production.

Suggested change
if (!resolvedModel) {
if (!resolvedModel) {
debugLogger.warn(
`Model "${model}" not found in registry for authType "${authType}", falling back to main generator.`,
);
return this.getContentGeneratorOrFail();
}

— deepseek-v4-pro via Qwen Code /review

@B-A-M-N

B-A-M-N commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

Commit: 1c53cce33a2fd53a0a8b5204ce58cf12d87751d0

I have addressed the reviewer feedback regarding per-model settings risk:

  • Refactored config.getFastModel() to getFastModelConfig(authType) to prevent cross-auth inheritance of model settings.
  • Systematically updated all call sites in core and CLI to use the new resolution logic, ensuring fast models are correctly bound to their corresponding auth scope.
  • Added diagnostics and unified resolution logic across background tasks (session recap, side queries, agent forking).

This change ensures that fast model configuration is consistent across the entire application and eliminates cross-auth contamination. Please let me know if there's anything else needed!

B-A-M-N added a commit to B-A-M-N/qwen-code that referenced this pull request May 4, 2026
- Use fast model's authType for retryWithBackoff instead of main model's authType.
  Added createRetryAuthTypeForModel() helper that resolves the target model's
  authType from the registry, so QWEN_OAUTH quota detection checks against the
  right provider.
- Add debugLogger.debug() to the !resolvedModel early return path so model
  registry misses are diagnosable instead of silent.
- Add test: model not in registry falls back to main generator without throwing.
- Add test: fast model authType is used for retry, not main model authType.
B-A-M-N added a commit to B-A-M-N/qwen-code that referenced this pull request May 4, 2026
Verifies buildAgentContentGeneratorConfig is NOT called when
getResolvedModel returns undefined (model not in registry).
Addresses PR QwenLM#3815 review round 3 feedback from @wenshao.
@B-A-M-N

B-A-M-N commented May 4, 2026

Copy link
Copy Markdown
Contributor Author

Thanks for catching that gap.

The existing test for the registry-miss fallback verified the main generator was used, but didn't assert that `buildAgentContentGeneratorConfig` was never called. Added that assertion.

All reviewer concerns across the three review rounds should now be covered:

  1. Catch block diagnostics — `debugLogger.warn` with model name and error details ✓ (commit 1951992)
  2. Import grouping — `buildAgentContentGeneratorConfig` moved to `// Models` section ✓ (commit 1951992)
  3. Success path test coverage — dedicated test with mocked `createContentGenerator` ✓ (commit 1951992)
  4. `!resolvedModel` logging — `debugLogger.debug` on registry miss ✓ (commit 087bf0e)
  5. `!resolvedModel` test coverage — assertion that `buildAgentContentGeneratorConfig` is not called on registry miss ✓ (commit 7cbfbaf)
  6. Retry authType — `createRetryAuthTypeForModel()` resolves the target model's authType for retry logic ✓ (commit 087bf0e)

Validation:

  • `npx vitest run packages/core/src/core/client.test.ts` — 68 tests passed
  • `npm run build` — 0 errors

Comment thread packages/core/src/core/client.ts Outdated
): Promise<ContentGenerator> {
try {
const authType =
this.config.getContentGeneratorConfig()?.authType ??

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] createContentGeneratorForModel 使用主模型的 authType 在模型注册表中查找目标模型(getResolvedModel(authType, model))。如果快速模型注册在不同的 auth type 下(如主模型用 QWEN_OAUTH,快模型用 Anthropic),查找会返回 undefined,静默回退到主 ContentGenerator——但请求仍携带快模型的 model ID。当前调用者均限制为同 provider 所以无实际影响,但值得面向未来加固。

Suggested change
this.config.getContentGeneratorConfig()?.authType ??
// Search across all authTypes instead of only the main config's authType:
private resolveModelAcrossProviders(modelId: string) {
const modelsConfig = this.config.getModelsConfig();
const authTypes = [
AuthType.QWEN_OAUTH, AuthType.USE_OPENAI,
AuthType.USE_VERTEX_AI, AuthType.USE_ANTHROPIC,
];
const mainAuthType = this.config.getContentGeneratorConfig()?.authType;
if (mainAuthType) {
const resolved = modelsConfig.getResolvedModel(mainAuthType as AuthType, modelId);
if (resolved) return resolved;
}
for (const at of authTypes) {
if (at === mainAuthType) continue;
const resolved = modelsConfig.getResolvedModel(at, modelId);
if (resolved) return resolved;
}
return undefined;
}

— deepseek-v4-pro via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 5fdc505. Added resolveModelAcrossAuthTypes() helper that searches across all authTypes to find the target model, handling the cross-provider case. The main authType is tried first for early exit, then remaining authTypes are iterated.

A cleaner data-layer solution (adding getModelAcrossAuthTypes to ModelRegistry) is noted for a follow-up PR.

Validation:

  • npx vitest run packages/core/src/core/client.test.ts — 68 tests passed
  • npm run build — 0 errors

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in commit 09b44d2. Changes:

  1. Cross-provider lookup — Added resolveModelAcrossAuthTypes() that searches all authTypes (QWEN_OAUTH, USE_OPENAI, USE_VERTEX_AI, USE_ANTHROPIC, USE_GEMINI). Main authType tried first for early exit. createContentGeneratorForModel and createRetryAuthTypeForModel both use it. Same-provider and cross-provider cases now handled.

  2. Caching — Added perModelGeneratorCache (Map<string, ContentGenerator>) so side queries (recap, title, tool summary) reuse the generator after first instantiation.

  3. Orphan JSDoc fixed — Moved to directly above createContentGeneratorForModel.

  4. Fallback visibility — Promoted missing-model log from debug to warn so misconfigured model settings are visible in production.

A cleaner data-layer solution (adding getModelAcrossAuthTypes to ModelRegistry) is TODO'd in code for a follow-up PR.

Validation: npx vitest run packages/core/src/core/client.test.ts — 68/68 passed. npm run build — 0 errors.

Comment thread packages/core/src/core/client.ts Outdated
AuthType.USE_OPENAI;
const resolvedModel = this.config
.getModelsConfig()
.getResolvedModel(authType, model);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Critical] 跨 provider 的 authType 不匹配导致 fast model 静默解析失败

createContentGeneratorForModel(以及 createRetryAuthTypeForModel L1180)使用主 model 的 authType 来查找目标 model:

const authType = this.config.getContentGeneratorConfig()?.authType ?? AuthType.USE_OPENAI;
const resolvedModel = this.config.getModelsConfig().getResolvedModel(authType, model);

ModelRegistry.getModel(authType, modelId) 是按 Map<AuthType, Map<string, ResolvedModelConfig>> 索引的。如果主 model 使用 QWEN_OAUTH,而 fast model 注册在 openai 下,getResolvedModel 返回 undefined,代码静默回退到主 content generator(带着错误凭证)。

测试 mock 了 getResolvedModel 使其在任何 authType 下都返回结果,掩盖了这个问题。

Suggested change
.getResolvedModel(authType, model);
// 修复方向:遍历所有已注册的 authType 来解析 model,或在 ModelRegistry 中添加无 authType 范围的方法
// 同时修复 createRetryAuthTypeForModel(L1180)中的相同问题
// 测试 mock 应强制执行 authType 匹配以反映真实行为

— deepseek-v4-pro via Qwen Code /review

Comment thread packages/core/src/core/client.ts Outdated

return await createContentGenerator(targetConfig, this.config);
} catch (err: unknown) {
debugLogger.warn(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] 回退路径无可观测性——仅有 debug 日志

catch 块仅调用 debugLogger.warn,不会输出到 stderr 或结构化遥测。生产环境中 fast model 静默降级时(包括因 Finding 1 导致的降级),仪表盘上无任何信号。

Suggested change
debugLogger.warn(
// 在 catch 和 !resolvedModel(L1196)分支中:
// 1. 调用 logApiError 写入结构化遥测
// 2. 递增 fast_model_fallback_total 计数器
// 3. 至少输出一条 stderr warning(与 retry heartbeat 风格一致)

— deepseek-v4-pro via Qwen Code /review

AuthType.USE_OPENAI)
: this.config.getContentGeneratorConfig()?.authType;

const contentGenerator = isPerModel

@wenshao wenshao May 5, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] ContentGenerator created outside retry loop — transient failures become permanent degradation

createContentGeneratorForModel executes before retryWithBackoff. If it fails for transient reasons (OAuth token refresh race, DNS timeout), it permanently falls back to the main generator with no retry opportunity.

Suggested change
const contentGenerator = isPerModel
// Wrap createContentGeneratorForModel in retry logic, or at least locally retry transient errors before degrading

— deepseek-v4-pro via Qwen Code /review

// the target model's provider, not the main session's provider. This
// ensures QWEN_OAUTH quota detection checks against the right provider.
const retryAuthType = isPerModel
? (this.createRetryAuthTypeForModel(model) ??

@wenshao wenshao May 5, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] retryAuthType mismatched with actual generator on fallback

When createRetryAuthTypeForModel successfully resolves the fast model's authType (e.g. openai), but createContentGeneratorForModel fails and falls back to the main generator (e.g. QWEN_OAUTH), retryAuthType remains openai — while the actual API call uses QWEN_OAUTH.

Suggested change
? (this.createRetryAuthTypeForModel(model) ??
// On the fallback path in createContentGeneratorForModel, recompute retryAuthType to match the generator actually used

— deepseek-v4-pro via Qwen Code /review

return resolvedModel?.authType;
}

private async createContentGeneratorForModel(

@wenshao wenshao May 5, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] ContentGenerator recreated on every side query (no caching)

Each call to createContentGeneratorForModel creates a ContentGenerator from scratch (including new OpenAI(...) HTTP client and connection pool). Session recap and tool summaries may be triggered multiple times within the same conversation turn.

Suggested change
private async createContentGeneratorForModel(
// Cache ContentGenerator keyed by model string in GeminiClient, invalidate on setModel
private perModelGenerators = new Map<string, ContentGenerator>();

— deepseek-v4-pro via Qwen Code /review

expect(buildAgentContentGeneratorConfig).not.toHaveBeenCalled();
});

it('should use fast model authType for retry, not main model authType', async () => {

@wenshao wenshao May 5, 2026

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] retryAuthType test is a false positive — actual retryAuthType never verified

The "should use fast model authType for retry" test only asserts that getResolvedModel was called, but never verifies the actual authType value passed to retryWithBackoff. retryWithBackoff isn't even mocked in this test file.

Suggested change
it('should use fast model authType for retry, not main model authType', async () => {
// Mock retryWithBackoff and assert the authType it receives:
// expect(mockRetryWithBackoff).toHaveBeenCalledWith(
// expect.any(Function),
// expect.objectContaining({ authType: 'openai' }),
// );

— deepseek-v4-pro via Qwen Code /review

@wenshao

wenshao commented May 5, 2026

Copy link
Copy Markdown
Collaborator

Review

Build + 68 tests pass. The fix is correct for the documented scenario (same provider, fast model in registry). A few concerns about scope and missing coverage.


🟡 Substantive

1. The cross-provider scenario from the issue is not actually fixed.

Issue #3765 explicitly calls out: "If the fast model lives on a different provider/base URL than the main model, side queries today would silently go to the wrong endpoint with the wrong key. It only 'works' if both models happen to share the same base URL and API key."

But the PR's lookup uses the main authType:

const authType = this.config.getContentGeneratorConfig()?.authType ?? AuthType.USE_OPENAI;
const resolvedModel = this.config.getModelsConfig().getResolvedModel(authType, model);

If the fast model is registered under a different authType in modelProviders, getResolvedModel(mainAuthType, fastModelId) returns undefined → falls through :1195-1200 → falls back to the main generator → bug not fixed for this case.

The same-provider case (the OP's setup, both on Dashscope) is fixed. The cross-provider case is not. Suggestions:

  • Add findModelAcrossAuthTypes(modelId) to ModelsConfig that iterates orderedAuthTypes.
  • Or in createContentGeneratorForModel, fall through to a cross-provider lookup when the same-authType lookup fails.

If cross-provider is intentionally out of scope, please update the PR description and issue thread so #3765 isn't closed under a broader claim than what's delivered.

2. No caching — every side query rebuilds the ContentGenerator.

createContentGeneratorForModel runs the full getResolvedModelbuildAgentContentGeneratorConfigcreateContentGenerator chain on every call, ending in new OpenAI(...). A typical session triggers fast-model side queries 3–5x (recap + title + tool summaries). Connection pooling is OK because getOrCreateSharedDispatcher is shared, but the SDK-level instantiation is wasted work.

private perModelGeneratorCache = new Map<string, ContentGenerator>();

private async createContentGeneratorForModel(model: string) {
  const cached = this.perModelGeneratorCache.get(model);
  if (cached) return cached;
  // ... existing build logic ...
  this.perModelGeneratorCache.set(model, generator);
  return generator;
}

Cache invalidation needed when fast-model settings change (/model --fast <id> or settings hot reload). Simplest is to clear on any Config change.

3. Tests verify routing, not the actual behavior change.

All 5 new tests assert "getResolvedModel was called" or "buildAgentContentGeneratorConfig was called" — none assert that the fast model's extra_body.enable_thinking: false actually appears in the outgoing request body instead of the main model's enable_thinking: true.

That's the user-visible behavior in #3765. If someone later breaks buildAgentContentGeneratorConfig's field-passthrough (e.g. the cross-provider clear path), all current tests still pass and the bug regresses silently.

Suggested: a test that mocks createContentGenerator to return a spy, asserts spy.generateContent(req) receives a req.config containing the fast model's extra_body and not the main model's.

4. 'should use fast model authType for retry, not main model authType' doesn't test what it claims.

The assertion only verifies getResolvedModel was called with the main authType. It doesn't actually exercise the retry path or assert that the retry's authType parameter is the fast model's. Either rename, or mock retryWithBackoff and inspect the authType it receives.


🟢 Smaller items

5. Orphan JSDoc.

client.ts:1159-1168 has a JSDoc block describing "Return a ContentGenerator for a specific model..." but the function directly below it is createRetryAuthTypeForModel (which has its own correct JSDoc). The orphan JSDoc was meant for createContentGeneratorForModel, which sits further down with no JSDoc. IDE hover currently shows nothing for the function it's supposed to describe. Move the block to the right function.

6. getResolvedModel is invoked 3 times per side-query call.

Once in createRetryAuthTypeForModel, once in createContentGeneratorForModel, and once internally inside buildAgentContentGeneratorConfig. All O(1) Map lookups, but conceptually it's the same lookup repeated. Could be folded — resolve once and pass the result through.

7. Fallback log is too quiet.

When createContentGenerator throws, :1216 warns at debug level and falls back to the main generator. With debug logs off (default in production), users get a successful response but their fast-model-specific settings silently don't apply — exactly the symptom #3765 was filed to fix. If modelProviders is misconfigured or envKey is missing, the user has no signal.

Consider promoting the fallback to a one-time visible warning (via eventEmitter, ChatRecording, or stderr).


Verification

  • npx vitest run packages/core/src/core/client.test.ts — 68/68 ✅

Priority

  1. Must address: pre-release: fix ci #1 — either fix cross-provider, or explicitly scope it out in PR description and the linked issue.
  2. Should fix: 如何自定义密钥文件 .env可能与其他文件冲突 #3 (the actual-behavior regression test) and TypeError in Authentication Selection Interface #5 (orphan JSDoc).
  3. Recommended: Where is the config saved? #2 caching, API Key是要设成阿里云的API Key吗? #7 visible fallback signal, Are you interested in AI Terminal? #4 test rename, OpenAI API Error: 401 Incorecct API Key provided #6 lookup fold.

Direction is right and the per-model hook lands in a sensible place. The main thing is making sure scope is explicit so the issue reporter's secondary complaint isn't silently dropped.

B-A-M-N added a commit to B-A-M-N/qwen-code that referenced this pull request May 5, 2026
Address PR QwenLM#3815 review round 3: createContentGeneratorForModel and
createRetryAuthTypeForModel now search across all authTypes when resolving
a model, preventing silent fallback when the fast model is registered
under a different authType than the main model.

Added resolveModelAcrossAuthTypes() helper in client.ts that tries
the main authType first for early exit, then iterates remaining authTypes.
A cleaner data-layer solution (adding getModelAcrossAuthTypes to
ModelRegistry) is noted for a follow-up PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
B-A-M-N added a commit to B-A-M-N/qwen-code that referenced this pull request May 5, 2026
…kup, caching, JSDoc, visible fallback

Address review feedback from wenshao (deepseek-v4-pro) on PR QwenLM#3815:

1. Cross-provider model resolution for fast model side queries
   - Added resolveModelAcrossAuthTypes() that searches across all authTypes
   - Handles case where fast model is registered under a different authType
     than the main model (e.g. main=QWEN_OAUTH, fast=USE_ANTHROPIC)
   - createContentGeneratorForModel and createRetryAuthTypeForModel now use it
   - Main authType tried first for early exit, then remaining authTypes iterated
   - TODO in code to move to ModelRegistry in a follow-up PR

2. Per-model ContentGenerator caching
   - Added perModelGeneratorCache (Map<string, ContentGenerator>)
   - Avoids rebuilding generator on every side query (recap, title, tool summary)
   - Typical session triggers 3-5 side queries; now only first instantiates

3. Fixed orphan JSDoc block
   - Moved "Return a ContentGenerator..." JSDoc from between functions
     to directly above createContentGeneratorForModel

4. Promoted fallback logging from debug to warn
   - When model not found across all authTypes, logs at warn level
   - Makes misconfigured model settings visible in production
   - Prevents silent degradation (the QwenLM#3765 symptom)

Validation:
- npx vitest run packages/core/src/core/client.test.ts — 68/68 passed
- npm run build — 0 errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round 4 review

Real progress this round — resolveModelAcrossAuthTypes lands the cross-provider fix from round 3 #1 (the must-fix), the orphan JSDoc is moved to the right function, and the fallback log is promoted from debug to warn. CI is green and 68/68 tests still pass. That's the bulk of round 3 addressed correctly.

But three things from the previous review are either undone or partially done in a way that creates a new gap, and the new ~80 lines have zero test coverage.


🔴 Cache claims invalidation it doesn't have

/**
 * Cache of per-model ContentGenerators keyed by model ID.
 * Avoids rebuilding the generator (SDK instantiation, config resolution)
 * on every side query (recap, title, tool summary).
 * Cleared on config changes that could affect model settings.   ← never happens
 */
private perModelGeneratorCache = new Map<string, ContentGenerator>();

I grep'd the whole file — nothing ever clears this Map. The closing line of the JSDoc is a promise the code does not keep, which is worse than no comment at all because future readers (and reviewers) will trust it.

Concrete failure mode: user edits modelProviders.fast.extra_body.enable_thinking from true to false mid-session. Same model name → cache hit → returns the pre-edit ContentGenerator → the new setting is silently ignored. That's literally the #3765 symptom. The cache fix to one perf problem reintroduces the original correctness bug under config edits.

Pick one:

  • Implement what the JSDoc claims — wire a hook on the config-change path (or expose clearPerModelGeneratorCache() and call it from the same place that swaps contentGeneratorConfig).
  • Drop the comment line and change scope — make the cache short-lived (e.g. cleared on resetChat() or scoped per side-query batch). Then the JSDoc should describe the actual lifetime.

Either is fine. The current state — caching + JSDoc lying about invalidation — is not.

🔴 ~80 lines of new code with 0 tests

packages/core/src/core/client.test.ts diff is empty in this commit. The new logic is non-trivial:

  1. resolveModelAcrossAuthTypes cross-authType iteration — main authType hits / main misses but secondary hits / all miss. None covered.
  2. perModelGeneratorCache hit path — second call with same model should not invoke createContentGenerator. Not covered.

Round 3 #1 was a must-fix; merging the implementation without tests means a future change to getResolvedModel ordering, the hardcoded allAuthTypes array, or the cache key strategy will silently regress. Suggested skeletons:

it('resolveModelAcrossAuthTypes: falls through to a non-main authType when main lookup misses', async () => {
  // mock getModelsConfig().getResolvedModel:
  //   (mainAuthType, 'fast-id') → undefined
  //   (USE_ANTHROPIC, 'fast-id') → { authType: USE_ANTHROPIC, ... }
  // expect createContentGeneratorForModel('fast-id') to use the USE_ANTHROPIC config
});

it('perModelGeneratorCache: second call for same model reuses generator', async () => {
  const createSpy = vi.mocked(createContentGenerator);
  await client['createContentGeneratorForModel']('fast-id'); // build
  await client['createContentGeneratorForModel']('fast-id'); // hit
  expect(createSpy).toHaveBeenCalledTimes(1);
});

🟡 Round 3 #3 (actual-behavior regression test) and #4 (test rename) untouched

These were should-fix last round and are unchanged in this commit:

  • #3 — still no test asserting the fast model's extra_body actually appears in the outgoing request, only that the right config-resolution function was invoked. The user-visible behavior in #3765 is still not pinned.
  • #4'should use fast model authType for retry, not main model authType' still asserts only that getResolvedModel was called with the main authType, not that the retry path actually receives the fast model's authType.

Carry forward, please.


🟢 Smaller items (non-blocking)

  • Hardcoded allAuthTypes array is acknowledged by the in-code TODO ("Move cross-authType resolution to ModelRegistry"). Fine as a stop-gap, but Object.values(AuthType) is a one-line change that would auto-cover any future authType added to the enum:

    for (const authType of Object.values(AuthType)) {
      if (authType === mainAuthType) continue;
      ...
    }

    Same correctness, lower maintenance footprint.

  • debugLogger.warn visibility — depends on whether the qwen-code debugLogger gates warn behind a debug flag. If yes, this is still effectively silent in production and round 3 #7 isn't really fixed; if no, it's good. Quick way to check: grep for whatever sets the underlying debug enable bit. If it gates warn, surface via eventEmitter / stderr instead.


Priority

  1. Must fix before approve: cache invalidation (🔴 above) + tests for the new code (🔴 above).
  2. Should fix: carry forward round 3 #3 + #4.
  3. Optional: Object.values(AuthType) swap, debugLogger.warn visibility check.

Net: this is genuinely close. The cross-provider fix is correct, the cache is the right idea — just needs invalidation wired up and a couple of tests on the new code, plus the two carry-overs.

B-A-M-N added a commit to B-A-M-N/qwen-code that referenced this pull request May 5, 2026
…r generator cache and fix false-positive retryAuthType test
@wenshao

wenshao commented May 5, 2026

Copy link
Copy Markdown
Collaborator

Quick FYI before you spin another round:

Lint CI is red on 7c7cdda — the new 'should cache per-model content generators' test calls client.generateContent(contents, {}, undefined, 'fast-model') with undefined for the third arg, which is typed as AbortSignal:

src/core/client.test.ts(3434,50): error TS2345: Argument of type 'undefined' is not assignable to parameter of type 'AbortSignal'.
src/core/client.test.ts(3438,50): error TS2345: ...

tsc --build fails → Test job gets skipped. Pass an AbortSignal (e.g. new AbortController().signal) and it should go green.

Also one substantive thing — I think round 4 #1 got crossed wires. The "cache claims invalidation it doesn't have" item wasn't about a race condition; it's about stale cache after settings edits: user edits modelProviders.fast.extra_body.enable_thinking mid-session → same model name → cache hit → returns the pre-edit ContentGenerator → new setting silently ignored (the original #3765 symptom).

Switching Map<string, ContentGenerator> to Map<string, Promise<ContentGenerator>> is a nice independent improvement (avoids redundant SDK instantiation under concurrent side queries, and the delete(model) on failure is a good touch), but it solves the concurrent build problem, not the stale entry problem. The JSDoc line "Cleared on config changes that could affect model settings" is still describing behavior the code doesn't have — please either wire actual invalidation, or shrink the cache scope (e.g. clear on resetChat() / config swap) and update the JSDoc to match.

Also still outstanding from round 4: a test for resolveModelAcrossAuthTypes (the cross-authType fallback path — main miss → secondary hit). Currently 0 coverage on that ~30 lines.

Happy to take another careful pass once lint is green and those two are addressed.

B-A-M-N pushed a commit to B-A-M-N/qwen-code that referenced this pull request May 5, 2026
- Fix lint error: pass AbortSignal instead of undefined in cache test
- Fix cache JSDoc: remove false claim about config-change invalidation,
  add actual cache clear in resetChat()
- Add test: resolveModelAcrossAuthTypes cross-authType fallback (main miss → secondary hit)
- Add test: per-model generator cache cleared on resetChat
@B-A-M-N B-A-M-N force-pushed the fix/fast-model-side-query-settings branch from c25c0c0 to 1d033db Compare May 5, 2026 17:13
@B-A-M-N

B-A-M-N commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

Commit: 1d033dbd2 (1d033dbd)

Thanks for the thorough round 4 review. All three items addressed:

1. Cache JSDoc lying about invalidation (🔴): Fixed. Removed the false "Cleared on config changes" line from the JSDoc. Added this.perModelGeneratorCache.clear() in resetChat() so the cache is actually cleared on session reset. JSDoc now says "Cleared on session reset (resetChat) to pick up config changes" — which matches the code.

2. ~80 lines with 0 tests (🔴): Added two new tests:

  • resolveModelAcrossAuthTypes: falls through to a non-main authType when main lookup misses — mocks main authType returning undefined, secondary authType returning the model, verifies both calls and that createContentGenerator is invoked with the cross-provider config.
  • per-model generator cache cleared on resetChat — verifies cache hit on second call, then after resetChat(), verifies the generator is recreated (cache was cleared).

3. Lint CI red: Fixed the undefined AbortSignal type error in the cache test by passing new AbortController().signal.

Validation: npx vitest run packages/core/src/core/client.test.ts — 71/71 passed. npx tsc --build — 0 errors in changed files (pre-existing eslint import/no-internal-modules errors in base branch are unrelated).

@wenshao

wenshao commented May 5, 2026

Copy link
Copy Markdown
Collaborator

Thanks — 1d033dbd2 lands all three round-4 critical items cleanly:

  • ✅ Cache invalidation: resetChat() now clears perModelGeneratorCache, JSDoc updated to describe actual behavior (no more lying line). Caveat trade-off — settings edits without resetChat() are still stale-window, but the contract is now honest, fine as-is.
  • ✅ Cross-authType test (should resolve model across authTypes when main authType misses) — covers the main miss → secondary hit path.
  • ✅ Cache invalidation test (should clear per-model generator cache on resetChat).
  • ✅ Lint TS error fixed with abortController.signal.

Only thing left: branch is currently CONFLICTING against main (CI hasn't run on 1d033dbd2 yet because of it). Please rebase onto latest main and I'll approve once CI comes back green. Round-3 carry-over #3 (direct extra_body passthrough assertion) is fine as a follow-up — the new tests cover the user-visible #3765 path indirectly, no need to block on it here.

…wenLM#3765)

When side queries (session recap, title generation, tool summary) run on the
fast model, they previously used the main model's ContentGeneratorConfig,
causing extra_body/samplingParams/reasoning settings to leak from the main
model into fast model requests.

Changes:
- client.ts: In generateContent(), when the requested model differs from the
  main model, resolve the target model's own ContentGeneratorConfig via
  buildAgentContentGeneratorConfig() and create a dedicated ContentGenerator.
  Falls back to the main content generator if the model is not in the registry
  or if creation fails (e.g. test environments).
- client.test.ts: Add 2 tests verifying per-model config resolution is
  attempted for different models and skipped when model matches main.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
B-A-M-N and others added 7 commits May 5, 2026 12:31
…ueries

Three changes from the review:

1. Test coverage for createContentGenerator success path — Added a test that
   mocks createContentGenerator to return a dedicated content generator and
   verifies it is used instead of the main content generator. Includes a
   module-level vi.mock for createContentGenerator with beforeEach override.

2. Debug logging in createContentGeneratorForModel catch block — Added
   debugLogger.warn with model name and error details so production failures
   (missing API key, unsupported authType) are diagnosable instead of silently
   falling back to the main generator.

3. Import grouping — Moved buildAgentContentGeneratorConfig import from
   under // Services to a new // Models section.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Use fast model's authType for retryWithBackoff instead of main model's authType.
  Added createRetryAuthTypeForModel() helper that resolves the target model's
  authType from the registry, so QWEN_OAUTH quota detection checks against the
  right provider.
- Add debugLogger.debug() to the !resolvedModel early return path so model
  registry misses are diagnosable instead of silent.
- Add test: model not in registry falls back to main generator without throwing.
- Add test: fast model authType is used for retry, not main model authType.
Verifies buildAgentContentGeneratorConfig is NOT called when
getResolvedModel returns undefined (model not in registry).
Addresses PR QwenLM#3815 review round 3 feedback from @wenshao.
Address PR QwenLM#3815 review round 3: createContentGeneratorForModel and
createRetryAuthTypeForModel now search across all authTypes when resolving
a model, preventing silent fallback when the fast model is registered
under a different authType than the main model.

Added resolveModelAcrossAuthTypes() helper in client.ts that tries
the main authType first for early exit, then iterates remaining authTypes.
A cleaner data-layer solution (adding getModelAcrossAuthTypes to
ModelRegistry) is noted for a follow-up PR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…kup, caching, JSDoc, visible fallback

Address review feedback from wenshao (deepseek-v4-pro) on PR QwenLM#3815:

1. Cross-provider model resolution for fast model side queries
   - Added resolveModelAcrossAuthTypes() that searches across all authTypes
   - Handles case where fast model is registered under a different authType
     than the main model (e.g. main=QWEN_OAUTH, fast=USE_ANTHROPIC)
   - createContentGeneratorForModel and createRetryAuthTypeForModel now use it
   - Main authType tried first for early exit, then remaining authTypes iterated
   - TODO in code to move to ModelRegistry in a follow-up PR

2. Per-model ContentGenerator caching
   - Added perModelGeneratorCache (Map<string, ContentGenerator>)
   - Avoids rebuilding generator on every side query (recap, title, tool summary)
   - Typical session triggers 3-5 side queries; now only first instantiates

3. Fixed orphan JSDoc block
   - Moved "Return a ContentGenerator..." JSDoc from between functions
     to directly above createContentGeneratorForModel

4. Promoted fallback logging from debug to warn
   - When model not found across all authTypes, logs at warn level
   - Makes misconfigured model settings visible in production
   - Prevents silent degradation (the QwenLM#3765 symptom)

Validation:
- npx vitest run packages/core/src/core/client.test.ts — 68/68 passed
- npm run build — 0 errors

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…r generator cache and fix false-positive retryAuthType test
- Fix lint error: pass AbortSignal instead of undefined in cache test
- Fix cache JSDoc: remove false claim about config-change invalidation,
  add actual cache clear in resetChat()
- Add test: resolveModelAcrossAuthTypes cross-authType fallback (main miss → secondary hit)
- Add test: per-model generator cache cleared on resetChat
@B-A-M-N B-A-M-N force-pushed the fix/fast-model-side-query-settings branch from 1d033db to 1a5e2eb Compare May 5, 2026 17:35
@B-A-M-N

B-A-M-N commented May 5, 2026

Copy link
Copy Markdown
Contributor Author

Rebased onto latest main (commit 1a5e2ebf0). The only change vs. 1d033dbd2 is resolving the resetChat() conflict — main added FileReadCache.clear() and our branch adds `perModelGeneratorCache.clear()); both are now present. Type-check and tests are clean (73/80 pass; the 7 failures are pre-existing on main). Ready for your approval once CI goes green.

@wenshao

wenshao commented May 5, 2026

Copy link
Copy Markdown
Collaborator

Local verification

Check Result
npm run build ✅ passes
client.test.ts (incl. the 8 new fast-model cases) on this branch ✅ 80/80
sessionTitle.test.ts, toolUseSummary.test.ts ✅ 50/50
Revert packages/core/src/core/client.ts to origin/main while keeping the new tests 7/8 fail

The revert step is the meaningful one — it confirms the new tests actually exercise the fix and aren't tautological. The single case that still passes is should use the main content generator when the requested model matches the main model, which covers the unchanged main-model path; that's expected.

Failures map cleanly to the symptoms of #3765:

  • should use a dedicated content generator for the fast model on success — fast model never got its own generator
  • should fall back to main generator when model is not in registrygetResolvedModel was never consulted
  • should use fast model authType for retry, not main model authType — retry used qwen-oauth (main) instead of openai (fast)
  • should cache per-model content generators / ... on resetChat — cache machinery absent
  • should resolve model across authTypes when main authType misses — no cross-authType resolution

Minor: PR description lists sessionRecap.test.ts under Validation, but that file doesn't exist in the repo (only sessionRecap.ts).

@wenshao wenshao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Verified locally per comment above; regression check confirms the new tests catch the bug. LGTM.

@wenshao wenshao merged commit c7facf5 into QwenLM:main May 5, 2026
13 checks passed
B-A-M-N added a commit to B-A-M-N/qwen-code that referenced this pull request May 5, 2026
- Fix lint error: pass AbortSignal instead of undefined in cache test
- Fix cache JSDoc: remove false claim about config-change invalidation,
  add actual cache clear in resetChat()
- Add test: resolveModelAcrossAuthTypes cross-authType fallback (main miss → secondary hit)
- Add test: per-model generator cache cleared on resetChat
DragonnZhang pushed a commit that referenced this pull request May 8, 2026
Fixes #3765. Side queries (session recap, title generation, tool-use summary) running on the fast model previously inherited the main model's ContentGeneratorConfig, leaking extra_body / samplingParams / reasoning between models.

GeminiClient.generateContent() now resolves the target model's own config via buildAgentContentGeneratorConfig() when the requested model differs from the main model, and creates a dedicated ContentGenerator (cached, cleared on resetChat). Cross-authType resolution lets the fast model live under a different provider than the main model. Uses the target model's authType for retry logic so provider-specific checks (e.g. QWEN_OAUTH quota detection) reference the correct provider. Falls back to the main generator if the model is not in the registry.
xaelistic pushed a commit to xaelistic/qwen-code that referenced this pull request Jun 7, 2026
…#3815)

Fixes QwenLM#3765. Side queries (session recap, title generation, tool-use summary) running on the fast model previously inherited the main model's ContentGeneratorConfig, leaking extra_body / samplingParams / reasoning between models.

GeminiClient.generateContent() now resolves the target model's own config via buildAgentContentGeneratorConfig() when the requested model differs from the main model, and creates a dedicated ContentGenerator (cached, cleared on resetChat). Cross-authType resolution lets the fast model live under a different provider than the main model. Uses the target model's authType for retry logic so provider-specific checks (e.g. QWEN_OAUTH quota detection) reference the correct provider. Falls back to the main generator if the model is not in the registry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Side queries on the fast model use the main model's per-model settings

3 participants