Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 83 additions & 0 deletions packages/core/src/core/openaiContentGenerator/converter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1812,6 +1812,89 @@ describe('OpenAIContentConverter', () => {

expect(response.candidates).toEqual([]);
});

it('maps DeepSeek prompt cache hit tokens into cached content token count', () => {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] This test covers the non-streaming path (convertOpenAIResponseToGemini), but the streaming path (convertOpenAIChunkToGemini) also calls getCachedPromptTokens and has no corresponding test for prompt_cache_hit_tokens.

DeepSeek's streaming API may include prompt_cache_hit_tokens in the final chunk's usage object. The streaming usage assembly has its own estimation fallback logic (totalTokens > 0 && promptTokens === 0 && completionTokens === 0) that interacts with cachedTokens — this interaction is untested.

Consider adding a mirror test that feeds a chunk with prompt_cache_hit_tokens through convertOpenAIChunkToGemini and asserts cachedContentTokenCount is populated.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in cf0a82c. The streaming convertOpenAIChunkToGemini path now has coverage for provider-specific prompt_cache_hit_tokens in chunk usage metadata.

const response = converter.convertOpenAIResponseToGemini(
{
object: 'chat.completion',
id: 'chatcmpl-deepseek-cache',
created: 123,
model: 'deepseek-v4-pro',
choices: [],
usage: {
prompt_tokens: 21225,
completion_tokens: 45,
total_tokens: 21270,
prompt_cache_hit_tokens: 21120,
},
} as unknown as OpenAI.Chat.ChatCompletion,
requestContext,
);

expect(response.usageMetadata).toEqual(
expect.objectContaining({
promptTokenCount: 21225,
candidatesTokenCount: 45,
totalTokenCount: 21270,
cachedContentTokenCount: 21120,
}),
);
});

it('prefers standard cached token fields over provider-specific fallbacks', () => {
const response = converter.convertOpenAIResponseToGemini(
{
object: 'chat.completion',
id: 'chatcmpl-cache-precedence',
created: 123,
model: 'deepseek-v4-pro',
choices: [],
usage: {
prompt_tokens: 400,
completion_tokens: 20,
total_tokens: 420,
prompt_tokens_details: { cached_tokens: 100 },
cached_tokens: 200,
prompt_cache_hit_tokens: 300,
},
} as unknown as OpenAI.Chat.ChatCompletion,
requestContext,
);

expect(response.usageMetadata).toEqual(
expect.objectContaining({
cachedContentTokenCount: 100,
}),
);
});

it('maps DeepSeek prompt cache hit tokens from streaming chunks', () => {
const chunk = converter.convertOpenAIChunkToGemini(
{
object: 'chat.completion.chunk',
id: 'chunk-deepseek-cache',
created: 456,
choices: [],
model: 'deepseek-v4-pro',
usage: {
prompt_tokens: 21225,
completion_tokens: 45,
total_tokens: 21270,
prompt_cache_hit_tokens: 21120,
},
} as unknown as OpenAI.Chat.ChatCompletionChunk,
withStreamParser(),
);

expect(chunk.usageMetadata).toEqual(
expect.objectContaining({
promptTokenCount: 21225,
candidatesTokenCount: 45,
totalTokenCount: 21270,
cachedContentTokenCount: 21120,
}),
);
});
});

describe('OpenAI -> Gemini reasoning content', () => {
Expand Down
29 changes: 15 additions & 14 deletions packages/core/src/core/openaiContentGenerator/converter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ const debugLogger = createDebugLogger('CONVERTER');
*/
interface ExtendedCompletionUsage extends OpenAI.CompletionUsage {
cached_tokens?: number;
prompt_cache_hit_tokens?: number;
}

export interface ExtendedChatCompletionAssistantMessageParam
Expand Down Expand Up @@ -69,6 +70,16 @@ export interface ExtendedCompletionChunkDelta
// so it preserves catch-rate without silently suppressing legitimate chunks.
const CUMULATIVE_DELTA_EXACT_REPEAT_MIN_LENGTH = 64;

function getCachedPromptTokens(usage: OpenAI.CompletionUsage): number {
const extendedUsage = usage as ExtendedCompletionUsage;
return (
usage.prompt_tokens_details?.cached_tokens ??

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] No test verifies the ?? precedence chain: prompt_tokens_details.cached_tokens > cached_tokens > prompt_cache_hit_tokens. This PR adds a third fallback branch, making the relative ordering meaningful.

No existing test sends multiple cache-token fields simultaneously. If a future refactor reorders the chain, the wrong value could silently take precedence without any test catching it.

Consider adding a test with usage: { prompt_tokens_details: { cached_tokens: 100 }, cached_tokens: 200, prompt_cache_hit_tokens: 300 } asserting cachedContentTokenCount === 100.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added in cf0a82c. The non-streaming cache test now covers the precedence chain with prompt_tokens_details.cached_tokens, top-level cached_tokens, and prompt_cache_hit_tokens, asserting the standard nested field wins.

extendedUsage.cached_tokens ??
extendedUsage.prompt_cache_hit_tokens ??
0
);
}

// Once this many bytes have been emitted without entering cumulative mode the
// stream is almost certainly a standard incremental provider. Stop growing
// emittedText beyond this point to bound per-stream memory and CPU. The true
Expand Down Expand Up @@ -1121,13 +1132,8 @@ export function convertOpenAIResponseToGemini(
const promptTokens = usage.prompt_tokens || 0;
const completionTokens = usage.completion_tokens || 0;
const totalTokens = usage.total_tokens || 0;
// Support both formats: prompt_tokens_details.cached_tokens (OpenAI standard)
// and cached_tokens (some models return it at top level)
const extendedUsage = usage as ExtendedCompletionUsage;
const cachedTokens =
usage.prompt_tokens_details?.cached_tokens ??
extendedUsage.cached_tokens ??
0;
// Support OpenAI and provider-specific cache usage fields.
const cachedTokens = getCachedPromptTokens(usage);
const thinkingTokens =
usage.completion_tokens_details?.reasoning_tokens || 0;

Expand Down Expand Up @@ -1320,13 +1326,8 @@ export function convertOpenAIChunkToGemini(
const totalTokens = usage.total_tokens || 0;
const thinkingTokens =
usage.completion_tokens_details?.reasoning_tokens || 0;
// Support both formats: prompt_tokens_details.cached_tokens (OpenAI standard)
// and cached_tokens (some models return it at top level)
const extendedUsage = usage as ExtendedCompletionUsage;
const cachedTokens =
usage.prompt_tokens_details?.cached_tokens ??
extendedUsage.cached_tokens ??
0;
// Support OpenAI and provider-specific cache usage fields.
const cachedTokens = getCachedPromptTokens(usage);

// If we only have total tokens but no breakdown, estimate the split
// Typically input is ~70% and output is ~30% for most conversations
Expand Down
162 changes: 162 additions & 0 deletions packages/core/src/core/openaiContentGenerator/pipeline.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -424,6 +424,168 @@ describe('ContentGenerationPipeline', () => {
);
});

it('sorts DeepSeek tools by function name before sending the wire request', async () => {
mockContentGeneratorConfig.baseUrl = 'https://api.deepseek.com/v1';
const request: GenerateContentParameters = {
model: 'deepseek-v4-pro',
contents: [{ parts: [{ text: 'Hello' }], role: 'user' }],
config: {
tools: [
{
functionDeclarations: [
{
name: 'placeholder',
description: 'Placeholder function',
parameters: { type: Type.OBJECT, properties: {} },
},
],
},
],
},
};

const mockMessages = [
{ role: 'user', content: 'Hello' },
] as OpenAI.Chat.ChatCompletionMessageParam[];
const mockTools = [
{ type: 'function', function: { name: 'zeta' } },
{ type: 'function', function: { name: 'alpha' } },
{ type: 'function', function: { name: 'bravo' } },
] as OpenAI.Chat.ChatCompletionTool[];

(mockConverter.convertGeminiRequestToOpenAI as Mock).mockReturnValue(
mockMessages,
);
(mockConverter.convertGeminiToolsToOpenAI as Mock).mockResolvedValue(
mockTools,
);
(mockConverter.convertOpenAIResponseToGemini as Mock).mockReturnValue(
new GenerateContentResponse(),
);
(mockClient.chat.completions.create as Mock).mockResolvedValue({
id: 'response-id',
choices: [],
} as unknown as OpenAI.Chat.ChatCompletion);

await pipeline.execute(request, 'test-prompt-id');

const apiCall = (mockClient.chat.completions.create as Mock).mock
.calls[0][0];
expect(
apiCall.tools.map(
(tool: OpenAI.Chat.ChatCompletionTool) => tool.function.name,
),
).toEqual(['alpha', 'bravo', 'zeta']);
});

it('preserves tool order for non-DeepSeek hostnames', async () => {
mockContentGeneratorConfig.baseUrl = 'https://example.test/v1';
const request: GenerateContentParameters = {
model: 'test-model',
contents: [{ parts: [{ text: 'Hello' }], role: 'user' }],
config: {
tools: [
{
functionDeclarations: [
{
name: 'placeholder',
description: 'Placeholder function',
parameters: { type: Type.OBJECT, properties: {} },
},
],
},
],
},
};

const mockMessages = [
{ role: 'user', content: 'Hello' },
] as OpenAI.Chat.ChatCompletionMessageParam[];
const mockTools = [
{ type: 'function', function: { name: 'zeta' } },
{ type: 'function', function: { name: 'alpha' } },
] as OpenAI.Chat.ChatCompletionTool[];

(mockConverter.convertGeminiRequestToOpenAI as Mock).mockReturnValue(
mockMessages,
);
(mockConverter.convertGeminiToolsToOpenAI as Mock).mockResolvedValue(
mockTools,
);
(mockConverter.convertOpenAIResponseToGemini as Mock).mockReturnValue(
new GenerateContentResponse(),
);
(mockClient.chat.completions.create as Mock).mockResolvedValue({
id: 'response-id',
choices: [],
} as unknown as OpenAI.Chat.ChatCompletion);

await pipeline.execute(request, 'test-prompt-id');

const apiCall = (mockClient.chat.completions.create as Mock).mock
.calls[0][0];
expect(
apiCall.tools.map(
(tool: OpenAI.Chat.ChatCompletionTool) => tool.function.name,
),
).toEqual(['zeta', 'alpha']);
});

it('sorts self-hosted DeepSeek tools by function name before sending the wire request', async () => {
mockContentGeneratorConfig.baseUrl = 'https://example.test/v1';
mockContentGeneratorConfig.model = 'deepseek-v4-pro';
const request: GenerateContentParameters = {
model: 'deepseek-v4-pro',
contents: [{ parts: [{ text: 'Hello' }], role: 'user' }],
config: {
tools: [
{
functionDeclarations: [
{
name: 'placeholder',
description: 'Placeholder function',
parameters: { type: Type.OBJECT, properties: {} },
},
],
},
],
},
};

const mockMessages = [
{ role: 'user', content: 'Hello' },
] as OpenAI.Chat.ChatCompletionMessageParam[];
const mockTools = [
{ type: 'function', function: { name: 'zeta' } },
{ type: 'function', function: { name: 'alpha' } },
{ type: 'function', function: { name: 'bravo' } },
] as OpenAI.Chat.ChatCompletionTool[];

(mockConverter.convertGeminiRequestToOpenAI as Mock).mockReturnValue(
mockMessages,
);
(mockConverter.convertGeminiToolsToOpenAI as Mock).mockResolvedValue(
mockTools,
);
(mockConverter.convertOpenAIResponseToGemini as Mock).mockReturnValue(
new GenerateContentResponse(),
);
(mockClient.chat.completions.create as Mock).mockResolvedValue({
id: 'response-id',
choices: [],
} as unknown as OpenAI.Chat.ChatCompletion);

await pipeline.execute(request, 'test-prompt-id');

const apiCall = (mockClient.chat.completions.create as Mock).mock
.calls[0][0];
expect(
apiCall.tools.map(
(tool: OpenAI.Chat.ChatCompletionTool) => tool.function.name,
),
).toEqual(['alpha', 'bravo', 'zeta']);
});

it('should skip empty tools array in request', async () => {
// Arrange — tools: [] should NOT be included in the API request
const request: GenerateContentParameters = {
Expand Down
32 changes: 31 additions & 1 deletion packages/core/src/core/openaiContentGenerator/pipeline.ts
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,34 @@ import {
} from '@google/genai';
import type { ContentGeneratorConfig } from '../contentGenerator.js';
import { OpenAIContentConverter } from './converter.js';
import { isDeepSeekHostname } from './provider/deepseek.js';
import { isDeepSeekHostname, isDeepSeekProvider } from './provider/deepseek.js';
import { openaiRequestCaptureContext } from './requestCaptureContext.js';
import { StreamingToolCallParser } from './streamingToolCallParser.js';
import { TaggedThinkingParser } from './taggedThinkingParser.js';
import type { PipelineConfig, RequestContext } from './types.js';
import { redactProxyError } from '../../utils/runtimeFetchOptions.js';
import { runtimeDiagnostics } from '../../utils/runtimeDiagnostics.js';

function compareStableStrings(left: string, right: string): number {
if (left < right) return -1;
if (left > right) return 1;
return 0;
}

function getToolSortKey(tool: OpenAI.Chat.ChatCompletionTool): string {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] getToolSortKey builds a composite key [name, type, JSON.stringify(tool)] but tool.function.name alone is a sufficient total order — function names are unique within a tool set (the Gemini→OpenAI converter enforces this upstream). The JSON.stringify tiebreaker is invoked O(n log n) times per request and serializes the entire tool schema (often the largest part of the request body). Simplifying to return tool.function.name; is correct, cheaper, and clearer about the contract.

Suggested change
function getToolSortKey(tool: OpenAI.Chat.ChatCompletionTool): string {
function getToolSortKey(tool: OpenAI.Chat.ChatCompletionTool): string {
return tool.function.name;
}

— qwen3.7-max via Qwen Code /review

return tool.function.name;
}

function sortToolsForCacheStableRequest(
request: OpenAI.Chat.ChatCompletionCreateParams,
): void {
if (!request.tools || request.tools.length < 2) return;

request.tools = [...request.tools].sort((left, right) =>
compareStableStrings(getToolSortKey(left), getToolSortKey(right)),
);
}

/**
* Error thrown when the API returns an error embedded as stream content
* instead of a proper HTTP error. Some providers (e.g., certain OpenAI-compatible
Expand Down Expand Up @@ -367,6 +387,16 @@ export class ContentGenerationPipeline {
}
}

// DeepSeek's KV cache is prefix-exact: a different tool order changes the
// serialized prompt prefix even when the tool set and schemas are identical.
// Gate on broad provider detection because cache-prefix stability follows
// the DeepSeek model/protocol even for self-hosted deployments. The
// narrower hostname gate above is only for DeepSeek's official V4
// `thinking` wire shape, which self-hosted servers may reject.
if (isDeepSeekProvider(this.contentGeneratorConfig)) {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Suggestion] Stale comment: the two lines above say "Sort only for official DeepSeek endpoints" but the gate here is isDeepSeekProvider, which also matches self-hosted DeepSeek deployments via model-name fallback (as confirmed by the sorts self-hosted DeepSeek tools by model name test). The comment was accurate when the gate was isDeepSeekHostname but was not updated when the gate was widened in this commit.

Suggested reword:

// DeepSeek's KV cache is prefix-exact: a different tool order changes the
// serialized prompt prefix even when the tool set and schemas are identical.
// Gate on isDeepSeekProvider so both official and self-hosted DeepSeek
// deployments benefit; non-DeepSeek providers keep caller order.

— qwen3.7-max via Qwen Code /review

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 349f7f2. Updated the comment to match the widened isDeepSeekProvider() gate: official and self-hosted DeepSeek deployments get cache-stable ordering, while non-DeepSeek providers keep caller order.

sortToolsForCacheStableRequest(providerRequest);
}

return providerRequest;
}

Expand Down
Loading