test: migrate contract suite to replay harness#109
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review infoConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughAdds contract replay testing infrastructure and CI job, introduces ToolCall/FunctionCall types and Anthropic tool extraction, extends provider model listing and responses handling (Gemini/models client, recordapi responses endpoints), and converts many provider contract tests to replay-driven, golden-file based suites. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 15
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/recordapi/main.go`:
- Around line 95-111: The routes "responses" and "responses_stream" currently
build requests unconditionally; add a provider-endpoint capability check before
constructing/sending these requests so unsupported providers return a clear
local error. In cmd/recordapi/main.go update the logic that registers/handles
"responses" and "responses_stream" (the map entries shown) to call the provider
capability check (e.g., Provider.SupportsEndpoint or
Provider.Capabilities/Has("responses")) and short-circuit with a 400/501 error
if the provider does not support "/v1/responses"; ensure both the non-stream and
stream handlers use the same validation and return a concise error message
mentioning the missing "responses" capability.
In `@internal/providers/gemini/gemini.go`:
- Around line 154-162: The models auth header function `headers` currently uses
q.Add and doesn't forward trace headers; make it idempotent by using
q.Set("key", p.apiKey) instead of q.Add so repeated calls replace the key rather
than appending, and copy the incoming "X-Request-Id" into the outgoing request
headers (matching the behavior in `setHeaders`) to preserve trace propagation;
update `headers` to read any existing X-Request-Id from the incoming context/req
and set it on req.Header before returning.
- Around line 152-153: ListModels currently constructs a fresh client via
llmclient.DefaultConfig("gemini", p.modelsURL) and only copies p.hooks, which
bypasses the provider's configured retry/circuit-breaker/resilience settings;
change the implementation to reuse the provider's already-configured llmclient
(or derive the models config from the provider's existing client/config) instead
of DefaultConfig so the retry and circuit-breaker options are preserved—e.g.,
obtain the config or client from p (referencing p.modelsURL, p.hooks, modelsCfg,
llmclient.DefaultConfig) and call ListModels through that existing client or
cloned config that carries over the resilience settings (apply same change for
the other occurrence at lines 164-169).
In `@tests/contract/anthropic_test.go`:
- Around line 27-141: Restore golden-file validation in the Anthropic contract
replay tests by replacing/augmenting the current shallow assertions in
TestAnthropicReplayChatCompletion, TestAnthropicReplayStreamChatCompletion,
TestAnthropicReplayResponses, and TestAnthropicReplayStreamResponses with full
normalized output comparisons against committed golden fixtures (use the
existing json/sse fixture routes as the golden artifacts), add the required
build tag comment (`//go:build contract` / `// +build contract`) and a 5-minute
test timeout comment at the top of the file, and ensure stream outputs are
serialized/normalized before comparing; record or update the golden files by
running make record-api after changes.
In `@tests/contract/gemini_test.go`:
- Around line 18-26: The helper newGeminiReplayProvider duplicates common setup;
extract its body into a shared test helper (e.g., newReplayProvider or
newGeminiReplayProvider) in the shared replay harness test file and have
existing provider contract tests call that shared helper. Move the creation
logic that calls newReplayHTTPClient(t, routes),
gemini.NewWithHTTPClient("test-api-key", client, llmclient.Hooks{}), and the
provider.SetBaseURL/SetModelsURL calls into the shared helper while preserving
the function signature (t *testing.T, routes map[string]replayRoute) and return
type core.Provider; update all places that currently define this helper to
import/use the shared helper and remove the duplicate local definitions. Ensure
the shared helper has the same package scope and required imports so tests
compile.
- Around line 28-155: The tests in TestGeminiReplay* only assert selected
fields; update each test (TestGeminiReplayChatCompletion,
TestGeminiReplayStreamChatCompletion, TestGeminiReplayListModels,
TestGeminiReplayResponses, TestGeminiReplayStreamResponses) to perform
normalized golden-file validation: serialize the full normalized response
output, compare against a committed golden file, and fail if different; provide
a way to re-record goldens via make record-api. Also add the required build tag
and 5-minute timeout to the top of the file (per tests/contract/**/*_test.go
guidelines) so the file uses -tags=contract and the test harness timeout. Ensure
stream tests write/normalize stream output before comparing to their respective
golden files.
In `@tests/contract/groq_test.go`:
- Around line 27-154: The tests (TestGroqReplayChatCompletion,
TestGroqReplayStreamChatCompletion, TestGroqReplayListModels,
TestGroqReplayResponses, TestGroqReplayStreamResponses) currently assert
individual fields; replace those field-level checks with full golden-file
validation: produce a normalized representation of the response/stream output
and compare it against a golden artifact (use the repo's golden helpers and
normalization used by other contract tests), add the required `// +build
contract` (or `//go:build contract`) build tag and set the test timeout to 5
minutes, and update/record the new golden files by running `make record-api` so
the tests validate complete outputs rather than only selective fields.
In `@tests/contract/openai_test.go`:
- Around line 114-116: The test currently skips when required golden fixtures
are missing (uses goldenFileExists(...) and t.Skip), which lets CI pass
silently; change the conditional to fail the test instead by replacing
t.Skip(...) with t.Fatalf(...) or t.Fatal(...) including a message that names
the missing fixture ("openai/responses.json") and instructs to record via "make
record-api"; apply the same replacement for the other occurrence that also
checks goldenFileExists in this file so contract tests fail CI when fixtures are
absent.
- Around line 27-112: Tests are only asserting selected fields; update
TestOpenAIReplayChatCompletion, TestOpenAIReplayStreamChatCompletion, and
TestOpenAIReplayListModels to perform full normalized output vs. golden-file
validation using the repository's golden helpers (e.g., produce a normalized
response and call the compare/validate golden helper) and add the required
contract build tag comment at the top of the file (//go:build contract) and
ensure the test run is documented to use a 5-minute timeout (tests are intended
to be run with -tags=contract and -timeout=5m); record any new golden artifacts
by running make record-api and commit those golden files.
- Around line 153-160: The test TestOpenAIReplayStreamResponses currently checks
for response events but not that the stream terminated; update the assertions
after parseResponsesStream to also verify the stream termination by asserting
the presence of the "[DONE]" termination marker (e.g. require.Contains(raw,
"[DONE]") or assert.True(t, hasResponsesEvent(events, "[DONE]"))), keeping the
existing checks (readAllStream, parseResponsesStream, hasResponsesEvent for
"response.created"/"response.output_text.delta"/"response.completed" and
extractResponsesStreamText) intact.
In `@tests/contract/README.md`:
- Around line 34-46: Update the tests/contract/README.md to align commands with
the contract-test policy: change the contract test run instruction to include
the build tag and five-minute timeout (use "-tags=contract" and "-timeout=5m")
and document the canonical golden-file recording command as "make record-api"
(or reference cmd/recordapi) for updating fixtures; also state that contract
tests under tests/contract/**/*_test.go must use golden file validation and be
run with those flags so local and CI workflows remain consistent.
In `@tests/contract/replay_harness_test.go`:
- Around line 115-125: The flush closure currently appends one sseEvent per
entry in dataLines, causing multi-line SSE `data:` blocks to become multiple
events; update flush (referencing flush, dataLines, events, sseEvent, and
currentName) to join all entries in dataLines with newline separators into a
single sseEvent.Data (e.g., strings.Join(dataLines, "\n")), append that single
event with Name set to currentName to events, then reset currentName and clear
dataLines so multi-line SSE payloads are emitted as one event.
In `@tests/contract/xai_test.go`:
- Around line 27-98: Add golden-file validation and contract build tag/timeout:
add the contract build tag header to this test file (//go:build contract) and
ensure tests run under the 5-minute contract timeout when executing (run with
-timeout=5m in CI or document running via make record-api). Replace the current
field-level assertions in TestXAIReplayChatCompletion,
TestXAIReplayStreamChatCompletion, and TestXAIReplayListModels with golden
comparisons that serialize/normalize the full responses (resp from
ChatCompletion/ListModels and the reconstructed stream chunks from
StreamChatCompletion) and compare them against dedicated golden files (e.g.,
testdata/golden/xai/*.golden.json); if a RECORD flag is set (or when running
make record-api) update/write those golden files. Use existing helpers like
newXAIReplayProvider, jsonFixtureRoute, sseFixtureRoute, readAllStream,
parseChatStream and extractChatStreamText to obtain the actual outputs to
serialize, and fail the test if the normalized output differs from the golden
artifact.
- Around line 101-103: Replace the t.Skip branch that hides missing golden
fixtures with a failing assertion: in tests/contract/xai_test.go (the test
function using goldenFileExists), remove the t.Skip("golden file not found -
record xai responses first") and instead call t.Fatalf with a clear message that
the golden file xai/responses.json is missing and instruct contributors to
regenerate fixtures (e.g., "missing golden file xai/responses.json; run `make
record-api` to create/update contract fixtures"). Apply the same change to the
other checks around the same test (the occurrences noted at lines ~126-128) so
contract tests fail-fast when fixtures are absent.
- Around line 140-148: The test reads the raw stream via readAllStream and
parses events but doesn't assert the stream termination marker; update
TestXAIReplayStreamResponses to assert the terminal "[DONE]" marker is present
in the raw stream (use the raw variable returned from readAllStream) — e.g., add
a require.Contains(raw, "[DONE]") or require.True(strings.HasSuffix(raw,
"[DONE]\n")) after events are parsed to ensure StreamResponses emits the final
"[DONE]" sentinel; keep existing checks using parseResponsesStream,
hasResponsesEvent, and extractResponsesStreamText.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (15)
.github/workflows/test.ymlcmd/recordapi/main.godocs/TESTING_STRATEGY.mdinternal/providers/gemini/gemini.gotests/contract/README.mdtests/contract/anthropic_test.gotests/contract/gemini_test.gotests/contract/groq_test.gotests/contract/openai_test.gotests/contract/replay_harness_test.gotests/contract/testdata/openai/responses.jsontests/contract/testdata/openai/responses_stream.txttests/contract/testdata/xai/responses.jsontests/contract/testdata/xai/responses_stream.txttests/contract/xai_test.go
internal/providers/gemini/gemini.go
Outdated
| modelsCfg := llmclient.DefaultConfig("gemini", p.modelsURL) | ||
| modelsCfg.Hooks = p.hooks |
There was a problem hiding this comment.
ListModels bypasses configured retry/circuit-breaker settings.
This code creates a fresh client from llmclient.DefaultConfig and only reuses hooks, so resilience options passed through provider construction are not applied to model-list requests.
Also applies to: 164-169
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/providers/gemini/gemini.go` around lines 152 - 153, ListModels
currently constructs a fresh client via llmclient.DefaultConfig("gemini",
p.modelsURL) and only copies p.hooks, which bypasses the provider's configured
retry/circuit-breaker/resilience settings; change the implementation to reuse
the provider's already-configured llmclient (or derive the models config from
the provider's existing client/config) instead of DefaultConfig so the retry and
circuit-breaker options are preserved—e.g., obtain the config or client from p
(referencing p.modelsURL, p.hooks, modelsCfg, llmclient.DefaultConfig) and call
ListModels through that existing client or cloned config that carries over the
resilience settings (apply same change for the other occurrence at lines
164-169).
| headers := func(req *http.Request) { | ||
| // Add API key as query parameter. | ||
| // NOTE: Passing the API key in the URL query parameter is required by Google's native Gemini API for the models endpoint. | ||
| // This may be a security concern, as the API key can be logged in server access logs, proxy logs, and browser history. | ||
| // See: https://cloud.google.com/vertex-ai/docs/generative-ai/model-parameters#api-key | ||
| q := req.URL.Query() | ||
| q.Add("key", p.apiKey) | ||
| req.URL.RawQuery = q.Encode() | ||
| } |
There was a problem hiding this comment.
Make models auth setup idempotent and keep trace propagation.
Use q.Set("key", ...) instead of q.Add(...) and forward X-Request-Id here as well, matching setHeaders behavior.
Suggested fix
headers := func(req *http.Request) {
- q := req.URL.Query()
- q.Add("key", p.apiKey)
- req.URL.RawQuery = q.Encode()
+ q := req.URL.Query()
+ q.Set("key", p.apiKey)
+ req.URL.RawQuery = q.Encode()
+
+ if requestID := core.GetRequestID(req.Context()); requestID != "" {
+ req.Header.Set("X-Request-Id", requestID)
+ }
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/providers/gemini/gemini.go` around lines 154 - 162, The models auth
header function `headers` currently uses q.Add and doesn't forward trace
headers; make it idempotent by using q.Set("key", p.apiKey) instead of q.Add so
repeated calls replace the key rather than appending, and copy the incoming
"X-Request-Id" into the outgoing request headers (matching the behavior in
`setHeaders`) to preserve trace propagation; update `headers` to read any
existing X-Request-Id from the incoming context/req and set it on req.Header
before returning.
| func TestAnthropicReplayChatCompletion(t *testing.T) { | ||
| testCases := []struct { | ||
| name string | ||
| fixturePath string | ||
| finishReason string | ||
| }{ | ||
| {name: "basic", fixturePath: "anthropic/messages.json"}, | ||
| {name: "with-params", fixturePath: "anthropic/messages_with_params.json"}, | ||
| {name: "with-tools", fixturePath: "anthropic/messages_with_tools.json", finishReason: "tool_use"}, | ||
| {name: "extended-thinking", fixturePath: "anthropic/messages_extended_thinking.json"}, | ||
| {name: "multi-turn", fixturePath: "anthropic/messages_multi_turn.json"}, | ||
| {name: "multimodal", fixturePath: "anthropic/messages_multimodal.json"}, | ||
| } | ||
|
|
||
| data := loadGoldenFileRaw(t, "anthropic/messages.json") | ||
|
|
||
| var resp AnthropicMessageResponse | ||
| err := json.Unmarshal(data, &resp) | ||
| require.NoError(t, err, "failed to unmarshal Anthropic response") | ||
|
|
||
| t.Run("Contract", func(t *testing.T) { | ||
| // Validate required fields | ||
| assert.NotEmpty(t, resp.ID, "response ID should not be empty") | ||
| assert.Equal(t, "message", resp.Type, "type should be 'message'") | ||
| assert.Equal(t, "assistant", resp.Role, "role should be 'assistant'") | ||
| assert.NotEmpty(t, resp.Model, "model should not be empty") | ||
|
|
||
| // Validate content structure | ||
| require.NotEmpty(t, resp.Content, "content should not be empty") | ||
| for i, block := range resp.Content { | ||
| assert.NotEmpty(t, block.Type, "content block %d: type should not be empty", i) | ||
| if block.Type == "text" { | ||
| assert.NotEmpty(t, block.Text, "content block %d: text should not be empty for text type", i) | ||
| for _, tc := range testCases { | ||
| t.Run(tc.name, func(t *testing.T) { | ||
| provider := newAnthropicReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodPost, "/messages"): jsonFixtureRoute(t, tc.fixturePath), | ||
| }) | ||
|
|
||
| resp, err := provider.ChatCompletion(context.Background(), &core.ChatRequest{ | ||
| Model: "claude-sonnet-4-20250514", | ||
| Messages: []core.Message{{ | ||
| Role: "user", | ||
| Content: "hello", | ||
| }}, | ||
| }) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, resp) | ||
|
|
||
| assert.NotEmpty(t, resp.ID) | ||
| assert.Equal(t, "chat.completion", resp.Object) | ||
| require.NotEmpty(t, resp.Choices) | ||
| assert.Equal(t, "assistant", resp.Choices[0].Message.Role) | ||
| assert.NotEmpty(t, resp.Choices[0].FinishReason) | ||
| if tc.finishReason != "" { | ||
| assert.Equal(t, tc.finishReason, resp.Choices[0].FinishReason) | ||
| } | ||
| } | ||
|
|
||
| // Validate stop reason | ||
| assert.NotEmpty(t, resp.StopReason, "stop_reason should not be empty") | ||
| assert.NotEmpty(t, resp.Choices[0].Message.Content) | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| // Validate usage | ||
| assert.GreaterOrEqual(t, resp.Usage.InputTokens, 0, "input_tokens should be >= 0") | ||
| assert.GreaterOrEqual(t, resp.Usage.OutputTokens, 0, "output_tokens should be >= 0") | ||
| func TestAnthropicReplayStreamChatCompletion(t *testing.T) { | ||
| provider := newAnthropicReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodPost, "/messages"): sseFixtureRoute(t, "anthropic/messages_stream.txt"), | ||
| }) | ||
|
|
||
| t.Run("IDFormat", func(t *testing.T) { | ||
| // Anthropic message IDs typically start with "msg_" | ||
| assert.Contains(t, resp.ID, "msg_", "Anthropic message ID should contain 'msg_'") | ||
| stream, err := provider.StreamChatCompletion(context.Background(), &core.ChatRequest{ | ||
| Model: "claude-sonnet-4-20250514", | ||
| Messages: []core.Message{{ | ||
| Role: "user", | ||
| Content: "stream", | ||
| }}, | ||
| }) | ||
| require.NoError(t, err) | ||
|
|
||
| t.Run("Streaming", func(t *testing.T) { | ||
| if !goldenFileExists(t, "anthropic/messages_stream.txt") { | ||
| t.Skip("golden file not found - run 'make record-api' to generate") | ||
| } | ||
|
|
||
| data := loadGoldenFileRaw(t, "anthropic/messages_stream.txt") | ||
|
|
||
| // Anthropic streaming responses use SSE format with event types | ||
| assert.Contains(t, string(data), "event:", "streaming response should contain SSE event lines") | ||
| assert.Contains(t, string(data), "data:", "streaming response should contain SSE data lines") | ||
|
|
||
| // Should contain message_start event | ||
| assert.Contains(t, string(data), "message_start", "streaming response should contain message_start event") | ||
| raw := readAllStream(t, stream) | ||
| chunks, done := parseChatStream(t, raw) | ||
|
|
||
| // Should contain message_stop event | ||
| assert.Contains(t, string(data), "message_stop", "streaming response should contain message_stop event") | ||
| }) | ||
| require.True(t, done, "stream should terminate with [DONE]") | ||
| require.NotEmpty(t, chunks) | ||
| assert.NotEmpty(t, extractChatStreamText(chunks)) | ||
| } | ||
|
|
||
| func TestAnthropic_MessagesWithParams(t *testing.T) { | ||
| if !goldenFileExists(t, "anthropic/messages_with_params.json") { | ||
| t.Skip("golden file not found - run 'make record-api' to generate") | ||
| } | ||
|
|
||
| data := loadGoldenFileRaw(t, "anthropic/messages_with_params.json") | ||
|
|
||
| var resp AnthropicMessageResponse | ||
| err := json.Unmarshal(data, &resp) | ||
| require.NoError(t, err, "failed to unmarshal Anthropic response") | ||
|
|
||
| t.Run("Contract", func(t *testing.T) { | ||
| assert.NotEmpty(t, resp.ID, "response ID should not be empty") | ||
| assert.Equal(t, "message", resp.Type, "type should be 'message'") | ||
| assert.Equal(t, "assistant", resp.Role, "role should be 'assistant'") | ||
| assert.NotEmpty(t, resp.Model, "model should not be empty") | ||
| func TestAnthropicReplayResponses(t *testing.T) { | ||
| provider := newAnthropicReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodPost, "/messages"): jsonFixtureRoute(t, "anthropic/messages.json"), | ||
| }) | ||
| } | ||
|
|
||
| // AnthropicToolUseResponse represents an Anthropic response with tool use. | ||
| type AnthropicToolUseResponse struct { | ||
| ID string `json:"id"` | ||
| Type string `json:"type"` | ||
| Role string `json:"role"` | ||
| Content []AnthropicToolContentBlock `json:"content"` | ||
| Model string `json:"model"` | ||
| StopReason string `json:"stop_reason"` | ||
| StopSequence *string `json:"stop_sequence"` | ||
| Usage AnthropicUsage `json:"usage"` | ||
| } | ||
|
|
||
| // AnthropicToolContentBlock represents a content block that may be text or tool_use. | ||
| type AnthropicToolContentBlock struct { | ||
| Type string `json:"type"` | ||
| Text string `json:"text,omitempty"` | ||
| ID string `json:"id,omitempty"` | ||
| Name string `json:"name,omitempty"` | ||
| Input json.RawMessage `json:"input,omitempty"` | ||
| } | ||
|
|
||
| func TestAnthropic_MessagesWithTools(t *testing.T) { | ||
| if !goldenFileExists(t, "anthropic/messages_with_tools.json") { | ||
| t.Skip("golden file not found - run 'make record-api' to generate") | ||
| } | ||
|
|
||
| data := loadGoldenFileRaw(t, "anthropic/messages_with_tools.json") | ||
|
|
||
| var resp AnthropicToolUseResponse | ||
| err := json.Unmarshal(data, &resp) | ||
| require.NoError(t, err, "failed to unmarshal Anthropic response") | ||
|
|
||
| t.Run("Contract", func(t *testing.T) { | ||
| assert.NotEmpty(t, resp.ID, "response ID should not be empty") | ||
| assert.Equal(t, "message", resp.Type, "type should be 'message'") | ||
| assert.Equal(t, "assistant", resp.Role, "role should be 'assistant'") | ||
| assert.NotEmpty(t, resp.Model, "model should not be empty") | ||
|
|
||
| // Check for tool_use content blocks | ||
| hasToolUse := false | ||
| for _, block := range resp.Content { | ||
| if block.Type == "tool_use" { | ||
| hasToolUse = true | ||
| assert.NotEmpty(t, block.ID, "tool_use block should have ID") | ||
| assert.NotEmpty(t, block.Name, "tool_use block should have name") | ||
| } | ||
| } | ||
| if hasToolUse { | ||
| assert.Equal(t, "tool_use", resp.StopReason, "stop_reason should be 'tool_use' when tools are called") | ||
| } | ||
| resp, err := provider.Responses(context.Background(), &core.ResponsesRequest{ | ||
| Model: "claude-sonnet-4-20250514", | ||
| Input: "hello", | ||
| }) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, resp) | ||
|
|
||
| assert.Equal(t, "response", resp.Object) | ||
| assert.Equal(t, "completed", resp.Status) | ||
| require.NotEmpty(t, resp.Output) | ||
| require.NotEmpty(t, resp.Output[0].Content) | ||
| assert.NotEmpty(t, resp.Output[0].Content[0].Text) | ||
| require.NotNil(t, resp.Usage) | ||
| assert.GreaterOrEqual(t, resp.Usage.TotalTokens, 0) | ||
| } | ||
|
|
||
| // AnthropicThinkingResponse represents an Anthropic response with extended thinking. | ||
| type AnthropicThinkingResponse struct { | ||
| ID string `json:"id"` | ||
| Type string `json:"type"` | ||
| Role string `json:"role"` | ||
| Content []AnthropicThinkingContentBlock `json:"content"` | ||
| Model string `json:"model"` | ||
| StopReason string `json:"stop_reason"` | ||
| StopSequence *string `json:"stop_sequence"` | ||
| Usage AnthropicThinkingUsage `json:"usage"` | ||
| } | ||
|
|
||
| // AnthropicThinkingContentBlock represents a content block that may be text or thinking. | ||
| type AnthropicThinkingContentBlock struct { | ||
| Type string `json:"type"` | ||
| Text string `json:"text,omitempty"` | ||
| Thinking string `json:"thinking,omitempty"` | ||
| } | ||
|
|
||
| // AnthropicThinkingUsage represents usage with cache tokens for extended thinking. | ||
| type AnthropicThinkingUsage struct { | ||
| InputTokens int `json:"input_tokens"` | ||
| OutputTokens int `json:"output_tokens"` | ||
| } | ||
|
|
||
| func TestAnthropic_ExtendedThinking(t *testing.T) { | ||
| if !goldenFileExists(t, "anthropic/messages_extended_thinking.json") { | ||
| t.Skip("golden file not found - run 'make record-api' to generate") | ||
| } | ||
|
|
||
| data := loadGoldenFileRaw(t, "anthropic/messages_extended_thinking.json") | ||
|
|
||
| var resp AnthropicThinkingResponse | ||
| err := json.Unmarshal(data, &resp) | ||
| require.NoError(t, err, "failed to unmarshal Anthropic response") | ||
|
|
||
| t.Run("Contract", func(t *testing.T) { | ||
| assert.NotEmpty(t, resp.ID, "response ID should not be empty") | ||
| assert.Equal(t, "message", resp.Type, "type should be 'message'") | ||
| assert.Equal(t, "assistant", resp.Role, "role should be 'assistant'") | ||
| assert.NotEmpty(t, resp.Model, "model should not be empty") | ||
|
|
||
| // Extended thinking responses should have thinking content block | ||
| hasThinking := false | ||
| for _, block := range resp.Content { | ||
| if block.Type == "thinking" { | ||
| hasThinking = true | ||
| assert.NotEmpty(t, block.Thinking, "thinking block should have content") | ||
| } | ||
| } | ||
| // Note: thinking may not always be present depending on the model and request | ||
| _ = hasThinking | ||
| func TestAnthropicReplayStreamResponses(t *testing.T) { | ||
| provider := newAnthropicReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodPost, "/messages"): sseFixtureRoute(t, "anthropic/messages_stream.txt"), | ||
| }) | ||
| } | ||
|
|
||
| func TestAnthropic_MessagesMultiTurn(t *testing.T) { | ||
| if !goldenFileExists(t, "anthropic/messages_multi_turn.json") { | ||
| t.Skip("golden file not found - run 'make record-api' to generate") | ||
| } | ||
|
|
||
| data := loadGoldenFileRaw(t, "anthropic/messages_multi_turn.json") | ||
|
|
||
| var resp AnthropicMessageResponse | ||
| err := json.Unmarshal(data, &resp) | ||
| require.NoError(t, err, "failed to unmarshal Anthropic response") | ||
|
|
||
| t.Run("Contract", func(t *testing.T) { | ||
| assert.NotEmpty(t, resp.ID, "response ID should not be empty") | ||
| assert.Equal(t, "message", resp.Type, "type should be 'message'") | ||
| assert.Equal(t, "assistant", resp.Role, "role should be 'assistant'") | ||
| require.NotEmpty(t, resp.Content, "content should not be empty") | ||
| stream, err := provider.StreamResponses(context.Background(), &core.ResponsesRequest{ | ||
| Model: "claude-sonnet-4-20250514", | ||
| Input: "stream", | ||
| }) | ||
| } | ||
|
|
||
| func TestAnthropic_MessagesMultimodal(t *testing.T) { | ||
| if !goldenFileExists(t, "anthropic/messages_multimodal.json") { | ||
| t.Skip("golden file not found - run 'make record-api' to generate") | ||
| require.NoError(t, err) | ||
|
|
||
| raw := readAllStream(t, stream) | ||
| events := parseResponsesStream(t, raw) | ||
| require.NotEmpty(t, events) | ||
|
|
||
| assert.True(t, hasResponsesEvent(events, "response.created")) | ||
| assert.True(t, hasResponsesEvent(events, "response.output_text.delta")) | ||
| assert.True(t, hasResponsesEvent(events, "response.completed")) | ||
| assert.NotEmpty(t, extractResponsesStreamText(events)) | ||
|
|
||
| hasDone := false | ||
| for _, event := range events { | ||
| if event.Done { | ||
| hasDone = true | ||
| break | ||
| } | ||
| } | ||
|
|
||
| data := loadGoldenFileRaw(t, "anthropic/messages_multimodal.json") | ||
|
|
||
| var resp AnthropicMessageResponse | ||
| err := json.Unmarshal(data, &resp) | ||
| require.NoError(t, err, "failed to unmarshal Anthropic response") | ||
|
|
||
| t.Run("Contract", func(t *testing.T) { | ||
| assert.NotEmpty(t, resp.ID, "response ID should not be empty") | ||
| assert.Equal(t, "message", resp.Type, "type should be 'message'") | ||
| assert.Equal(t, "assistant", resp.Role, "role should be 'assistant'") | ||
| require.NotEmpty(t, resp.Content, "content should not be empty") | ||
| }) | ||
| assert.True(t, hasDone, "responses stream should terminate with [DONE]") | ||
| } |
There was a problem hiding this comment.
Restore golden-file validation in Anthropic contract replay tests.
These tests assert only selected fields and do not validate normalized outputs against committed golden artifacts, which weakens contract regression detection.
As per coding guidelines tests/contract/**/*_test.go: Contract tests must use golden file validation and include -tags=contract build tag with 5-minute timeout. Record new golden files using make record-api.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/anthropic_test.go` around lines 27 - 141, Restore golden-file
validation in the Anthropic contract replay tests by replacing/augmenting the
current shallow assertions in TestAnthropicReplayChatCompletion,
TestAnthropicReplayStreamChatCompletion, TestAnthropicReplayResponses, and
TestAnthropicReplayStreamResponses with full normalized output comparisons
against committed golden fixtures (use the existing json/sse fixture routes as
the golden artifacts), add the required build tag comment (`//go:build contract`
/ `// +build contract`) and a 5-minute test timeout comment at the top of the
file, and ensure stream outputs are serialized/normalized before comparing;
record or update the golden files by running make record-api after changes.
tests/contract/gemini_test.go
Outdated
| func newGeminiReplayProvider(t *testing.T, routes map[string]replayRoute) core.Provider { | ||
| t.Helper() | ||
|
|
||
| t.Run("ModelPrefix", func(t *testing.T) { | ||
| // Gemini models typically contain "gemini" | ||
| assert.Contains(t, resp.Model, "gemini", "Gemini model should contain 'gemini'") | ||
| }) | ||
| client := newReplayHTTPClient(t, routes) | ||
| provider := gemini.NewWithHTTPClient("test-api-key", client, llmclient.Hooks{}) | ||
| provider.SetBaseURL("https://replay.local") | ||
| provider.SetModelsURL("https://replay.local") | ||
| return provider | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Extract common replay-provider setup to shared test helper.
newGeminiReplayProvider duplicates the same setup pattern used across provider contract files; centralizing this in tests/contract/replay_harness_test.go would reduce drift and simplify future changes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/gemini_test.go` around lines 18 - 26, The helper
newGeminiReplayProvider duplicates common setup; extract its body into a shared
test helper (e.g., newReplayProvider or newGeminiReplayProvider) in the shared
replay harness test file and have existing provider contract tests call that
shared helper. Move the creation logic that calls newReplayHTTPClient(t,
routes), gemini.NewWithHTTPClient("test-api-key", client, llmclient.Hooks{}),
and the provider.SetBaseURL/SetModelsURL calls into the shared helper while
preserving the function signature (t *testing.T, routes map[string]replayRoute)
and return type core.Provider; update all places that currently define this
helper to import/use the shared helper and remove the duplicate local
definitions. Ensure the shared helper has the same package scope and required
imports so tests compile.
| func TestGeminiReplayChatCompletion(t *testing.T) { | ||
| testCases := []struct { | ||
| name string | ||
| fixturePath string | ||
| expectContent bool | ||
| }{ | ||
| {name: "basic", fixturePath: "gemini/chat_completion.json", expectContent: true}, | ||
| {name: "params", fixturePath: "gemini/chat_with_params.json", expectContent: true}, | ||
| {name: "tools", fixturePath: "gemini/chat_with_tools.json", expectContent: false}, | ||
| } | ||
|
|
||
| resp := loadGoldenFile[core.ModelsResponse](t, "gemini/models.json") | ||
|
|
||
| // Validate required fields | ||
| assert.Equal(t, "list", resp.Object, "object should be 'list'") | ||
| assert.NotEmpty(t, resp.Data, "models list should not be empty") | ||
|
|
||
| // Validate each model structure | ||
| for i, model := range resp.Data { | ||
| assert.NotEmpty(t, model.ID, "model %d: ID should not be empty", i) | ||
| assert.Equal(t, "model", model.Object, "model %d: object should be 'model'", i) | ||
| } | ||
|
|
||
| // Check for some expected models (Gemini variants) | ||
| modelIDs := make(map[string]bool) | ||
| for _, model := range resp.Data { | ||
| modelIDs[model.ID] = true | ||
| } | ||
|
|
||
| // At least one Gemini model should exist | ||
| hasGemini := false | ||
| for id := range modelIDs { | ||
| if hasPrefixIgnoreModels(id, "gemini") { | ||
| hasGemini = true | ||
| break | ||
| } | ||
| for _, tc := range testCases { | ||
| t.Run(tc.name, func(t *testing.T) { | ||
| provider := newGeminiReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodPost, "/chat/completions"): jsonFixtureRoute(t, tc.fixturePath), | ||
| }) | ||
|
|
||
| resp, err := provider.ChatCompletion(context.Background(), &core.ChatRequest{ | ||
| Model: "gemini-2.5-flash", | ||
| Messages: []core.Message{{ | ||
| Role: "user", | ||
| Content: "hello", | ||
| }}, | ||
| }) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, resp) | ||
|
|
||
| assert.NotEmpty(t, resp.ID) | ||
| assert.Equal(t, "chat.completion", resp.Object) | ||
| require.NotEmpty(t, resp.Choices) | ||
| assert.Equal(t, "assistant", resp.Choices[0].Message.Role) | ||
| assert.NotEmpty(t, resp.Choices[0].FinishReason) | ||
| if tc.expectContent { | ||
| assert.NotEmpty(t, resp.Choices[0].Message.Content) | ||
| } | ||
| }) | ||
| } | ||
| assert.True(t, hasGemini, "expected at least one Gemini model in models list") | ||
| } | ||
|
|
||
| func TestGemini_StreamingFormat_Contract(t *testing.T) { | ||
| if !goldenFileExists(t, "gemini/chat_completion_stream.txt") { | ||
| t.Skip("golden file not found - run 'make record-api' to generate") | ||
| } | ||
| func TestGeminiReplayStreamChatCompletion(t *testing.T) { | ||
| provider := newGeminiReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodPost, "/chat/completions"): sseFixtureRoute(t, "gemini/chat_completion_stream.txt"), | ||
| }) | ||
|
|
||
| data := loadGoldenFileRaw(t, "gemini/chat_completion_stream.txt") | ||
| stream, err := provider.StreamChatCompletion(context.Background(), &core.ChatRequest{ | ||
| Model: "gemini-2.5-flash", | ||
| Messages: []core.Message{{ | ||
| Role: "user", | ||
| Content: "stream", | ||
| }}, | ||
| }) | ||
| require.NoError(t, err) | ||
|
|
||
| // Streaming responses should be in SSE format | ||
| assert.Contains(t, string(data), "data:", "streaming response should contain SSE data lines") | ||
| raw := readAllStream(t, stream) | ||
| chunks, done := parseChatStream(t, raw) | ||
|
|
||
| // Should end with [DONE] | ||
| assert.Contains(t, string(data), "[DONE]", "streaming response should end with [DONE]") | ||
| require.True(t, done, "stream should terminate with [DONE]") | ||
| require.NotEmpty(t, chunks) | ||
| assert.NotEmpty(t, extractChatStreamText(chunks)) | ||
| } | ||
|
|
||
| func TestGemini_ChatWithParams(t *testing.T) { | ||
| if !goldenFileExists(t, "gemini/chat_with_params.json") { | ||
| t.Skip("golden file not found - run 'make record-api' to generate") | ||
| func TestGeminiReplayListModels(t *testing.T) { | ||
| provider := newGeminiReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodGet, "/models?key=test-api-key"): jsonFixtureRoute(t, "gemini/models.json"), | ||
| }) | ||
|
|
||
| resp, err := provider.ListModels(context.Background()) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, resp) | ||
|
|
||
| assert.Equal(t, "list", resp.Object) | ||
| require.NotEmpty(t, resp.Data) | ||
| for _, model := range resp.Data { | ||
| assert.NotEmpty(t, model.ID) | ||
| assert.Equal(t, "model", model.Object) | ||
| } | ||
| } | ||
|
|
||
| resp := loadGoldenFile[core.ChatResponse](t, "gemini/chat_with_params.json") | ||
| func TestGeminiReplayResponses(t *testing.T) { | ||
| provider := newGeminiReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodPost, "/chat/completions"): jsonFixtureRoute(t, "gemini/chat_completion.json"), | ||
| }) | ||
|
|
||
| t.Run("Contract", func(t *testing.T) { | ||
| assert.NotEmpty(t, resp.ID, "response ID should not be empty") | ||
| assert.Equal(t, "chat.completion", resp.Object, "object should be chat.completion") | ||
| require.NotEmpty(t, resp.Choices, "choices should not be empty") | ||
| assert.Equal(t, "assistant", resp.Choices[0].Message.Role, "message role should be assistant") | ||
| resp, err := provider.Responses(context.Background(), &core.ResponsesRequest{ | ||
| Model: "gemini-2.5-flash", | ||
| Input: "hello", | ||
| }) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, resp) | ||
|
|
||
| assert.Equal(t, "response", resp.Object) | ||
| assert.Equal(t, "completed", resp.Status) | ||
| require.NotEmpty(t, resp.Output) | ||
| require.NotEmpty(t, resp.Output[0].Content) | ||
| assert.NotEmpty(t, resp.Output[0].Content[0].Text) | ||
| require.NotNil(t, resp.Usage) | ||
| assert.GreaterOrEqual(t, resp.Usage.TotalTokens, 0) | ||
| } | ||
|
|
||
| func TestGemini_ChatWithTools(t *testing.T) { | ||
| if !goldenFileExists(t, "gemini/chat_with_tools.json") { | ||
| t.Skip("golden file not found - run 'make record-api' to generate") | ||
| } | ||
| func TestGeminiReplayStreamResponses(t *testing.T) { | ||
| provider := newGeminiReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodPost, "/chat/completions"): sseFixtureRoute(t, "gemini/chat_completion_stream.txt"), | ||
| }) | ||
|
|
||
| resp := loadGoldenFile[ToolCallResponse](t, "gemini/chat_with_tools.json") | ||
| stream, err := provider.StreamResponses(context.Background(), &core.ResponsesRequest{ | ||
| Model: "gemini-2.5-flash", | ||
| Input: "stream", | ||
| }) | ||
| require.NoError(t, err) | ||
|
|
||
| t.Run("Contract", func(t *testing.T) { | ||
| assert.NotEmpty(t, resp.ID, "response ID should not be empty") | ||
| assert.Equal(t, "chat.completion", resp.Object, "object should be chat.completion") | ||
| require.NotEmpty(t, resp.Choices, "choices should not be empty") | ||
| raw := readAllStream(t, stream) | ||
| events := parseResponsesStream(t, raw) | ||
| require.NotEmpty(t, events) | ||
|
|
||
| choice := resp.Choices[0] | ||
| // Tool calls may be present if the model chose to use tools | ||
| if len(choice.Message.ToolCalls) > 0 { | ||
| for i, tc := range choice.Message.ToolCalls { | ||
| assert.NotEmpty(t, tc.ID, "tool call %d: ID should not be empty", i) | ||
| assert.Equal(t, "function", tc.Type, "tool call %d: type should be 'function'", i) | ||
| assert.NotEmpty(t, tc.Function.Name, "tool call %d: function name should not be empty", i) | ||
| } | ||
| } | ||
| }) | ||
| } | ||
| assert.True(t, hasResponsesEvent(events, "response.created")) | ||
| assert.True(t, hasResponsesEvent(events, "response.output_text.delta")) | ||
| assert.True(t, hasResponsesEvent(events, "response.completed")) | ||
| assert.NotEmpty(t, extractResponsesStreamText(events)) | ||
|
|
||
| // hasPrefixIgnoreModels checks if a string has a given prefix, after stripping "models/" prefix | ||
| func hasPrefixIgnoreModels(s, prefix string) bool { | ||
| s = strings.TrimPrefix(s, "models/") | ||
| return strings.HasPrefix(s, prefix) | ||
| hasDone := false | ||
| for _, event := range events { | ||
| if event.Done { | ||
| hasDone = true | ||
| break | ||
| } | ||
| } | ||
| assert.True(t, hasDone, "responses stream should terminate with [DONE]") | ||
| } |
There was a problem hiding this comment.
Gemini contract replay tests should include golden-file validation.
The suite currently checks selected fields only and does not compare normalized outputs against committed golden artifacts.
As per coding guidelines tests/contract/**/*_test.go: Contract tests must use golden file validation and include -tags=contract build tag with 5-minute timeout. Record new golden files using make record-api.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/gemini_test.go` around lines 28 - 155, The tests in
TestGeminiReplay* only assert selected fields; update each test
(TestGeminiReplayChatCompletion, TestGeminiReplayStreamChatCompletion,
TestGeminiReplayListModels, TestGeminiReplayResponses,
TestGeminiReplayStreamResponses) to perform normalized golden-file validation:
serialize the full normalized response output, compare against a committed
golden file, and fail if different; provide a way to re-record goldens via make
record-api. Also add the required build tag and 5-minute timeout to the top of
the file (per tests/contract/**/*_test.go guidelines) so the file uses
-tags=contract and the test harness timeout. Ensure stream tests write/normalize
stream output before comparing to their respective golden files.
tests/contract/README.md
Outdated
| ## Running | ||
|
|
||
| ```bash | ||
| # Basic chat completion | ||
| curl https://api.groq.com/openai/v1/chat/completions \ | ||
| -H "Authorization: Bearer $GROQ_API_KEY" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"model":"llama-3.3-70b-versatile","messages":[{"role":"user","content":"Say Hello World"}],"max_tokens":50}' | ||
|
|
||
| # Streaming | ||
| curl https://api.groq.com/openai/v1/chat/completions \ | ||
| -H "Authorization: Bearer $GROQ_API_KEY" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{"model":"llama-3.3-70b-versatile","messages":[{"role":"user","content":"Say Hello World"}],"max_tokens":50,"stream":true}' | ||
|
|
||
| # Models list | ||
| curl https://api.groq.com/openai/v1/models \ | ||
| -H "Authorization: Bearer $GROQ_API_KEY" | ||
|
|
||
| # With parameters | ||
| curl https://api.groq.com/openai/v1/chat/completions \ | ||
| -H "Authorization: Bearer $GROQ_API_KEY" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "model":"llama-3.3-70b-versatile", | ||
| "messages":[ | ||
| {"role":"system","content":"You are concise."}, | ||
| {"role":"user","content":"Name 3 planets"} | ||
| ], | ||
| "temperature":0.5, | ||
| "max_tokens":50 | ||
| }' | ||
| # Run contract replay tests | ||
| go test -v -tags=contract ./tests/contract/... | ||
|
|
||
| # Function calling | ||
| curl https://api.groq.com/openai/v1/chat/completions \ | ||
| -H "Authorization: Bearer $GROQ_API_KEY" \ | ||
| -H "Content-Type: application/json" \ | ||
| -d '{ | ||
| "model":"llama-3.3-70b-versatile", | ||
| "messages":[{"role":"user","content":"What is the weather in London?"}], | ||
| "tools":[{"type":"function","function":{"name":"get_weather","parameters":{"type":"object","properties":{"location":{"type":"string"}}}}}], | ||
| "max_tokens":200 | ||
| }' | ||
| # Make target | ||
| make test-contract | ||
| ``` | ||
|
|
||
| ## Models Used | ||
|
|
||
| | Provider | Model | Notes | | ||
| | --------- | -------------------------------- | ------------------------------------ | | ||
| | OpenAI | gpt-4o-mini | Standard chat model | | ||
| | OpenAI | o3-mini | Reasoning model with thinking tokens | | ||
| | Anthropic | claude-sonnet-4-20250514 | Latest Claude model | | ||
| | Gemini | gemini-2.5-flash-preview-09-2025 | Preview flash model | | ||
| | xAI | grok-3-mini | Latest Grok model | | ||
| | Groq | llama-3.3-70b-versatile | Fast inference Llama model | | ||
|
|
||
| ## Test Coverage | ||
| ## Updating fixtures | ||
|
|
||
| | Feature | OpenAI | Anthropic | Gemini | xAI | Groq | | ||
| | -------------- | ------ | --------- | ------ | --- | ---- | | ||
| | Basic chat | ✅ | ✅ | ✅ | ✅ | ✅ | | ||
| | Streaming | ✅ | ✅ | ✅ | ✅ | ✅ | | ||
| | Models list | ✅ | - | ✅ | ✅ | ✅ | | ||
| | Tool calling | ✅ | ✅ | ✅ | - | ✅ | | ||
| | System prompt | ✅ | ✅ | ✅ | ✅ | ✅ | | ||
| | Temperature | ✅ | ✅ | ✅ | ✅ | ✅ | | ||
| | Stop sequences | ✅ | ✅ | - | - | - | | ||
| | Multi-turn | ✅ | ✅ | - | - | - | | ||
| | Multimodal | ✅ | ✅ | - | - | - | | ||
| | JSON mode | ✅ | - | - | - | - | | ||
| | Reasoning | ✅ | ✅ | - | - | - | | ||
| Use `cmd/recordapi` (or provider curl calls) to refresh payloads when provider contracts change, then re-run the contract suite. |
There was a problem hiding this comment.
Align README commands with the contract-test policy.
The run/update instructions should include the required timeout and the canonical fixture-recording command to keep local and CI workflows consistent.
Suggested doc patch
# Run contract replay tests
-go test -v -tags=contract ./tests/contract/...
+go test -v -tags=contract -timeout=5m ./tests/contract/...
# Make target
make test-contract
@@
-Use `cmd/recordapi` (or provider curl calls) to refresh payloads when provider contracts change, then re-run the contract suite.
+Use `make record-api` (or `cmd/recordapi`) to refresh payloads when provider contracts change, then re-run the contract suite.As per coding guidelines tests/contract/**/*_test.go: Contract tests must use golden file validation and include -tags=contract build tag with 5-minute timeout. Record new golden files using make record-api.
Based on learnings: Applies to tests/contract/README.md — update tests/contract/README.md when modifying contract testing procedures or golden file validation processes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/README.md` around lines 34 - 46, Update the
tests/contract/README.md to align commands with the contract-test policy: change
the contract test run instruction to include the build tag and five-minute
timeout (use "-tags=contract" and "-timeout=5m") and document the canonical
golden-file recording command as "make record-api" (or reference cmd/recordapi)
for updating fixtures; also state that contract tests under
tests/contract/**/*_test.go must use golden file validation and be run with
those flags so local and CI workflows remain consistent.
| flush := func() { | ||
| if len(dataLines) == 0 { | ||
| currentName = "" | ||
| return | ||
| } | ||
| for _, data := range dataLines { | ||
| events = append(events, sseEvent{ | ||
| Name: currentName, | ||
| Data: data, | ||
| }) | ||
| } |
There was a problem hiding this comment.
Assemble multi-line SSE data: blocks into one event payload.
Current flush logic emits one event per data: line; multi-line SSE events should be emitted as a single event with newline-joined data.
Suggested fix
flush := func() {
if len(dataLines) == 0 {
currentName = ""
return
}
- for _, data := range dataLines {
- events = append(events, sseEvent{
- Name: currentName,
- Data: data,
- })
- }
+ events = append(events, sseEvent{
+ Name: currentName,
+ Data: strings.Join(dataLines, "\n"),
+ })
currentName = ""
dataLines = dataLines[:0]
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| flush := func() { | |
| if len(dataLines) == 0 { | |
| currentName = "" | |
| return | |
| } | |
| for _, data := range dataLines { | |
| events = append(events, sseEvent{ | |
| Name: currentName, | |
| Data: data, | |
| }) | |
| } | |
| flush := func() { | |
| if len(dataLines) == 0 { | |
| currentName = "" | |
| return | |
| } | |
| events = append(events, sseEvent{ | |
| Name: currentName, | |
| Data: strings.Join(dataLines, "\n"), | |
| }) | |
| currentName = "" | |
| dataLines = dataLines[:0] | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/replay_harness_test.go` around lines 115 - 125, The flush
closure currently appends one sseEvent per entry in dataLines, causing
multi-line SSE `data:` blocks to become multiple events; update flush
(referencing flush, dataLines, events, sseEvent, and currentName) to join all
entries in dataLines with newline separators into a single sseEvent.Data (e.g.,
strings.Join(dataLines, "\n")), append that single event with Name set to
currentName to events, then reset currentName and clear dataLines so multi-line
SSE payloads are emitted as one event.
| func TestXAIReplayChatCompletion(t *testing.T) { | ||
| testCases := []struct { | ||
| name string | ||
| fixturePath string | ||
| }{ | ||
| {name: "basic", fixturePath: "xai/chat_completion.json"}, | ||
| {name: "params", fixturePath: "xai/chat_with_params.json"}, | ||
| } | ||
|
|
||
| for _, tc := range testCases { | ||
| t.Run(tc.name, func(t *testing.T) { | ||
| provider := newXAIReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodPost, "/chat/completions"): jsonFixtureRoute(t, tc.fixturePath), | ||
| }) | ||
|
|
||
| resp, err := provider.ChatCompletion(context.Background(), &core.ChatRequest{ | ||
| Model: "grok-3-mini", | ||
| Messages: []core.Message{{ | ||
| Role: "user", | ||
| Content: "hello", | ||
| }}, | ||
| }) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, resp) | ||
|
|
||
| assert.NotEmpty(t, resp.ID) | ||
| assert.Equal(t, "chat.completion", resp.Object) | ||
| require.NotEmpty(t, resp.Choices) | ||
| assert.Equal(t, "assistant", resp.Choices[0].Message.Role) | ||
| assert.NotEmpty(t, resp.Choices[0].Message.Content) | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| resp := loadGoldenFile[core.ChatResponse](t, "xai/chat_completion.json") | ||
|
|
||
| t.Run("Contract", func(t *testing.T) { | ||
| // Validate required fields exist (structure validation) | ||
| assert.NotEmpty(t, resp.ID, "response ID should not be empty") | ||
| assert.NotEmpty(t, resp.Object, "response object should not be empty") | ||
| assert.Equal(t, "chat.completion", resp.Object, "object should be chat.completion") | ||
| assert.NotEmpty(t, resp.Model, "model should not be empty") | ||
| assert.NotZero(t, resp.Created, "created timestamp should not be zero") | ||
|
|
||
| // Validate choices structure | ||
| require.NotEmpty(t, resp.Choices, "choices should not be empty") | ||
| choice := resp.Choices[0] | ||
| assert.GreaterOrEqual(t, choice.Index, 0, "choice index should be >= 0") | ||
| assert.NotNil(t, choice.Message, "choice message should not be nil") | ||
| assert.NotEmpty(t, choice.Message.Role, "message role should not be empty") | ||
| assert.Equal(t, "assistant", choice.Message.Role, "message role should be assistant") | ||
| assert.NotEmpty(t, choice.FinishReason, "finish reason should not be empty") | ||
|
|
||
| // Validate usage structure | ||
| assert.GreaterOrEqual(t, resp.Usage.PromptTokens, 0, "prompt tokens should be >= 0") | ||
| assert.GreaterOrEqual(t, resp.Usage.CompletionTokens, 0, "completion tokens should be >= 0") | ||
| assert.GreaterOrEqual(t, resp.Usage.TotalTokens, 0, "total tokens should be >= 0") | ||
| func TestXAIReplayStreamChatCompletion(t *testing.T) { | ||
| provider := newXAIReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodPost, "/chat/completions"): sseFixtureRoute(t, "xai/chat_completion_stream.txt"), | ||
| }) | ||
|
|
||
| t.Run("ModelPrefix", func(t *testing.T) { | ||
| // xAI models typically contain "grok" | ||
| assert.Contains(t, resp.Model, "grok", "xAI model should contain 'grok'") | ||
| stream, err := provider.StreamChatCompletion(context.Background(), &core.ChatRequest{ | ||
| Model: "grok-3-mini", | ||
| Messages: []core.Message{{ | ||
| Role: "user", | ||
| Content: "stream", | ||
| }}, | ||
| }) | ||
| } | ||
| require.NoError(t, err) | ||
|
|
||
| func TestXAI_ModelsResponse_Contract(t *testing.T) { | ||
| if !goldenFileExists(t, "xai/models.json") { | ||
| t.Skip("golden file not found - run 'make record-api' to generate") | ||
| } | ||
| raw := readAllStream(t, stream) | ||
| chunks, done := parseChatStream(t, raw) | ||
|
|
||
| resp := loadGoldenFile[core.ModelsResponse](t, "xai/models.json") | ||
| require.True(t, done, "stream should terminate with [DONE]") | ||
| require.NotEmpty(t, chunks) | ||
| assert.NotEmpty(t, extractChatStreamText(chunks)) | ||
| } | ||
|
|
||
| // Validate required fields | ||
| assert.Equal(t, "list", resp.Object, "object should be 'list'") | ||
| assert.NotEmpty(t, resp.Data, "models list should not be empty") | ||
| func TestXAIReplayListModels(t *testing.T) { | ||
| provider := newXAIReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodGet, "/models"): jsonFixtureRoute(t, "xai/models.json"), | ||
| }) | ||
|
|
||
| // Validate each model structure | ||
| for i, model := range resp.Data { | ||
| assert.NotEmpty(t, model.ID, "model %d: ID should not be empty", i) | ||
| assert.Equal(t, "model", model.Object, "model %d: object should be 'model'", i) | ||
| assert.NotEmpty(t, model.OwnedBy, "model %d: owned_by should not be empty", i) | ||
| } | ||
| resp, err := provider.ListModels(context.Background()) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, resp) | ||
|
|
||
| // Check for some expected models (Grok variants) | ||
| modelIDs := make(map[string]bool) | ||
| assert.Equal(t, "list", resp.Object) | ||
| require.NotEmpty(t, resp.Data) | ||
| for _, model := range resp.Data { | ||
| modelIDs[model.ID] = true | ||
| } | ||
|
|
||
| // At least one Grok model should exist | ||
| hasGrok := false | ||
| for id := range modelIDs { | ||
| if strings.HasPrefix(id, "grok") { | ||
| hasGrok = true | ||
| break | ||
| } | ||
| assert.NotEmpty(t, model.ID) | ||
| assert.Equal(t, "model", model.Object) | ||
| } | ||
| assert.True(t, hasGrok, "expected at least one Grok model in models list") | ||
| } |
There was a problem hiding this comment.
Add golden-file assertions for XAI replay contract coverage.
The current checks are field-level only and do not validate full normalized outputs against golden artifacts.
As per coding guidelines tests/contract/**/*_test.go: Contract tests must use golden file validation and include -tags=contract build tag with 5-minute timeout. Record new golden files using make record-api.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/xai_test.go` around lines 27 - 98, Add golden-file validation
and contract build tag/timeout: add the contract build tag header to this test
file (//go:build contract) and ensure tests run under the 5-minute contract
timeout when executing (run with -timeout=5m in CI or document running via make
record-api). Replace the current field-level assertions in
TestXAIReplayChatCompletion, TestXAIReplayStreamChatCompletion, and
TestXAIReplayListModels with golden comparisons that serialize/normalize the
full responses (resp from ChatCompletion/ListModels and the reconstructed stream
chunks from StreamChatCompletion) and compare them against dedicated golden
files (e.g., testdata/golden/xai/*.golden.json); if a RECORD flag is set (or
when running make record-api) update/write those golden files. Use existing
helpers like newXAIReplayProvider, jsonFixtureRoute, sseFixtureRoute,
readAllStream, parseChatStream and extractChatStreamText to obtain the actual
outputs to serialize, and fail the test if the normalized output differs from
the golden artifact.
| if !goldenFileExists(t, "xai/responses.json") { | ||
| t.Skip("golden file not found - record xai responses first") | ||
| } |
There was a problem hiding this comment.
Do not skip when required response fixtures are missing.
Skipping turns missing contract artifacts into green CI. Fail fast and direct contributors to regenerate fixtures.
Suggested fix
- if !goldenFileExists(t, "xai/responses.json") {
- t.Skip("golden file not found - record xai responses first")
- }
+ require.True(t, goldenFileExists(t, "xai/responses.json"),
+ "missing xai/responses.json; run `make record-api`")
@@
- if !goldenFileExists(t, "xai/responses_stream.txt") {
- t.Skip("golden file not found - record xai responses stream first")
- }
+ require.True(t, goldenFileExists(t, "xai/responses_stream.txt"),
+ "missing xai/responses_stream.txt; run `make record-api`")As per coding guidelines tests/contract/**/*_test.go: Contract tests must use golden file validation and include -tags=contract build tag with 5-minute timeout. Record new golden files using make record-api.
Also applies to: 126-128
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/xai_test.go` around lines 101 - 103, Replace the t.Skip branch
that hides missing golden fixtures with a failing assertion: in
tests/contract/xai_test.go (the test function using goldenFileExists), remove
the t.Skip("golden file not found - record xai responses first") and instead
call t.Fatalf with a clear message that the golden file xai/responses.json is
missing and instruct contributors to regenerate fixtures (e.g., "missing golden
file xai/responses.json; run `make record-api` to create/update contract
fixtures"). Apply the same change to the other checks around the same test (the
occurrences noted at lines ~126-128) so contract tests fail-fast when fixtures
are absent.
| raw := readAllStream(t, stream) | ||
| events := parseResponsesStream(t, raw) | ||
| require.NotEmpty(t, events) | ||
|
|
||
| assert.True(t, hasResponsesEvent(events, "response.created")) | ||
| assert.True(t, hasResponsesEvent(events, "response.output_text.delta")) | ||
| assert.True(t, hasResponsesEvent(events, "response.completed")) | ||
| assert.NotEmpty(t, extractResponsesStreamText(events)) | ||
| } |
There was a problem hiding this comment.
Assert stream termination with [DONE] in StreamResponses.
TestXAIReplayStreamResponses checks events/text but does not verify terminal completion marker.
Suggested fix
raw := readAllStream(t, stream)
events := parseResponsesStream(t, raw)
require.NotEmpty(t, events)
assert.True(t, hasResponsesEvent(events, "response.created"))
assert.True(t, hasResponsesEvent(events, "response.output_text.delta"))
assert.True(t, hasResponsesEvent(events, "response.completed"))
assert.NotEmpty(t, extractResponsesStreamText(events))
+
+ hasDone := false
+ for _, event := range events {
+ if event.Done {
+ hasDone = true
+ break
+ }
+ }
+ assert.True(t, hasDone, "responses stream should terminate with [DONE]")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| raw := readAllStream(t, stream) | |
| events := parseResponsesStream(t, raw) | |
| require.NotEmpty(t, events) | |
| assert.True(t, hasResponsesEvent(events, "response.created")) | |
| assert.True(t, hasResponsesEvent(events, "response.output_text.delta")) | |
| assert.True(t, hasResponsesEvent(events, "response.completed")) | |
| assert.NotEmpty(t, extractResponsesStreamText(events)) | |
| } | |
| raw := readAllStream(t, stream) | |
| events := parseResponsesStream(t, raw) | |
| require.NotEmpty(t, events) | |
| assert.True(t, hasResponsesEvent(events, "response.created")) | |
| assert.True(t, hasResponsesEvent(events, "response.output_text.delta")) | |
| assert.True(t, hasResponsesEvent(events, "response.completed")) | |
| assert.NotEmpty(t, extractResponsesStreamText(events)) | |
| hasDone := false | |
| for _, event := range events { | |
| if event.Done { | |
| hasDone = true | |
| break | |
| } | |
| } | |
| assert.True(t, hasDone, "responses stream should terminate with [DONE]") | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/xai_test.go` around lines 140 - 148, The test reads the raw
stream via readAllStream and parses events but doesn't assert the stream
termination marker; update TestXAIReplayStreamResponses to assert the terminal
"[DONE]" marker is present in the raw stream (use the raw variable returned from
readAllStream) — e.g., add a require.Contains(raw, "[DONE]") or
require.True(strings.HasSuffix(raw, "[DONE]\n")) after events are parsed to
ensure StreamResponses emits the final "[DONE]" sentinel; keep existing checks
using parseResponsesStream, hasResponsesEvent, and extractResponsesStreamText.
There was a problem hiding this comment.
Actionable comments posted: 11
♻️ Duplicate comments (2)
internal/providers/gemini/gemini.go (1)
79-81: 🧹 Nitpick | 🔵 Trivial
NewWithHTTPClientusesDefaultConfigfor resilience settings.Unlike
New()which acceptsProviderOptionswith explicit resilience config,NewWithHTTPClientusesllmclient.DefaultConfig()which applies default retry/circuit-breaker settings. This asymmetry could cause unexpected behavior if callers expect consistent resilience configuration.Consider accepting
ProviderOptionsor resilience config inNewWithHTTPClientfor consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/gemini/gemini.go` around lines 79 - 81, NewWithHTTPClient currently builds modelsCfg via llmclient.DefaultConfig("gemini", defaultModelsBaseURL) (and assigns it to p.modelsClientConf) which applies default retry/circuit-breaker settings and creates an asymmetry with New that accepts ProviderOptions; modify NewWithHTTPClient to accept the same ProviderOptions (or a resilience config) parameter and use that to construct modelsCfg instead of llmclient.DefaultConfig, or provide an overload that takes a ResilienceConfig and merges it into modelsCfg before assigning to p.modelsClientConf so resilience settings are consistent with New.tests/contract/openai_test.go (1)
121-149: 🧹 Nitpick | 🔵 TrivialAdd
[DONE]termination assertion for consistency.
TestOpenAIReplayStreamResponsesvalidates event types but doesn't verify stream termination with a[DONE]marker, unlike the Anthropic and Groq equivalents which include this check. For contract consistency, add the termination assertion.♻️ Proposed fix to add [DONE] termination check
require.True(t, hasResponsesEvent(events, "response.created")) require.True(t, hasResponsesEvent(events, "response.output_text.delta")) require.True(t, hasResponsesEvent(events, "response.completed")) require.NotEmpty(t, extractResponsesStreamText(events)) + + hasDone := false + for _, event := range events { + if event.Done { + hasDone = true + break + } + } + require.True(t, hasDone, "responses stream should terminate with [DONE]")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/openai_test.go` around lines 121 - 149, TestOpenAIReplayStreamResponses currently validates event types but doesn't assert the stream termination; after reading the raw stream (raw := readAllStream(t, stream)) add a termination assertion to verify the provider emitted the "[DONE]" marker (for example: require.Contains(t, string(raw), "[DONE]") or require.True(t, hasResponsesEvent(events, "[DONE]"))), ensuring the stream is properly terminated like the Anthropic/Groq tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/test.yml:
- Around line 88-102: Update the contract tests README to reflect the migration
to replay-harness behavior: edit the contract tests README
(tests/contract/README.md) to describe the new contributor-facing steps for
running contract replay tests (referencing the CI job name test-contract) and
the exact local command developers should run (go test -v -tags=contract
-timeout=5m ./tests/contract/...) plus updated guidance on golden file
validation and how to regenerate/verify golden outputs under the replay harness.
In `@internal/providers/gemini/gemini.go`:
- Line 269: Replace the plain fmt.Errorf("unexpected Gemini models response
format") return with a structured GatewayError (or existing gateway error type)
so downstream middleware can distinguish parsing failures; create/populate a
GatewayError (e.g., &GatewayError{Code: "parse_error", Message: "unexpected
Gemini models response format", Details: <optional raw payload>}) and return
that (or wrap it with fmt.Errorf("%w: ...", gwErr) if wrappers are required),
updating the site that checks for GatewayError via errors.As accordingly.
In `@tests/contract/gemini_test.go`:
- Around line 71-81: The test TestGeminiReplayListModels couples the API key
into the route key string (replayKey(http.MethodGet,
"/models?key=test-api-key")), making it fragile; update the test to decouple the
key by either using replayKey(http.MethodGet, "/models") (i.e., match the path
without query params) or construct the route key dynamically from the provider
created by newGeminiReplayProvider so it uses the same API key value; modify the
replayRoute entry accordingly (touching TestGeminiReplayListModels,
newGeminiReplayProvider usage, and the replayKey call) so the route matching no
longer hardcodes "?key=test-api-key".
In `@tests/contract/golden_helpers_test.go`:
- Around line 93-105: The normalizeGeneratedID function only treats resp_ and
msg_ suffixes as generated when the remainder parses as a UUID; extend its
normalization to also detect known provider-specific ID formats used in golden
files (e.g., OpenAI 56-char hex for resp_, Anthropic 24-char base62-like for
msg_, and Groq/ULID-like req_ patterns) so those are mapped to resp_<generated>,
msg_<generated>, or req_<generated> respectively. Modify normalizeGeneratedID to
check for these additional patterns (in addition to the existing uuid.Parse
path) using deterministic checks (length + allowed character sets or a ULID
regex) and return the normalized token when matched; keep the existing behavior
for strings.HasPrefix("resp_"), strings.HasPrefix("msg_"), and add support for
"req_" as well so golden-file comparisons are consistent.
In `@tests/contract/testdata/golden/anthropic/messages_with_tools.golden.json`:
- Around line 4-9: The fixture shows "finish_reason": "tool_use" but the message
object only has plain text (message.content) so tooling assertions won't run;
update this golden file to include structured tool-call data under the message
(e.g., add a "tool_calls" array/object alongside message.content) that contains
the invoked tool name and its arguments (matching the adapter's expected keys)
so tests validate the tool name and args mapping when finish_reason is
"tool_use".
In `@tests/contract/testdata/golden/gemini/chat_with_tools.golden.json`:
- Around line 4-9: The fixture's assistant message has finish_reason set to
"tool_calls" but the message object lacks the required tool_calls array; update
the JSON in tests/contract/testdata/golden/gemini/chat_with_tools.golden.json so
that the "message" object includes a "tool_calls" array containing the tool call
entries (with fields like id, type, and function) similar to the OpenAI/Groq
fixtures, ensuring the array reflects the same structure and content used by the
other golden files.
In `@tests/contract/testdata/golden/groq/chat_with_tools.golden.json`:
- Around line 4-9: The golden fixtures declare a tool-calling completion with
"finish_reason": "tool_calls" but lack the required message.tool_calls payload;
update each affected golden file so the assistant message includes a tool_calls
array containing objects with id, type, and function (with name and arguments),
then re-record the fixtures (e.g., run make record-api and/or RECORD=1 go test
-v -tags=contract ./tests/contract/...) to capture real tool-call payloads, and
add a short note in the contract tests README describing that when finish_reason
== "tool_calls" the message must include message.tool_calls with id, type, and
function{name, arguments}.
In `@tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json`:
- Around line 25-30: The prompt_tokens_details object is inconsistent with
prompt_tokens: update the detail counts so they sum to prompt_tokens (13) —
since this is a text-only prompt, set "text_tokens" to 13 and keep
"audio_tokens", "cached_tokens", and "image_tokens" at 0; ensure the
"prompt_tokens" field and the sum of prompt_tokens_details remain consistent
going forward (verify any generation logic that produces these fields uses the
same source for totals).
In `@tests/contract/testdata/golden/openai/chat_with_tools.golden.json`:
- Around line 4-9: Add a ToolCalls field to the core message schema and wire
through provider parsing: update the core.Message struct to include ToolCalls
[]ToolCall (create a ToolCall type if missing with fields matching the OpenAI
tool_calls payload), then update each provider adapter (the OpenAI adapter
parsing messages and any other providers that emit tool-invocation responses) to
detect finish_reason == "tool_calls" and populate Message.ToolCalls by
normalizing the provider-specific tool call objects into the new ToolCall type;
ensure all downstream code that constructs or marshals core.Message
(serializers, tests like chat_with_tools.golden.json, and any response builders)
handle the new field.
In `@tests/contract/testdata/golden/xai/responses_stream.golden.json`:
- Around line 2322-2449: The stream uses inconsistent message IDs across related
events: response.output_item.added/response.output_item.done use
"msg_<generated>" while response.content_part.added and response.output_text.*
use "msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0"; update the golden so all events
for the same output item share one canonical ID (pick either "msg_<generated>"
or "msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0") for the entries with Name values
response.output_item.added, response.content_part.added,
response.output_text.delta, response.output_text.done,
response.content_part.done, and response.output_item.done to preserve
correlation semantics.
In `@tests/contract/xai_test.go`:
- Around line 109-110: Replace the hardcoded golden file paths passed to
compareGoldenJSON with calls to goldenPathForFixture so XAI tests use the
centralized fixture naming; specifically, update the compareGoldenJSON(t,
"xai/responses.golden.json", resp) calls (and the similar occurrences around the
later block at lines 133-136) to compareGoldenJSON(t,
goldenPathForFixture("xai/responses.golden.json"), resp) so the test uses
goldenPathForFixture(...) consistently.
---
Duplicate comments:
In `@internal/providers/gemini/gemini.go`:
- Around line 79-81: NewWithHTTPClient currently builds modelsCfg via
llmclient.DefaultConfig("gemini", defaultModelsBaseURL) (and assigns it to
p.modelsClientConf) which applies default retry/circuit-breaker settings and
creates an asymmetry with New that accepts ProviderOptions; modify
NewWithHTTPClient to accept the same ProviderOptions (or a resilience config)
parameter and use that to construct modelsCfg instead of
llmclient.DefaultConfig, or provide an overload that takes a ResilienceConfig
and merges it into modelsCfg before assigning to p.modelsClientConf so
resilience settings are consistent with New.
In `@tests/contract/openai_test.go`:
- Around line 121-149: TestOpenAIReplayStreamResponses currently validates event
types but doesn't assert the stream termination; after reading the raw stream
(raw := readAllStream(t, stream)) add a termination assertion to verify the
provider emitted the "[DONE]" marker (for example: require.Contains(t,
string(raw), "[DONE]") or require.True(t, hasResponsesEvent(events, "[DONE]"))),
ensuring the stream is properly terminated like the Anthropic/Groq tests.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (57)
.github/workflows/test.ymlcmd/recordapi/main.godocs/TESTING_STRATEGY.mdinternal/providers/gemini/gemini.gotests/contract/README.mdtests/contract/anthropic_test.gotests/contract/gemini_test.gotests/contract/golden_helpers_test.gotests/contract/groq_test.gotests/contract/openai_test.gotests/contract/replay_harness_test.gotests/contract/replay_provider_helpers_test.gotests/contract/testdata/golden/anthropic/messages.golden.jsontests/contract/testdata/golden/anthropic/messages_extended_thinking.golden.jsontests/contract/testdata/golden/anthropic/messages_multi_turn.golden.jsontests/contract/testdata/golden/anthropic/messages_multimodal.golden.jsontests/contract/testdata/golden/anthropic/messages_stream.golden.jsontests/contract/testdata/golden/anthropic/messages_with_params.golden.jsontests/contract/testdata/golden/anthropic/messages_with_tools.golden.jsontests/contract/testdata/golden/anthropic/responses.golden.jsontests/contract/testdata/golden/anthropic/responses_stream.golden.jsontests/contract/testdata/golden/gemini/chat_completion.golden.jsontests/contract/testdata/golden/gemini/chat_completion_stream.golden.jsontests/contract/testdata/golden/gemini/chat_with_params.golden.jsontests/contract/testdata/golden/gemini/chat_with_tools.golden.jsontests/contract/testdata/golden/gemini/models.golden.jsontests/contract/testdata/golden/gemini/responses.golden.jsontests/contract/testdata/golden/gemini/responses_stream.golden.jsontests/contract/testdata/golden/groq/chat_completion.golden.jsontests/contract/testdata/golden/groq/chat_completion_stream.golden.jsontests/contract/testdata/golden/groq/chat_with_params.golden.jsontests/contract/testdata/golden/groq/chat_with_tools.golden.jsontests/contract/testdata/golden/groq/models.golden.jsontests/contract/testdata/golden/groq/responses.golden.jsontests/contract/testdata/golden/groq/responses_stream.golden.jsontests/contract/testdata/golden/openai/chat_completion.golden.jsontests/contract/testdata/golden/openai/chat_completion_reasoning.golden.jsontests/contract/testdata/golden/openai/chat_completion_stream.golden.jsontests/contract/testdata/golden/openai/chat_json_mode.golden.jsontests/contract/testdata/golden/openai/chat_multi_turn.golden.jsontests/contract/testdata/golden/openai/chat_multimodal.golden.jsontests/contract/testdata/golden/openai/chat_with_params.golden.jsontests/contract/testdata/golden/openai/chat_with_tools.golden.jsontests/contract/testdata/golden/openai/models.golden.jsontests/contract/testdata/golden/openai/responses.golden.jsontests/contract/testdata/golden/openai/responses_stream.golden.jsontests/contract/testdata/golden/xai/chat_completion.golden.jsontests/contract/testdata/golden/xai/chat_completion_stream.golden.jsontests/contract/testdata/golden/xai/chat_with_params.golden.jsontests/contract/testdata/golden/xai/models.golden.jsontests/contract/testdata/golden/xai/responses.golden.jsontests/contract/testdata/golden/xai/responses_stream.golden.jsontests/contract/testdata/openai/responses.jsontests/contract/testdata/openai/responses_stream.txttests/contract/testdata/xai/responses.jsontests/contract/testdata/xai/responses_stream.txttests/contract/xai_test.go
| test-contract: | ||
| name: Contract Replay Tests | ||
| runs-on: ubuntu-latest | ||
| steps: | ||
| - uses: actions/checkout@v6 | ||
|
|
||
| - name: Set up Go | ||
| uses: actions/setup-go@v6 | ||
| with: | ||
| go-version: ${{ env.GO_VERSION }} | ||
| cache: true | ||
|
|
||
| - name: Run contract replay tests | ||
| run: go test -v -tags=contract -timeout=5m ./tests/contract/... | ||
|
|
There was a problem hiding this comment.
Please include the contract-testing procedure update in tests/contract/README.md.
This PR migrates contract execution to replay harness behavior, so the contributor-facing contract testing steps should be updated in the same change set.
Based on learnings: Applies to tests/contract/README.md : Update tests/contract/README.md when modifying contract testing procedures or golden file validation processes
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/test.yml around lines 88 - 102, Update the contract tests
README to reflect the migration to replay-harness behavior: edit the contract
tests README (tests/contract/README.md) to describe the new contributor-facing
steps for running contract replay tests (referencing the CI job name
test-contract) and the exact local command developers should run (go test -v
-tags=contract -timeout=5m ./tests/contract/...) plus updated guidance on golden
file validation and how to regenerate/verify golden outputs under the replay
harness.
internal/providers/gemini/gemini.go
Outdated
| Object: "list", | ||
| Data: models, | ||
| }, nil | ||
| return nil, fmt.Errorf("unexpected Gemini models response format") |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider returning a structured error type.
Returning a plain fmt.Errorf for an unexpected response format may not integrate well with error handling middleware that expects GatewayError types. Consider wrapping this in a more specific error type if downstream handlers need to distinguish parsing failures from network errors.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/providers/gemini/gemini.go` at line 269, Replace the plain
fmt.Errorf("unexpected Gemini models response format") return with a structured
GatewayError (or existing gateway error type) so downstream middleware can
distinguish parsing failures; create/populate a GatewayError (e.g.,
&GatewayError{Code: "parse_error", Message: "unexpected Gemini models response
format", Details: <optional raw payload>}) and return that (or wrap it with
fmt.Errorf("%w: ...", gwErr) if wrappers are required), updating the site that
checks for GatewayError via errors.As accordingly.
| func TestGeminiReplayListModels(t *testing.T) { | ||
| provider := newGeminiReplayProvider(t, map[string]replayRoute{ | ||
| replayKey(http.MethodGet, "/models?key=test-api-key"): jsonFixtureRoute(t, "gemini/models.json"), | ||
| }) | ||
|
|
||
| // Check for some expected models (Gemini variants) | ||
| modelIDs := make(map[string]bool) | ||
| for _, model := range resp.Data { | ||
| modelIDs[model.ID] = true | ||
| } | ||
| resp, err := provider.ListModels(context.Background()) | ||
| require.NoError(t, err) | ||
| require.NotNil(t, resp) | ||
|
|
||
| // At least one Gemini model should exist | ||
| hasGemini := false | ||
| for id := range modelIDs { | ||
| if hasPrefixIgnoreModels(id, "gemini") { | ||
| hasGemini = true | ||
| break | ||
| } | ||
| } | ||
| assert.True(t, hasGemini, "expected at least one Gemini model in models list") | ||
| compareGoldenJSON(t, goldenPathForFixture("gemini/models.json"), resp) | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider decoupling the API key from the route key.
The route key includes the query parameter ?key=test-api-key which couples the test to the specific API key value used in the provider. If the provider's API key handling changes, this test would break. Consider whether the models endpoint route could be matched without the query parameter, or ensure the key matches what newGeminiReplayProvider uses.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/gemini_test.go` around lines 71 - 81, The test
TestGeminiReplayListModels couples the API key into the route key string
(replayKey(http.MethodGet, "/models?key=test-api-key")), making it fragile;
update the test to decouple the key by either using replayKey(http.MethodGet,
"/models") (i.e., match the path without query params) or construct the route
key dynamically from the provider created by newGeminiReplayProvider so it uses
the same API key value; modify the replayRoute entry accordingly (touching
TestGeminiReplayListModels, newGeminiReplayProvider usage, and the replayKey
call) so the route matching no longer hardcodes "?key=test-api-key".
| func normalizeGeneratedID(id string) string { | ||
| if strings.HasPrefix(id, "resp_") { | ||
| if _, err := uuid.Parse(strings.TrimPrefix(id, "resp_")); err == nil { | ||
| return "resp_<generated>" | ||
| } | ||
| } | ||
| if strings.HasPrefix(id, "msg_") { | ||
| if _, err := uuid.Parse(strings.TrimPrefix(id, "msg_")); err == nil { | ||
| return "msg_<generated>" | ||
| } | ||
| } | ||
| return id | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find tests/contract -name "*.json" -o -name "*.yaml" -o -name "*.yml" | head -20Repository: ENTERPILOT/GOModel
Length of output: 1315
🏁 Script executed:
rg '"id":\s*"[^"]*"' tests/contract/testdata/golden/ --max-count=30Repository: ENTERPILOT/GOModel
Length of output: 20293
🏁 Script executed:
# Also check the context around the normalizeGeneratedID function
cat -n tests/contract/golden_helpers_test.go | sed -n '80,115p'Repository: ENTERPILOT/GOModel
Length of output: 764
🏁 Script executed:
# Check how normalizeGeneratedID is actually used
rg -B5 -A5 'normalizeGeneratedID' tests/contract/Repository: ENTERPILOT/GOModel
Length of output: 1344
The normalizeGeneratedID function doesn't handle non-UUID ID formats used by actual providers.
The function only normalizes IDs where the suffix (after resp_ or msg_) is a valid UUID. However, golden files contain:
- OpenAI:
resp_0d8b81b7c805508d0069a31431b68c8194af45c94a6952706b(56-char hex string) - Anthropic:
msg_013LBDy1nvGiRgxiRAWseYRS(base62-like, 24 chars) - Groq:
req_01kfk38ha1ey6bw3b68mcaeddg(ULID-like format)
These will fail uuid.Parse() and remain unchanged in golden files, while other providers have already-normalized IDs (msg_<generated>). This inconsistency can cause comparison failures during golden file updates. Either extend the pattern matching to handle provider-specific formats, or document that only UUID-suffixed IDs are normalized.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/golden_helpers_test.go` around lines 93 - 105, The
normalizeGeneratedID function only treats resp_ and msg_ suffixes as generated
when the remainder parses as a UUID; extend its normalization to also detect
known provider-specific ID formats used in golden files (e.g., OpenAI 56-char
hex for resp_, Anthropic 24-char base62-like for msg_, and Groq/ULID-like req_
patterns) so those are mapped to resp_<generated>, msg_<generated>, or
req_<generated> respectively. Modify normalizeGeneratedID to check for these
additional patterns (in addition to the existing uuid.Parse path) using
deterministic checks (length + allowed character sets or a ULID regex) and
return the normalized token when matched; keep the existing behavior for
strings.HasPrefix("resp_"), strings.HasPrefix("msg_"), and add support for
"req_" as well so golden-file comparisons are consistent.
| "finish_reason": "tool_use", | ||
| "index": 0, | ||
| "message": { | ||
| "content": "I'll check the current weather in Paris for you.", | ||
| "role": "assistant" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# First, check if the file exists and show its content
if [ -f "tests/contract/testdata/golden/anthropic/messages_with_tools.golden.json" ]; then
echo "=== File content (full) ==="
cat "tests/contract/testdata/golden/anthropic/messages_with_tools.golden.json"
echo ""
else
echo "File not found"
exit 1
fiRepository: ENTERPILOT/GOModel
Length of output: 545
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# 1) Inspect the fixture for tool-call fields
echo "=== Searching fixture for finish_reason, tool_calls, content ==="
rg -nP -C2 '"finish_reason"\s*:\s*"tool_use"|"tool_calls"|"content"' tests/contract/testdata/golden/anthropic/messages_with_tools.golden.json
echo ""
echo "=== Searching test assertions related to this fixture ==="
# 2) Locate replay/contract assertions related to this fixture and tool-call parsing
rg -nP --type=go -C3 'messages_with_tools|tool_use|tool_calls|ToolCalls|finish_reason' tests/contract 2>/dev/null || echo "(search completed)"Repository: ENTERPILOT/GOModel
Length of output: 2610
Tool-use fixture is missing structured tool-call data.
Line 4 indicates "finish_reason": "tool_use", but the message object (lines 7–8) contains only plain text content with no tool_calls field. This prevents the test from validating adapter mapping for tool name and arguments, allowing tool-call regressions to pass undetected.
Proposed fixture adjustment
"choices": [
{
"finish_reason": "tool_use",
"index": 0,
"message": {
- "content": "I'll check the current weather in Paris for you.",
- "role": "assistant"
+ "content": "",
+ "role": "assistant",
+ "tool_calls": [
+ {
+ "id": "call_1",
+ "type": "function",
+ "function": {
+ "name": "get_weather",
+ "arguments": "{\"location\":\"Paris\"}"
+ }
+ }
+ ]
}
}
],📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "finish_reason": "tool_use", | |
| "index": 0, | |
| "message": { | |
| "content": "I'll check the current weather in Paris for you.", | |
| "role": "assistant" | |
| } | |
| "finish_reason": "tool_use", | |
| "index": 0, | |
| "message": { | |
| "content": "", | |
| "role": "assistant", | |
| "tool_calls": [ | |
| { | |
| "id": "call_1", | |
| "type": "function", | |
| "function": { | |
| "name": "get_weather", | |
| "arguments": "{\"location\":\"Paris\"}" | |
| } | |
| } | |
| ] | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/anthropic/messages_with_tools.golden.json`
around lines 4 - 9, The fixture shows "finish_reason": "tool_use" but the
message object only has plain text (message.content) so tooling assertions won't
run; update this golden file to include structured tool-call data under the
message (e.g., add a "tool_calls" array/object alongside message.content) that
contains the invoked tool name and its arguments (matching the adapter's
expected keys) so tests validate the tool name and args mapping when
finish_reason is "tool_use".
| "finish_reason": "tool_calls", | ||
| "index": 0, | ||
| "message": { | ||
| "content": "", | ||
| "role": "assistant" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and verify the snippet
cat -n tests/contract/testdata/golden/groq/chat_with_tools.golden.json | head -20Repository: ENTERPILOT/GOModel
Length of output: 620
🏁 Script executed:
# Check if tests/contract/README.md exists and what it documents
ls -la tests/contract/README.md 2>/dev/null || echo "README.md not found"Repository: ENTERPILOT/GOModel
Length of output: 135
🏁 Script executed:
# Search for other golden files with finish_reason="tool_calls"
fd -e json . tests/contract/testdata/golden -type f | while read f; do
if jq -e '.choices[]?.finish_reason' "$f" 2>/dev/null | grep -q "tool_calls"; then
echo "File: $f"
jq '.choices[]?.finish_reason' "$f" 2>/dev/null | head -3
fi
doneRepository: ENTERPILOT/GOModel
Length of output: 232
🌐 Web query:
Groq API chat completions tool_calls response schema finish_reason
💡 Result:
Groq /openai/v1/chat/completions response shape (tool calls + finish_reason)
Groq’s Chat Completions endpoint is OpenAI-compatible and returns a top-level chat.completion object with a choices[] array; each choice includes a message and a finish_reason. When the model chooses to call a tool, the assistant message contains tool_calls, and finish_reason is "tool_calls". [1][2]
Typical tool-calling response (non-streaming)
{
"id": "chatcmpl_...",
"object": "chat.completion",
"created": 1234567890,
"model": "openai/gpt-oss-120b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"tool_calls": [
{
"id": "call_...",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"location\":\"New York, NY\",\"unit\":\"fahrenheit\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
],
"usage": { "...": "..." }
}Key tool_calls[] fields: id, type ("function"), function.name, function.arguments (a JSON string). [2]
finish_reason values you should handle
Groq follows the OpenAI-style set documented/expected for chat completions:
"stop"(normal completion) [1][3]"length"(token limit hit) [4]"tool_calls"(model requests one or more tool calls) [2][4]"content_filter"(content filtered) [4]
Streaming note (important edge case)
A Groq community report (Nov 19, 2025) indicates that for some models, streaming chat-completion chunks may omit finish_reason in the final chunk (breaking OpenAI streaming expectations). If you depend on finish_reason in streaming, you may need a defensive fallback. [4]
Sources: Groq API Reference (chat completions) [1], Groq “Local Tool Calling” docs (tool_calls + finish_reason example) [2], Groq built-in tools example showing finish_reason: "stop" [3], Groq community thread on missing streaming finish_reason + expected values [4].
🏁 Script executed:
# Correct the fd command and search for other golden files with finish_reason="tool_calls"
fd -e json . tests/contract/testdata/golden | while read f; do
if jq -e '.choices[]?.finish_reason' "$f" 2>/dev/null | grep -q "tool_calls"; then
echo "=== $f ==="
jq '.choices[] | select(.finish_reason=="tool_calls") | .message | keys' "$f" 2>/dev/null
fi
doneRepository: ENTERPILOT/GOModel
Length of output: 359
🏁 Script executed:
# Check the README.md content to see what's documented
cat tests/contract/README.mdRepository: ENTERPILOT/GOModel
Length of output: 1764
finish_reason: "tool_calls" missing message.tool_calls across all provider golden files.
This pattern appears in three golden fixtures:
tests/contract/testdata/golden/openai/chat_with_tools.golden.jsontests/contract/testdata/golden/gemini/chat_with_tools.golden.jsontests/contract/testdata/golden/groq/chat_with_tools.golden.json
Each declares a tool-calling completion (finish_reason: "tool_calls") but omits the message.tool_calls array. Per OpenAI/Groq schema, when finish_reason is "tool_calls", the message must include a tool_calls array with id, type, and function (name, arguments). Missing this allows replay tests to pass while silently masking adapter tool-call parsing regressions.
Re-record fixtures using make record-api, then refresh golden outputs via RECORD=1 go test -v -tags=contract ./tests/contract/... to capture actual tool-call payloads.
Additionally, update tests/contract/README.md to document the required tool_calls payload structure when finish_reason is "tool_calls", so future golden fixtures comply with schema expectations.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/groq/chat_with_tools.golden.json` around lines
4 - 9, The golden fixtures declare a tool-calling completion with
"finish_reason": "tool_calls" but lack the required message.tool_calls payload;
update each affected golden file so the assistant message includes a tool_calls
array containing objects with id, type, and function (with name and arguments),
then re-record the fixtures (e.g., run make record-api and/or RECORD=1 go test
-v -tags=contract ./tests/contract/...) to capture real tool-call payloads, and
add a short note in the contract tests README describing that when finish_reason
== "tool_calls" the message must include message.tool_calls with id, type, and
function{name, arguments}.
| "prompt_tokens": 13, | ||
| "prompt_tokens_details": { | ||
| "audio_tokens": 0, | ||
| "cached_tokens": 0, | ||
| "image_tokens": 0, | ||
| "text_tokens": 0 |
There was a problem hiding this comment.
Inconsistent token accounting in prompt_tokens_details.
The prompt_tokens field shows 13 tokens, but all the detail fields (audio_tokens, cached_tokens, image_tokens, text_tokens) are 0. The detail fields should sum to match the parent total. For a text-only prompt with 13 tokens, text_tokens should be 13.
📊 Proposed fix to correct token details
"prompt_tokens": 13,
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0,
"image_tokens": 0,
- "text_tokens": 0
+ "text_tokens": 13
},🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json`
around lines 25 - 30, The prompt_tokens_details object is inconsistent with
prompt_tokens: update the detail counts so they sum to prompt_tokens (13) —
since this is a text-only prompt, set "text_tokens" to 13 and keep
"audio_tokens", "cached_tokens", and "image_tokens" at 0; ensure the
"prompt_tokens" field and the sum of prompt_tokens_details remain consistent
going forward (verify any generation logic that produces these fields uses the
same source for totals).
| "finish_reason": "tool_calls", | ||
| "index": 0, | ||
| "message": { | ||
| "content": "", | ||
| "role": "assistant" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, find the file and understand its context
find . -type f -name "chat_with_tools.golden.json" 2>/dev/nullRepository: ENTERPILOT/GOModel
Length of output: 263
🏁 Script executed:
# List golden test fixtures to understand the structure
fd "golden.*json" tests/ | head -20Repository: ENTERPILOT/GOModel
Length of output: 1418
🏁 Script executed:
# Check if there's a schema or documentation about golden files
find . -type f -name "*.md" | xargs grep -l "golden" 2>/dev/null | head -5Repository: ENTERPILOT/GOModel
Length of output: 127
🏁 Script executed:
# Look for test files that reference chat_with_tools
rg "chat_with_tools" --type go --type json 2>/dev/null | head -20Repository: ENTERPILOT/GOModel
Length of output: 364
🏁 Script executed:
# Read the full OpenAI chat_with_tools golden fixture
cat -n tests/contract/testdata/golden/openai/chat_with_tools.golden.jsonRepository: ENTERPILOT/GOModel
Length of output: 1024
🏁 Script executed:
# Read the test file to understand expectations
cat -n tests/contract/openai_test.go | head -100Repository: ENTERPILOT/GOModel
Length of output: 3840
🏁 Script executed:
# Search for schema definitions or type definitions related to responses
rg "type.*Response" --type go tests/contract/ -A 5 | head -50Repository: ENTERPILOT/GOModel
Length of output: 442
🏁 Script executed:
# Check if there's any documentation in the contract test directory
cat tests/contract/README.mdRepository: ENTERPILOT/GOModel
Length of output: 1764
🏁 Script executed:
# Find OpenAI provider implementation
find . -type f -path "*/providers/openai*" -name "*.go" | head -10Repository: ENTERPILOT/GOModel
Length of output: 142
🏁 Script executed:
# Search for tool_calls handling in codebase
rg "tool_calls" --type go -B 2 -A 2Repository: ENTERPILOT/GOModel
Length of output: 715
🏁 Script executed:
# Check the core types to understand schema
find . -type f -path "*/internal/core*" -name "*.go" | head -10Repository: ENTERPILOT/GOModel
Length of output: 259
🌐 Web query:
OpenAI Chat Completions API message tool_calls field schema finish_reason
💡 Result:
choices[i].message.tool_calls (Chat Completions API)
In a chat completion response, an assistant message may include:
{
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "string",
"type": "function",
"function": {
"name": "string",
"arguments": "string (JSON text)"
}
}
]
}tool_callsis an optional array ofChatCompletionMessageToolCallobjects. [1]- Each tool call has an
idand atype. For function tools,typeis"function"and the payload is underfunction: { name, arguments }(withargumentsprovided as a string containing JSON). [1] - The schema also allows a custom tool call variant:
{ id, type: "custom", custom: { name, input } }. [2]
Tool result message schema (role: "tool")
To reply with the tool’s output, send a tool message that references the tool call:
{
"role": "tool",
"tool_call_id": "string",
"content": "string (or text parts)"
}tool_call_id links the tool result back to the corresponding tool_calls[].id. [3]
choices[i].finish_reason when tools are invoked
finish_reason is one of: "stop", "length", "tool_calls", "content_filter", or "function_call" (deprecated). [4]
When the model decides to call a tool, finish_reason will be "tool_calls". [4]
Sources: OpenAI API Reference (Chat Completions). [1][2][3][4]
🏁 Script executed:
# Check OpenAI adapter for tool_calls parsing
cat -n ./internal/providers/openai/openai.go | head -150Repository: ENTERPILOT/GOModel
Length of output: 6029
🏁 Script executed:
# Check core types definition
cat -n ./internal/core/types.go | grep -A 20 "type Message struct"Repository: ENTERPILOT/GOModel
Length of output: 815
🏁 Script executed:
# Compare with other provider fixtures that handle tools
cat tests/contract/testdata/golden/gemini/chat_with_tools.golden.jsonRepository: ENTERPILOT/GOModel
Length of output: 471
🏁 Script executed:
# Check if anthropic has tool_calls in its fixture
cat tests/contract/testdata/golden/anthropic/messages_with_tools.golden.json | head -50Repository: ENTERPILOT/GOModel
Length of output: 516
🏁 Script executed:
# Search for ToolCall handling in OpenAI adapter
rg "ToolCall" -type go ./internal/providers/openai/Repository: ENTERPILOT/GOModel
Length of output: 89
🏁 Script executed:
# Check if tool_calls are parsed anywhere
rg "tool_calls" -type go ./internal/providers/ -B 3 -A 3Repository: ENTERPILOT/GOModel
Length of output: 89
🏁 Script executed:
# Look at the full Message type definition more carefully
cat -n ./internal/core/types.go | head -100Repository: ENTERPILOT/GOModel
Length of output: 4443
🏁 Script executed:
# Check if there's a separate response type for tools
rg "type.*Response" ./internal/core/types.go -A 10Repository: ENTERPILOT/GOModel
Length of output: 1203
🏁 Script executed:
# Check if tool_calls are being parsed but dropped in normalization
grep -r "tool_calls" ./internal/providers/openai/ --include="*.go" -B 5 -A 5Repository: ENTERPILOT/GOModel
Length of output: 44
🏁 Script executed:
# Check if there are any comment or docs about tool_calls support
rg "tool.call|tool_call" . --type go -i | head -20Repository: ENTERPILOT/GOModel
Length of output: 741
🏁 Script executed:
# Verify the test expectations - what exactly is being compared
cat -n tests/contract/main_test.go | grep -A 50 "compareGoldenJSON"Repository: ENTERPILOT/GOModel
Length of output: 44
🏁 Script executed:
# Find the ToolCallResponse and related struct definitions in main_test.go
cat -n tests/contract/main_test.go | grep -A 20 "type ToolCallResponse"Repository: ENTERPILOT/GOModel
Length of output: 928
🏁 Script executed:
# Look at how the test compares the golden JSON
cat -n tests/contract/main_test.go | grep -B 5 -A 20 "func compareGoldenJSON"Repository: ENTERPILOT/GOModel
Length of output: 44
🏁 Script executed:
# Check what the actual test expectations are for tool_calls
cat -n tests/contract/main_test.go | head -200Repository: ENTERPILOT/GOModel
Length of output: 3444
🏁 Script executed:
# Verify that OpenAI provider directly uses core.Message (confirm no transformation)
cat -n ./internal/providers/openai/openai.go | grep -A 30 "ChatCompletion"Repository: ENTERPILOT/GOModel
Length of output: 1937
🏁 Script executed:
# Check if there's any transformation layer between provider response and core.Message
rg "json.Unmarshal|json.Decoder" ./internal/providers/openai/ -B 3 -A 3Repository: ENTERPILOT/GOModel
Length of output: 3952
core.Message schema must add ToolCalls field to support tool-invocation responses.
The fixture correctly reflects the current core.Message struct (which lacks a ToolCalls field), but when finish_reason="tool_calls" (line 4), the OpenAI API provides a tool_calls array in the message that cannot be captured or exposed. This is a design gap: the core schema must be extended to include ToolCalls []ToolCall in the Message type, and all provider adapters must be updated to parse and normalize tool-call data.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/openai/chat_with_tools.golden.json` around
lines 4 - 9, Add a ToolCalls field to the core message schema and wire through
provider parsing: update the core.Message struct to include ToolCalls []ToolCall
(create a ToolCall type if missing with fields matching the OpenAI tool_calls
payload), then update each provider adapter (the OpenAI adapter parsing messages
and any other providers that emit tool-invocation responses) to detect
finish_reason == "tool_calls" and populate Message.ToolCalls by normalizing the
provider-specific tool call objects into the new ToolCall type; ensure all
downstream code that constructs or marshals core.Message (serializers, tests
like chat_with_tools.golden.json, and any response builders) handle the new
field.
| "Done": false, | ||
| "Name": "response.output_item.added", | ||
| "Payload": { | ||
| "item": { | ||
| "content": [], | ||
| "id": "msg_\u003cgenerated\u003e", | ||
| "role": "assistant", | ||
| "status": "in_progress", | ||
| "type": "message" | ||
| }, | ||
| "output_index": 1, | ||
| "sequence_number": 186, | ||
| "type": "response.output_item.added" | ||
| } | ||
| }, | ||
| { | ||
| "Done": false, | ||
| "Name": "response.content_part.added", | ||
| "Payload": { | ||
| "content_index": 0, | ||
| "item_id": "msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0", | ||
| "output_index": 1, | ||
| "part": { | ||
| "annotations": [], | ||
| "logprobs": [], | ||
| "text": "", | ||
| "type": "output_text" | ||
| }, | ||
| "sequence_number": 187, | ||
| "type": "response.content_part.added" | ||
| } | ||
| }, | ||
| { | ||
| "Done": false, | ||
| "Name": "response.output_text.delta", | ||
| "Payload": { | ||
| "content_index": 0, | ||
| "delta": "Hello", | ||
| "item_id": "msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0", | ||
| "logprobs": [], | ||
| "output_index": 1, | ||
| "sequence_number": 188, | ||
| "type": "response.output_text.delta" | ||
| } | ||
| }, | ||
| { | ||
| "Done": false, | ||
| "Name": "response.output_text.delta", | ||
| "Payload": { | ||
| "content_index": 0, | ||
| "delta": ",", | ||
| "item_id": "msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0", | ||
| "logprobs": [], | ||
| "output_index": 1, | ||
| "sequence_number": 189, | ||
| "type": "response.output_text.delta" | ||
| } | ||
| }, | ||
| { | ||
| "Done": false, | ||
| "Name": "response.output_text.delta", | ||
| "Payload": { | ||
| "content_index": 0, | ||
| "delta": " World", | ||
| "item_id": "msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0", | ||
| "logprobs": [], | ||
| "output_index": 1, | ||
| "sequence_number": 190, | ||
| "type": "response.output_text.delta" | ||
| } | ||
| }, | ||
| { | ||
| "Done": false, | ||
| "Name": "response.output_text.delta", | ||
| "Payload": { | ||
| "content_index": 0, | ||
| "delta": "!", | ||
| "item_id": "msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0", | ||
| "logprobs": [], | ||
| "output_index": 1, | ||
| "sequence_number": 191, | ||
| "type": "response.output_text.delta" | ||
| } | ||
| }, | ||
| { | ||
| "Done": false, | ||
| "Name": "response.output_text.done", | ||
| "Payload": { | ||
| "content_index": 0, | ||
| "item_id": "msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0", | ||
| "logprobs": [], | ||
| "output_index": 1, | ||
| "sequence_number": 192, | ||
| "text": "Hello, World!", | ||
| "type": "response.output_text.done" | ||
| } | ||
| }, | ||
| { | ||
| "Done": false, | ||
| "Name": "response.content_part.done", | ||
| "Payload": { | ||
| "content_index": 0, | ||
| "item_id": "msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0", | ||
| "output_index": 1, | ||
| "part": { | ||
| "annotations": [], | ||
| "logprobs": [], | ||
| "text": "Hello, World!", | ||
| "type": "output_text" | ||
| }, | ||
| "sequence_number": 193, | ||
| "type": "response.content_part.done" | ||
| } | ||
| }, | ||
| { | ||
| "Done": false, | ||
| "Name": "response.output_item.done", | ||
| "Payload": { | ||
| "item": { | ||
| "content": [ | ||
| { | ||
| "annotations": [], | ||
| "logprobs": [], | ||
| "text": "Hello, World!", | ||
| "type": "output_text" | ||
| } | ||
| ], | ||
| "id": "msg_\u003cgenerated\u003e", |
There was a problem hiding this comment.
Keep stream message identifiers consistent across related events.
response.output_item.added/response.output_item.done use msg_<generated>, while the associated response.content_part.* and response.output_text.* events use msg_9f.... Use a single canonical ID across these events to preserve correlation semantics in the golden stream.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/xai/responses_stream.golden.json` around lines
2322 - 2449, The stream uses inconsistent message IDs across related events:
response.output_item.added/response.output_item.done use "msg_<generated>" while
response.content_part.added and response.output_text.* use
"msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0"; update the golden so all events for
the same output item share one canonical ID (pick either "msg_<generated>" or
"msg_9f619ea5-7275-4a31-cb4c-35f8adb5c6b0") for the entries with Name values
response.output_item.added, response.content_part.added,
response.output_text.delta, response.output_text.done,
response.content_part.done, and response.output_item.done to preserve
correlation semantics.
tests/contract/xai_test.go
Outdated
| compareGoldenJSON(t, "xai/responses.golden.json", resp) | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Use goldenPathForFixture(...) consistently for XAI responses tests.
These two hardcoded golden paths are the only outliers; deriving them from fixture paths keeps naming/lookup behavior centralized.
♻️ Proposed refactor
- compareGoldenJSON(t, "xai/responses.golden.json", resp)
+ compareGoldenJSON(t, goldenPathForFixture("xai/responses.json"), resp)
@@
- compareGoldenJSON(t, "xai/responses_stream.golden.json", map[string]any{
+ compareGoldenJSON(t, goldenPathForFixture("xai/responses_stream.txt"), map[string]any{
"events": events,
"text": extractResponsesStreamText(events),
})Also applies to: 133-136
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/xai_test.go` around lines 109 - 110, Replace the hardcoded
golden file paths passed to compareGoldenJSON with calls to goldenPathForFixture
so XAI tests use the centralized fixture naming; specifically, update the
compareGoldenJSON(t, "xai/responses.golden.json", resp) calls (and the similar
occurrences around the later block at lines 133-136) to compareGoldenJSON(t,
goldenPathForFixture("xai/responses.golden.json"), resp) so the test uses
goldenPathForFixture(...) consistently.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/providers/gemini/gemini.go (1)
156-158:⚠️ Potential issue | 🟠 MajorNative empty models payload is treated as an error.
A valid Gemini native response with
{"models":[]}currently falls through to fallback parsing and can return a 502-style provider error. Empty native lists should return an emptycore.ModelsResponse.💡 Proposed fix
type geminiModelsResponse struct { - Models []geminiModel `json:"models"` + Models *[]geminiModel `json:"models"` } @@ // Preferred path: native Gemini models response. var geminiResp geminiModelsResponse - if err := json.Unmarshal(rawResp.Body, &geminiResp); err == nil && len(geminiResp.Models) > 0 { - models := make([]core.Model, 0, len(geminiResp.Models)) + if err := json.Unmarshal(rawResp.Body, &geminiResp); err == nil && geminiResp.Models != nil { + models := make([]core.Model, 0, len(*geminiResp.Models)) - for _, gm := range geminiResp.Models { + for _, gm := range *geminiResp.Models { // Extract model ID from name (format: "models/gemini-...") modelID := strings.TrimPrefix(gm.Name, "models/")Also applies to: 205-206
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/providers/gemini/gemini.go` around lines 156 - 158, The Gemini parser treats a native response with an empty models slice as an error; update the code paths that decode geminiModelsResponse (the struct type named geminiModelsResponse and its Models []geminiModel field) to explicitly handle len(resp.Models)==0 by returning an empty core.ModelsResponse (not falling back to the error/fallback parser). Apply the same fix to the second parsing location that currently checks the native payload (the other block handling geminiModelsResponse around the alternate parse) so both code paths return an empty core.ModelsResponse when Models is empty instead of producing a provider error.
♻️ Duplicate comments (3)
tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json (1)
25-31:⚠️ Potential issue | 🟡 Minor
prompt_tokens_detailsis still inconsistent withprompt_tokens.This is still mismatched (
prompt_tokens: 13vstext_tokens: 0) and should be aligned for reliable token-accounting assertions.Suggested fix
"prompt_tokens": 13, "prompt_tokens_details": { "audio_tokens": 0, "cached_tokens": 0, "image_tokens": 0, - "text_tokens": 0 + "text_tokens": 13 },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json` around lines 25 - 31, The JSON shows prompt_tokens (13) not matching the sum in prompt_tokens_details (text_tokens: 0); update the prompt_tokens_details so its fields reflect the actual token count—e.g., set prompt_tokens_details.text_tokens to 13 (or adjust prompt_tokens to match the sum of audio_tokens + cached_tokens + image_tokens + text_tokens) so that "prompt_tokens" and "prompt_tokens_details" are consistent.tests/contract/openai_test.go (1)
136-143:⚠️ Potential issue | 🟡 MinorAdd explicit stream termination assertion in
TestOpenAIReplayStreamResponses.At Line 136, event checks are present, but the test does not verify the final
[DONE]sentinel.Suggested fix
raw := readAllStream(t, stream) events := parseResponsesStream(t, raw) require.NotEmpty(t, events) require.True(t, hasResponsesEvent(events, "response.created")) require.True(t, hasResponsesEvent(events, "response.output_text.delta")) require.True(t, hasResponsesEvent(events, "response.completed")) require.NotEmpty(t, extractResponsesStreamText(events)) +require.Contains(t, raw, "[DONE]", "responses stream should terminate with [DONE]") compareGoldenJSON(t, "openai/responses_stream.golden.json", map[string]any{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/openai_test.go` around lines 136 - 143, The test TestOpenAIReplayStreamResponses currently checks parsed events but doesn't assert that the stream ended with the final "[DONE]" sentinel; update the test (after calling readAllStream and parseResponsesStream and before finishing assertions) to explicitly verify stream termination by asserting that the raw stream string ends with the "[DONE]" sentinel or that events include a final sentinel event (use the existing raw variable from readAllStream(t, stream) or events from parseResponsesStream(t, raw) and add a require.True or require.Equal assertion to confirm the final "[DONE]" marker).tests/contract/xai_test.go (1)
127-132:⚠️ Potential issue | 🟡 MinorAssert terminal
[DONE]marker in stream responses test.At Line 127, the test checks parsed events but not stream termination; truncated SSE output could still satisfy current assertions.
Suggested fix
raw := readAllStream(t, stream) events := parseResponsesStream(t, raw) require.True(t, hasResponsesEvent(events, "response.created")) require.True(t, hasResponsesEvent(events, "response.output_text.delta")) require.True(t, hasResponsesEvent(events, "response.completed")) +require.Contains(t, raw, "[DONE]", "responses stream should terminate with [DONE]") compareGoldenJSON(t, goldenPathForFixture("xai/responses_stream.txt"), map[string]any{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/xai_test.go` around lines 127 - 132, Test currently verifies response events but not that the SSE stream contained the terminal "[DONE]" marker; update the test around readAllStream/parseResponsesStream (functions readAllStream, parseResponsesStream, hasResponsesEvent and variables raw/events) to assert the presence of the terminal marker by checking the raw stream for the "[DONE]" sentinel (e.g., require that raw contains the "[DONE]" line or that the final SSE message equals "[DONE]") so truncated output cannot falsely pass.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/contract/testdata/golden/gemini/chat_with_tools.golden.json`:
- Line 23: Replace the hard-coded ID value "4zpyaaPLGYXlxN8Pq7jUkQs" under the
"id" key with the normalized placeholder pattern used in other golden fixtures
(e.g., "msg_\u003cgenerated\u003e") so the fixture matches the standardized ID
format; locate the JSON object containing the "id" property and update its value
to the placeholder string to ensure consistency across golden files.
In `@tests/contract/testdata/golden/gemini/responses_stream.golden.json`:
- Around line 26-39: The Gemini streaming golden fixture's response.completed
event is missing the usage object; update the "response.completed"
Payload.response to include a "usage" field matching other providers and Gemini
non-streaming fixtures with keys input_tokens, output_tokens, and total_tokens
(e.g., usage: { "input_tokens": 0, "output_tokens": 0, "total_tokens": 0 }),
ensuring the "response.completed" event in responses_stream.golden.json contains
this usage object alongside created_at, id, model, object, provider, and status.
In `@tests/contract/testdata/golden/groq/models.golden.json`:
- Around line 2-123: The golden fixture models.golden.json is sensitive to
upstream ordering; to prevent non-semantic churn, sort the model list by the
"id" field before writing the fixture (i.e., perform a stable sort on each
entry's id property) wherever the model-list is serialized for tests (the code
that emits the model-list golden/testdata). Ensure the serializer or test
fixture writer sorts entries by entry["id"] (or model.id) deterministically
prior to marshaling so the resulting models.golden.json is consistently ordered.
In `@tests/contract/testdata/golden/openai/chat_completion.golden.json`:
- Around line 25-31: The golden test fixture has inconsistent token accounting:
"prompt_tokens" is 17 while all entries in "prompt_tokens_details" are 0; update
the fixture so the detail fields sum to the total (for a text-only prompt set
"text_tokens": 17 or otherwise distribute the 17 across
"audio_tokens","cached_tokens","image_tokens","text_tokens")—look for the keys
"prompt_tokens", "prompt_tokens_details" and "text_tokens" in the JSON and make
the values consistent.
In `@tests/contract/testdata/golden/openai/chat_multimodal.golden.json`:
- Line 13: Replace the concrete volatile completion id value in the golden
fixture: the "id" field currently set to
"chatcmpl-D0qUlBEghKIASKSVRE4fnjs5QUDGF" should be normalized to a stable
placeholder (e.g., "chatcmpl-000000000000000" or "chatcmpl-<SANITIZED_ID>") so
regenerated fixtures don't cause churn; update the "id" key in
chat_multimodal.golden.json accordingly.
---
Outside diff comments:
In `@internal/providers/gemini/gemini.go`:
- Around line 156-158: The Gemini parser treats a native response with an empty
models slice as an error; update the code paths that decode geminiModelsResponse
(the struct type named geminiModelsResponse and its Models []geminiModel field)
to explicitly handle len(resp.Models)==0 by returning an empty
core.ModelsResponse (not falling back to the error/fallback parser). Apply the
same fix to the second parsing location that currently checks the native payload
(the other block handling geminiModelsResponse around the alternate parse) so
both code paths return an empty core.ModelsResponse when Models is empty instead
of producing a provider error.
---
Duplicate comments:
In `@tests/contract/openai_test.go`:
- Around line 136-143: The test TestOpenAIReplayStreamResponses currently checks
parsed events but doesn't assert that the stream ended with the final "[DONE]"
sentinel; update the test (after calling readAllStream and parseResponsesStream
and before finishing assertions) to explicitly verify stream termination by
asserting that the raw stream string ends with the "[DONE]" sentinel or that
events include a final sentinel event (use the existing raw variable from
readAllStream(t, stream) or events from parseResponsesStream(t, raw) and add a
require.True or require.Equal assertion to confirm the final "[DONE]" marker).
In `@tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json`:
- Around line 25-31: The JSON shows prompt_tokens (13) not matching the sum in
prompt_tokens_details (text_tokens: 0); update the prompt_tokens_details so its
fields reflect the actual token count—e.g., set
prompt_tokens_details.text_tokens to 13 (or adjust prompt_tokens to match the
sum of audio_tokens + cached_tokens + image_tokens + text_tokens) so that
"prompt_tokens" and "prompt_tokens_details" are consistent.
In `@tests/contract/xai_test.go`:
- Around line 127-132: Test currently verifies response events but not that the
SSE stream contained the terminal "[DONE]" marker; update the test around
readAllStream/parseResponsesStream (functions readAllStream,
parseResponsesStream, hasResponsesEvent and variables raw/events) to assert the
presence of the terminal marker by checking the raw stream for the "[DONE]"
sentinel (e.g., require that raw contains the "[DONE]" line or that the final
SSE message equals "[DONE]") so truncated output cannot falsely pass.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (59)
.github/workflows/test.ymlcmd/recordapi/main.godocs/TESTING_STRATEGY.mdinternal/core/types.gointernal/providers/anthropic/anthropic.gointernal/providers/gemini/gemini.gotests/contract/README.mdtests/contract/anthropic_test.gotests/contract/gemini_test.gotests/contract/golden_helpers_test.gotests/contract/groq_test.gotests/contract/openai_test.gotests/contract/replay_harness_test.gotests/contract/replay_provider_helpers_test.gotests/contract/testdata/golden/anthropic/messages.golden.jsontests/contract/testdata/golden/anthropic/messages_extended_thinking.golden.jsontests/contract/testdata/golden/anthropic/messages_multi_turn.golden.jsontests/contract/testdata/golden/anthropic/messages_multimodal.golden.jsontests/contract/testdata/golden/anthropic/messages_stream.golden.jsontests/contract/testdata/golden/anthropic/messages_with_params.golden.jsontests/contract/testdata/golden/anthropic/messages_with_tools.golden.jsontests/contract/testdata/golden/anthropic/responses.golden.jsontests/contract/testdata/golden/anthropic/responses_stream.golden.jsontests/contract/testdata/golden/gemini/chat_completion.golden.jsontests/contract/testdata/golden/gemini/chat_completion_stream.golden.jsontests/contract/testdata/golden/gemini/chat_with_params.golden.jsontests/contract/testdata/golden/gemini/chat_with_tools.golden.jsontests/contract/testdata/golden/gemini/models.golden.jsontests/contract/testdata/golden/gemini/responses.golden.jsontests/contract/testdata/golden/gemini/responses_stream.golden.jsontests/contract/testdata/golden/groq/chat_completion.golden.jsontests/contract/testdata/golden/groq/chat_completion_stream.golden.jsontests/contract/testdata/golden/groq/chat_with_params.golden.jsontests/contract/testdata/golden/groq/chat_with_tools.golden.jsontests/contract/testdata/golden/groq/models.golden.jsontests/contract/testdata/golden/groq/responses.golden.jsontests/contract/testdata/golden/groq/responses_stream.golden.jsontests/contract/testdata/golden/openai/chat_completion.golden.jsontests/contract/testdata/golden/openai/chat_completion_reasoning.golden.jsontests/contract/testdata/golden/openai/chat_completion_stream.golden.jsontests/contract/testdata/golden/openai/chat_json_mode.golden.jsontests/contract/testdata/golden/openai/chat_multi_turn.golden.jsontests/contract/testdata/golden/openai/chat_multimodal.golden.jsontests/contract/testdata/golden/openai/chat_with_params.golden.jsontests/contract/testdata/golden/openai/chat_with_tools.golden.jsontests/contract/testdata/golden/openai/models.golden.jsontests/contract/testdata/golden/openai/responses.golden.jsontests/contract/testdata/golden/openai/responses_stream.golden.jsontests/contract/testdata/golden/xai/chat_completion.golden.jsontests/contract/testdata/golden/xai/chat_completion_stream.golden.jsontests/contract/testdata/golden/xai/chat_with_params.golden.jsontests/contract/testdata/golden/xai/models.golden.jsontests/contract/testdata/golden/xai/responses.golden.jsontests/contract/testdata/golden/xai/responses_stream.golden.jsontests/contract/testdata/openai/responses.jsontests/contract/testdata/openai/responses_stream.txttests/contract/testdata/xai/responses.jsontests/contract/testdata/xai/responses_stream.txttests/contract/xai_test.go
| } | ||
| ], | ||
| "created": 0, | ||
| "id": "4zpyaaPLGYXlxN8Pq7jUkQs", |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider normalizing the ID for consistency with other golden fixtures.
Other golden files use placeholder patterns like "msg_\u003cgenerated\u003e" for IDs, but this file uses a raw ID value. This inconsistency may cause test failures if the fixture is re-recorded with a different ID.
♻️ Suggested change
- "id": "4zpyaaPLGYXlxN8Pq7jUkQs",
+ "id": "chatcmpl-\u003cgenerated\u003e",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "id": "4zpyaaPLGYXlxN8Pq7jUkQs", | |
| "id": "chatcmpl-\u003cgenerated\u003e", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/gemini/chat_with_tools.golden.json` at line
23, Replace the hard-coded ID value "4zpyaaPLGYXlxN8Pq7jUkQs" under the "id" key
with the normalized placeholder pattern used in other golden fixtures (e.g.,
"msg_\u003cgenerated\u003e") so the fixture matches the standardized ID format;
locate the JSON object containing the "id" property and update its value to the
placeholder string to ensure consistency across golden files.
| { | ||
| "Done": false, | ||
| "Name": "response.completed", | ||
| "Payload": { | ||
| "response": { | ||
| "created_at": 0, | ||
| "id": "resp_\u003cgenerated\u003e", | ||
| "model": "gemini-2.5-flash", | ||
| "object": "response", | ||
| "provider": "gemini", | ||
| "status": "completed" | ||
| }, | ||
| "type": "response.completed" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check for usage field in Gemini golden fixtures
rg -n "usage" tests/contract/testdata/golden/gemini/ --type jsonRepository: ENTERPILOT/GOModel
Length of output: 384
🏁 Script executed:
# Also check the Anthropic streaming fixture to see the difference
rg -B5 -A5 "response.completed" tests/contract/testdata/golden/anthropic/ --type jsonRepository: ENTERPILOT/GOModel
Length of output: 2102
🏁 Script executed:
# Get a broader view of the Gemini responses_stream.golden.json file
wc -l tests/contract/testdata/golden/gemini/responses_stream.golden.jsonRepository: ENTERPILOT/GOModel
Length of output: 131
🏁 Script executed:
# Read the complete Gemini streaming fixture
cat tests/contract/testdata/golden/gemini/responses_stream.golden.jsonRepository: ENTERPILOT/GOModel
Length of output: 1110
🏁 Script executed:
# Check all streaming golden fixtures for usage field
find tests/contract/testdata/golden -name "*stream*.json" -exec echo "=== {} ===" \; -exec rg "usage" {} \;Repository: ENTERPILOT/GOModel
Length of output: 1067
🏁 Script executed:
# Search for test setup/expectations for Gemini streaming
rg -n "responses_stream" tests/contract/ --type go --type json -A 2 -B 2Repository: ENTERPILOT/GOModel
Length of output: 3444
🏁 Script executed:
# Check if there's implementation code specific to Gemini streaming
rg -n "gemini.*stream" . --type go -iRepository: ENTERPILOT/GOModel
Length of output: 774
Add missing usage field to response.completed event.
The Gemini streaming fixture omits usage (input_tokens, output_tokens, total_tokens) in the response.completed event, unlike all other providers (Anthropic, Groq, OpenAI, xAI) which include it even in streaming responses. Additionally, Gemini's own non-streaming fixtures (chat_completion.golden.json, responses.golden.json, etc.) include usage. The missing field should be added to maintain consistency with both Gemini non-streaming behavior and other providers' streaming patterns.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/gemini/responses_stream.golden.json` around
lines 26 - 39, The Gemini streaming golden fixture's response.completed event is
missing the usage object; update the "response.completed" Payload.response to
include a "usage" field matching other providers and Gemini non-streaming
fixtures with keys input_tokens, output_tokens, and total_tokens (e.g., usage: {
"input_tokens": 0, "output_tokens": 0, "total_tokens": 0 }), ensuring the
"response.completed" event in responses_stream.golden.json contains this usage
object alongside created_at, id, model, object, provider, and status.
| "data": [ | ||
| { | ||
| "created": 0, | ||
| "id": "canopylabs/orpheus-v1-english", | ||
| "object": "model", | ||
| "owned_by": "Canopy Labs" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "whisper-large-v3", | ||
| "object": "model", | ||
| "owned_by": "OpenAI" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "moonshotai/kimi-k2-instruct", | ||
| "object": "model", | ||
| "owned_by": "Moonshot AI" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "canopylabs/orpheus-arabic-saudi", | ||
| "object": "model", | ||
| "owned_by": "Canopy Labs" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "moonshotai/kimi-k2-instruct-0905", | ||
| "object": "model", | ||
| "owned_by": "Moonshot AI" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "meta-llama/llama-4-maverick-17b-128e-instruct", | ||
| "object": "model", | ||
| "owned_by": "Meta" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "openai/gpt-oss-20b", | ||
| "object": "model", | ||
| "owned_by": "OpenAI" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "meta-llama/llama-guard-4-12b", | ||
| "object": "model", | ||
| "owned_by": "Meta" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "llama-3.1-8b-instant", | ||
| "object": "model", | ||
| "owned_by": "Meta" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "meta-llama/llama-4-scout-17b-16e-instruct", | ||
| "object": "model", | ||
| "owned_by": "Meta" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "allam-2-7b", | ||
| "object": "model", | ||
| "owned_by": "SDAIA" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "groq/compound-mini", | ||
| "object": "model", | ||
| "owned_by": "Groq" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "whisper-large-v3-turbo", | ||
| "object": "model", | ||
| "owned_by": "OpenAI" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "qwen/qwen3-32b", | ||
| "object": "model", | ||
| "owned_by": "Alibaba Cloud" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "meta-llama/llama-prompt-guard-2-22m", | ||
| "object": "model", | ||
| "owned_by": "Meta" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "openai/gpt-oss-120b", | ||
| "object": "model", | ||
| "owned_by": "OpenAI" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "groq/compound", | ||
| "object": "model", | ||
| "owned_by": "Groq" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "meta-llama/llama-prompt-guard-2-86m", | ||
| "object": "model", | ||
| "owned_by": "Meta" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "openai/gpt-oss-safeguard-20b", | ||
| "object": "model", | ||
| "owned_by": "OpenAI" | ||
| }, | ||
| { | ||
| "created": 0, | ||
| "id": "llama-3.3-70b-versatile", | ||
| "object": "model", | ||
| "owned_by": "Meta" | ||
| } | ||
| ], |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider canonical ordering for model-list goldens.
If upstream model order is not guaranteed, sorting entries by id before writing this fixture will reduce non-semantic golden churn on re-record.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/groq/models.golden.json` around lines 2 - 123,
The golden fixture models.golden.json is sensitive to upstream ordering; to
prevent non-semantic churn, sort the model list by the "id" field before writing
the fixture (i.e., perform a stable sort on each entry's id property) wherever
the model-list is serialized for tests (the code that emits the model-list
golden/testdata). Ensure the serializer or test fixture writer sorts entries by
entry["id"] (or model.id) deterministically prior to marshaling so the resulting
models.golden.json is consistently ordered.
| "prompt_tokens": 17, | ||
| "prompt_tokens_details": { | ||
| "audio_tokens": 0, | ||
| "cached_tokens": 0, | ||
| "image_tokens": 0, | ||
| "text_tokens": 0 | ||
| }, |
There was a problem hiding this comment.
Fix inconsistent prompt_tokens_details accounting.
prompt_tokens is 17, but all detail fields are 0. For a text-only prompt, text_tokens should reflect the prompt total.
Suggested fix
"prompt_tokens": 17,
"prompt_tokens_details": {
"audio_tokens": 0,
"cached_tokens": 0,
"image_tokens": 0,
- "text_tokens": 0
+ "text_tokens": 17
},📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "prompt_tokens": 17, | |
| "prompt_tokens_details": { | |
| "audio_tokens": 0, | |
| "cached_tokens": 0, | |
| "image_tokens": 0, | |
| "text_tokens": 0 | |
| }, | |
| "prompt_tokens": 17, | |
| "prompt_tokens_details": { | |
| "audio_tokens": 0, | |
| "cached_tokens": 0, | |
| "image_tokens": 0, | |
| "text_tokens": 17 | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/openai/chat_completion.golden.json` around
lines 25 - 31, The golden test fixture has inconsistent token accounting:
"prompt_tokens" is 17 while all entries in "prompt_tokens_details" are 0; update
the fixture so the detail fields sum to the total (for a text-only prompt set
"text_tokens": 17 or otherwise distribute the 17 across
"audio_tokens","cached_tokens","image_tokens","text_tokens")—look for the keys
"prompt_tokens", "prompt_tokens_details" and "text_tokens" in the JSON and make
the values consistent.
| } | ||
| ], | ||
| "created": 0, | ||
| "id": "chatcmpl-D0qUlBEghKIASKSVRE4fnjs5QUDGF", |
There was a problem hiding this comment.
Sanitize volatile completion IDs in the golden fixture.
Line 13 uses a concrete chatcmpl-* id, which can create avoidable golden churn when fixtures are regenerated.
Suggested normalization
- "id": "chatcmpl-D0qUlBEghKIASKSVRE4fnjs5QUDGF",
+ "id": "chatcmpl_<generated>",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "id": "chatcmpl-D0qUlBEghKIASKSVRE4fnjs5QUDGF", | |
| "id": "chatcmpl_<generated>", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/openai/chat_multimodal.golden.json` at line
13, Replace the concrete volatile completion id value in the golden fixture: the
"id" field currently set to "chatcmpl-D0qUlBEghKIASKSVRE4fnjs5QUDGF" should be
normalized to a stable placeholder (e.g., "chatcmpl-000000000000000" or
"chatcmpl-<SANITIZED_ID>") so regenerated fixtures don't cause churn; update the
"id" key in chat_multimodal.golden.json accordingly.
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
docs/TESTING_STRATEGY.md (2)
160-167:⚠️ Potential issue | 🟠 MajorUse the standardized golden-recording command (
make record-api) in this strategy doc.This section currently promotes direct
recordapiusage, but team guidance standardizes recording viamake record-api.🔧 Suggested doc fix
-Use the `recordapi` tool or manual curl commands: +Use `make record-api` (preferred) or manual curl commands: ```bash -# Using recordapi -go run ./cmd/recordapi -provider=openai -endpoint=chat -output=tests/contract/testdata/openai/chat_completion.json +# Preferred: standardized recording flow +make record-apiBased on learnings: Contract tests must use golden file validation and include
-tags=contractbuild tag with 5-minute timeout. Record new golden files usingmake record-api.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/TESTING_STRATEGY.md` around lines 160 - 167, Replace the direct recordapi invocation under the "Recording New Golden Files" section with the standardized make target; update the example that currently shows `go run ./cmd/recordapi -provider=openai -endpoint=chat -output=...` to instruct users to run `make record-api` instead and add a short note reminding contract tests to use golden-file validation with the `-tags=contract` build tag and a 5-minute timeout; target the "Recording New Golden Files" header and the example command lines in that section for this change.
104-114:⚠️ Potential issue | 🟠 MajorAdd the required 5-minute timeout to contract test commands.
The contract commands in this section omit the required timeout flag, which diverges from the project’s contract test standard.
🔧 Suggested doc fix
# Run contract tests -go test -v -tags=contract ./tests/contract/... +go test -v -tags=contract -timeout=5m ./tests/contract/... # Run specific provider tests -go test -v -tags=contract ./tests/contract/... -run TestOpenAI +go test -v -tags=contract -timeout=5m ./tests/contract/... -run TestOpenAIBased on learnings: Contract tests must use golden file validation and include
-tags=contractbuild tag with 5-minute timeout.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/TESTING_STRATEGY.md` around lines 104 - 114, Update the contract test command examples in TESTING_STRATEGY.md to include the required 5-minute timeout flag; specifically modify the two command lines that run contract tests (the generic run and the provider-specific run) to add the Go test flag `-timeout 5m` alongside the existing `-v -tags=contract` so they match the project’s contract test standard.
♻️ Duplicate comments (3)
tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json (1)
25-30:⚠️ Potential issue | 🟡 MinorFix inconsistent prompt token breakdown.
prompt_tokensis 13, but on Line [30]text_tokensis 0. The detail fields should reconcile with the parent total for deterministic contract assertions.🔧 Proposed fix
"prompt_tokens": 13, "prompt_tokens_details": { "audio_tokens": 0, "cached_tokens": 0, "image_tokens": 0, - "text_tokens": 0 + "text_tokens": 13 },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json` around lines 25 - 30, The prompt token breakdown is inconsistent: prompt_tokens is 13 but prompt_tokens_details fields sum to 0; update prompt_tokens_details (e.g., text_tokens, audio_tokens, image_tokens, cached_tokens) so their sum equals the parent prompt_tokens (13). Locate the JSON object with keys "prompt_tokens" and "prompt_tokens_details" in the golden file and set the appropriate detail values (most likely set "text_tokens" to 13 or distribute across the detail keys) so the totals reconcile..github/workflows/test.yml (1)
88-102:⚠️ Potential issue | 🟡 MinorPlease ship the contract README update with this migration.
The new replay CI path changes contributor workflow, so
tests/contract/README.mdshould be updated in the same PR with the replay run/validation instructions.Based on learnings: Applies to tests/contract/README.md : Update
tests/contract/README.mdwhen modifying contract testing procedures or golden file validation processes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test.yml around lines 88 - 102, The CI now adds a "test-contract" job that runs "go test -v -tags=contract -timeout=5m ./tests/contract/..." (step name "Run contract replay tests"), so update tests/contract/README.md in the same PR to include the new replay CI path and explicit contributor instructions for running the replay locally and validating golden files (commands, flags, expected outputs, and how to accept/update golden files); ensure the README references the "test-contract" job and the exact go test invocation used by CI so contributors can reproduce the CI run and validation steps.tests/contract/testdata/golden/gemini/responses_stream.golden.json (1)
30-37:⚠️ Potential issue | 🟠 MajorAdd
usagetoresponse.completedpayload.
response.completedshould include usage tokens to keep streaming golden shape consistent with other providers and non-stream responses.💡 Suggested patch
"Payload": { "response": { "created_at": 0, "id": "resp_\u003cgenerated\u003e", "model": "gemini-2.5-flash", "object": "response", "provider": "gemini", - "status": "completed" + "status": "completed", + "usage": { + "input_tokens": 0, + "output_tokens": 0, + "total_tokens": 0 + } }, "type": "response.completed" }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/testdata/golden/gemini/responses_stream.golden.json` around lines 30 - 37, The response.completed payload is missing the usage field; update the "response" object (the completed case with "status": "completed" and id "resp_<generated>") to include a "usage" key matching the other providers/non-stream shapes (e.g., token counts like total_tokens and per-category fields used in other golden files) so the streaming golden shape is consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@cmd/recordapi/main.go`:
- Around line 95-111: Add documentation for the new responses and
responses_stream endpoints to tests/contract/README.md: describe the
/v1/responses endpoint, the two example request bodies shown (keys "model",
"input" and the "stream": true option for responses_stream), and list which
providers support this capability (openai and xai); update the "endpoint
options" section to include the streaming flag and model choice and note any
provider-specific behavior for recording so contract tests and golden-file
validation reference the new endpoints.
In `@internal/providers/anthropic/anthropic.go`:
- Around line 653-656: The code assigns raw b.Input (possibly with varying
whitespace) directly to the local variable arguments which later becomes
Function.Arguments; to avoid golden-file churn, parse b.Input as JSON into an
interface{} and re-marshal it with encoding/json to produce canonicalized JSON
(e.g., json.Unmarshal -> json.Marshal) before assigning to
arguments/Function.Arguments, and if parsing fails fall back to "{}" or the
original string while logging or handling the error appropriately; update the
block around the variables arguments and b.Input in anthropic.go to perform this
canonicalization.
In `@internal/providers/gemini/gemini.go`:
- Around line 205-244: The current logic in the Gemini provider rejects valid
responses like {"models":[]} because the if condition checks both json.Unmarshal
success and len(geminiResp.Models) > 0; update the branch handling
json.Unmarshal into geminiResp so it treats an empty geminiResp.Models slice as
a valid response: remove the len(geminiResp.Models) > 0 requirement (or
otherwise allow zero-length slices), keep the existing loop that builds models
(which will produce an empty models slice), and return a core.ModelsResponse
with Object:"list" and Data:models; adjust code around geminiResp,
ModelsResponse, and the supportsGenerate/supportsEmbed checks accordingly.
In `@tests/contract/openai_test.go`:
- Line 3: Update the contract tests README to reflect the new replay-harness
workflow and changed fixture/golden flow: in the contract tests README (the
README.md next to the contract tests) describe the new execution steps
(including running contract tests with the -tags=contract -timeout=5m flags as
used by openai_test.go), explain how fixtures and golden files are produced and
validated under the replay-harness (how to run replays, how to regenerate
goldens), and update CI/PR guidance (what to check, when to commit updated
golden files and how to run the local validation commands). Ensure the README
clearly lists commands, expected outputs, and the exact developer workflow for
replay vs live runs so contributors can reproduce and update goldens correctly.
In `@tests/contract/testdata/golden/openai/chat_with_params.golden.json`:
- Around line 1-34: Update the contract testing README to reflect the new
replay-harness workflow: add concise, ordered steps for recording new
interactions, replaying tests, and validating golden files (including the
command-line invocations, required env vars, and CI expectations), document how
to regenerate/update golden files such as chat_with_params.golden.json and where
to place them, and include troubleshooting notes about common failures and the
expected validation output so contributors follow the new process consistently.
---
Outside diff comments:
In `@docs/TESTING_STRATEGY.md`:
- Around line 160-167: Replace the direct recordapi invocation under the
"Recording New Golden Files" section with the standardized make target; update
the example that currently shows `go run ./cmd/recordapi -provider=openai
-endpoint=chat -output=...` to instruct users to run `make record-api` instead
and add a short note reminding contract tests to use golden-file validation with
the `-tags=contract` build tag and a 5-minute timeout; target the "Recording New
Golden Files" header and the example command lines in that section for this
change.
- Around line 104-114: Update the contract test command examples in
TESTING_STRATEGY.md to include the required 5-minute timeout flag; specifically
modify the two command lines that run contract tests (the generic run and the
provider-specific run) to add the Go test flag `-timeout 5m` alongside the
existing `-v -tags=contract` so they match the project’s contract test standard.
---
Duplicate comments:
In @.github/workflows/test.yml:
- Around line 88-102: The CI now adds a "test-contract" job that runs "go test
-v -tags=contract -timeout=5m ./tests/contract/..." (step name "Run contract
replay tests"), so update tests/contract/README.md in the same PR to include the
new replay CI path and explicit contributor instructions for running the replay
locally and validating golden files (commands, flags, expected outputs, and how
to accept/update golden files); ensure the README references the "test-contract"
job and the exact go test invocation used by CI so contributors can reproduce
the CI run and validation steps.
In `@tests/contract/testdata/golden/gemini/responses_stream.golden.json`:
- Around line 30-37: The response.completed payload is missing the usage field;
update the "response" object (the completed case with "status": "completed" and
id "resp_<generated>") to include a "usage" key matching the other
providers/non-stream shapes (e.g., token counts like total_tokens and
per-category fields used in other golden files) so the streaming golden shape is
consistent.
In `@tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json`:
- Around line 25-30: The prompt token breakdown is inconsistent: prompt_tokens
is 13 but prompt_tokens_details fields sum to 0; update prompt_tokens_details
(e.g., text_tokens, audio_tokens, image_tokens, cached_tokens) so their sum
equals the parent prompt_tokens (13). Locate the JSON object with keys
"prompt_tokens" and "prompt_tokens_details" in the golden file and set the
appropriate detail values (most likely set "text_tokens" to 13 or distribute
across the detail keys) so the totals reconcile.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (59)
.github/workflows/test.ymlcmd/recordapi/main.godocs/TESTING_STRATEGY.mdinternal/core/types.gointernal/providers/anthropic/anthropic.gointernal/providers/gemini/gemini.gotests/contract/README.mdtests/contract/anthropic_test.gotests/contract/gemini_test.gotests/contract/golden_helpers_test.gotests/contract/groq_test.gotests/contract/openai_test.gotests/contract/replay_harness_test.gotests/contract/replay_provider_helpers_test.gotests/contract/testdata/golden/anthropic/messages.golden.jsontests/contract/testdata/golden/anthropic/messages_extended_thinking.golden.jsontests/contract/testdata/golden/anthropic/messages_multi_turn.golden.jsontests/contract/testdata/golden/anthropic/messages_multimodal.golden.jsontests/contract/testdata/golden/anthropic/messages_stream.golden.jsontests/contract/testdata/golden/anthropic/messages_with_params.golden.jsontests/contract/testdata/golden/anthropic/messages_with_tools.golden.jsontests/contract/testdata/golden/anthropic/responses.golden.jsontests/contract/testdata/golden/anthropic/responses_stream.golden.jsontests/contract/testdata/golden/gemini/chat_completion.golden.jsontests/contract/testdata/golden/gemini/chat_completion_stream.golden.jsontests/contract/testdata/golden/gemini/chat_with_params.golden.jsontests/contract/testdata/golden/gemini/chat_with_tools.golden.jsontests/contract/testdata/golden/gemini/models.golden.jsontests/contract/testdata/golden/gemini/responses.golden.jsontests/contract/testdata/golden/gemini/responses_stream.golden.jsontests/contract/testdata/golden/groq/chat_completion.golden.jsontests/contract/testdata/golden/groq/chat_completion_stream.golden.jsontests/contract/testdata/golden/groq/chat_with_params.golden.jsontests/contract/testdata/golden/groq/chat_with_tools.golden.jsontests/contract/testdata/golden/groq/models.golden.jsontests/contract/testdata/golden/groq/responses.golden.jsontests/contract/testdata/golden/groq/responses_stream.golden.jsontests/contract/testdata/golden/openai/chat_completion.golden.jsontests/contract/testdata/golden/openai/chat_completion_reasoning.golden.jsontests/contract/testdata/golden/openai/chat_completion_stream.golden.jsontests/contract/testdata/golden/openai/chat_json_mode.golden.jsontests/contract/testdata/golden/openai/chat_multi_turn.golden.jsontests/contract/testdata/golden/openai/chat_multimodal.golden.jsontests/contract/testdata/golden/openai/chat_with_params.golden.jsontests/contract/testdata/golden/openai/chat_with_tools.golden.jsontests/contract/testdata/golden/openai/models.golden.jsontests/contract/testdata/golden/openai/responses.golden.jsontests/contract/testdata/golden/openai/responses_stream.golden.jsontests/contract/testdata/golden/xai/chat_completion.golden.jsontests/contract/testdata/golden/xai/chat_completion_stream.golden.jsontests/contract/testdata/golden/xai/chat_with_params.golden.jsontests/contract/testdata/golden/xai/models.golden.jsontests/contract/testdata/golden/xai/responses.golden.jsontests/contract/testdata/golden/xai/responses_stream.golden.jsontests/contract/testdata/openai/responses.jsontests/contract/testdata/openai/responses_stream.txttests/contract/testdata/xai/responses.jsontests/contract/testdata/xai/responses_stream.txttests/contract/xai_test.go
| "responses": { | ||
| path: "/v1/responses", | ||
| method: http.MethodPost, | ||
| requestBody: map[string]interface{}{ | ||
| "model": "gpt-4o-mini", | ||
| "input": "Say 'Hello, World!' and nothing else.", | ||
| }, | ||
| }, | ||
| "responses_stream": { | ||
| path: "/v1/responses", | ||
| method: http.MethodPost, | ||
| requestBody: map[string]interface{}{ | ||
| "model": "gpt-4o-mini", | ||
| "input": "Say 'Hello, World!' and nothing else.", | ||
| "stream": true, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider updating tests/contract/README.md with new endpoint documentation.
The new responses and responses_stream endpoints add recording capabilities that users should be aware of. The README should document:
- The new endpoint options
- Which providers support the
/v1/responsescapability (openai, xai)
Based on learnings: "Update tests/contract/README.md when modifying contract testing procedures or golden file validation processes."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cmd/recordapi/main.go` around lines 95 - 111, Add documentation for the new
responses and responses_stream endpoints to tests/contract/README.md: describe
the /v1/responses endpoint, the two example request bodies shown (keys "model",
"input" and the "stream": true option for responses_stream), and list which
providers support this capability (openai and xai); update the "endpoint
options" section to include the streaming flag and model choice and note any
provider-specific behavior for recording so contract tests and golden-file
validation reference the new endpoints.
| arguments := "{}" | ||
| if len(b.Input) > 0 { | ||
| arguments = string(b.Input) | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Canonicalize tool-call argument JSON before assigning Function.Arguments.
Line 655 currently preserves raw whitespace from upstream input, which can cause golden churn for semantically identical argument objects.
♻️ Suggested refinement
- arguments := "{}"
- if len(b.Input) > 0 {
- arguments = string(b.Input)
- }
+ arguments := "{}"
+ if len(bytes.TrimSpace(b.Input)) > 0 {
+ var compact bytes.Buffer
+ if err := json.Compact(&compact, b.Input); err == nil {
+ arguments = compact.String()
+ } else {
+ arguments = string(b.Input)
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| arguments := "{}" | |
| if len(b.Input) > 0 { | |
| arguments = string(b.Input) | |
| } | |
| arguments := "{}" | |
| if len(bytes.TrimSpace(b.Input)) > 0 { | |
| var compact bytes.Buffer | |
| if err := json.Compact(&compact, b.Input); err == nil { | |
| arguments = compact.String() | |
| } else { | |
| arguments = string(b.Input) | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/providers/anthropic/anthropic.go` around lines 653 - 656, The code
assigns raw b.Input (possibly with varying whitespace) directly to the local
variable arguments which later becomes Function.Arguments; to avoid golden-file
churn, parse b.Input as JSON into an interface{} and re-marshal it with
encoding/json to produce canonicalized JSON (e.g., json.Unmarshal ->
json.Marshal) before assigning to arguments/Function.Arguments, and if parsing
fails fall back to "{}" or the original string while logging or handling the
error appropriately; update the block around the variables arguments and b.Input
in anthropic.go to perform this canonicalization.
internal/providers/gemini/gemini.go
Outdated
| if err := json.Unmarshal(rawResp.Body, &geminiResp); err == nil && len(geminiResp.Models) > 0 { | ||
| models := make([]core.Model, 0, len(geminiResp.Models)) | ||
|
|
||
| for _, gm := range geminiResp.Models { | ||
| // Extract model ID from name (format: "models/gemini-...") | ||
| modelID := strings.TrimPrefix(gm.Name, "models/") | ||
|
|
||
| // Only include models that support generateContent (chat/completion) | ||
| supportsGenerate := false | ||
| for _, method := range gm.SupportedMethods { | ||
| if method == "generateContent" || method == "streamGenerateContent" { | ||
| supportsGenerate = true | ||
| break | ||
| } | ||
| } | ||
| } | ||
|
|
||
| supportsEmbed := false | ||
| for _, method := range gm.SupportedMethods { | ||
| if method == "embedContent" { | ||
| supportsEmbed = true | ||
| break | ||
| supportsEmbed := false | ||
| for _, method := range gm.SupportedMethods { | ||
| if method == "embedContent" { | ||
| supportsEmbed = true | ||
| break | ||
| } | ||
| } | ||
|
|
||
| isOpenAICompatModel := strings.HasPrefix(modelID, "gemini-") || strings.HasPrefix(modelID, "text-embedding-") | ||
| if (supportsGenerate || supportsEmbed) && isOpenAICompatModel { | ||
| models = append(models, core.Model{ | ||
| ID: modelID, | ||
| Object: "model", | ||
| OwnedBy: "google", | ||
| Created: now, | ||
| }) | ||
| } | ||
| } | ||
|
|
||
| isOpenAICompatModel := strings.HasPrefix(modelID, "gemini-") || strings.HasPrefix(modelID, "text-embedding-") | ||
| if (supportsGenerate || supportsEmbed) && isOpenAICompatModel { | ||
| return &core.ModelsResponse{ | ||
| Object: "list", | ||
| Data: models, | ||
| }, nil | ||
| } |
There was a problem hiding this comment.
Handle empty native model lists as valid responses.
Line 205 currently requires len(geminiResp.Models) > 0, so a valid native response like {"models":[]} falls through and eventually returns an unexpected-format error instead of an empty list.
💡 Proposed fix
- if err := json.Unmarshal(rawResp.Body, &geminiResp); err == nil && len(geminiResp.Models) > 0 {
+ if err := json.Unmarshal(rawResp.Body, &geminiResp); err == nil && geminiResp.Models != nil {
models := make([]core.Model, 0, len(geminiResp.Models))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err := json.Unmarshal(rawResp.Body, &geminiResp); err == nil && len(geminiResp.Models) > 0 { | |
| models := make([]core.Model, 0, len(geminiResp.Models)) | |
| for _, gm := range geminiResp.Models { | |
| // Extract model ID from name (format: "models/gemini-...") | |
| modelID := strings.TrimPrefix(gm.Name, "models/") | |
| // Only include models that support generateContent (chat/completion) | |
| supportsGenerate := false | |
| for _, method := range gm.SupportedMethods { | |
| if method == "generateContent" || method == "streamGenerateContent" { | |
| supportsGenerate = true | |
| break | |
| } | |
| } | |
| } | |
| supportsEmbed := false | |
| for _, method := range gm.SupportedMethods { | |
| if method == "embedContent" { | |
| supportsEmbed = true | |
| break | |
| supportsEmbed := false | |
| for _, method := range gm.SupportedMethods { | |
| if method == "embedContent" { | |
| supportsEmbed = true | |
| break | |
| } | |
| } | |
| isOpenAICompatModel := strings.HasPrefix(modelID, "gemini-") || strings.HasPrefix(modelID, "text-embedding-") | |
| if (supportsGenerate || supportsEmbed) && isOpenAICompatModel { | |
| models = append(models, core.Model{ | |
| ID: modelID, | |
| Object: "model", | |
| OwnedBy: "google", | |
| Created: now, | |
| }) | |
| } | |
| } | |
| isOpenAICompatModel := strings.HasPrefix(modelID, "gemini-") || strings.HasPrefix(modelID, "text-embedding-") | |
| if (supportsGenerate || supportsEmbed) && isOpenAICompatModel { | |
| return &core.ModelsResponse{ | |
| Object: "list", | |
| Data: models, | |
| }, nil | |
| } | |
| if err := json.Unmarshal(rawResp.Body, &geminiResp); err == nil && geminiResp.Models != nil { | |
| models := make([]core.Model, 0, len(geminiResp.Models)) | |
| for _, gm := range geminiResp.Models { | |
| // Extract model ID from name (format: "models/gemini-...") | |
| modelID := strings.TrimPrefix(gm.Name, "models/") | |
| // Only include models that support generateContent (chat/completion) | |
| supportsGenerate := false | |
| for _, method := range gm.SupportedMethods { | |
| if method == "generateContent" || method == "streamGenerateContent" { | |
| supportsGenerate = true | |
| break | |
| } | |
| } | |
| supportsEmbed := false | |
| for _, method := range gm.SupportedMethods { | |
| if method == "embedContent" { | |
| supportsEmbed = true | |
| break | |
| } | |
| } | |
| isOpenAICompatModel := strings.HasPrefix(modelID, "gemini-") || strings.HasPrefix(modelID, "text-embedding-") | |
| if (supportsGenerate || supportsEmbed) && isOpenAICompatModel { | |
| models = append(models, core.Model{ | |
| ID: modelID, | |
| Object: "model", | |
| OwnedBy: "google", | |
| Created: now, | |
| }) | |
| } | |
| } | |
| return &core.ModelsResponse{ | |
| Object: "list", | |
| Data: models, | |
| }, nil | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/providers/gemini/gemini.go` around lines 205 - 244, The current
logic in the Gemini provider rejects valid responses like {"models":[]} because
the if condition checks both json.Unmarshal success and len(geminiResp.Models) >
0; update the branch handling json.Unmarshal into geminiResp so it treats an
empty geminiResp.Models slice as a valid response: remove the
len(geminiResp.Models) > 0 requirement (or otherwise allow zero-length slices),
keep the existing loop that builds models (which will produce an empty models
slice), and return a core.ModelsResponse with Object:"list" and Data:models;
adjust code around geminiResp, ModelsResponse, and the
supportsGenerate/supportsEmbed checks accordingly.
| @@ -1,221 +1,149 @@ | |||
| //go:build contract | |||
|
|
|||
| // Contract tests in this file are intended to run with: -tags=contract -timeout=5m. | |||
There was a problem hiding this comment.
Update contract testing README for replay-harness workflow changes.
This migration changes contract test execution and fixture/golden flow; please update tests/contract/README.md accordingly so contributors follow the new process.
Based on learnings: Applies to tests/contract/README.md : Update tests/contract/README.md when modifying contract testing procedures or golden file validation processes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/openai_test.go` at line 3, Update the contract tests README to
reflect the new replay-harness workflow and changed fixture/golden flow: in the
contract tests README (the README.md next to the contract tests) describe the
new execution steps (including running contract tests with the -tags=contract
-timeout=5m flags as used by openai_test.go), explain how fixtures and golden
files are produced and validated under the replay-harness (how to run replays,
how to regenerate goldens), and update CI/PR guidance (what to check, when to
commit updated golden files and how to run the local validation commands).
Ensure the README clearly lists commands, expected outputs, and the exact
developer workflow for replay vs live runs so contributors can reproduce and
update goldens correctly.
| { | ||
| "choices": [ | ||
| { | ||
| "finish_reason": "stop", | ||
| "index": 0, | ||
| "message": { | ||
| "content": "1, 2, 3, 4, ", | ||
| "role": "assistant" | ||
| } | ||
| } | ||
| ], | ||
| "created": 0, | ||
| "id": "chatcmpl-D0qOmQKlnUcfmQ1UR9UNvcYikTjhe", | ||
| "model": "gpt-4o-mini-2024-07-18", | ||
| "object": "chat.completion", | ||
| "provider": "", | ||
| "usage": { | ||
| "completion_tokens": 13, | ||
| "completion_tokens_details": { | ||
| "accepted_prediction_tokens": 0, | ||
| "audio_tokens": 0, | ||
| "reasoning_tokens": 0, | ||
| "rejected_prediction_tokens": 0 | ||
| }, | ||
| "prompt_tokens": 27, | ||
| "prompt_tokens_details": { | ||
| "audio_tokens": 0, | ||
| "cached_tokens": 0, | ||
| "image_tokens": 0, | ||
| "text_tokens": 0 | ||
| }, | ||
| "total_tokens": 40 | ||
| } | ||
| } |
There was a problem hiding this comment.
Update contract docs for the replay-harness migration.
This PR changes the contract testing workflow; please ensure tests/contract/README.md is updated with replay/record/validation steps so contributors follow the new process consistently.
Based on learnings: Applies to tests/contract/README.md : Update tests/contract/README.md when modifying contract testing procedures or golden file validation processes.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/openai/chat_with_params.golden.json` around
lines 1 - 34, Update the contract testing README to reflect the new
replay-harness workflow: add concise, ordered steps for recording new
interactions, replaying tests, and validating golden files (including the
command-line invocations, required env vars, and CI expectations), document how
to regenerate/update golden files such as chat_with_params.golden.json and where
to place them, and include troubleshooting notes about common failures and the
expected validation output so contributors follow the new process consistently.
There was a problem hiding this comment.
Actionable comments posted: 9
♻️ Duplicate comments (6)
tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json (1)
25-30:⚠️ Potential issue | 🟡 MinorFix inconsistent prompt token details.
prompt_tokensis13, butprompt_tokens_detailscurrently sums to0.Suggested fix
"prompt_tokens": 13, "prompt_tokens_details": { "audio_tokens": 0, "cached_tokens": 0, "image_tokens": 0, - "text_tokens": 0 + "text_tokens": 13 },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json` around lines 25 - 30, The JSON has inconsistent token counts: "prompt_tokens" is 13 but "prompt_tokens_details" (audio_tokens, cached_tokens, image_tokens, text_tokens) sums to 0; update the "prompt_tokens_details" fields so their summed value equals 13 (or if the detailed breakdown is unknown, set "text_tokens": 13 and others 0) to keep the top-level "prompt_tokens" consistent with the detailed keys..github/workflows/test.yml (1)
88-102:⚠️ Potential issue | 🟡 MinorPlease include the contract-testing procedure update in
tests/contract/README.md.This PR changes contributor-facing contract execution flow in CI; the README should be updated in the same change set.
Based on learnings: Applies to tests/contract/README.md : Update
tests/contract/README.mdwhen modifying contract testing procedures or golden file validation processes🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/test.yml around lines 88 - 102, Update tests/contract/README.md to reflect the changed CI contract-testing flow introduced by the test-contract job: document the exact command and flags used in CI (go test -v -tags=contract -timeout=5m ./tests/contract/...), any required environment variables (e.g., GO_VERSION) or setup steps, and the updated contributor workflow for running contract replay tests locally; also describe the golden file validation process and how to regenerate or approve golden files so contributors know how to reproduce CI results and fix failures locally.tests/contract/testdata/golden/openai/chat_multimodal.golden.json (1)
13-13:⚠️ Potential issue | 🟡 MinorNormalize volatile completion IDs in this golden fixture.
The concrete
chatcmpl-*value is run-specific and will create avoidable golden churn when fixtures are re-recorded.Suggested stabilization
- "id": "chatcmpl-D0qUlBEghKIASKSVRE4fnjs5QUDGF", + "id": "chatcmpl_<generated>",🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/testdata/golden/openai/chat_multimodal.golden.json` at line 13, Replace the run-specific "id" value that starts with "chatcmpl-" in the golden fixture by normalizing it to a stable placeholder (e.g., "chatcmpl-<normalized>" or "<CHAT_COMPLETION_ID>") in tests/contract/testdata/golden/openai/chat_multimodal.golden.json; update the JSON "id" field instead of keeping the concrete run-specific value so re-records won’t change the golden, and if the test harness performs strict compares ensure it treats this placeholder as the canonical ID (or add normalization logic around the comparison that strips or masks the "chatcmpl-*" pattern).tests/contract/testdata/golden/gemini/responses_stream.golden.json (1)
26-39:⚠️ Potential issue | 🟠 MajorAdd missing
usagefield toresponse.completedevent.The
response.completedevent lacks usage metrics (input_tokens, output_tokens, total_tokens), which other providers include in streaming responses. This creates inconsistency with Anthropic/OpenAI streaming fixtures and Gemini's own non-streaming fixtures.Proposed fix
"Payload": { "response": { "created_at": 0, "id": "resp_\u003cgenerated\u003e", "model": "gemini-2.5-flash", "object": "response", "provider": "gemini", - "status": "completed" + "status": "completed", + "usage": { + "input_tokens": 0, + "output_tokens": 0, + "total_tokens": 0 + } },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/testdata/golden/gemini/responses_stream.golden.json` around lines 26 - 39, The golden fixture for the "response.completed" streaming event is missing the "usage" metrics; update the "response.completed" event inside responses_stream.golden.json so the payload.response includes a "usage" object with numeric fields "input_tokens", "output_tokens", and "total_tokens" (match values/format used by other providers' streaming fixtures and Gemini non-streaming fixtures) to keep streaming responses consistent.tests/contract/xai_test.go (1)
127-136:⚠️ Potential issue | 🟡 MinorAdd
[DONE]termination assertion toTestXAIReplayStreamResponses.Unlike the Groq and Anthropic streaming response tests, this test does not verify that the stream terminates with the
[DONE]marker. This inconsistency could allow regressions in stream termination behavior to go undetected.Proposed fix
raw := readAllStream(t, stream) events := parseResponsesStream(t, raw) require.True(t, hasResponsesEvent(events, "response.created")) require.True(t, hasResponsesEvent(events, "response.output_text.delta")) require.True(t, hasResponsesEvent(events, "response.completed")) + + hasDone := false + for _, event := range events { + if event.Done { + hasDone = true + break + } + } + require.True(t, hasDone, "responses stream should terminate with [DONE]") compareGoldenJSON(t, goldenPathForFixture("xai/responses_stream.txt"), map[string]any{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/xai_test.go` around lines 127 - 136, The test reads the full stream into raw but doesn't assert the stream termination marker; modify the test (TestXAIReplayStreamResponses) to assert the stream ends with the expected "[DONE]" marker by checking raw (or the final parsed event) for the "[DONE]" terminator (e.g., use strings.HasSuffix(raw, "[DONE]") or verify the last element of events equals the termination marker) and add a require.True/assertion using raw or events to fail the test if the marker is missing.tests/contract/openai_test.go (1)
136-148:⚠️ Potential issue | 🟡 MinorAdd
[DONE]termination assertion for consistency with Gemini tests.The Gemini streaming responses test validates that the stream terminates with a
[DONE]marker (seeTestGeminiReplayStreamResponseslines 115-122), but this test does not include the same validation. This could miss regressions in stream termination behavior.🛡️ Suggested fix to add termination check
require.True(t, hasResponsesEvent(events, "response.created")) require.True(t, hasResponsesEvent(events, "response.output_text.delta")) require.True(t, hasResponsesEvent(events, "response.completed")) require.NotEmpty(t, extractResponsesStreamText(events)) + hasDone := false + for _, event := range events { + if event.Done { + hasDone = true + break + } + } + require.True(t, hasDone, "responses stream should terminate with [DONE]") + compareGoldenJSON(t, "openai/responses_stream.golden.json", map[string]any{🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/contract/openai_test.go` around lines 136 - 148, After obtaining the raw stream in tests/contract/openai_test.go (variable raw produced by readAllStream) add an assertion that the stream termination marker is present for consistency with Gemini tests: check that raw ends with or contains the "[DONE]" marker (e.g., assert strings.HasSuffix(raw, "[DONE]\n") or require.Contains(raw, "[DONE]")) before parsing and comparing events (so keep the existing calls to parseResponsesStream, hasResponsesEvent, extractResponsesStreamText, and compareGoldenJSON).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/providers/gemini/gemini.go`:
- Around line 167-174: The current headers closure adds the API key as a query
parameter (q.Set("key", p.apiKey)), which exposes credentials in logs; update
the headers function used in gemini.go to stop writing p.apiKey to
req.URL.RawQuery and instead set the API key on the request header using the
x-goog-api-key header (e.g., call req.Header.Set("x-goog-api-key", p.apiKey));
ensure you remove the query parameter logic so only header-based authentication
is used for the Gemini models endpoint.
In `@tests/contract/gemini_test.go`:
- Around line 83-96: TestGeminiReplayResponses lacks a pre-check that the golden
file exists; add a call to goldenFileExists(t, "gemini/responses.golden.json")
at the start of TestGeminiReplayResponses (before creating the provider and
calling provider.Responses) similar to TestOpenAIReplayResponses so the test
fails with a clear message if the golden file is missing; keep
compareGoldenJSON(t, "gemini/responses.golden.json", resp) as-is.
In `@tests/contract/replay_harness_test.go`:
- Around line 140-143: The test's SSE parsing is stripping significant
whitespace from `data:` payloads by using TrimSpace; change the `data:` branch
to preserve raw payload bytes by replacing dataLines = append(dataLines,
strings.TrimSpace(strings.TrimPrefix(line, "data:"))) with appending the result
of only removing the "data:" prefix (e.g. strings.TrimPrefix(line, "data:") or
line[len("data:"):]) so leading/trailing spaces are kept; leave the `event:`
handling (`currentName = strings.TrimSpace(strings.TrimPrefix(line, "event:"))`)
unchanged unless events need different treatment.
- Around line 38-42: The current fallback logic in the request matching (the
block that checks req.URL.RawQuery and then sets key = replayKey(req.Method,
req.URL.Path) and looks up rt.routes[key]) masks regressions by allowing a
path-only route to match when query parameters differ; remove this fallback so
matching uses the full RequestURI/RawQuery and fails when the query is missing
or different. Update the matching code around replayKey, req.URL.RawQuery,
req.URL.Path and rt.routes to require the original key (including query) and do
not attempt the path-only lookup; if desired, add an explicit test or flag to
opt into query-agnostic matching rather than silently falling back.
In `@tests/contract/testdata/golden/groq/responses_stream.golden.json`:
- Around line 54-60: Normalize the volatile timing fields in the usage payload
by replacing or rounding completion_time, prompt_time, queue_time, and
total_time to a stable value (e.g., 0.0 or a fixed sentinel) before
serializing/generating the golden file; leave token counts (completion_tokens,
prompt_tokens, total_tokens) unchanged. Locate the code path that builds or
serializes the usage payload (the function that emits the JSON containing keys
completion_time, prompt_time, queue_time, total_time) and apply the
normalization there so tests always produce deterministic timing values. Ensure
the change only affects those timing fields and not token metrics or other
payload data.
In `@tests/contract/testdata/golden/openai/chat_completion_stream.golden.json`:
- Around line 17-23: Replace concrete volatile metadata values in each chat
completion chunk by normalizing the "id", "obfuscation", and
"system_fingerprint" fields to stable placeholders (e.g. "<ID>",
"<OBFUSCATION>", "<SYSTEM_FINGERPRINT>") so golden snapshots do not churn;
update every chunk instance (the examples around the shown chunk and the ones at
the other ranges) to use these placeholders or a deterministic generator used by
your test harness that returns stable values for "id", "obfuscation", and
"system_fingerprint".
In `@tests/contract/testdata/golden/openai/chat_multi_turn.golden.json`:
- Line 13: The golden fixture contains a concrete completion ID ("id":
"chatcmpl-D0qOykS3WniK2xU97yFf96Aa4Me8d") which makes tests brittle; replace
that value with a sanitized/deterministic placeholder (e.g.,
"chatcmpl-<SANITIZED>" or "chatcmpl-0000000000") so the "id" field in
tests/contract/testdata/golden/openai/chat_multi_turn.golden.json is stable
across re-recordings and does not contain real/variable IDs.
In `@tests/contract/testdata/golden/openai/responses_stream.golden.json`:
- Line 146: Replace run-specific volatile fields in the stream fixture by
normalizing the "obfuscation" and "completed_at" JSON keys to stable sentinel
values (e.g., fixed string like "REDACTED" or a constant timestamp) wherever
they appear (including the instances at the noted locations), so golden
comparisons are stable; update the entries that currently contain dynamic tokens
(the "obfuscation" values and any "completed_at" timestamps) to the chosen
normalized value and ensure any test helpers that generate fixtures follow the
same normalization convention.
In `@tests/contract/testdata/golden/xai/chat_completion.golden.json`:
- Line 13: Replace the concrete UUID value assigned to the JSON "id" field in
the golden fixture with a stable placeholder to make the snapshot deterministic;
locate the "id" key in the chat completion golden fixture and substitute the
UUID with a fixed token such as 00000000-0000-0000-0000-000000000000 or a clear
placeholder like <ID_PLACEHOLDER>, and ensure any tests or snapshot update logic
expect this placeholder instead of a variable UUID.
---
Duplicate comments:
In @.github/workflows/test.yml:
- Around line 88-102: Update tests/contract/README.md to reflect the changed CI
contract-testing flow introduced by the test-contract job: document the exact
command and flags used in CI (go test -v -tags=contract -timeout=5m
./tests/contract/...), any required environment variables (e.g., GO_VERSION) or
setup steps, and the updated contributor workflow for running contract replay
tests locally; also describe the golden file validation process and how to
regenerate or approve golden files so contributors know how to reproduce CI
results and fix failures locally.
In `@tests/contract/openai_test.go`:
- Around line 136-148: After obtaining the raw stream in
tests/contract/openai_test.go (variable raw produced by readAllStream) add an
assertion that the stream termination marker is present for consistency with
Gemini tests: check that raw ends with or contains the "[DONE]" marker (e.g.,
assert strings.HasSuffix(raw, "[DONE]\n") or require.Contains(raw, "[DONE]"))
before parsing and comparing events (so keep the existing calls to
parseResponsesStream, hasResponsesEvent, extractResponsesStreamText, and
compareGoldenJSON).
In `@tests/contract/testdata/golden/gemini/responses_stream.golden.json`:
- Around line 26-39: The golden fixture for the "response.completed" streaming
event is missing the "usage" metrics; update the "response.completed" event
inside responses_stream.golden.json so the payload.response includes a "usage"
object with numeric fields "input_tokens", "output_tokens", and "total_tokens"
(match values/format used by other providers' streaming fixtures and Gemini
non-streaming fixtures) to keep streaming responses consistent.
In `@tests/contract/testdata/golden/openai/chat_completion_reasoning.golden.json`:
- Around line 25-30: The JSON has inconsistent token counts: "prompt_tokens" is
13 but "prompt_tokens_details" (audio_tokens, cached_tokens, image_tokens,
text_tokens) sums to 0; update the "prompt_tokens_details" fields so their
summed value equals 13 (or if the detailed breakdown is unknown, set
"text_tokens": 13 and others 0) to keep the top-level "prompt_tokens" consistent
with the detailed keys.
In `@tests/contract/testdata/golden/openai/chat_multimodal.golden.json`:
- Line 13: Replace the run-specific "id" value that starts with "chatcmpl-" in
the golden fixture by normalizing it to a stable placeholder (e.g.,
"chatcmpl-<normalized>" or "<CHAT_COMPLETION_ID>") in
tests/contract/testdata/golden/openai/chat_multimodal.golden.json; update the
JSON "id" field instead of keeping the concrete run-specific value so re-records
won’t change the golden, and if the test harness performs strict compares ensure
it treats this placeholder as the canonical ID (or add normalization logic
around the comparison that strips or masks the "chatcmpl-*" pattern).
In `@tests/contract/xai_test.go`:
- Around line 127-136: The test reads the full stream into raw but doesn't
assert the stream termination marker; modify the test
(TestXAIReplayStreamResponses) to assert the stream ends with the expected
"[DONE]" marker by checking raw (or the final parsed event) for the "[DONE]"
terminator (e.g., use strings.HasSuffix(raw, "[DONE]") or verify the last
element of events equals the termination marker) and add a
require.True/assertion using raw or events to fail the test if the marker is
missing.
ℹ️ Review info
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (59)
.github/workflows/test.ymlcmd/recordapi/main.godocs/TESTING_STRATEGY.mdinternal/core/types.gointernal/providers/anthropic/anthropic.gointernal/providers/gemini/gemini.gotests/contract/README.mdtests/contract/anthropic_test.gotests/contract/gemini_test.gotests/contract/golden_helpers_test.gotests/contract/groq_test.gotests/contract/openai_test.gotests/contract/replay_harness_test.gotests/contract/replay_provider_helpers_test.gotests/contract/testdata/golden/anthropic/messages.golden.jsontests/contract/testdata/golden/anthropic/messages_extended_thinking.golden.jsontests/contract/testdata/golden/anthropic/messages_multi_turn.golden.jsontests/contract/testdata/golden/anthropic/messages_multimodal.golden.jsontests/contract/testdata/golden/anthropic/messages_stream.golden.jsontests/contract/testdata/golden/anthropic/messages_with_params.golden.jsontests/contract/testdata/golden/anthropic/messages_with_tools.golden.jsontests/contract/testdata/golden/anthropic/responses.golden.jsontests/contract/testdata/golden/anthropic/responses_stream.golden.jsontests/contract/testdata/golden/gemini/chat_completion.golden.jsontests/contract/testdata/golden/gemini/chat_completion_stream.golden.jsontests/contract/testdata/golden/gemini/chat_with_params.golden.jsontests/contract/testdata/golden/gemini/chat_with_tools.golden.jsontests/contract/testdata/golden/gemini/models.golden.jsontests/contract/testdata/golden/gemini/responses.golden.jsontests/contract/testdata/golden/gemini/responses_stream.golden.jsontests/contract/testdata/golden/groq/chat_completion.golden.jsontests/contract/testdata/golden/groq/chat_completion_stream.golden.jsontests/contract/testdata/golden/groq/chat_with_params.golden.jsontests/contract/testdata/golden/groq/chat_with_tools.golden.jsontests/contract/testdata/golden/groq/models.golden.jsontests/contract/testdata/golden/groq/responses.golden.jsontests/contract/testdata/golden/groq/responses_stream.golden.jsontests/contract/testdata/golden/openai/chat_completion.golden.jsontests/contract/testdata/golden/openai/chat_completion_reasoning.golden.jsontests/contract/testdata/golden/openai/chat_completion_stream.golden.jsontests/contract/testdata/golden/openai/chat_json_mode.golden.jsontests/contract/testdata/golden/openai/chat_multi_turn.golden.jsontests/contract/testdata/golden/openai/chat_multimodal.golden.jsontests/contract/testdata/golden/openai/chat_with_params.golden.jsontests/contract/testdata/golden/openai/chat_with_tools.golden.jsontests/contract/testdata/golden/openai/models.golden.jsontests/contract/testdata/golden/openai/responses.golden.jsontests/contract/testdata/golden/openai/responses_stream.golden.jsontests/contract/testdata/golden/xai/chat_completion.golden.jsontests/contract/testdata/golden/xai/chat_completion_stream.golden.jsontests/contract/testdata/golden/xai/chat_with_params.golden.jsontests/contract/testdata/golden/xai/models.golden.jsontests/contract/testdata/golden/xai/responses.golden.jsontests/contract/testdata/golden/xai/responses_stream.golden.jsontests/contract/testdata/openai/responses.jsontests/contract/testdata/openai/responses_stream.txttests/contract/testdata/xai/responses.jsontests/contract/testdata/xai/responses_stream.txttests/contract/xai_test.go
| "completion_time": 0.027554055, | ||
| "completion_tokens": 4, | ||
| "prompt_time": 0.003210785, | ||
| "prompt_tokens": 38, | ||
| "queue_time": 0.090321431, | ||
| "total_time": 0.03076484, | ||
| "total_tokens": 42 |
There was a problem hiding this comment.
Normalize volatile latency metrics in usage payload.
Timing values are environment-dependent and can create unnecessary golden diffs.
Suggested stabilization
- "completion_time": 0.027554055,
+ "completion_time": 0,
...
- "prompt_time": 0.003210785,
+ "prompt_time": 0,
...
- "queue_time": 0.090321431,
+ "queue_time": 0,
...
- "total_time": 0.03076484,
+ "total_time": 0,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "completion_time": 0.027554055, | |
| "completion_tokens": 4, | |
| "prompt_time": 0.003210785, | |
| "prompt_tokens": 38, | |
| "queue_time": 0.090321431, | |
| "total_time": 0.03076484, | |
| "total_tokens": 42 | |
| "completion_time": 0, | |
| "completion_tokens": 4, | |
| "prompt_time": 0, | |
| "prompt_tokens": 38, | |
| "queue_time": 0, | |
| "total_time": 0, | |
| "total_tokens": 42 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/groq/responses_stream.golden.json` around
lines 54 - 60, Normalize the volatile timing fields in the usage payload by
replacing or rounding completion_time, prompt_time, queue_time, and total_time
to a stable value (e.g., 0.0 or a fixed sentinel) before serializing/generating
the golden file; leave token counts (completion_tokens, prompt_tokens,
total_tokens) unchanged. Locate the code path that builds or serializes the
usage payload (the function that emits the JSON containing keys completion_time,
prompt_time, queue_time, total_time) and apply the normalization there so tests
always produce deterministic timing values. Ensure the change only affects those
timing fields and not token metrics or other payload data.
| "id": "chatcmpl-D0q8CZXN1uEzngZGkU7FuV92lnXWq", | ||
| "model": "gpt-4o-mini-2024-07-18", | ||
| "obfuscation": "iPgRzm", | ||
| "object": "chat.completion.chunk", | ||
| "service_tier": "default", | ||
| "system_fingerprint": "fp_29330a9688" | ||
| }, |
There was a problem hiding this comment.
Sanitize volatile stream metadata across all chunks.
Concrete id, obfuscation, and system_fingerprint fields are unstable and likely to churn on re-record.
Suggested normalization pattern (apply to each chunk)
- "id": "chatcmpl-D0q8CZXN1uEzngZGkU7FuV92lnXWq",
+ "id": "chatcmpl_<generated>",
...
- "obfuscation": "iPgRzm",
+ "obfuscation": "<generated>",
...
- "system_fingerprint": "fp_29330a9688"
+ "system_fingerprint": "fp_<generated>"Also applies to: 36-42, 55-61, 74-80, 93-99, 110-116
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/openai/chat_completion_stream.golden.json`
around lines 17 - 23, Replace concrete volatile metadata values in each chat
completion chunk by normalizing the "id", "obfuscation", and
"system_fingerprint" fields to stable placeholders (e.g. "<ID>",
"<OBFUSCATION>", "<SYSTEM_FINGERPRINT>") so golden snapshots do not churn;
update every chunk instance (the examples around the shown chunk and the ones at
the other ranges) to use these placeholders or a deterministic generator used by
your test harness that returns stable values for "id", "obfuscation", and
"system_fingerprint".
| } | ||
| ], | ||
| "created": 0, | ||
| "id": "chatcmpl-D0qOykS3WniK2xU97yFf96Aa4Me8d", |
There was a problem hiding this comment.
Use a sanitized completion ID in golden data.
A concrete completion ID makes this fixture unstable across re-recordings.
Suggested stabilization
- "id": "chatcmpl-D0qOykS3WniK2xU97yFf96Aa4Me8d",
+ "id": "chatcmpl_<generated>",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "id": "chatcmpl-D0qOykS3WniK2xU97yFf96Aa4Me8d", | |
| "id": "chatcmpl_<generated>", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/openai/chat_multi_turn.golden.json` at line
13, The golden fixture contains a concrete completion ID ("id":
"chatcmpl-D0qOykS3WniK2xU97yFf96Aa4Me8d") which makes tests brittle; replace
that value with a sanitized/deterministic placeholder (e.g.,
"chatcmpl-<SANITIZED>" or "chatcmpl-0000000000") so the "id" field in
tests/contract/testdata/golden/openai/chat_multi_turn.golden.json is stable
across re-recordings and does not contain real/variable IDs.
| "delta": "Hello", | ||
| "item_id": "msg_\u003cgenerated\u003e", | ||
| "logprobs": [], | ||
| "obfuscation": "fIVBptC5V5x", |
There was a problem hiding this comment.
Sanitize volatile timestamp/obfuscation fields in stream fixtures.
obfuscation tokens and completed_at are run-specific; keep them normalized for stable golden comparisons.
Suggested normalization
- "obfuscation": "fIVBptC5V5x",
+ "obfuscation": "<generated>",
...
- "completed_at": 1772295220,
+ "completed_at": 0,Also applies to: 160-160, 174-174, 188-188, 253-253
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/openai/responses_stream.golden.json` at line
146, Replace run-specific volatile fields in the stream fixture by normalizing
the "obfuscation" and "completed_at" JSON keys to stable sentinel values (e.g.,
fixed string like "REDACTED" or a constant timestamp) wherever they appear
(including the instances at the noted locations), so golden comparisons are
stable; update the entries that currently contain dynamic tokens (the
"obfuscation" values and any "completed_at" timestamps) to the chosen normalized
value and ensure any test helpers that generate fixtures follow the same
normalization convention.
| } | ||
| ], | ||
| "created": 0, | ||
| "id": "2ac1119b-8e3f-969e-1c85-9a6dd6d91720", |
There was a problem hiding this comment.
Normalize the fixture response ID.
The concrete UUID should be replaced with a stable placeholder for deterministic golden updates.
Suggested stabilization
- "id": "2ac1119b-8e3f-969e-1c85-9a6dd6d91720",
+ "id": "<generated>",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| "id": "2ac1119b-8e3f-969e-1c85-9a6dd6d91720", | |
| "id": "<generated>", |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/contract/testdata/golden/xai/chat_completion.golden.json` at line 13,
Replace the concrete UUID value assigned to the JSON "id" field in the golden
fixture with a stable placeholder to make the snapshot deterministic; locate the
"id" key in the chat completion golden fixture and substitute the UUID with a
fixed token such as 00000000-0000-0000-0000-000000000000 or a clear placeholder
like <ID_PLACEHOLDER>, and ensure any tests or snapshot update logic expect this
placeholder instead of a variable UUID.
Summary by CodeRabbit
New Features
Tests
Documentation