Skip to content

refactor: introduced execution plan and decoupling#148

Merged
SantiagoDePolonia merged 12 commits intomainfrom
refactor/architecture-improvements
Mar 16, 2026
Merged

refactor: introduced execution plan and decoupling#148
SantiagoDePolonia merged 12 commits intomainfrom
refactor/architecture-improvements

Conversation

@SantiagoDePolonia
Copy link
Copy Markdown
Contributor

@SantiagoDePolonia SantiagoDePolonia commented Mar 15, 2026

Summary

Describe what changed and why.

PR Title Format (required)

Use Conventional Commits in the PR title:

type(scope): short summary

Examples:

  • feat(cache): add Redis-backed model cache
  • fix(streaming): flush done marker in SSE
  • docs(config): clarify provider auto-discovery

Allowed type values:

  • feat
  • fix
  • perf
  • docs
  • refactor
  • test
  • build
  • ci
  • chore
  • revert

Breaking changes:

  • Add ! before : (example: feat(api)!: remove legacy endpoint)

Release Notes

  • User-facing work should use feat, fix, perf, or docs.
  • Internal-only work (test, ci, build, chore, many refactors) is auto-labeled and excluded from release notes.
  • Use release:skip label to explicitly exclude an item from release notes.

Summary by CodeRabbit

  • New Features

    • Central execution planning with passthrough semantic enrichment; new passthrough endpoint flow.
    • Native batch and native file endpoints with batch-preparation pipeline and guardrails integration.
    • Model resolver, exposed model listing, and streaming-friendly file uploads.
  • Refactor

    • Server handlers and middleware reorganized around execution-plan-driven flows for consistent request processing.
  • Bug Fixes

    • Cache key and skip logic tightened to respect execution plans; audit logs now include plan-derived metadata and request IDs.
  • Documentation

    • Added extensive end-to-end scenarios and a new architecture snapshot; legacy architecture doc removed.

@github-actions github-actions bot added the release:internal Internal changes excluded from release notes label Mar 15, 2026
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 15, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c89d90da-58c7-46ba-ad03-aa8a0e1b9db1

📥 Commits

Reviewing files that changed from the base of the PR and between 9890a51 and c59ffcb.

📒 Files selected for processing (3)
  • internal/admin/dashboard/static/css/dashboard.css
  • internal/admin/dashboard/static/js/modules/contribution-calendar.js
  • internal/admin/dashboard/templates/index.html

📝 Walkthrough

Walkthrough

Replaces per-request RequestModelResolution with a new ExecutionPlan abstraction and migrates middleware, context helpers, audit enrichment, cache keying, handler/service flows, passthrough semantic enrichment, native batch/file services, batch preparers, provider/alias wiring, and many tests to an execution-plan-first model.

Changes

Cohort / File(s) Summary
Core: ExecutionPlan & Operation types
internal/core/execution_plan.go, internal/core/endpoints.go, internal/core/batch_semantic.go, internal/core/semantic.go, internal/core/semantic_canonical.go, internal/core/*_test.go
Add ExecutionPlan, ExecutionMode, CapabilitySet, typed Operation enum, PassthroughRouteInfo, stream hint caching and accessors; migrate semantic codecs/tests to typed Operation.
Context helpers
internal/core/context.go
Replace RequestModelResolution context helpers with WithExecutionPlan / GetExecutionPlan storing *core.ExecutionPlan.
Audit logging & middleware
internal/auditlog/middleware.go, internal/auditlog/auditlog_test.go
Middleware now reads ExecutionPlan; rename enrichment API to EnrichEntryWithExecutionPlan; update tests and add ExecutionPlan-specific cases.
Execution planning & resolution
internal/server/model_validation.go, internal/server/execution_plan_helpers.go, internal/server/request_model_resolution.go, internal/server/execution_plan_helpers_test.go
Introduce ExecutionPlanning/ExecutionPlanningWithResolver, plan derivation/storage, RequestModelResolver interface, and helpers to build/ensure translated/native plans and propagate resolutions into plans.
Server wiring, handlers & services
internal/server/http.go, internal/server/handlers.go, internal/server/translated_inference_service.go, internal/server/native_batch_service.go, internal/server/native_file_service.go, internal/server/passthrough_service.go
Refactor handlers to delegate to new services (translatedInference, nativeBatch, nativeFiles, passthrough); inject ModelResolver, TranslatedRequestPatcher, BatchRequestPreparer, ExposedModelLister; implement plan-driven flows including streaming, usage, batch/file lifecycle and passthrough.
Passthrough semantic enrichment
internal/server/passthrough_semantic_enrichment.go, internal/providers/*/passthrough_semantics.go, internal/providers/factory.go
Add PassthroughSemanticEnricher interface, provider-specific enrichers, factory accessor, middleware that enriches/caches PassthroughRouteInfo.
Aliases & batch preparation
internal/aliases/batch_preparer.go, internal/aliases/provider.go, internal/aliases/service.go, internal/aliases/*_test.go
Add alias-aware BatchPreparer, provider options to disable translated/native batch processing, PrepareBatchRequest API, alias-model resolution helpers and tests.
Batch preparer composition
internal/server/batch_request_preparer.go, internal/server/batch_request_preparer_test.go
Add BatchRequestPreparer interface and composer to chain preparers with cleanup/merge semantics and tests.
Response cache & middleware
internal/responsecache/simple.go, internal/responsecache/middleware_test.go
Change cache key generation to accept *core.ExecutionPlan (include Mode, ProviderType, ResolvedQualifiedModel), add skip-checks for certain plans; update/add tests.
Guardrails integration & executors
internal/guardrails/*
Add RequestPatcher and BatchPreparer abstractions, guarded provider options to opt out of translated processing, and guardrails batch/chat patching implementations.
New server utilities & support
internal/server/*_support.go, internal/server/error_support.go, internal/server/passthrough_support.go, internal/server/stream_support.go
Add centralized error handling, request ID helpers, passthrough header/audit helpers, SSE/stream flush helper, and passthrough execution helpers with tests.
Native batch/file services
internal/server/native_batch_support.go, internal/server/native_batch_service.go, internal/server/native_file_service.go
Add end-to-end native batch and file services: provider resolution, input rewrite lifecycle, usage extraction, store integration, and many helpers.
Providers & factory
internal/providers/*, internal/providers/factory_test.go
Register passthrough enrichers for providers, factory accessor for enrichers, and tests; add ContentReader support to file creation helpers.
I/O & file APIs
internal/core/files.go, internal/providers/file_adapter_openai_compat.go, internal/providers/router_test.go
Add FileCreateRequest.ContentReader support and stream multipart upload construction; update CreateFile test helpers to respect ContentReader.
Tests & refactors
many internal/*_test.go files
Widespread test updates and new unit/integration tests covering ExecutionPlan flows, passthrough enrichment, batch preparers, guardrails, and streaming behaviors.
Docs & lint
2026-03-16_ARCHITECTURE_SNAPSHOT.md, ARCHITECTURE.md, tests/e2e/*, .golangci.yml
Add new architecture snapshot and E2E scenarios; remove old ARCHITECTURE.md; enable extra linters.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant HTTP_Server as "HTTP Server"
    participant ExecPlanner as "Execution Planner"
    participant Cache as "Response Cache"
    participant Service as "Handler/Service"
    participant Provider

    Client->>HTTP_Server: POST /... (request)
    HTTP_Server->>ExecPlanner: ExecutionPlanning -> deriveExecutionPlan
    ExecPlanner-->>HTTP_Server: store *ExecutionPlan in context
    HTTP_Server->>Cache: hashRequest(plan)
    alt cache hit
        Cache-->>HTTP_Server: cached response
        HTTP_Server-->>Client: return cached response
    else cache miss / skip
        HTTP_Server->>Service: handler reads ExecutionPlan
        Service->>Service: prepare/patch/rewrite (batch/translated/passthrough)
        Service->>Provider: submit request (uses plan-resolved model/provider)
        Provider-->>Service: response / stream
        Service->>HTTP_Server: EnrichEntryWithExecutionPlan(plan)
        HTTP_Server-->>Client: return response
    end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested labels

enhancement

Poem

🐰 I hopped through code from root to tree,

Plans in my pouch, models set free.
I patched the routes and rewrote the test,
Cleaned cache crumbs and did my best,
A tiny hop — big dev jubilee!

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/architecture-improvements

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/server/handlers.go (1)

571-590: 🧹 Nitpick | 🔵 Trivial

Extract the plan-derived provider/model selection into one helper.

ensureTranslatedRequestPlan already rewrites req.Model / req.Provider via internal/server/execution_plan_helpers.go Line 36-Line 37 and Line 77-Line 83, and GetProviderType(c) already unwraps plan.ProviderType in internal/server/model_validation.go Line 190-Line 193. Keeping these follow-up overrides in three handlers duplicates the same source of truth and makes future ExecutionPlan changes easy to miss in one path.

Also applies to: 1045-1059, 1101-1105

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/handlers.go` around lines 571 - 590, Extract the duplicated
logic that derives providerType and model from the ExecutionPlan into a single
helper (e.g., ResolveProviderAndModelFromPlan) and use it from the handlers that
currently reapply plan overrides (the providerType assignment using
GetProviderType(c) and the usageModel assignment using
plan.Resolution.ResolvedSelector.Model); the helper should accept the context,
Echo context or request, the plan and the ChatRequest (to handle stream-specific
cloning via cloneChatRequestForStreamUsage when req.Stream and
shouldEnforceReturningUsageData() apply) and return the final providerType and
model (and optionally the modified request for streaming). Replace the inline
overrides in the current handler block and the other two occurrences with calls
to this new helper so all paths use
ensureTranslatedRequestPlan/execution_plan_helpers.go as the single source of
truth.
internal/server/request_model_resolution.go (1)

57-68: ⚠️ Potential issue | 🟠 Major

Enrich the live audit entry when a late resolution updates an existing plan.

When this branch runs after ExecutionPlanning, the resolved ProviderType / Resolution only land in request context. Streaming handlers create the stream log during handler execution (internal/server/handlers.go Line 84-Line 90), while internal/auditlog/middleware.go Line 149-Line 152 skips the normal post-handler write for streams, so translated streams that resolve here can be logged without the resolved model/provider.

Suggested fix
 func storeRequestModelResolution(c *echo.Context, resolution *core.RequestModelResolution) {
 	if c == nil || resolution == nil {
 		return
 	}
 
 	ctx := c.Request().Context()
 	if plan := core.GetExecutionPlan(ctx); plan != nil {
 		cloned := *plan
 		cloned.ProviderType = resolution.ProviderType
 		cloned.Resolution = resolution
 		ctx = core.WithExecutionPlan(ctx, &cloned)
+		auditlog.EnrichEntryWithExecutionPlan(c, &cloned)
 	}
 	if env := core.GetWhiteBoxPrompt(ctx); env != nil {
 		env.RouteHints.Model = resolution.ResolvedSelector.Model
 		env.RouteHints.Provider = resolution.ResolvedSelector.Provider
 	}
 	c.SetRequest(c.Request().WithContext(ctx))
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/request_model_resolution.go` around lines 57 - 68, When you
clone and replace the ExecutionPlan in context (core.GetExecutionPlan /
core.WithExecutionPlan) and update whitebox hints (core.GetWhiteBoxPrompt), also
update the in-flight live audit entry so streamed handlers see the resolved
model/provider: after setting cloned.ProviderType and cloned.Resolution, look up
the active live audit entry (use the auditlog middleware's live-entry accessor
used by streaming handlers) and set its Provider and Model from
resolution.ResolvedSelector.Provider and resolution.ResolvedSelector.Model
(no-op if entry missing), ensuring the updated entry is stored/visible to
downstream streaming logging the same request context.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/core/execution_plan.go`:
- Around line 27-65: CapabilitiesForEndpoint currently switches on raw
EndpointDescriptor.Operation strings (function CapabilitiesForEndpoint, type
EndpointDescriptor) which duplicates literals from internal/core/endpoints.go
and risks silent mismatches; fix by centralizing operation identifiers into
typed constants (e.g., define an Operation type and consts in
internal/core/endpoints.go) or by adding a method on EndpointDescriptor (e.g.,
EndpointDescriptor.Capabilities() or EndpointDescriptor.OperationType()
returning a typed enum) and then update CapabilitiesForEndpoint to switch on
that typed constant or simply delegate to the new
EndpointDescriptor.Capabilities() so capability logic is derived from a single
source of truth.

In `@internal/responsecache/simple.go`:
- Around line 59-60: The cache middleware computes the cache key using
hashRequest(path, body, core.GetExecutionPlan(c.Request().Context())) before
calling next(c), which lets ExecutionModeTranslated requests with a partial plan
(plan.Resolution == nil) produce incomplete keys and poison cache entries;
update the middleware to detect the execution plan via
core.GetExecutionPlan(...), and if the plan is a translated request with
plan.Resolution == nil (i.e., ensureTranslatedRequestPlan has not run yet), skip
computing the key and skip cache reads/writes—simply call next(c) and return;
apply the same guard to the second occurrence of this logic (the block around
the other hashRequest use).

In `@internal/server/model_validation_test.go`:
- Around line 237-241: The test is instantiating ModelValidation(provider) but
the server registers ExecutionPlanning(provider); update the test to construct
the same middleware used by the server by replacing ModelValidation(provider)
with ExecutionPlanning(provider) so the test exercises the actual entrypoint;
ensure the created middleware variable (currently named middleware) and the
wrapped handler that captures core.GetExecutionPlan(...) remain unchanged so the
test semantics are preserved while using ExecutionPlanning.

---

Outside diff comments:
In `@internal/server/handlers.go`:
- Around line 571-590: Extract the duplicated logic that derives providerType
and model from the ExecutionPlan into a single helper (e.g.,
ResolveProviderAndModelFromPlan) and use it from the handlers that currently
reapply plan overrides (the providerType assignment using GetProviderType(c) and
the usageModel assignment using plan.Resolution.ResolvedSelector.Model); the
helper should accept the context, Echo context or request, the plan and the
ChatRequest (to handle stream-specific cloning via
cloneChatRequestForStreamUsage when req.Stream and
shouldEnforceReturningUsageData() apply) and return the final providerType and
model (and optionally the modified request for streaming). Replace the inline
overrides in the current handler block and the other two occurrences with calls
to this new helper so all paths use
ensureTranslatedRequestPlan/execution_plan_helpers.go as the single source of
truth.

In `@internal/server/request_model_resolution.go`:
- Around line 57-68: When you clone and replace the ExecutionPlan in context
(core.GetExecutionPlan / core.WithExecutionPlan) and update whitebox hints
(core.GetWhiteBoxPrompt), also update the in-flight live audit entry so streamed
handlers see the resolved model/provider: after setting cloned.ProviderType and
cloned.Resolution, look up the active live audit entry (use the auditlog
middleware's live-entry accessor used by streaming handlers) and set its
Provider and Model from resolution.ResolvedSelector.Provider and
resolution.ResolvedSelector.Model (no-op if entry missing), ensuring the updated
entry is stored/visible to downstream streaming logging the same request
context.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: c237187c-4dad-4a68-ba8d-0c21c4b3fc52

📥 Commits

Reviewing files that changed from the base of the PR and between c81dd2a and 45d3b3f.

📒 Files selected for processing (14)
  • internal/auditlog/auditlog_test.go
  • internal/auditlog/middleware.go
  • internal/core/context.go
  • internal/core/execution_plan.go
  • internal/responsecache/middleware_test.go
  • internal/responsecache/simple.go
  • internal/server/execution_plan_helpers.go
  • internal/server/execution_plan_helpers_test.go
  • internal/server/handlers.go
  • internal/server/handlers_test.go
  • internal/server/http.go
  • internal/server/model_validation.go
  • internal/server/model_validation_test.go
  • internal/server/request_model_resolution.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 14

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
internal/core/batch_semantic.go (1)

15-15: 🧹 Nitpick | 🔵 Trivial

Consider typing Operation field for consistency.

The DecodedBatchItemRequest.Operation field remains a string while the rest of the codebase now uses the typed Operation. This requires explicit string() casts at lines 155 and 188. Consider changing the field type to Operation to maintain type consistency across the batch semantic layer.

♻️ Optional: Type the Operation field
 // DecodedBatchItemRequest is the canonical decode result for known JSON batch subrequests.
 type DecodedBatchItemRequest struct {
 	Endpoint  string
 	Method    string
-	Operation string
+	Operation Operation
 	Request   any
 }

This would require updating canonicalOperationCodecFor to accept Operation and adjusting the error message formatting at line 197.

Also applies to: 155-155, 188-188

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/core/batch_semantic.go` at line 15, Change
DecodedBatchItemRequest.Operation from string to the typed Operation and update
any call sites that currently cast it with string() (these are the spots around
where items are decoded and used). Update canonicalOperationCodecFor to accept
an Operation instead of string and adjust its usage and tests accordingly. Also
update the error formatting that builds messages from the operation (previously
using string()) so it formats the Operation value directly. Ensure all places
constructing/reading DecodedBatchItemRequest are updated to use the Operation
type.
internal/server/handlers.go (1)

292-307: ⚠️ Potential issue | 🟡 Minor

Document the streaming variant of /v1/responses.

translatedInferenceService.Responses now emits SSE when stream=true, but this Swagger block still advertises JSON only. Generated clients and API docs will miss text/event-stream and the streaming success description.

Suggested fix
 // `@Accept`       json
 // `@Produce`      json
+// `@Produce`      text/event-stream
 // `@Security`     BearerAuth
 // `@Param`        request  body      core.ResponsesRequest  true  "Responses API request"
-// `@Success`      200      {object}  core.ResponsesResponse
+// `@Success`      200      {object}  core.ResponsesResponse  "JSON response or SSE stream when stream=true"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/handlers.go` around lines 292 - 307, Update the Swagger
comment above the Handler.Responses function to document the streaming variant:
add "@Produce text/event-stream" alongside the existing "@Produce json", and add
a streaming success annotation (e.g., an additional "@Success 200 {string}
string \"SSE stream (text/event-stream) when stream=true\"") or otherwise
clarify the 200 response for streaming. Reference the Handler.Responses handler
and the translatedInferenceService.Responses behavior (stream=true) so API docs
and generated clients include "text/event-stream" and the streaming success
description for /v1/responses.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/aliases/batch_preparer.go`:
- Around line 29-30: The rewrite currently allows an alias inside a native batch
to resolve to a different provider and only fails at submission; in
PrepareBatchRequest (which calls rewriteAliasBatchSource) ensure the resolved
selector/provider for every alias is validated against the enclosing batch's
providerType and reject the batch with a clear error if any resolved provider
does not match providerType. Update rewriteAliasBatchSource (and any sibling
alias-rewrite paths used for single requests) to accept the enclosing
providerType or context and compare it to the resolved selector/provider before
constructing the provider-specific request, returning an error on mismatch so
mixed-provider alias resolutions inside a native batch are rejected early.

In `@internal/core/semantic.go`:
- Around line 265-273: The code retrieves a cached pointer via
CachedPassthroughRouteInfo(), then mutates its Provider/Model and re-caches it
which breaks the immutability contract for PassthroughRouteInfo; instead, make a
shallow copy of the returned PassthroughRouteInfo (e.g., new := *passthrough)
and perform any assignments on that copy before calling
CachePassthroughRouteInfo(env, &new), referencing CachedPassthroughRouteInfo,
PassthroughRouteInfo, selectors.Provider, selectors.Model and
CachePassthroughRouteInfo to locate the spot to change.

In `@internal/guardrails/executor.go`:
- Around line 105-115: processGuardedChat currently calls chatToMessages before
checking whether any guardrail pipeline is present, which can cause unnecessary
decoding and false rejections (e.g., via NewRequestPatcher(nil)) when pipeline
is nil or empty; move the nil/empty check for pipeline (the same check used in
processGuardedResponses) to run before chatToMessages so the function returns
the original req unchanged when pipeline == nil or pipeline.Len() == 0, and only
call chatToMessages and subsequent patching when a non-empty pipeline is
present.

In `@internal/guardrails/provider.go`:
- Around line 26-29: PrepareBatchRequest currently always calls
processGuardedBatchRequest which causes inconsistent behavior versus the direct
CreateBatch* paths that respect EnableForBatchProcessing; update
PrepareBatchRequest to check the EnableForBatchProcessing flag and skip calling
processGuardedBatchRequest when the flag is false so it matches the CreateBatch*
behavior. Specifically, in PrepareBatchRequest (and any other entry points that
currently invoke processGuardedBatchRequest), add a guard: if
EnableForBatchProcessing is true then call processGuardedBatchRequest, else
bypass guardrail rewriting and proceed as the CreateBatch* paths do; ensure the
same conditional is applied to all locations that previously unconditionally
invoked processGuardedBatchRequest (including the other call sites roughly
corresponding to the blocks around 353-399 and 583-587) so behavior is
consistent across PrepareBatchRequest, CreateBatchRequest/CreateBatchResponses,
and related methods.

In `@internal/providers/anthropic/passthrough_semantics.go`:
- Around line 28-31: The fallback audit path currently uses info.RawEndpoint
which can produce non-canonical audit paths; instead build enriched.AuditPath
from the normalized upstream endpoint used elsewhere in this enricher (e.g., the
variable holding the canonical/normalized endpoint value), trimming leading
slashes and joining with "/p/anthropic/". Replace the use of info.RawEndpoint
with that normalized endpoint variable so `/p/anthropic/v1/foo` and
`/p/anthropic/foo` collapse to the same audit path.

In `@internal/providers/openai/passthrough_semantics.go`:
- Around line 31-34: The fallback AuditPath construction in the default branch
currently uses info.RawEndpoint which causes divergences; change it to build
enriched.AuditPath from the normalized upstream endpoint (use the normalized
endpoint field on info instead of info.RawEndpoint) so both alias-normalized
routes map to the same audit key—i.e., replace the
strings.Trim...(info.RawEndpoint) usage when setting enriched.AuditPath with the
normalized endpoint value.

In `@internal/server/native_batch_service.go`:
- Around line 79-95: On any failure after calling CreateBatch (including the
upstream==nil path, missing providerBatchID path, and the batchStore.Create
failure path), perform a best-effort rollback: if
s.cleanupPreparedBatchInputFile != nil call s.cleanupPreparedBatchInputFile(ctx,
providerType, batchPreparation.RewrittenInputFileID) to delete the rewritten
input file, clear any hint/state that points to the prepared batch (invoke your
existing hint-clear method or add one and call it for the same
RewrittenInputFileID), and if an upstream ID was already returned (upstream.ID
or upstream.ProviderBatchID != ""), attempt to cancel the upstream batch via the
provider client (add a helper like s.cancelUpstreamBatch(ctx, providerType,
upstream.ID) that calls the provider cancel API and ignores errors). Apply this
rollback logic both around the upstream==nil/providerBatchID checks and where
batchStore.Create fails after receiving the upstream response.

In `@internal/server/native_batch_support.go`:
- Around line 561-572: The current isNativeBatchResultsPending function (used
when handling gateway errors) treats any 404 from providers listed in
batchResultsPending404Providers as "results pending"; change this to avoid false
positives by either checking a provider-specific error code/message on
gatewayErr (e.g., examine gatewayErr.Code() or gatewayErr.Message() for the
non-terminal indicator for that provider) before returning true, or perform a
follow-up GetBatch call for the given batch ID and only return true if that call
reports an active/non-terminal state; update isNativeBatchResultsPending to
reference gatewayErr.Provider and gatewayErr.Code()/Message() (or to call
GetBatch) so real upstream 404/expired cases are not considered pending.

In `@internal/server/native_file_service.go`:
- Around line 137-140: The code uses io.ReadAll(file) in the native file upload
path which reads the entire file into memory; replace this with a streaming
approach so large uploads don't exhaust memory — for example, change the logic
around the variable file in the same handler to stream the file to the
downstream provider or to a temporary file using io.Copy (or pass the io.Reader
directly) and return errors via handleError/core.NewInvalidRequestError as
before; ensure any downstream call that previously accepted content (the content
variable) is updated to accept an io.Reader or the temp file reference so you no
longer allocate the whole file in memory.

In `@internal/server/passthrough_semantic_enrichment.go`:
- Around line 38-40: The code swallows normalization errors from
normalizePassthroughEndpoint and continues to set info.NormalizedEndpoint,
causing invalid passthrough endpoints to be enriched/cached; change the logic in
passthrough_semantic_enrichment.go so that when
normalizePassthroughEndpoint(info.RawEndpoint, allowPassthroughV1Alias) returns
a non-nil error you return early (or at minimum skip all enrichment and caching
for that request) instead of proceeding—use the returned error to decide not to
set info.NormalizedEndpoint and to abort further processing of that info object.

In `@internal/server/passthrough_service.go`:
- Around line 33-40: The code currently prefers the client X-Request-ID header
over the existing context value; instead call the existing helper
requestIDFromContextOrHeader to obtain the authoritative request ID and updated
context, then set that context back onto the incoming request so downstream code
sees the server-assigned ID (use c.SetRequest(c.Request().WithContext(ctx))
after you obtain ctx and requestID). Ensure you still call core.WithRequestID or
otherwise preserve the request ID in context as the helper expects.

In `@internal/server/passthrough_support.go`:
- Around line 224-261: The proxyPassthroughResponse function currently forwards
provider responses verbatim; modify proxyPassthroughResponse to check
resp.StatusCode and, for any non-success status (>=400), do not write the
provider headers/body but instead construct a core.GatewayError (with
OpenAI-compatible shape: error.type, error.message, error.param=null,
error.code=null and a typed client-facing category) and return it via
handleError so clients always receive a normalized GatewayError; keep existing
streaming path (isSSEContentType, audit log wrapping) intact but ensure
streaming errors still map to core.GatewayError on non-success; use the existing
symbols proxyPassthroughResponse, handleError, core.GatewayError,
copyPassthroughResponseHeaders, isSSEContentType and passthroughAuditPath to
locate where to add the status check and normalization.

In `@internal/server/request_model_resolution.go`:
- Line 38: The parameter named resolver is being shadowed by the short-variable
declaration in the if statement; change the inner variable name to avoid
shadowing (for example, assign effectiveRequestModelResolver(provider, resolver)
to a new variable like resolvedModel or res and use that in the if check) so the
function parameter resolver remains unshadowed; update any subsequent uses
inside the block to reference the new variable. Reference: the function
parameter resolver and the call effectiveRequestModelResolver(provider,
resolver).

In `@internal/server/request_support.go`:
- Around line 36-37: Guard against a nil Header on the incoming *http.Request
before calling req.Header.Set("X-Request-ID", requestID) by checking if
req.Header == nil and initializing it (req.Header = make(http.Header)) when nil;
update the request_support.go location around the req.Header.Set call to perform
this defensive initialization so future call sites that omit headers won’t
panic.

---

Outside diff comments:
In `@internal/core/batch_semantic.go`:
- Line 15: Change DecodedBatchItemRequest.Operation from string to the typed
Operation and update any call sites that currently cast it with string() (these
are the spots around where items are decoded and used). Update
canonicalOperationCodecFor to accept an Operation instead of string and adjust
its usage and tests accordingly. Also update the error formatting that builds
messages from the operation (previously using string()) so it formats the
Operation value directly. Ensure all places constructing/reading
DecodedBatchItemRequest are updated to use the Operation type.

In `@internal/server/handlers.go`:
- Around line 292-307: Update the Swagger comment above the Handler.Responses
function to document the streaming variant: add "@Produce text/event-stream"
alongside the existing "@Produce json", and add a streaming success annotation
(e.g., an additional "@Success 200 {string} string \"SSE stream
(text/event-stream) when stream=true\"") or otherwise clarify the 200 response
for streaming. Reference the Handler.Responses handler and the
translatedInferenceService.Responses behavior (stream=true) so API docs and
generated clients include "text/event-stream" and the streaming success
description for /v1/responses.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 5cde7c9f-3e5d-4735-8f3c-3c369260fdf4

📥 Commits

Reviewing files that changed from the base of the PR and between 45d3b3f and 0fde89c.

📒 Files selected for processing (54)
  • internal/aliases/batch_preparer.go
  • internal/aliases/batch_preparer_test.go
  • internal/aliases/provider.go
  • internal/aliases/provider_test.go
  • internal/aliases/service.go
  • internal/app/app.go
  • internal/core/batch_semantic.go
  • internal/core/endpoints.go
  • internal/core/endpoints_test.go
  • internal/core/execution_plan.go
  • internal/core/passthrough.go
  • internal/core/semantic.go
  • internal/core/semantic_canonical.go
  • internal/core/semantic_canonical_test.go
  • internal/core/semantic_test.go
  • internal/guardrails/executor.go
  • internal/guardrails/provider.go
  • internal/providers/anthropic/anthropic.go
  • internal/providers/anthropic/passthrough_semantics.go
  • internal/providers/anthropic/passthrough_semantics_test.go
  • internal/providers/factory.go
  • internal/providers/factory_test.go
  • internal/providers/openai/openai.go
  • internal/providers/openai/passthrough_semantics.go
  • internal/providers/openai/passthrough_semantics_test.go
  • internal/providers/passthrough.go
  • internal/responsecache/middleware_test.go
  • internal/responsecache/simple.go
  • internal/server/batch_request_preparer.go
  • internal/server/batch_request_preparer_test.go
  • internal/server/error_support.go
  • internal/server/execution_plan_helpers.go
  • internal/server/execution_plan_helpers_test.go
  • internal/server/exposed_model_lister.go
  • internal/server/handlers.go
  • internal/server/handlers_test.go
  • internal/server/http.go
  • internal/server/model_validation.go
  • internal/server/model_validation_test.go
  • internal/server/native_batch_service.go
  • internal/server/native_batch_support.go
  • internal/server/native_file_service.go
  • internal/server/passthrough_execution_helpers.go
  • internal/server/passthrough_execution_helpers_test.go
  • internal/server/passthrough_semantic_enrichment.go
  • internal/server/passthrough_semantic_enrichment_test.go
  • internal/server/passthrough_service.go
  • internal/server/passthrough_support.go
  • internal/server/request_model_resolution.go
  • internal/server/request_snapshot_test.go
  • internal/server/request_support.go
  • internal/server/stream_support.go
  • internal/server/translated_inference_service.go
  • internal/server/translated_request_patcher.go

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/providers/file_adapter_openai_compat.go (1)

81-114: ⚠️ Potential issue | 🔴 Critical

Close the pipe reader when the request returns.

This upload path creates a writer goroutine that only stops once pr is closed or fully drained. The current RawBodyReader client path can return early on failure without draining that reader, so the goroutine can stay blocked in io.Copy/writer.Close and leak.

♻️ Proposed fix
 	pr, pw := io.Pipe()
+	defer func() {
+		_ = pr.Close()
+	}()
 	writer := multipart.NewWriter(pw)
@@
 	}, &fileObj); err != nil {
+		_ = pr.CloseWithError(err)
 		return nil, err
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/providers/file_adapter_openai_compat.go` around lines 81 - 114, The
upload goroutine can block if the HTTP request returns early because the pipe
reader (pr) is never closed; to fix, ensure pr is closed when the request path
exits by adding a defer to close pr after starting the writer goroutine but
before calling client.Do. Specifically, after spawning the anonymous goroutine
that writes to pw (and before invoking client.Do with RawBodyReader: pr), add a
defer that calls pr.Close() (ignore the error) so the writer goroutine is
unblocked on early client.Do returns and cannot leak; keep existing
pw.CloseWithError usage in the goroutine as-is.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/aliases/provider.go`:
- Around line 310-314: PrepareBatchRequest currently always rewrites batches via
rewriteAliasBatchSource, bypassing the DisableNativeBatchPreparation option;
update Provider.PrepareBatchRequest to check the Provider option
(DisableNativeBatchPreparation) and short-circuit (return nil, nil or
appropriate no-op/forward result) when the option is set so behavior matches
CreateBatch/CreateBatchWithHints, or alternatively prevent exposing this method
when that flag is enabled; locate the Provider struct and its
DisableNativeBatchPreparation flag and modify PrepareBatchRequest to honor that
flag before calling rewriteAliasBatchSource.

In `@internal/guardrails/executor.go`:
- Around line 62-102: processGuardedBatchRequest should short-circuit when
pipeline is nil or empty to match processGuardedChat/processGuardedResponses:
before calling core.RewriteBatchSource, check the pipeline (e.g., pipeline ==
nil || len(pipeline.Stages) == 0 or the same emptiness test used by
processGuardedChat/processGuardedResponses) and if empty return the original req
as an unmodified *core.BatchRewriteResult (or the same "no-op" result used
elsewhere) with nil error; otherwise proceed to call core.RewriteBatchSource as
currently implemented. Ensure you reference processGuardedBatchRequest,
processGuardedChat, processGuardedResponses and core.RewriteBatchSource so the
change mirrors the existing fast-path behavior.

In `@internal/server/native_batch_service.go`:
- Around line 177-214: The GetBatch handler uses requestContextWithRequestID to
derive ctx but then calls s.batchStore.Update with c.Request().Context(),
causing inconsistent context propagation; change the Update call to use the
previously derived ctx (from requestContextWithRequestID) so any request-scoped
values/tracing are preserved, i.e., replace c.Request().Context() with ctx when
invoking s.batchStore.Update inside GetBatch (and ensure any other calls in
GetBatch that use c.Request().Context() instead use ctx as well).
- Around line 274-311: The CancelBatch handler extracts ctx via
requestContextWithRequestID but then calls s.batchStore.Update with
c.Request().Context(), causing inconsistent request contexts; change the Update
call (and any other uses of c.Request().Context() in CancelBatch) to use the
previously created ctx so the same request-scoped context
(requestContextWithRequestID) is propagated through
nativeBatchService.CancelBatch, including the call to s.batchStore.Update and
any cleanup calls like cleanupStoredBatchRewrittenInputFile.

In `@internal/server/native_batch_support.go`:
- Around line 134-246: The logBatchUsageFromBatchResults function is doing too
many things; extract smaller helpers to improve readability and testability by
splitting validation/early returns, per-item processing, and totals
accumulation. Create a helper like processResultItem(resultItem, storedBatch,
fallbackRequestID) (or processBatchResultItem) that encapsulates the payload
parsing (asJSONMap), usage extraction (extractTokenTotals),
provider/model/providerID resolution, raw usage building
(buildBatchUsageRawData), pricing lookup (h.pricingResolver.ResolvePricing),
usage entry creation (usage.ExtractFromSSEUsage) and setting entry.ID via
deterministicBatchUsageID, returning the entry plus per-item token/cost values;
create another helper accumulateUsageTotals(aggregate, entry, inputTokens,
outputTokens, usageTotal) to update loggedEntries, inputTotal, outputTotal,
totalTokens and cost totals and hasAnyCost; keep the top-level
logBatchUsageFromBatchResults to handle initial validation, loop over
result.Data calling processResultItem and accumulateUsageTotals, write entries
with h.usageLogger.Write, and finally mutate stored.RequestID,
stored.UsageLoggedAt and stored.Batch.Usage from the accumulated totals.

In `@internal/server/native_file_service.go`:
- Around line 70-76: The audit entry currently clears provider attribution via
auditlog.EnrichEntry(c, "file", "") and then returns on the first successful
candidate without restoring the provider; update the loop so that before
returning on success you set the provider attribution to the chosen candidate
(e.g., call auditlog.EnrichEntry(c, "file", candidate) or whichever key/value is
used to record provider) so GetFile/DeleteFile/GetFileContent are audited with
the resolved provider, and ensure any previous value is restored or only cleared
where intended; modify the success path inside the providers loop around
callFn/respondFn to set the provider attribution before calling respondFn.
- Around line 201-224: The merged listing incorrectly queries each provider with
after="" which hides items when a client supplies an older cursor; update the
providers loop to forward the request's after cursor to nativeRouter.ListFiles
(use the existing after variable instead of the empty string) so each backend
pages from the same cursor, continue fetching limit+1 per provider to detect
extra item, and keep the post-merge applyAfterCursor call to trim/normalize
results and handle deduping; reference nativeRouter.ListFiles, applyAfterCursor,
providers, limit+1 and after when making the change.

---

Outside diff comments:
In `@internal/providers/file_adapter_openai_compat.go`:
- Around line 81-114: The upload goroutine can block if the HTTP request returns
early because the pipe reader (pr) is never closed; to fix, ensure pr is closed
when the request path exits by adding a defer to close pr after starting the
writer goroutine but before calling client.Do. Specifically, after spawning the
anonymous goroutine that writes to pw (and before invoking client.Do with
RawBodyReader: pr), add a defer that calls pr.Close() (ignore the error) so the
writer goroutine is unblocked on early client.Do returns and cannot leak; keep
existing pw.CloseWithError usage in the goroutine as-is.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 252a6dfd-ed37-49c4-bb57-a642b284bd06

📥 Commits

Reviewing files that changed from the base of the PR and between 0fde89c and d8208bc.

📒 Files selected for processing (30)
  • internal/aliases/batch_preparer.go
  • internal/aliases/batch_preparer_test.go
  • internal/aliases/provider.go
  • internal/aliases/provider_test.go
  • internal/core/batch_preparation.go
  • internal/core/batch_semantic.go
  • internal/core/batch_semantic_test.go
  • internal/core/files.go
  • internal/core/semantic.go
  • internal/core/semantic_canonical.go
  • internal/guardrails/executor.go
  • internal/guardrails/provider.go
  • internal/guardrails/provider_test.go
  • internal/providers/anthropic/anthropic_test.go
  • internal/providers/anthropic/passthrough_semantics.go
  • internal/providers/anthropic/passthrough_semantics_test.go
  • internal/providers/file_adapter_openai_compat.go
  • internal/providers/openai/passthrough_semantics.go
  • internal/providers/openai/passthrough_semantics_test.go
  • internal/providers/router_test.go
  • internal/server/handlers.go
  • internal/server/handlers_test.go
  • internal/server/native_batch_service.go
  • internal/server/native_batch_support.go
  • internal/server/native_file_service.go
  • internal/server/passthrough_semantic_enrichment.go
  • internal/server/passthrough_service.go
  • internal/server/passthrough_support.go
  • internal/server/request_model_resolution.go
  • internal/server/request_support.go

Comment on lines +177 to +214
func (s *nativeBatchService) GetBatch(c *echo.Context) error {
ctx, _ := requestContextWithRequestID(c.Request())

id, err := batchIDFromRequest(c)
if err != nil {
return handleError(c, err)
}

stored, err := s.requireStoredBatch(c, id)
if err != nil {
return handleError(c, err)
}

nativeRouter, ok := s.provider.(core.NativeBatchRoutableProvider)
if !ok || stored.Batch.Provider == "" || stored.Batch.ProviderBatchID == "" {
return c.JSON(http.StatusOK, stored.Batch)
}

latest, err := nativeRouter.GetBatch(ctx, stored.Batch.Provider, stored.Batch.ProviderBatchID)
if err != nil {
return handleError(c, err)
}
updated := false
if latest != nil {
mergeStoredBatchFromUpstream(stored, latest)
updated = true
}
if isTerminalBatchStatus(stored.Batch.Status) && s.cleanupStoredBatchRewrittenInputFile != nil && s.cleanupStoredBatchRewrittenInputFile(ctx, stored) {
updated = true
}
if updated && s.batchStore != nil {
if err := s.batchStore.Update(c.Request().Context(), stored); err != nil && !errors.Is(err, batchstore.ErrNotFound) {
return handleError(c, core.NewProviderError("batch_store", http.StatusInternalServerError, "failed to persist refreshed batch", err))
}
}

return c.JSON(http.StatusOK, stored.Batch)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Inconsistent context usage in GetBatch.

Line 178 extracts ctx from requestContextWithRequestID, but line 208 uses c.Request().Context() instead. This inconsistency could cause issues if requestContextWithRequestID adds important context values (like request ID or tracing).

♻️ Suggested fix for context consistency
 	if updated && s.batchStore != nil {
-		if err := s.batchStore.Update(c.Request().Context(), stored); err != nil && !errors.Is(err, batchstore.ErrNotFound) {
+		if err := s.batchStore.Update(ctx, stored); err != nil && !errors.Is(err, batchstore.ErrNotFound) {
 			return handleError(c, core.NewProviderError("batch_store", http.StatusInternalServerError, "failed to persist refreshed batch", err))
 		}
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *nativeBatchService) GetBatch(c *echo.Context) error {
ctx, _ := requestContextWithRequestID(c.Request())
id, err := batchIDFromRequest(c)
if err != nil {
return handleError(c, err)
}
stored, err := s.requireStoredBatch(c, id)
if err != nil {
return handleError(c, err)
}
nativeRouter, ok := s.provider.(core.NativeBatchRoutableProvider)
if !ok || stored.Batch.Provider == "" || stored.Batch.ProviderBatchID == "" {
return c.JSON(http.StatusOK, stored.Batch)
}
latest, err := nativeRouter.GetBatch(ctx, stored.Batch.Provider, stored.Batch.ProviderBatchID)
if err != nil {
return handleError(c, err)
}
updated := false
if latest != nil {
mergeStoredBatchFromUpstream(stored, latest)
updated = true
}
if isTerminalBatchStatus(stored.Batch.Status) && s.cleanupStoredBatchRewrittenInputFile != nil && s.cleanupStoredBatchRewrittenInputFile(ctx, stored) {
updated = true
}
if updated && s.batchStore != nil {
if err := s.batchStore.Update(c.Request().Context(), stored); err != nil && !errors.Is(err, batchstore.ErrNotFound) {
return handleError(c, core.NewProviderError("batch_store", http.StatusInternalServerError, "failed to persist refreshed batch", err))
}
}
return c.JSON(http.StatusOK, stored.Batch)
}
func (s *nativeBatchService) GetBatch(c *echo.Context) error {
ctx, _ := requestContextWithRequestID(c.Request())
id, err := batchIDFromRequest(c)
if err != nil {
return handleError(c, err)
}
stored, err := s.requireStoredBatch(c, id)
if err != nil {
return handleError(c, err)
}
nativeRouter, ok := s.provider.(core.NativeBatchRoutableProvider)
if !ok || stored.Batch.Provider == "" || stored.Batch.ProviderBatchID == "" {
return c.JSON(http.StatusOK, stored.Batch)
}
latest, err := nativeRouter.GetBatch(ctx, stored.Batch.Provider, stored.Batch.ProviderBatchID)
if err != nil {
return handleError(c, err)
}
updated := false
if latest != nil {
mergeStoredBatchFromUpstream(stored, latest)
updated = true
}
if isTerminalBatchStatus(stored.Batch.Status) && s.cleanupStoredBatchRewrittenInputFile != nil && s.cleanupStoredBatchRewrittenInputFile(ctx, stored) {
updated = true
}
if updated && s.batchStore != nil {
if err := s.batchStore.Update(ctx, stored); err != nil && !errors.Is(err, batchstore.ErrNotFound) {
return handleError(c, core.NewProviderError("batch_store", http.StatusInternalServerError, "failed to persist refreshed batch", err))
}
}
return c.JSON(http.StatusOK, stored.Batch)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/native_batch_service.go` around lines 177 - 214, The GetBatch
handler uses requestContextWithRequestID to derive ctx but then calls
s.batchStore.Update with c.Request().Context(), causing inconsistent context
propagation; change the Update call to use the previously derived ctx (from
requestContextWithRequestID) so any request-scoped values/tracing are preserved,
i.e., replace c.Request().Context() with ctx when invoking s.batchStore.Update
inside GetBatch (and ensure any other calls in GetBatch that use
c.Request().Context() instead use ctx as well).

Comment on lines +274 to +311
func (s *nativeBatchService) CancelBatch(c *echo.Context) error {
ctx, _ := requestContextWithRequestID(c.Request())

id, err := batchIDFromRequest(c)
if err != nil {
return handleError(c, err)
}

stored, err := s.requireStoredBatch(c, id)
if err != nil {
return handleError(c, err)
}

nativeRouter, ok := s.provider.(core.NativeBatchRoutableProvider)
if !ok || stored.Batch.Provider == "" || stored.Batch.ProviderBatchID == "" {
return handleError(c, core.NewInvalidRequestError("native batch cancellation is not available", nil))
}

latest, err := nativeRouter.CancelBatch(ctx, stored.Batch.Provider, stored.Batch.ProviderBatchID)
if err != nil {
return handleError(c, err)
}
if latest != nil {
mergeStoredBatchFromUpstream(stored, latest)
}
if isTerminalBatchStatus(stored.Batch.Status) && s.cleanupStoredBatchRewrittenInputFile != nil {
s.cleanupStoredBatchRewrittenInputFile(ctx, stored)
}

if err := s.batchStore.Update(c.Request().Context(), stored); err != nil {
if errors.Is(err, batchstore.ErrNotFound) {
return handleError(c, core.NewNotFoundError("batch not found: "+id))
}
return handleError(c, core.NewProviderError("batch_store", http.StatusInternalServerError, "failed to cancel batch", err))
}

return c.JSON(http.StatusOK, stored.Batch)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Same context inconsistency in CancelBatch.

Line 275 extracts ctx but line 303 uses c.Request().Context().

♻️ Suggested fix
-	if err := s.batchStore.Update(c.Request().Context(), stored); err != nil {
+	if err := s.batchStore.Update(ctx, stored); err != nil {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/native_batch_service.go` around lines 274 - 311, The
CancelBatch handler extracts ctx via requestContextWithRequestID but then calls
s.batchStore.Update with c.Request().Context(), causing inconsistent request
contexts; change the Update call (and any other uses of c.Request().Context() in
CancelBatch) to use the previously created ctx so the same request-scoped
context (requestContextWithRequestID) is propagated through
nativeBatchService.CancelBatch, including the call to s.batchStore.Update and
any cleanup calls like cleanupStoredBatchRewrittenInputFile.

Comment on lines +134 to +246
func (h *Handler) logBatchUsageFromBatchResults(stored *batchstore.StoredBatch, result *core.BatchResultsResponse, fallbackRequestID string) bool {
if h.usageLogger == nil || !h.usageLogger.Config().Enabled || stored == nil || stored.Batch == nil || result == nil || len(result.Data) == 0 {
return false
}
if stored.UsageLoggedAt != nil {
return false
}

requestID := strings.TrimSpace(stored.RequestID)
if requestID == "" {
requestID = strings.TrimSpace(fallbackRequestID)
}
if requestID == "" {
requestID = "batch:" + stored.Batch.ID
}

loggedEntries := 0
inputTotal := 0
outputTotal := 0
totalTokens := 0
var inputCostTotal float64
var outputCostTotal float64
var totalCostTotal float64
hasAnyCost := false

for _, item := range result.Data {
if item.StatusCode < http.StatusOK || item.StatusCode >= http.StatusMultipleChoices {
continue
}

payload, ok := asJSONMap(item.Response)
if !ok {
continue
}
usagePayload, ok := asJSONMap(payload["usage"])
if !ok {
continue
}

inputTokens, outputTokens, usageTotal, hasUsage := extractTokenTotals(usagePayload)
if !hasUsage {
continue
}

provider := firstNonEmpty(item.Provider, stored.Batch.Provider)
model := firstNonEmpty(item.Model, stringFromAny(payload["model"]))
providerID := firstNonEmpty(
stringFromAny(payload["id"]),
item.CustomID,
fmt.Sprintf("%s:%d", firstNonEmpty(stored.Batch.ProviderBatchID, stored.Batch.ID), item.Index),
)
rawUsage := buildBatchUsageRawData(usagePayload, stored.Batch, item)

var pricing *core.ModelPricing
if h.pricingResolver != nil && model != "" {
pricing = h.pricingResolver.ResolvePricing(model, provider)
}

entry := usage.ExtractFromSSEUsage(
providerID,
inputTokens,
outputTokens,
usageTotal,
rawUsage,
requestID,
model,
provider,
"/v1/batches",
pricing,
)
if entry == nil {
continue
}
entry.ID = deterministicBatchUsageID(stored.Batch, item, providerID)

h.usageLogger.Write(entry)
loggedEntries++
inputTotal += inputTokens
outputTotal += outputTokens
totalTokens += usageTotal
if entry.InputCost != nil {
inputCostTotal += *entry.InputCost
hasAnyCost = true
}
if entry.OutputCost != nil {
outputCostTotal += *entry.OutputCost
hasAnyCost = true
}
if entry.TotalCost != nil {
totalCostTotal += *entry.TotalCost
hasAnyCost = true
}
}

if loggedEntries == 0 {
return false
}

now := time.Now().UTC()
stored.RequestID = requestID
stored.UsageLoggedAt = &now

stored.Batch.Usage.InputTokens = inputTotal
stored.Batch.Usage.OutputTokens = outputTotal
stored.Batch.Usage.TotalTokens = totalTokens
if hasAnyCost {
stored.Batch.Usage.InputCost = &inputCostTotal
stored.Batch.Usage.OutputCost = &outputCostTotal
stored.Batch.Usage.TotalCost = &totalCostTotal
}

return true
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider extracting sub-functions from logBatchUsageFromBatchResults.

This function has multiple responsibilities: validation, iteration, token extraction, pricing resolution, usage entry creation, and stored batch mutation. While it works correctly, extracting helpers like processResultItem or accumulateUsageTotals would improve readability and testability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/native_batch_support.go` around lines 134 - 246, The
logBatchUsageFromBatchResults function is doing too many things; extract smaller
helpers to improve readability and testability by splitting validation/early
returns, per-item processing, and totals accumulation. Create a helper like
processResultItem(resultItem, storedBatch, fallbackRequestID) (or
processBatchResultItem) that encapsulates the payload parsing (asJSONMap), usage
extraction (extractTokenTotals), provider/model/providerID resolution, raw usage
building (buildBatchUsageRawData), pricing lookup
(h.pricingResolver.ResolvePricing), usage entry creation
(usage.ExtractFromSSEUsage) and setting entry.ID via deterministicBatchUsageID,
returning the entry plus per-item token/cost values; create another helper
accumulateUsageTotals(aggregate, entry, inputTokens, outputTokens, usageTotal)
to update loggedEntries, inputTotal, outputTotal, totalTokens and cost totals
and hasAnyCost; keep the top-level logBatchUsageFromBatchResults to handle
initial validation, loop over result.Data calling processResultItem and
accumulateUsageTotals, write entries with h.usageLogger.Write, and finally
mutate stored.RequestID, stored.UsageLoggedAt and stored.Batch.Usage from the
accumulated totals.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/server/model_validation.go`:
- Around line 73-80: The code stores a passthrough plan while leaving
passthroughRouteInfo().Provider potentially empty because
providerPassthroughType(c) can recover the provider separately; fix by
populating the passthrough object with the resolved provider before assigning it
to plan. Concretely: call passthroughRouteInfo(c) into a local variable, call
providerPassthroughType(c) to get providerType (and the provider identifier),
set passthrough.Provider = <resolved provider> (or otherwise copy the provider
value returned by providerPassthroughType) then set plan.Mode =
core.ExecutionModePassthrough, plan.ProviderType = providerType and finally
plan.Passthrough = passthrough. Apply the same change in the other similar block
that currently repeats this logic.
- Around line 237-245: Update the GoDoc for GetProviderType to reflect that it
reads the provider type from the request's execution plan via
core.GetExecutionPlan and is part of the ExecutionPlanning flow (not
ModelValidation); replace the outdated "set by ModelValidation" phrasing with
wording that says the value is read from the request execution plan
(ExecutionPlanning) so callers understand the source and context of the provider
type.

In `@tests/e2e/release-e2e-scenarios.md`:
- Around line 369-396: The tests are nondeterministic because steps keep
rediscovering resources with list&limit=1; instead, capture and reuse IDs
returned by the create/upload calls (e.g., parse the create responses in S35,
S39, S42 into variables like BATCH_FILE_ID, ASSISTANT_FILE_ID, and whatever ID
S42 returns) and reference those variables in dependent steps (S38, S40, S46,
etc.) rather than re-querying with limit=1; update the create/upload commands to
extract the .id from their JSON responses into shell variables and replace
subsequent curl list/delete calls to use those variables so
deletion/cancellation targets the exact created resource.
- Around line 107-110: The AUDIT_ID extraction using jq -r on the admin audit
endpoint can yield null on a clean gateway causing a later query to use
log_id=null; update the S11 scenario to either perform a guaranteed audited
request before reading the audit log (e.g., send a simple request that creates
an audit entry) or change the jq call that sets AUDIT_ID (the line assigning
AUDIT_ID and the jq invocation) to use jq -er so it fails fast on empty results;
ensure the subsequent AUDIT_ID usage and the curl to
/admin/api/v1/audit/conversation keep the same variable name and handle failure
accordingly.
- Around line 13-27: The happy-path curl calls that currently use patterns like
"curl -sS | jq" or "curl -sS | sed" must detect HTTP errors: update those
positive-probe invocations to either use curl --fail (or -f) so curl returns
non-zero on 4xx/5xx, or capture the HTTP status with curl -w "%{http_code}" -o
<respfile> and explicitly check the status before piping to jq/sed; apply this
change to the probes that reference the BATCH_FILE and UPLOAD_FILE uploads and
any other success-path curl invocations so a 4xx/5xx causes the script to error
rather than produce null/partial JSON output.
- Around line 252-259: Update the S26 "Latest GPT reasoning on chat (negative)"
scenario (and the other listed negative scenarios 405-410, 443-445, 473-475,
515-518) to perform explicit pass/fail assertions instead of dumping raw
responses: invoke the same curl request but assert the HTTP status is the
expected failure code and verify the response body uses the OpenAI-compatible
error shape with fields error.type, error.message, error.param, and error.code
and that the server-side error maps to a core.GatewayError category (so the
client-facing fields reflect the typed category); reference the scenario title
"S26 Latest GPT reasoning on chat (negative)" and the request payload keys
"model","messages","reasoning" to locate where to add the checks and add the
normalized error assertions to each corresponding negative test block.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4f3b5fc8-454c-46fb-8b8c-409e44e96460

📥 Commits

Reviewing files that changed from the base of the PR and between d8208bc and 8dcbd65.

📒 Files selected for processing (2)
  • internal/server/model_validation.go
  • tests/e2e/release-e2e-scenarios.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
tests/e2e/release-e2e-scenarios.md (4)

369-370: ⚠️ Potential issue | 🟠 Major

Reuse created resource IDs instead of rediscovering with limit=1.

Current list-and-pick-first flow is nondeterministic and can target stale artifacts. Capture IDs at create time (S35/S39/S42) and reuse in dependent scenarios.

Suggested direction
-curl -sS "$BASE_URL/v1/files?provider=openai" \
+OPENAI_BATCH_FILE_ID=$(
+  curl -fsS "$BASE_URL/v1/files?provider=openai" \
     -F purpose=batch \
     -F "file=@$BATCH_FILE" \
-    | jq '.'
+    | jq -er '.id'
+)
...
-FILE_ID=$(curl -sS "$BASE_URL/v1/files?provider=openai&purpose=batch&limit=1" | jq -r '.data[0].id')
-curl -sS "$BASE_URL/v1/files/$FILE_ID?provider=openai" | jq '.'
+curl -fsS "$BASE_URL/v1/files/$OPENAI_BATCH_FILE_ID?provider=openai" | jq '.'

Also applies to: 377-378, 395-397, 405-406, 416-417, 435-437, 443-444, 451-453

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/release-e2e-scenarios.md` around lines 369 - 370, Replace the
nondeterministic list-and-pick-first pattern that sets FILE_ID by querying the
list endpoint with a deterministic reuse of the ID captured when the resource
was created (referenced as create steps S35/S39/S42); update the scenarios that
currently run "curl ... /v1/files?provider=openai&purpose=batch&limit=1" and the
subsequent GET to instead reference the variable populated at create time (the
original create response's ID) so dependent scenarios reuse the exact created
resource ID rather than rediscovering with limit=1.

108-110: ⚠️ Potential issue | 🟠 Major

Make S11 fail fast when audit log is empty.

jq -r '.entries[0].id' can yield null and continue with log_id=null. Use jq -er (or seed an audited request first) so this scenario fails deterministically.

Suggested fix
-AUDIT_ID=$(curl -sS "$BASE_URL/admin/api/v1/audit/log?limit=1" | jq -r '.entries[0].id')
+AUDIT_ID=$(curl -fsS "$BASE_URL/admin/api/v1/audit/log?limit=1" | jq -er '.entries[0].id')
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/release-e2e-scenarios.md` around lines 108 - 110, The AUDIT_ID
extraction can produce "null" and let the scenario continue; update the AUDIT_ID
assignment (the command that sets AUDIT_ID=$(curl ... | jq -r '.entries[0].id'))
to use jq -er (or otherwise assert non-empty) so it exits non-zero when no entry
exists, or proactively create/seed an audit entry before querying; ensure the
subsequent curl that uses log_id=$AUDIT_ID only runs when AUDIT_ID is valid
(i.e., fail fast on missing audit entries).

35-36: ⚠️ Potential issue | 🟠 Major

Enforce fail-fast HTTP checks on positive probes.

Several happy-path scenarios still use curl -sS | jq/sed, so HTTP 4xx/5xx can be misread as valid output. Use -f (or explicit status checks) on success-path probes.

Suggested pattern
-curl -sS "$BASE_URL/v1/models" \
+curl -fsS "$BASE_URL/v1/models" \
   | jq '{count:(.data|length), sample:(.data[:10]|map({id,owned_by}))}'
-curl -sS "$BASE_URL/v1/files?provider=openai" \
+curl -fsS "$BASE_URL/v1/files?provider=openai" \
   -F purpose=batch \
   -F "file=@$BATCH_FILE" \
   | jq '.'

Also applies to: 42-43, 49-51, 351-355, 385-389, 416-421, 537-546

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/release-e2e-scenarios.md` around lines 35 - 36, The positive HTTP
probe calls like curl -sS "$BASE_URL/health" should be fail-fast so 4xx/5xx are
not treated as valid output: update these invocations (e.g., the health check
and the other probes referenced) to use curl -f -sS (or replace with an explicit
HTTP status check using curl -sS -o /dev/null -w "%{http_code}" and assert the
code) and preserve any subsequent jq/sed parsing; ensure all listed occurrences
(the lines around the health probe and the ranges noted in the comment) are
updated consistently.

256-259: ⚠️ Potential issue | 🟠 Major

Negative scenarios still need explicit pass/fail assertions.

These probes currently dump raw responses; they should assert expected HTTP status and normalized error.* fields so failures are machine-verifiable in release gating.

Suggested pattern
-curl -sS -i "$BASE_URL/v1/chat/completions" \
-  -H 'Content-Type: application/json' \
-  -d '{"model":"gpt-5-nano", ... }'
+tmp="$(mktemp)"
+code=$(
+  curl -sS -o "$tmp" -w '%{http_code}' "$BASE_URL/v1/chat/completions" \
+    -H 'Content-Type: application/json' \
+    -d '{"model":"gpt-5-nano", ... }'
+)
+[ "$code" = "400" ] || { cat "$tmp"; exit 1; }
+jq -e '.error.type and .error.message and (.error|has("param")) and (.error|has("code"))' "$tmp" >/dev/null

Also applies to: 443-445, 473-475, 515-518

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/e2e/release-e2e-scenarios.md` around lines 256 - 259, The E2E probe
using the curl POST to "$BASE_URL/v1/chat/completions" currently only prints the
raw response; change it to assert expected HTTP status and normalized error
fields by capturing HTTP status (e.g., via curl -w "%{http_code}" or using jq)
and parsing the JSON body to check error.* fields (error.message, error.type,
error.code) are present/normalized when an error is expected, and that a
successful call yields 200 and the expected response token (e.g., validate body
for "QA_GPT5_REASONING_OK"); update the curl invocation and the following
validation logic so each probe fails non‑zero when assertions fail, using the
same curl command and payload from the diff as the target to modify.
internal/server/model_validation.go (2)

73-80: ⚠️ Potential issue | 🟡 Minor

Backfill plan.Passthrough.Provider before storing the plan.

If passthroughRouteInfo(c) comes from cached route info with an empty Provider, providerPassthroughType(c) can still recover the provider from the path, so the stored plan ends up with ProviderType set while plan.Passthrough.Provider stays blank until some later helper mutates it. Any direct ExecutionPlan reader will see inconsistent state.

🔧 Suggested fix
 	case core.OperationProviderPassthrough:
 		passthrough := passthroughRouteInfo(c)
 		providerType, ok := providerPassthroughType(c)
 		if !ok {
 			return nil, nil
 		}
+		if passthrough == nil {
+			passthrough = &core.PassthroughRouteInfo{AuditPath: c.Request().URL.Path}
+		}
+		if strings.TrimSpace(passthrough.Provider) == "" {
+			passthrough.Provider = providerType
+		}
 		plan.Mode = core.ExecutionModePassthrough
 		plan.ProviderType = providerType
 		plan.Passthrough = passthrough
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/model_validation.go` around lines 73 - 80, The stored
ExecutionPlan can end up inconsistent when passthroughRouteInfo(c) returns a
RouteInfo with an empty Provider but providerPassthroughType(c) recovers
provider from the path; fix by populating plan.Passthrough.Provider with the
recovered provider before storing the plan: after calling
providerPassthroughType(c) and setting plan.ProviderType, set
plan.Passthrough.Provider = provider (or the recovered provider value) so
plan.Passthrough and plan.ProviderType remain consistent for any direct
ExecutionPlan readers.

237-238: 🧹 Nitpick | 🔵 Trivial

Refresh the GoDoc to describe the execution-plan source.

GetProviderType now reads from core.GetExecutionPlan, so “set by ModelValidation” is stale.

✏️ Suggested wording
-// GetProviderType returns the provider type set by ModelValidation for this request.
+// GetProviderType returns the provider type stored in the request's execution plan.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/model_validation.go` around lines 237 - 238, Update the GoDoc
for GetProviderType to reflect that the provider type is derived from the
execution plan via core.GetExecutionPlan rather than being "set by
ModelValidation"; locate the GetProviderType function and change its comment to
describe that it returns the provider type read from the execution plan
(obtained from core.GetExecutionPlan) for the current request, including any
relevant context about execution-plan precedence or fallback behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/e2e/auditlog_test.go`:
- Around line 621-652: After posting to serverURL+"/p/unknown/responses" and
asserting StatusBadRequest, decode resp.Body as JSON and assert the
OpenAI-compatible error envelope fields: ensure the top-level has "error" object
and that error.type equals "invalid_request_error" (or matches entry.ErrorType),
error.message is the client-facing message, error.param is null (or the expected
param), and error.code is null (or expected code); update the subtest around the
http.Post call and resp handling (variables: resp, closeBody, serverURL) to
parse the body and add assertions for error.type, error.message, error.param,
and error.code to satisfy the GatewayError/OpenAI shape requirement.

In `@tests/e2e/release-e2e-scenarios.md`:
- Line 31: The markdown headings like "### S01 Health endpoint" violate MD022 by
lacking a blank line before each "###" heading; fix this by inserting a single
blank line immediately above every "###" heading (e.g., above "### S01 Health
endpoint" and the other listed "### ..." scenario headings) so each heading is
separated by one empty line from the preceding content; ensure you update all
occurrences referenced in the comment.

---

Duplicate comments:
In `@internal/server/model_validation.go`:
- Around line 73-80: The stored ExecutionPlan can end up inconsistent when
passthroughRouteInfo(c) returns a RouteInfo with an empty Provider but
providerPassthroughType(c) recovers provider from the path; fix by populating
plan.Passthrough.Provider with the recovered provider before storing the plan:
after calling providerPassthroughType(c) and setting plan.ProviderType, set
plan.Passthrough.Provider = provider (or the recovered provider value) so
plan.Passthrough and plan.ProviderType remain consistent for any direct
ExecutionPlan readers.
- Around line 237-238: Update the GoDoc for GetProviderType to reflect that the
provider type is derived from the execution plan via core.GetExecutionPlan
rather than being "set by ModelValidation"; locate the GetProviderType function
and change its comment to describe that it returns the provider type read from
the execution plan (obtained from core.GetExecutionPlan) for the current
request, including any relevant context about execution-plan precedence or
fallback behavior.

In `@tests/e2e/release-e2e-scenarios.md`:
- Around line 369-370: Replace the nondeterministic list-and-pick-first pattern
that sets FILE_ID by querying the list endpoint with a deterministic reuse of
the ID captured when the resource was created (referenced as create steps
S35/S39/S42); update the scenarios that currently run "curl ...
/v1/files?provider=openai&purpose=batch&limit=1" and the subsequent GET to
instead reference the variable populated at create time (the original create
response's ID) so dependent scenarios reuse the exact created resource ID rather
than rediscovering with limit=1.
- Around line 108-110: The AUDIT_ID extraction can produce "null" and let the
scenario continue; update the AUDIT_ID assignment (the command that sets
AUDIT_ID=$(curl ... | jq -r '.entries[0].id')) to use jq -er (or otherwise
assert non-empty) so it exits non-zero when no entry exists, or proactively
create/seed an audit entry before querying; ensure the subsequent curl that uses
log_id=$AUDIT_ID only runs when AUDIT_ID is valid (i.e., fail fast on missing
audit entries).
- Around line 35-36: The positive HTTP probe calls like curl -sS
"$BASE_URL/health" should be fail-fast so 4xx/5xx are not treated as valid
output: update these invocations (e.g., the health check and the other probes
referenced) to use curl -f -sS (or replace with an explicit HTTP status check
using curl -sS -o /dev/null -w "%{http_code}" and assert the code) and preserve
any subsequent jq/sed parsing; ensure all listed occurrences (the lines around
the health probe and the ranges noted in the comment) are updated consistently.
- Around line 256-259: The E2E probe using the curl POST to
"$BASE_URL/v1/chat/completions" currently only prints the raw response; change
it to assert expected HTTP status and normalized error fields by capturing HTTP
status (e.g., via curl -w "%{http_code}" or using jq) and parsing the JSON body
to check error.* fields (error.message, error.type, error.code) are
present/normalized when an error is expected, and that a successful call yields
200 and the expected response token (e.g., validate body for
"QA_GPT5_REASONING_OK"); update the curl invocation and the following validation
logic so each probe fails non‑zero when assertions fail, using the same curl
command and payload from the diff as the target to modify.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3b3159d2-51ad-432c-9187-06b98b4b3e85

📥 Commits

Reviewing files that changed from the base of the PR and between 8dcbd65 and ae93453.

📒 Files selected for processing (7)
  • internal/auditlog/auditlog_test.go
  • internal/auditlog/middleware.go
  • internal/server/handlers_test.go
  • internal/server/model_validation.go
  • internal/server/passthrough_service.go
  • tests/e2e/auditlog_test.go
  • tests/e2e/release-e2e-scenarios.md

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/aliases/provider_test.go`:
- Around line 297-315: The test currently only checks inner.fileCreates, which
won't fail if PrepareBatchRequest still reads the source file; update the mock
"inner" to either (a) have its GetFileContent method call t.Fatalf (or return an
error) when invoked, or (b) record calls (e.g., inner.fileGets) and assert
len(inner.fileGets)==0 after calling provider.PrepareBatchRequest; change the
test around NewProviderWithOptions/PrepareBatchRequest to fail if
inner.GetFileContent is invoked so the opt-out guarantees no native
batch-preparation I/O.

In `@internal/auditlog/middleware.go`:
- Around line 169-193: The audit enrichment misses copying the execution plan's
canonical request ID into the log entry; update enrichEntryWithExecutionPlan to
set entry.RequestID = plan.RequestID (when plan != nil and plan.RequestID is not
empty) so the audit log uses the same canonical request ID derived by
requestIDFromContextOrHeader; keep the existing nil checks and only overwrite
entry.RequestID when plan.RequestID is non-empty to avoid clobbering any
pre-existing value.

In `@internal/server/model_validation.go`:
- Around line 100-115: When the model registry is uninitialized (the
modelCountProvider branch) we currently return core.NewProviderError before
populating the plan, losing provider/model context; fix by assigning
plan.ProviderType = resolution.ProviderType and plan.Resolution = resolution and
calling auditlog.EnrichEntryWithExecutionPlan(c, plan) prior to returning the
error so the audit trail and response retain the resolved provider/model
information (references: plan, resolution, provider, modelCountProvider,
core.NewProviderError, auditlog.EnrichEntryWithExecutionPlan,
ensureRequestModelResolution).

In `@internal/server/native_file_service.go`:
- Around line 103-117: The code currently calls s.providerTypes()
unconditionally in CreateFile which breaks routers that implement
core.NativeFileRoutableProvider but not core.NativeFileProviderTypeLister;
change the logic so s.providerTypes() is only invoked when fileReq.Provider
(providerType) is empty: move the providers, err := s.providerTypes() call and
its error handling inside the providerType == "" branch (the branch that checks
len(providers) == 0/1/>1), leaving the path that has a provided providerType to
proceed directly to nativeRouter.CreateFile without consulting provider
inventory; keep references to providerType, fileReq.Provider, handleError, and
nativeRouter.CreateFile unchanged.

In `@tests/e2e/release-e2e-scenarios.md`:
- Line 542: The hardcoded "sleep 6" is environment-dependent and can cause flaky
E2E runs; replace each occurrence of "sleep 6" (the three spots including the
one shown) with "sleep $FLUSH_WAIT_SECONDS" and document or set a default at
environment setup by exporting FLUSH_WAIT_SECONDS=${FLUSH_WAIT_SECONDS:-6} so CI
can override it; ensure any README or environment setup block adds that export
and update the three lines (the exact token "sleep 6") to use the new variable.
- Around line 469-475: The test writes a fixed temporary file
/tmp/qa-mixed-provider-batch.jsonl which can collide across runs; update the
script that creates the file (the here-doc for
/tmp/qa-mixed-provider-batch.jsonl and subsequent FILE_ID/curl calls) to use a
unique temp filename (e.g., via mktemp or appending $$ or a timestamp) and
ensure the file is removed after use (use an explicit rm or a trap cleanup) so
FILE_ID and the curl -d steps operate on the unique file and no stale artifact
remains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: ee4c759f-c4cd-4042-a7d3-85c758ad5713

📥 Commits

Reviewing files that changed from the base of the PR and between 8dcbd65 and ee2096f.

📒 Files selected for processing (15)
  • 2026-03-16_ARCHITECTURE_SNAPSHOT.md
  • ARCHITECTURE.md
  • internal/aliases/provider.go
  • internal/aliases/provider_test.go
  • internal/auditlog/auditlog_test.go
  • internal/auditlog/middleware.go
  • internal/guardrails/executor.go
  • internal/guardrails/provider_test.go
  • internal/server/handlers_test.go
  • internal/server/model_validation.go
  • internal/server/model_validation_test.go
  • internal/server/native_file_service.go
  • internal/server/passthrough_service.go
  • tests/e2e/auditlog_test.go
  • tests/e2e/release-e2e-scenarios.md
💤 Files with no reviewable changes (1)
  • ARCHITECTURE.md

Comment on lines +100 to +115
case core.OperationChatCompletions, core.OperationResponses, core.OperationEmbeddings:
plan.Mode = core.ExecutionModeTranslated
resolution, parsed, err := ensureRequestModelResolution(c, provider, resolver)
if err != nil {
return nil, err
}
if !parsed || resolution == nil {
return plan, nil
}
if counted, ok := provider.(modelCountProvider); ok && counted.ModelCount() == 0 {
return nil, core.NewProviderError("", 0, "model registry not initialized", nil)
}
plan.ProviderType = resolution.ProviderType
plan.Resolution = resolution
auditlog.EnrichEntryWithExecutionPlan(c, plan)
return plan, nil
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t drop provider context on the model-registry failure path.

resolution is already known here, but this branch returns core.NewProviderError("", ...) before the plan is populated or stored. On registry outages the response and audit trail lose the resolved provider/model right when that context is most useful.

Suggested fix
 	case core.OperationChatCompletions, core.OperationResponses, core.OperationEmbeddings:
 		plan.Mode = core.ExecutionModeTranslated
 		resolution, parsed, err := ensureRequestModelResolution(c, provider, resolver)
 		if err != nil {
 			return nil, err
 		}
 		if !parsed || resolution == nil {
 			return plan, nil
 		}
+		plan.ProviderType = resolution.ProviderType
+		plan.Resolution = resolution
+		auditlog.EnrichEntryWithExecutionPlan(c, plan)
 		if counted, ok := provider.(modelCountProvider); ok && counted.ModelCount() == 0 {
-			return nil, core.NewProviderError("", 0, "model registry not initialized", nil)
+			storeExecutionPlan(c, plan)
+			return nil, core.NewProviderError(resolution.ProviderType, 0, "model registry not initialized", nil)
 		}
-		plan.ProviderType = resolution.ProviderType
-		plan.Resolution = resolution
-		auditlog.EnrichEntryWithExecutionPlan(c, plan)
 		return plan, nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/model_validation.go` around lines 100 - 115, When the model
registry is uninitialized (the modelCountProvider branch) we currently return
core.NewProviderError before populating the plan, losing provider/model context;
fix by assigning plan.ProviderType = resolution.ProviderType and plan.Resolution
= resolution and calling auditlog.EnrichEntryWithExecutionPlan(c, plan) prior to
returning the error so the audit trail and response retain the resolved
provider/model information (references: plan, resolution, provider,
modelCountProvider, core.NewProviderError,
auditlog.EnrichEntryWithExecutionPlan, ensureRequestModelResolution).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/auditlog/middleware.go`:
- Around line 183-187: The audit entry isn't getting the canonical passthrough
path; when plan.Mode == core.ExecutionModePassthrough and plan.Passthrough !=
nil (the block that already sets entry.Model), also set entry.Path =
plan.Passthrough.AuditPath (after trimming/validating if needed) so the logged
entry uses the canonical passthrough audit path instead of the raw gateway
route; update the same conditional that references plan.Mode,
core.ExecutionModePassthrough, and plan.Passthrough to copy
plan.Passthrough.AuditPath into entry.Path.

In `@internal/server/native_file_service.go`:
- Around line 177-189: The ListFiles handler must normalize nil responses and
nil resp.Data to an empty slice so callers always see data: [] instead of data:
null; after calling nativeRouter.ListFiles (in the branch where providerType !=
""), ensure that if resp == nil you instantiate resp =
&core.FileListResponse{Object: "list", Data: make([]*core.FileMeta, 0)} and if
resp.Object == "" set resp.Object = "list", and if resp.Data == nil set
resp.Data = make([]*core.FileMeta, 0); keep existing error handling via
handleError and auditlog.EnrichEntry unchanged.
- Around line 204-223: The GetFile and DeleteFile success handlers currently
pass provider results directly to c.JSON, which yields a 200 null when the
provider returns a typed nil; update the anonymous result handlers passed into
fileByID (used by nativeFileService.GetFile and nativeFileService.DeleteFile) to
detect typed-nil results (e.g., result == nil or
reflect.ValueOf(result).IsValid() && reflect.ValueOf(result).IsNil()) and treat
them as a gateway/provider error instead of returning 200; return an appropriate
error response (same shape you use elsewhere for provider failures, consistent
with GetFileContent’s behavior) so typed-nil results become an error response
rather than 200 null.
- Around line 352-374: The ListFiles response handling can create an infinite
loop when a backend ignores the `after` cursor and returns the same last item
with `HasMore=true`; update the block that processes `resp` (after calling
nativeRouter.ListFiles) to detect a non-advancing cursor by comparing the last
item's ID to the current `state.after` (use `state.after`, `state.items`,
`resp.HasMore` and the last item ID), and if they are equal treat the page as
terminal (set `state.hasMore=false` and `state.exhausted=true`, optionally log a
warning) so `nextMergedFile` won't keep reloading the same page forever.

In `@internal/server/request_snapshot_test.go`:
- Around line 195-197: Rename the test function currently named
TestModelValidation_UsesSemanticEnvelopeWithoutReadingBody to match the
refactored middleware name by renaming it to
TestExecutionPlanning_UsesSemanticEnvelopeWithoutReadingBody; update the test
declaration (the `func
TestModelValidation_UsesSemanticEnvelopeWithoutReadingBody(t *testing.T) { ...
}`) and any references to that identifier so it reflects the middleware under
test (`ExecutionPlanning`) and keeps the test behavior unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 82fec648-7781-4544-8323-d54f3cdc9b91

📥 Commits

Reviewing files that changed from the base of the PR and between ee2096f and 9890a51.

📒 Files selected for processing (7)
  • .golangci.yml
  • internal/auditlog/auditlog_test.go
  • internal/auditlog/middleware.go
  • internal/server/handlers_test.go
  • internal/server/model_validation.go
  • internal/server/native_file_service.go
  • internal/server/request_snapshot_test.go

Comment on lines +183 to +187
if plan.Mode == core.ExecutionModePassthrough && plan.Passthrough != nil {
if model := strings.TrimSpace(plan.Passthrough.Model); model != "" {
entry.Model = model
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Apply Passthrough.AuditPath to the audit entry.

entry.Path is initialized from the raw /p/:provider/* route, and this block only overrides model/provider. Now that the execution plan carries a canonical passthrough audit path, these requests will still be logged under the gateway route unless you copy plan.Passthrough.AuditPath here.

Minimal fix
 	if plan.Mode == core.ExecutionModePassthrough && plan.Passthrough != nil {
+		if auditPath := strings.TrimSpace(plan.Passthrough.AuditPath); auditPath != "" {
+			entry.Path = auditPath
+		}
 		if model := strings.TrimSpace(plan.Passthrough.Model); model != "" {
 			entry.Model = model
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/auditlog/middleware.go` around lines 183 - 187, The audit entry
isn't getting the canonical passthrough path; when plan.Mode ==
core.ExecutionModePassthrough and plan.Passthrough != nil (the block that
already sets entry.Model), also set entry.Path = plan.Passthrough.AuditPath
(after trimming/validating if needed) so the logged entry uses the canonical
passthrough audit path instead of the raw gateway route; update the same
conditional that references plan.Mode, core.ExecutionModePassthrough, and
plan.Passthrough to copy plan.Passthrough.AuditPath into entry.Path.

Comment on lines +177 to +189
if providerType != "" {
auditlog.EnrichEntry(c, "file", providerType)
resp, err := nativeRouter.ListFiles(c.Request().Context(), providerType, purpose, limit, after)
if err != nil {
return handleError(c, err)
}
if resp == nil {
resp = &core.FileListResponse{Object: "list"}
}
if resp.Object == "" {
resp.Object = "list"
}
return c.JSON(http.StatusOK, resp)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalize empty file lists to data: [].

When ListFiles returns nil or leaves resp.Data nil, this path serializes data: null. That is not a conservative OpenAI-compatible list shape and can break clients that always iterate data.

Minimal fix
 		resp, err := nativeRouter.ListFiles(c.Request().Context(), providerType, purpose, limit, after)
 		if err != nil {
 			return handleError(c, err)
 		}
 		if resp == nil {
-			resp = &core.FileListResponse{Object: "list"}
+			resp = &core.FileListResponse{Object: "list", Data: []core.FileObject{}}
 		}
 		if resp.Object == "" {
 			resp.Object = "list"
 		}
+		if resp.Data == nil {
+			resp.Data = []core.FileObject{}
+		}
 		return c.JSON(http.StatusOK, resp)
 	}

As per coding guidelines, "Accept provider responses liberally and pass them to the user in a conservative OpenAI-compatible shape."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if providerType != "" {
auditlog.EnrichEntry(c, "file", providerType)
resp, err := nativeRouter.ListFiles(c.Request().Context(), providerType, purpose, limit, after)
if err != nil {
return handleError(c, err)
}
if resp == nil {
resp = &core.FileListResponse{Object: "list"}
}
if resp.Object == "" {
resp.Object = "list"
}
return c.JSON(http.StatusOK, resp)
if providerType != "" {
auditlog.EnrichEntry(c, "file", providerType)
resp, err := nativeRouter.ListFiles(c.Request().Context(), providerType, purpose, limit, after)
if err != nil {
return handleError(c, err)
}
if resp == nil {
resp = &core.FileListResponse{Object: "list", Data: []core.FileObject{}}
}
if resp.Object == "" {
resp.Object = "list"
}
if resp.Data == nil {
resp.Data = []core.FileObject{}
}
return c.JSON(http.StatusOK, resp)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/native_file_service.go` around lines 177 - 189, The ListFiles
handler must normalize nil responses and nil resp.Data to an empty slice so
callers always see data: [] instead of data: null; after calling
nativeRouter.ListFiles (in the branch where providerType != ""), ensure that if
resp == nil you instantiate resp = &core.FileListResponse{Object: "list", Data:
make([]*core.FileMeta, 0)} and if resp.Object == "" set resp.Object = "list",
and if resp.Data == nil set resp.Data = make([]*core.FileMeta, 0); keep existing
error handling via handleError and auditlog.EnrichEntry unchanged.

Comment on lines +204 to +223
func (s *nativeFileService) GetFile(c *echo.Context) error {
return s.fileByID(c,
func(r core.NativeFileRoutableProvider, provider, id string) (any, error) {
return r.GetFile(c.Request().Context(), provider, id)
},
func(c *echo.Context, result any) error {
return c.JSON(http.StatusOK, result)
},
)
}

func (s *nativeFileService) DeleteFile(c *echo.Context) error {
return s.fileByID(c,
func(r core.NativeFileRoutableProvider, provider, id string) (any, error) {
return r.DeleteFile(c.Request().Context(), provider, id)
},
func(c *echo.Context, result any) error {
return c.JSON(http.StatusOK, result)
},
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject typed-nil GetFile/DeleteFile results instead of returning 200 null.

GetFileContent already guards the (*T)(nil), nil case, but these two success paths still hand typed-nil provider results straight to c.JSON. In Go that becomes a successful null payload instead of a gateway error.

Minimal fix
 func (s *nativeFileService) GetFile(c *echo.Context) error {
 	return s.fileByID(c,
 		func(r core.NativeFileRoutableProvider, provider, id string) (any, error) {
 			return r.GetFile(c.Request().Context(), provider, id)
 		},
 		func(c *echo.Context, result any) error {
-			return c.JSON(http.StatusOK, result)
+			resp, ok := result.(*core.FileObject)
+			if !ok || resp == nil {
+				return handleError(c, core.NewProviderError("", http.StatusBadGateway, "provider returned empty file response", nil))
+			}
+			return c.JSON(http.StatusOK, resp)
 		},
 	)
 }
 
 func (s *nativeFileService) DeleteFile(c *echo.Context) error {
 	return s.fileByID(c,
 		func(r core.NativeFileRoutableProvider, provider, id string) (any, error) {
 			return r.DeleteFile(c.Request().Context(), provider, id)
 		},
 		func(c *echo.Context, result any) error {
-			return c.JSON(http.StatusOK, result)
+			resp, ok := result.(*core.FileDeleteResponse)
+			if !ok || resp == nil {
+				return handleError(c, core.NewProviderError("", http.StatusBadGateway, "provider returned empty file delete response", nil))
+			}
+			return c.JSON(http.StatusOK, resp)
 		},
 	)
 }

As per coding guidelines, "Accept provider responses liberally and pass them to the user in a conservative OpenAI-compatible shape."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (s *nativeFileService) GetFile(c *echo.Context) error {
return s.fileByID(c,
func(r core.NativeFileRoutableProvider, provider, id string) (any, error) {
return r.GetFile(c.Request().Context(), provider, id)
},
func(c *echo.Context, result any) error {
return c.JSON(http.StatusOK, result)
},
)
}
func (s *nativeFileService) DeleteFile(c *echo.Context) error {
return s.fileByID(c,
func(r core.NativeFileRoutableProvider, provider, id string) (any, error) {
return r.DeleteFile(c.Request().Context(), provider, id)
},
func(c *echo.Context, result any) error {
return c.JSON(http.StatusOK, result)
},
)
func (s *nativeFileService) GetFile(c *echo.Context) error {
return s.fileByID(c,
func(r core.NativeFileRoutableProvider, provider, id string) (any, error) {
return r.GetFile(c.Request().Context(), provider, id)
},
func(c *echo.Context, result any) error {
resp, ok := result.(*core.FileObject)
if !ok || resp == nil {
return handleError(c, core.NewProviderError("", http.StatusBadGateway, "provider returned empty file response", nil))
}
return c.JSON(http.StatusOK, resp)
},
)
}
func (s *nativeFileService) DeleteFile(c *echo.Context) error {
return s.fileByID(c,
func(r core.NativeFileRoutableProvider, provider, id string) (any, error) {
return r.DeleteFile(c.Request().Context(), provider, id)
},
func(c *echo.Context, result any) error {
resp, ok := result.(*core.FileDeleteResponse)
if !ok || resp == nil {
return handleError(c, core.NewProviderError("", http.StatusBadGateway, "provider returned empty file delete response", nil))
}
return c.JSON(http.StatusOK, resp)
},
)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/native_file_service.go` around lines 204 - 223, The GetFile
and DeleteFile success handlers currently pass provider results directly to
c.JSON, which yields a 200 null when the provider returns a typed nil; update
the anonymous result handlers passed into fileByID (used by
nativeFileService.GetFile and nativeFileService.DeleteFile) to detect typed-nil
results (e.g., result == nil or reflect.ValueOf(result).IsValid() &&
reflect.ValueOf(result).IsNil()) and treat them as a gateway/provider error
instead of returning 200; return an appropriate error response (same shape you
use elsewhere for provider failures, consistent with GetFileContent’s behavior)
so typed-nil results become an error response rather than 200 null.

Comment on lines +352 to +374
resp, err := nativeRouter.ListFiles(ctx, state.provider, purpose, limit, state.after)
if err != nil {
return false, err
}

state.index = 0
state.items = nil
state.hasMore = false
state.exhausted = true
if resp == nil || len(resp.Data) == 0 {
return false, nil
}

state.items = make([]core.FileObject, len(resp.Data))
copy(state.items, resp.Data)
for i := range state.items {
if strings.TrimSpace(state.items[i].Provider) == "" {
state.items[i].Provider = state.provider
}
}
state.hasMore = resp.HasMore
state.after = strings.TrimSpace(state.items[len(state.items)-1].ID)
state.exhausted = false
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Detect non-advancing file cursors before loading the next page.

If a backend ignores after and keeps returning the same last item with HasMore=true, nextMergedFile will keep reloading the same page forever. Requests that are still scanning for the caller's after cursor never terminate in that case.

Minimal fix
 func loadProviderFilePage(
 	ctx context.Context,
 	nativeRouter core.NativeFileRoutableProvider,
 	state *providerFileListState,
 	purpose string,
 	limit int,
 ) (bool, error) {
 	if state == nil {
 		return false, nil
 	}
 
+	previousAfter := state.after
 	resp, err := nativeRouter.ListFiles(ctx, state.provider, purpose, limit, state.after)
 	if err != nil {
 		return false, err
 	}
@@
 	for i := range state.items {
 		if strings.TrimSpace(state.items[i].Provider) == "" {
 			state.items[i].Provider = state.provider
 		}
 	}
 	state.hasMore = resp.HasMore
-	state.after = strings.TrimSpace(state.items[len(state.items)-1].ID)
+	nextAfter := strings.TrimSpace(state.items[len(state.items)-1].ID)
+	if state.hasMore && (nextAfter == "" || nextAfter == previousAfter) {
+		return false, core.NewProviderError(state.provider, http.StatusBadGateway, "provider returned a non-advancing file pagination cursor", nil)
+	}
+	state.after = nextAfter
 	state.exhausted = false
 	return true, nil
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/native_file_service.go` around lines 352 - 374, The ListFiles
response handling can create an infinite loop when a backend ignores the `after`
cursor and returns the same last item with `HasMore=true`; update the block that
processes `resp` (after calling nativeRouter.ListFiles) to detect a
non-advancing cursor by comparing the last item's ID to the current
`state.after` (use `state.after`, `state.items`, `resp.HasMore` and the last
item ID), and if they are equal treat the page as terminal (set
`state.hasMore=false` and `state.exhausted=true`, optionally log a warning) so
`nextMergedFile` won't keep reloading the same page forever.

Comment on lines +195 to 197
handler := ExecutionPlanning(provider)(func(c *echo.Context) error {
return c.String(http.StatusOK, "ok")
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider renaming the test function to match the middleware.

The implementation correctly uses ExecutionPlanning, but the test function is still named TestModelValidation_UsesSemanticEnvelopeWithoutReadingBody (line 167). Consider renaming to TestExecutionPlanning_UsesSemanticEnvelopeWithoutReadingBody for consistency with the refactored middleware name.

📝 Suggested rename
-func TestModelValidation_UsesSemanticEnvelopeWithoutReadingBody(t *testing.T) {
+func TestExecutionPlanning_UsesSemanticEnvelopeWithoutReadingBody(t *testing.T) {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/server/request_snapshot_test.go` around lines 195 - 197, Rename the
test function currently named
TestModelValidation_UsesSemanticEnvelopeWithoutReadingBody to match the
refactored middleware name by renaming it to
TestExecutionPlanning_UsesSemanticEnvelopeWithoutReadingBody; update the test
declaration (the `func
TestModelValidation_UsesSemanticEnvelopeWithoutReadingBody(t *testing.T) { ...
}`) and any references to that identifier so it reflects the middleware under
test (`ExecutionPlanning`) and keeps the test behavior unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release:internal Internal changes excluded from release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant