perf(streaming): fast-path unchanged chat streams by SantiagoDePolonia · Pull Request #171 · ENTERPILOT/GoModel

SantiagoDePolonia · 2026-03-23T18:34:06Z

Summary

add a conservative translated chat-stream fast path that forwards the original request body through provider passthrough when the upstream stream can be returned unchanged
keep the existing translated stream path when the request selector must be rewritten, usage injection is enabled, or OpenAI o-series body adaptation is required
add server tests covering the passthrough fast path and the fallback cases

Testing

go test ./internal/server
repo git hooks triggered by git commit

Benchmark

Local mock benchmark against the OpenAI-compatible streaming path after this change:

/v1/chat/completions improved from 3532 req/s and 13.76ms p50 TTFB to 3844 req/s and 12.08ms p50 TTFB
/v1/responses is unchanged by design in this PR

Summary by CodeRabbit

Release Notes

Performance Improvements
- Introduced fast-path optimization for streaming chat completions with compatible providers, enabling direct passthrough of responses for eligible requests and reducing processing overhead.
Tests
- Enhanced streaming handler tests to validate passthrough routing and stream translation behavior across different configurations.

coderabbitai · 2026-03-23T18:34:12Z

Warning

Rate limit exceeded

@SantiagoDePolonia has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 6 minutes and 29 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 500c0057-d0e1-4cbd-9012-38db0bb19067

📥 Commits

Reviewing files that changed from the base of the PR and between c41cf55 and 1a8904d.

📒 Files selected for processing (2)

internal/server/handlers_test.go
internal/server/translated_inference_service.go

📝 Walkthrough

Walkthrough

The changes introduce a fast-path optimization for streaming chat completions in OpenAI-compatible providers, allowing eligible requests to bypass translation and pass through directly, while adding corresponding test coverage for both passthrough and non-passthrough streaming scenarios.

Changes

Cohort / File(s)	Summary
Streaming Fast-Path Implementation `internal/server/translated_inference_service.go`	Added conditional fast-path for streaming `ChatCompletion` requests that validates eligibility (no request patcher, usage enforcement disabled, provider in `{openai, azure, openrouter}`, no selector rewrites) and directly proxies compatible requests via `tryFastPathStreamingChatPassthrough`. Includes helper functions to detect selector rewrite and streaming body rewrite requirements.
Streaming Handler Tests `internal/server/handlers_test.go`	Added two new tests asserting fast-path behavior: one verifying passthrough SSE stream is returned verbatim when provider maps to `"openai"` with passthrough configured, and another confirming qualified models (`openai/...`) route through stream translation instead of passthrough. Updated existing streaming tests to configure `providerTypes` and verify `lastPassthroughReq` remains nil.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Handler as ChatCompletion Handler
    participant FastPath as Fast-Path Check
    participant Passthrough as Passthrough Service
    participant Translation as Stream Translation Path
    participant Provider as Provider

    Client->>Handler: Stream ChatCompletion Request
    Handler->>FastPath: canFastPathStreamingChatPassthrough?
    
    alt Fast-Path Eligible
        FastPath-->>Handler: true
        Handler->>Passthrough: tryFastPathStreamingChatPassthrough
        Passthrough->>Provider: Passthrough (raw HTTP body)
        Provider-->>Passthrough: SSE Stream Response
        Passthrough->>Handler: proxyPassthroughResponse
        Handler-->>Client: Streamed Response (passthrough)
    else Fast-Path Not Eligible
        FastPath-->>Handler: false
        Handler->>Translation: handleStreamingResponse
        Translation->>Provider: StreamChatCompletion (translated)
        Provider-->>Translation: Translated SSE Stream
        Translation->>Handler: Streamed Response
        Handler-->>Client: Streamed Response (translated)
    end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 Fast streams now hop through OpenAI's door,
No translation needed anymore!
With eligibility checks held tight,
Passthrough paths shine in the light. 🚀✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding a fast-path for unchanged chat streams, which is the core feature across both modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch perf/responses-streaming

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/server/translated_inference_service.go`:
- Around line 141-147: proxyPassthroughResponse called from ChatCompletion
currently returns raw stream I/O errors (e.g., from flushStream/copy) which can
leak non-core.GatewayError values; change the call site in ChatCompletion to
mirror handleStreamingResponse’s behavior: invoke
passthrough.proxyPassthroughResponse(...), but do not bubble raw errors — if the
passthrough returns a stream I/O error after response commit, log it via
s.logger/s.usageLogger and convert/suppress it by returning nil (or wrap it as a
core.GatewayError if you must surface an error), ensuring only core.GatewayError
instances are returned to clients; reference
passthroughService.proxyPassthroughResponse, handleStreamingResponse, and
core.GatewayError when making the change.
- Around line 125-129: The fast-path is forwarding c.Request().Body after
canonicalJSONRequestFromSemantics called requestBodyBytes(), which in the
snapshot case returns snapshot.CapturedBodyView() without rehydrating
c.Request().Body, so rebuild the request body from the canonical bytes before
calling passthroughProvider.Passthrough (in translated_inference_service.go
where Passthrough is invoked); specifically, when requestBodyBytes or
canonicalJSONRequestFromSemantics returns snapshot bytes, replace or rehydrate
c.Request().Body with a new readable body constructed from those bytes (or have
requestBodyBytes perform the rehydration) so Passthrough receives a valid,
non-consumed Body.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6673c86b-d321-4bf0-a585-78a093bbb48f

📥 Commits

Reviewing files that changed from the base of the PR and between a0a1d06 and c41cf55.

📒 Files selected for processing (2)

internal/server/handlers_test.go
internal/server/translated_inference_service.go

coderabbitai · 2026-03-23T18:57:30Z

internal/server/translated_inference_service.go

+	resp, err := passthroughProvider.Passthrough(ctx, providerType, &core.PassthroughRequest{
+		Method:   c.Request().Method,
+		Endpoint: endpoint,
+		Body:     c.Request().Body,
+		Headers:  buildPassthroughHeaders(ctx, c.Request().Header),


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "== canonicalJSONRequestFromSemantics implementation ==" rg -n -C4 'func canonicalJSONRequestFromSemantics|canonicalJSONRequestFromSemantics\[' internal echo echo "== request-snapshot helpers and any body rehydration ==" rg -n -C3 'RequestSnapshot|WithRequestSnapshot|Body\s*=.*(io\.NopCloser|bytes\.NewReader)|SetRequest\(' internal echo echo "== fast-path call site ==" sed -n '120,130p' internal/server/translated_inference_service.go

Repository: ENTERPILOT/GOModel

Length of output: 50374

🏁 Script executed:

# Get the complete semanticJSONBody and lookupOrCaptureSemantics implementations head -60 internal/server/semantic_requests.go

Repository: ENTERPILOT/GOModel

Length of output: 1497

🏁 Script executed:

# Get the requestBodyBytes function from semantic_requests.go rg -n -A 20 'func requestBodyBytes' internal/server/semantic_requests.go

Repository: ENTERPILOT/GOModel

Length of output: 44

🏁 Script executed:

# Show lines 115-135 of translated_inference_service.go to see complete fast-path context sed -n '115,135p' internal/server/translated_inference_service.go

Repository: ENTERPILOT/GOModel

Length of output: 684

🏁 Script executed:

# Find where requestBodyBytes is defined rg -n 'func requestBodyBytes' internal/server/

Repository: ENTERPILOT/GOModel

Length of output: 158

🏁 Script executed:

# Get full requestBodyBytes implementation from semantic_requests.go rg -n -A 25 'func requestBodyBytes' internal/server/semantic_requests.go

Repository: ENTERPILOT/GOModel

Length of output: 44

🏁 Script executed:

# Get the full requestBodyBytes function from request_snapshot.go sed -n '151,172p' internal/server/request_snapshot.go

Repository: ENTERPILOT/GOModel

Length of output: 558

Body rehydration missing for ingress-snapshot fast-path.

When canonicalJSONRequestFromSemantics is called before the fast-path, it invokes requestBodyBytes() which detects the snapshot case and returns snapshot.CapturedBodyView() bytes without rehydrating c.Request().Body. The fast-path then tries to forward c.Request().Body, which remains the original consumed stream, causing the request to fail.

For ingress-managed endpoints that use snapshots, requestBodyBytes() must rehydrate the request body even in the snapshot case, or the fast-path must explicitly reconstruct the body from the canonical bytes before calling Passthrough.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/server/translated_inference_service.go` around lines 125 - 129, The fast-path is forwarding c.Request().Body after canonicalJSONRequestFromSemantics called requestBodyBytes(), which in the snapshot case returns snapshot.CapturedBodyView() without rehydrating c.Request().Body, so rebuild the request body from the canonical bytes before calling passthroughProvider.Passthrough (in translated_inference_service.go where Passthrough is invoked); specifically, when requestBodyBytes or canonicalJSONRequestFromSemantics returns snapshot bytes, replace or rehydrate c.Request().Body with a new readable body constructed from those bytes (or have requestBodyBytes perform the rehydration) so Passthrough receives a valid, non-consumed Body.

coderabbitai · 2026-03-23T18:57:30Z

internal/server/translated_inference_service.go

+	passthrough := passthroughService{
+		provider:        s.provider,
+		logger:          s.logger,
+		usageLogger:     s.usageLogger,
+		pricingResolver: s.pricingResolver,
+	}
+	return true, passthrough.proxyPassthroughResponse(c, providerType, endpoint, info, resp)


⚠️ Potential issue | 🟠 Major

Don’t bubble passthrough stream I/O errors out of ChatCompletion.

proxyPassthroughResponse() can return raw flushStream()/copy errors after the response is already committed, while handleStreamingResponse on Lines 269-273 records the failure and returns nil. Reusing it directly here changes /v1/chat/completions behavior on client disconnects and mid-stream transport failures, and leaks non-core.GatewayError handler errors on the hot streaming path. As per coding guidelines, All errors returned to clients must be instances of core.GatewayError.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@internal/server/translated_inference_service.go` around lines 141 - 147, proxyPassthroughResponse called from ChatCompletion currently returns raw stream I/O errors (e.g., from flushStream/copy) which can leak non-core.GatewayError values; change the call site in ChatCompletion to mirror handleStreamingResponse’s behavior: invoke passthrough.proxyPassthroughResponse(...), but do not bubble raw errors — if the passthrough returns a stream I/O error after response commit, log it via s.logger/s.usageLogger and convert/suppress it by returning nil (or wrap it as a core.GatewayError if you must surface an error), ensuring only core.GatewayError instances are returned to clients; reference passthroughService.proxyPassthroughResponse, handleStreamingResponse, and core.GatewayError when making the change.

perf(streaming): fast-path unchanged chat streams

c41cf55

SantiagoDePolonia marked this pull request as ready for review March 23, 2026 18:47

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

SantiagoDePolonia self-assigned this Mar 25, 2026

SantiagoDePolonia added 2 commits March 26, 2026 14:07

Merge remote-tracking branch 'origin/main' into perf/responses-streaming

63e29b6

fix(streaming): restore fast-path guards after main merge

1a8904d

SantiagoDePolonia merged commit ab9d4c1 into main Mar 26, 2026
15 checks passed

SantiagoDePolonia deleted the perf/responses-streaming branch April 4, 2026 11:36

coderabbitai bot mentioned this pull request Apr 7, 2026

fix(admin): polish dashboard actions and clarify telemetry labels #215

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(streaming): fast-path unchanged chat streams#171

perf(streaming): fast-path unchanged chat streams#171
SantiagoDePolonia merged 3 commits intomainfrom
perf/responses-streaming

SantiagoDePolonia commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 23, 2026 •

edited

Loading

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

coderabbitai bot Mar 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

SantiagoDePolonia commented Mar 23, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Benchmark

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SantiagoDePolonia commented Mar 23, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 23, 2026 •

edited

Loading