Skip to content

feat: support more delimiter for SSE events in GCP gemini#1536

Merged
nacx merged 6 commits intoenvoyproxy:mainfrom
sukumargaonkar:gcp-stream-sse-fix
Nov 20, 2025
Merged

feat: support more delimiter for SSE events in GCP gemini#1536
nacx merged 6 commits intoenvoyproxy:mainfrom
sukumargaonkar:gcp-stream-sse-fix

Conversation

@sukumargaonkar
Copy link
Copy Markdown
Contributor

@sukumargaonkar sukumargaonkar commented Nov 13, 2025

Description
The SSE event spec supports 3 different delimiters (pair of CRLF / LF / CR). This PR updates the stream processing translator for GCP VertexAI to support all 3 delimiters.

SSE docs about delimiters

Note: The docs define an event as event = *( comment / field ) end-of-line
end-of-line is defined as a single CRLF or LF or CR (not a pair)
But the comment and field definitions also end in end-of-line implying that an event always ends in a pair of CRLR / CR / LF

…tex AI

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Nov 13, 2025

Codecov Report

❌ Patch coverage is 96.42857% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 84.19%. Comparing base (bbc2aa3) to head (755ec9d).

Files with missing lines Patch % Lines
internal/translator/openai_gcpvertexai.go 96.42% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1536      +/-   ##
==========================================
+ Coverage   84.15%   84.19%   +0.03%     
==========================================
  Files         150      150              
  Lines       12984    13005      +21     
==========================================
+ Hits        10927    10949      +22     
+ Misses       1436     1434       -2     
- Partials      621      622       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

}
lines := bytes.Split(bodyBytes, []byte("\n\n"))
bodyReader := io.MultiReader(bytes.NewReader(o.bufferedBody), body)
scanner := bufio.NewScanner(bodyReader)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When implementing the mcp sse parser, I got issues with an initial implementation that used the Scanner; for some servers, I got long lines and the parsing failed with bufio.ErrTooLong (bufio.Scanner: token too long). I only caught this when using some real MCP servers.
I don't know if this can be the case as well here, but just a data point to consider not using the scanner if long lines could be expected.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point.
as of go 1.25.3 the default for MaxScanTokenSize seems to be 64KB

Since each streaming chunk represents couple of generated tokens, the size of each chunk should not exceed this limit. though i couldn't find docs that guarantee that the chunk will be smaller than a specific size

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What comes from the LLM is inherently unpredictable, and I don't know if we can make that statement (for example, there are some tests that use chunks that have tool calls, and the length of the arguments could be unpredictable...

I agree it is unlikely to happen, but my main concern is that we are introducing a limitation we didn't have before, and I don't want this to hit us in the form of a bug.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

two approaches i can think of

  1. make the MaxScanTokenSize configurable
  2. follow the mcp sse parser logic to explicitly parse the chunks

I personally prefer option 1 as relying on scanner to do the parsing aspect is simpler and easier maintain

what do you think @nacx ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about a third option: in the previous implementation, we're loading everything in memory and doing a bytes.Split. Why don't just keep that, but use the right function to split? I mean, test for the presence of the delimiters to figure out the "split characters", then keep the original code just splitting using those.

This way we don't change the behaviour, keep it backwards-compatible, and we don't introduce a new config option that users won't know what value to set confidently and how to set it. Also, if we find loading everything in memory to be an issue (it hasn't been till now), then we can address the stream reader approach, but probably it's not needed for this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated implementation in 52781e5

another round of review @nacx ?

@nacx nacx self-assigned this Nov 13, 2025
@sukumargaonkar sukumargaonkar marked this pull request as ready for review November 13, 2025 15:34
@sukumargaonkar sukumargaonkar requested a review from a team as a code owner November 13, 2025 15:34
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 13, 2025
Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>

# Conflicts:
#	internal/translator/openai_gcpvertexai.go
@sukumargaonkar
Copy link
Copy Markdown
Contributor Author

unsure about test failures

github.com/envoyproxy/ai-gateway/tests/e2e-upgrade

timed out waiting for the condition on pods/envoy-default-upgrade-test-bd44e180-cd5d859c6-2rj22
    upgrade_test.go:146: 
        	Error Trace:	/home/runner/work/ai-gateway/ai-gateway/tests/e2e-upgrade/upgrade_test.go:146
        	Error:      	Received unexpected error:
        	            	[before upgrade (requests made: 0, current pods: envoy-default-upgrade-test-bd44e180-6d66b8f5d5-7q68j(Running), envoy-default-upgrade-test-bd44e180-cd5d859c6-2rj22(Running))] unexpected status code 500: body=
        	Test:       	TestUpgrade/rolling_out_pods

github.com/envoyproxy/ai-gateway/tests/e2e-inference-extension

error: error executing jsonpath "'{.items[0].spec.initContainers[*].name} {.items[0].spec.containers[*].name}'": Error executing template: array index out of bounds: index 0, length 0. Printing more information for debugging the template:
	template was:
		'{.items[0].spec.initContainers[*].name} {.items[0].spec.containers[*].name}'
	object given to jsonpath engine was:
		map[string]interface {}{"apiVersion":"v1", "items":[]interface {}{}, "kind":"List", "metadata":map[string]interface {}{"resourceVersion":""}}

@nacx
Copy link
Copy Markdown
Member

nacx commented Nov 18, 2025

/retest

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Copy link
Copy Markdown
Member

@nacx nacx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@nacx nacx enabled auto-merge (squash) November 20, 2025 18:27
@nacx
Copy link
Copy Markdown
Member

nacx commented Nov 20, 2025

/retest

@nacx nacx merged commit 54301da into envoyproxy:main Nov 20, 2025
45 of 49 checks passed
hustxiayang pushed a commit to hustxiayang/ai-gateway that referenced this pull request Nov 21, 2025
…#1536)

**Description**
The SSE event spec supports 3 different delimiters (pair of CRLF / LF /
CR). This PR updates the stream processing translator for GCP VertexAI
to support all 3 delimiters.

[SSE docs] about delimiters
> **Note:** The docs define an event as `event = *( comment / field )
end-of-line`
`end-of-line` is defined as a single CRLF or LF or CR (not a pair)
But the `comment` and `field` definitions also end in `end-of-line`
implying that an event always ends in a pair of CRLR / CR / LF

[SSE docs]:
https://html.spec.whatwg.org/multipage/server-sent-events.html#parsing-an-event-stream

---------

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Co-authored-by: Ignasi Barrera <ignasi@tetrate.io>
Signed-off-by: yxia216 <yxia216@bloomberg.net>
missBerg pushed a commit to missBerg/ai-gateway that referenced this pull request Dec 20, 2025
…#1536)

**Description**
The SSE event spec supports 3 different delimiters (pair of CRLF / LF /
CR). This PR updates the stream processing translator for GCP VertexAI
to support all 3 delimiters.

[SSE docs] about delimiters
> **Note:** The docs define an event as `event = *( comment / field )
end-of-line`
`end-of-line` is defined as a single CRLF or LF or CR (not a pair)
But the `comment` and `field` definitions also end in `end-of-line`
implying that an event always ends in a pair of CRLR / CR / LF

[SSE docs]:
https://html.spec.whatwg.org/multipage/server-sent-events.html#parsing-an-event-stream

---------

Signed-off-by: Sukumar Gaonkar <sgaonkar4@bloomberg.net>
Co-authored-by: Ignasi Barrera <ignasi@tetrate.io>
Signed-off-by: Erica Hughberg <erica.sundberg.90@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants