Skip to content

extproc: properly stream chat completions#468

Merged
mathetake merged 5 commits intomainfrom
streamingcorrectly
Mar 7, 2025
Merged

extproc: properly stream chat completions#468
mathetake merged 5 commits intomainfrom
streamingcorrectly

Conversation

@mathetake
Copy link
Copy Markdown
Member

Commit Message

This fixes a bug in the extproc when handling stream=true requests. Previously, mode_override was set at the request body handling phase, and it was not set in the response headers phase. That resulted in buffering the entire response body which is clearly not ideal as from clients point of view, they will receive the entire streaming vs line by line. This refactors around the mode override and properly handle it.

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake mathetake marked this pull request as ready for review March 7, 2025 19:49
@mathetake mathetake requested a review from a team as a code owner March 7, 2025 19:49
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is where the actual change took place: Instead of setting mode override in ProcessRequestBody, this will instead set it in ProcessResponseHeaders

Copy link
Copy Markdown
Contributor

@aabchoo aabchoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good to me

tokenUsage LLMTokenUsage,
err error,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cause it's not used :)

@mathetake
Copy link
Copy Markdown
Member Author

mathetake commented Mar 7, 2025

i am trying to write a regression test that actually verifies the streaming behavior

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake
Copy link
Copy Markdown
Member Author

ok the regression test added (verified it's failing on the current main)

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
@mathetake mathetake merged commit b7c658a into main Mar 7, 2025
15 checks passed
@mathetake mathetake deleted the streamingcorrectly branch March 7, 2025 21:38
mathetake added a commit that referenced this pull request Mar 8, 2025
**Commit Message**

This removes the low log levels accidentally enabled in #468 

**Related Issues/PRs (if applicable)**

Follow up on #468

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
aabchoo pushed a commit that referenced this pull request Mar 14, 2025
**Commit Message**

This fixes a bug in the extproc when handling stream=true requests.
Previously, mode_override was set at the request body handling phase,
and it was not set in the response headers phase. That resulted in
buffering the entire response body which is clearly not ideal as from
clients point of view, they will receive the entire streaming vs line by
line. This refactors around the mode override and properly handle it.

---------

Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
aabchoo added a commit that referenced this pull request Mar 14, 2025
**Commit Message**

PR to backport `mockChatCompletionMetrics`, chat completion stream fix,
and openai content type.

Including:
- #459 (468 uses mock components introduced here)
- #468 
- #486

---------

Signed-off-by: Huamin Chen <hchen@redhat.com>
Signed-off-by: Ignasi Barrera <ignasi@tetrate.io>
Signed-off-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Signed-off-by: Aaron Choo <achoo30@bloomberg.net>
Co-authored-by: Ignasi Barrera <ignasi@tetrate.io>
Co-authored-by: Takeshi Yoneda <t.y.mathetake@gmail.com>
Co-authored-by: Dan Sun <dsun20@bloomberg.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants