Skip to content

[model-gateway] Add e2e tests of streaming events and tool choice for response api#13880

Merged
slin1237 merged 10 commits intosgl-project:mainfrom
XinyueZhang369:xinyue/response-api-e2e-tests
Dec 1, 2025
Merged

[model-gateway] Add e2e tests of streaming events and tool choice for response api#13880
slin1237 merged 10 commits intosgl-project:mainfrom
XinyueZhang369:xinyue/response-api-e2e-tests

Conversation

@XinyueZhang369
Copy link
Copy Markdown

@XinyueZhang369 XinyueZhang369 commented Nov 25, 2025

Motivation

This PR adds more integration test cases for e2e responses API, gRPC backend.

Modifications

  • Add streaming event tests for output_index, OutputItemDone, and reasoning content in output array
  • Add tool choice tests

Test result

Screenshot 2025-11-24 at 7 28 41 PM

Accuracy Tests

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @XinyueZhang369, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness of the model-gateway's Response API by introducing comprehensive end-to-end integration tests. These new tests validate critical functionalities such as the accurate handling of streaming events, including zero-based indexing and event completion, and the correct behavior of the tool_choice parameter across various configurations and tool types. This ensures the API's reliability and predictability for both streaming responses and advanced tool integration.

Highlights

  • Streaming Event Validation: New end-to-end tests have been added to validate the behavior of streaming events in the Response API, ensuring correct zero-based output_index handling, proper emission of OutputItemDone events, and accurate construction of the output array in completed responses.
  • Tool Choice Parameter Testing: Comprehensive end-to-end tests for the tool_choice parameter have been introduced, covering various scenarios such as auto, required, and specific function choices, as well as verifying its functionality in streaming mode and with mixed tool types (function and MCP tools).
  • Harmony Backend Specific Tests: Dedicated tests for the Harmony backend ensure that reasoning content within streaming events correctly adheres to zero-based output_index and is properly included in the final output array of completed responses.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds valuable end-to-end tests for streaming events and the tool_choice parameter in the response API. The tests are well-structured and cover a good range of scenarios, including different backends and edge cases like mixed tool types. My main feedback is focused on improving the maintainability of the new test file test_tool_choice.py by refactoring duplicated and inconsistent tool definitions into shared constants. This will make the tests cleaner and easier to manage in the future.

Comment thread sgl-router/py_test/e2e_response_api/features/test_tool_choice.py Outdated
Copy link
Copy Markdown
Collaborator

@key4ng key4ng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm. noticed the ci running time increased to around 8 min. currently does every time we add a new class it will have to restart the backend?

Comment thread sgl-router/py_test/e2e_response_api/features/test_streaming_events.py Outdated
Comment thread sgl-router/py_test/e2e_response_api/features/test_streaming_events.py Outdated
Comment thread sgl-router/py_test/e2e_response_api/features/test_streaming_events.py Outdated
Comment thread sgl-router/py_test/e2e_response_api/features/test_tool_choice.py Outdated
@XinyueZhang369
Copy link
Copy Markdown
Author

overall lgtm. noticed the ci running time increased to around 8 min. currently does every time we add a new class it will have to restart the backend?

Sadly yes, I merged mcp , function call and tool choice tests into 1 test class to save some time

@XinyueZhang369
Copy link
Copy Markdown
Author

Also noticing that some tests like test_basic_function_call, can be a bit flaky, thinking about adding the retry for all responses e2e tests, what do you think?

Comment thread scripts/ci/ci_install_dependency.sh Outdated
@key4ng
Copy link
Copy Markdown
Collaborator

key4ng commented Dec 1, 2025

There is a ci-workflow change. May need @slin1237 's approval

@slin1237 slin1237 merged commit 1d66a14 into sgl-project:main Dec 1, 2025
55 checks passed
@XinyueZhang369 XinyueZhang369 deleted the xinyue/response-api-e2e-tests branch December 2, 2025 00:10
harvenstar pushed a commit to harvenstar/sglang that referenced this pull request Dec 4, 2025
… response api (sgl-project#13880)

Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
… response api (sgl-project#13880)

Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025
… response api (sgl-project#13880)

Co-authored-by: Simo Lin <linsimo.mark@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants