Revert "Implement `return_hidden_states` for the OpenAI API (#6137)" by zhyncs · Pull Request #6440 · sgl-project/sglang

zhyncs · 2025-05-20T01:20:13Z

This reverts commit 4f39bcf.

Motivation

Modifications

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

This reverts commit 4f39bcf.

zhyncs · 2025-05-20T01:21:24Z

Hi @kyle-pena-kuzco @Qiaolin-Yu @CatherineSue This pr breaks test/srt/test_openai_server.py

kyle-pena-kuzco · 2025-05-20T03:18:49Z

Hi @kyle-pena-kuzco @Qiaolin-Yu @CatherineSue This pr breaks test/srt/test_openai_server.py

Hi @zhyncs - thanks for the callout. We love the project and we want to make sure that our PRs meet the highest standards.

Could you help us understand which test case breaks and where you noticed the failure? This will help us pinpoint what you saw so we can address it. We are running test/srt/test_openai_server.py locally and we have all tests pass.

The only relevant github action result we could find was here, but it looked like maybe this failure was intermittent / random?
https://github.com/sgl-project/sglang/actions/runs/15107245347/job/42512558657#step:4:4522

zhyncs · 2025-05-20T03:20:37Z

Hi @kyle-pena-kuzco @Qiaolin-Yu @CatherineSue This pr breaks test/srt/test_openai_server.py

Hi @zhyncs - thanks for the callout. We love the project and we want to make sure that our PRs meet the highest standards.

Could you help us understand which test case breaks and where you noticed the failure? This will help us pinpoint what you saw so we can address it. We are running test/srt/test_openai_server.py locally and we have all tests pass.

The only relevant github action result we could find was here, but it looked like maybe this failure was intermittent / random? https://github.com/sgl-project/sglang/actions/runs/15107245347/job/42512558657#step:4:4522

Hi @kyle-pena-kuzco Are u running on H100 or H200? May you try to run on H100

kyle-pena-kuzco · 2025-05-20T03:25:29Z

Hi @kyle-pena-kuzco @Qiaolin-Yu @CatherineSue This pr breaks test/srt/test_openai_server.py

Hi @zhyncs - thanks for the callout. We love the project and we want to make sure that our PRs meet the highest standards.
Could you help us understand which test case breaks and where you noticed the failure? This will help us pinpoint what you saw so we can address it. We are running test/srt/test_openai_server.py locally and we have all tests pass.
The only relevant github action result we could find was here, but it looked like maybe this failure was intermittent / random? https://github.com/sgl-project/sglang/actions/runs/15107245347/job/42512558657#step:4:4522

Hi @kyle-pena-kuzco Are u running on H100 or H200? May you try to run on H100

Absolutely, we will try on an H100. We have been running our tests on a 4090.

Would you mind sharing what test failure you saw? That would help us to troubleshoot.

zhyncs · 2025-05-20T03:30:13Z

@BBuf can provide more detailed information.

BBuf · 2025-05-20T03:36:11Z

@BBuf can provide more detailed information.

Hi @kyle-pena-kuzco @Qiaolin-Yu @CatherineSue This pr breaks test/srt/test_openai_server.py

Hi @zhyncs - thanks for the callout. We love the project and we want to make sure that our PRs meet the highest standards.
Could you help us understand which test case breaks and where you noticed the failure? This will help us pinpoint what you saw so we can address it. We are running test/srt/test_openai_server.py locally and we have all tests pass.
The only relevant github action result we could find was here, but it looked like maybe this failure was intermittent / random? https://github.com/sgl-project/sglang/actions/runs/15107245347/job/42512558657#step:4:4522

Hi @kyle-pena-kuzco Are u running on H100 or H200? May you try to run on H100

Absolutely, we will try on an H100. We have been running our tests on a 4090.

Would you mind sharing what test failure you saw? That would help us to troubleshoot.

In cuda graph mode, the memory usage is too high because each batch will capture a cuda graph and return a hidden state. So we can set cuda_graph_max_bs parameter to 8 in test/srt/test_openai_server.py in H100 to avoid OOM and it's not effect the accuracy

kyle-pena-kuzco · 2025-05-20T20:19:34Z

@BBuf can provide more detailed information.

Hi @kyle-pena-kuzco @Qiaolin-Yu @CatherineSue This pr breaks test/srt/test_openai_server.py

Hi @zhyncs - thanks for the callout. We love the project and we want to make sure that our PRs meet the highest standards.
Could you help us understand which test case breaks and where you noticed the failure? This will help us pinpoint what you saw so we can address it. We are running test/srt/test_openai_server.py locally and we have all tests pass.
The only relevant github action result we could find was here, but it looked like maybe this failure was intermittent / random? https://github.com/sgl-project/sglang/actions/runs/15107245347/job/42512558657#step:4:4522

Hi @kyle-pena-kuzco Are u running on H100 or H200? May you try to run on H100

Absolutely, we will try on an H100. We have been running our tests on a 4090.
Would you mind sharing what test failure you saw? That would help us to troubleshoot.

In cuda graph mode, the memory usage is too high because each batch will capture a cuda graph and return a hidden state. So we can set cuda_graph_max_bs parameter to 8 in test/srt/test_openai_server.py in H100 to avoid OOM and it's not effect the accuracy

I believe I understand the issue now.

As test_openai_server.py iterates through many test cases, return_hidden_states switches between on and off many times.

When return_hidden_states changes, it triggers a CUDA graph re-capture. This is by design. See:

sglang/examples/runtime/hidden_states/hidden_states_server.py

Line 6 in 6632489

Note that each time you change the `return_hidden_states` parameter,

It looks like the old CUDA graph captures are not removed from memory, and as a result, available memory decreases over time, leading to eventual OOM.

Here is a screen capture demonstrating the available memory decreasing after every CUDA graph recapture:

So, I think the core issue is:
(a) Old CUDA graphs are not destroyed
(b) Requesting hidden states triggers a CUDA graph re-capture

If either of those problems are solved, I think that might resolve the issue.

@BBuf is this the problem that you encountered?

…ect#6137)" (sgl-project#6440)

Revert "Implement return_hidden_states for the OpenAI API (#6137)"

4c9bcd8

This reverts commit 4f39bcf.

zhyncs requested review from ByronHsu, CatherineSue, Ying1123, hnyls2002, ispobock and merrymercy as code owners May 20, 2025 01:20

zhyncs merged commit b146555 into main May 20, 2025
1 of 37 checks passed

zhyncs deleted the zhyncs/revert branch May 20, 2025 01:21

woodx9 pushed a commit to woodx9/sglang that referenced this pull request Jun 8, 2025

Revert "Implement return_hidden_states for the OpenAI API (sgl-proj…

80da7b3

…ect#6137)" (sgl-project#6440)

Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025

Revert "Implement return_hidden_states for the OpenAI API (sgl-proj…

8cf98c9

…ect#6137)" (sgl-project#6440)

xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025

Revert "Implement return_hidden_states for the OpenAI API (sgl-proj…

4d23ef9

…ect#6137)" (sgl-project#6440)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "Implement `return_hidden_states` for the OpenAI API (#6137)"#6440

Revert "Implement `return_hidden_states` for the OpenAI API (#6137)"#6440
zhyncs merged 1 commit intomainfrom
zhyncs/revert

zhyncs commented May 20, 2025

Uh oh!

zhyncs commented May 20, 2025

Uh oh!

Uh oh!

kyle-pena-kuzco commented May 20, 2025

Uh oh!

zhyncs commented May 20, 2025

Uh oh!

kyle-pena-kuzco commented May 20, 2025

Uh oh!

zhyncs commented May 20, 2025

Uh oh!

BBuf commented May 20, 2025 •

edited

Loading

Uh oh!

kyle-pena-kuzco commented May 20, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

zhyncs commented May 20, 2025

Motivation

Modifications

Checklist

Uh oh!

zhyncs commented May 20, 2025

Uh oh!

Uh oh!

kyle-pena-kuzco commented May 20, 2025

Uh oh!

zhyncs commented May 20, 2025

Uh oh!

kyle-pena-kuzco commented May 20, 2025

Uh oh!

zhyncs commented May 20, 2025

Uh oh!

BBuf commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kyle-pena-kuzco commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

BBuf commented May 20, 2025 •

edited

Loading

kyle-pena-kuzco commented May 20, 2025 •

edited

Loading