Implement return_hidden_states for the OpenAI API#6137
Implement return_hidden_states for the OpenAI API#6137zhyncs merged 16 commits intosgl-project:mainfrom
return_hidden_states for the OpenAI API#6137Conversation
|
@zhyncs - let me know if you have any feedback! We'd like to get this feature merged. |
|
@Qiaolin-Yu qiaolin could you take a look plz? |
Sure. Very happy to help. |
|
thank you @Qiaolin-Yu @zhaochenyang20 @yinfan98 <3 |
|
will merge it, no need to rebase |
|
nice work @kyle-pena-kuzco |
|
@kyle-pena-kuzco May you help resolve the conflicts? Thanks. cc @CatherineSue @ispobock @Qiaolin-Yu |
Yes. Starting on that now. i'll comment when completed. |
@zhyncs - I've fixed the merge conflicts. |
Motivation
The native API supports returning hidden states from the model. This PR implements the same feature for OpenAI.
Returning hidden states is important for model verification and diagnostics purposes. Inference providers that use SGLang as a backend route requests that are in the OpenAI format, so if an inference provider would like to do internal diagnostics and verification, it is much more straightforward to simply include
return_hidden_states.This PR also fixes #5761 and adds appropriate test coverage for that bug.
Modifications
Changes were made to
protocol.pyto support thereturn_hidden_statesflag as well as returninghidden_stateson/v1/completionsand/v1/chat/completionsfor both streaming and non-streaming.If no hidden states are requested, the hidden states property is omitted from the response instead of including a null field. That way, the responses are completely backwards compatible.
The adapter was changed to include hidden states in responses when requested.
The dictionary
n_prev_tokenswas not being updated inv1_chat_completionsfor a streaming response, leading the same top logprobs to being repeated for every chunk. This has been fixed as well.Checklist