Add return routed experts to the completions and chat/completions endpoints#17434
Conversation
Summary of ChangesHello @mansoor-s, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the API by integrating the capability to return routed expert information directly within the OpenAI-compatible chat and completion endpoints. This change is crucial for applications like Reinforcement Learning (RL) training, which benefit from capturing expert routing data during model rollouts. By making this data accessible through the standard completion interfaces, it streamlines the development of agentic harnesses and improves the observability of MoE model behavior. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request successfully integrates the return_routed_experts parameter across the completions and chat/completions endpoints. This includes updating protocol classes, adding utility functions for processing routed experts, and extending streaming support. The documentation has been updated with usage examples, and comprehensive tests have been added for all three relevant endpoints. The changes are well-structured and maintain code quality.
3a5ed56 to
1157034
Compare
…ints same as the /generated endpoint
…old SGLang specific response extensions
1157034 to
318a1d3
Compare
|
/tag-and-rerun-ci |
|
/rerun-failed-ci |
|
Tests are green |
Motivation
For RL training, it is highly desirable to capture the routed experts during the rollout. Currently, only the /generate endpoint returns the routed experts data but most agentic harnesses depend on the completions endpoints so it makes sense to return this data if it's requested. The response format is identical to the /generate endpoint.
Modifications
Testing
Launched server with and without
--enable-return-routed-expertsVerified valid response with:
Streaming response mode:
Launched server without
--enable-return-routed-expertsand ran the verification as above.Accuracy Tests
NA
Benchmarking and Profiling
NA
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci