Fix gpt temporary patch for grpo to happen after compile#536
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue where a GPT OSS GRPO patch was executing prematurely, hindering the compiler's ability to apply crucial fused loss optimizations. By implementing a new phasing mechanism, the patch can now be precisely timed to run after compilation, ensuring compatibility with compiler optimizations and maintaining efficient memory usage during training. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to control the timing of the gpt_oss GRPO patch application by introducing a phase parameter. The patch is now set to run only during the post_compile phase, which allows the compiler to perform optimizations like fused loss before the model's forward method is altered. Additionally, an idempotency check has been added to prevent the patch from being applied multiple times, addressing an issue where it was reportedly running three times. The changes are clear, well-implemented, and align with the goal of improving compiler optimization compatibility.
The gpt oss grpo patch in temporary_patches runs 3 times. Since it runs before compiler, compiler doesnt match the causallm pattern so we lose the connection to unsloth fused loss. This PR proposes phases for temporary patches so patches can determine when they want to run.
Currently the way it's setup it will default to the current behavior unless the corresponding unsloth pr is merged as well.