Add tiny Qwen3.5 Think/NoThink fixture generation scripts#5819
Merged
qgallouedec merged 2 commits intoMay 22, 2026
Conversation
888e7fa to
e415ce2
Compare
Member
|
thanks, can you remove qwen3_5_for_conditional_generation.py as well? |
e415ce2 to
76e7fb7
Compare
Contributor
Author
|
done |
Member
|
everywhere in the codebase, you should rename |
Contributor
Author
|
My initial plan was to land the fixtures here and split the test rename into a follow-up PR (PR 3 in the roadmap above), gated on a separate proposal issue. But if you'd prefer to bundle the codebase-wide |
qgallouedec
approved these changes
May 22, 2026
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
This was referenced May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Adds two new tiny-model fixture generation scripts for Qwen3.5 — sibling
-Thinkand-NoThinkvariants alongside the existingtiny-Qwen3_5ForConditionalGenerationfixture. Each is sourced from the Qwen3.5 release, whose bundled tokenizer ships the corresponding default thinking behavior:Qwen/Qwen3.5-4Bfor the thinking-enabled variant,Qwen/Qwen3.5-0.8Bfor the thinking-disabled variant.The architecture class, tiny-config overrides, fp32 linear-attn cast, and
_common.pypush flow are unchanged from the existing script — the only differences areMODEL_IDand thesuffixarg topush_to_hub.Note: scope and roadmap
Strictly additive — scripts only. The existing
tiny-Qwen3_5ForConditionalGenerationfixture is untouched, and no tests reference the new fixtures yet. Planned follow-ups:Note: push needs to be run by a maintainer
I don't have write access to
trl-internal-testing. Per the agreement in #5471, @qgallouedec will runpush_to_hubto materialize the fixtures. The Colab dry-runs below show both scripts build cleanly throughprint_config_diff.Validation (Colab A100, dry-run)
NoThink (sourced from
Qwen/Qwen3.5-0.8B):Think (sourced from
Qwen/Qwen3.5-4B):Both runs: smoke test passes, dtype pattern matches the reference checkpoint, config diff shows only the intended tiny overrides (no key leakage). The extra
num_key_value_headsline in the Think run reflects the genuine 0.8B vs. 4B upstream difference (4 → 2 override only shows up against the 4B source). Confirms the existinglinear_attn.A_log/linear_attn.norm.weightfp32 cast block applies to both sources without modification.Changes
scripts/generate_tiny_models/for_conditional_generation/qwen3_5_for_conditional_generation_think.py: new script —MODEL_ID="Qwen/Qwen3.5-4B", pushes astiny-Qwen3_5ForConditionalGeneration-Think.scripts/generate_tiny_models/for_conditional_generation/qwen3_5_for_conditional_generation_nothink.py: new script —MODEL_ID="Qwen/Qwen3.5-0.8B", pushes astiny-Qwen3_5ForConditionalGeneration-NoThink.qwen3_5_for_conditional_generation.pyPart of #5471
cc: @qgallouedec
Note
Low Risk
Low risk: adds/adjusts standalone fixture-generation scripts only, with no runtime/library behavior changes. Main risk is accidental fixture repo naming/sourcing mistakes when pushing to the Hub.
Overview
Adds a new tiny-model generation script
qwen3_5_for_conditional_generation_think.pythat builds the same 2-layer Qwen3.5 conditional-generation fixture but sourced fromQwen/Qwen3.5-4Band pushed with the-Thinksuffix.Updates the existing
qwen3_5_for_conditional_generation_nothink.pyscript to document the0.8Bsource choice and to push the generated fixture with the-NoThinksuffix, creating explicit sibling fixtures for tokenizer default thinking behavior.Reviewed by Cursor Bugbot for commit 76e7fb7. Bugbot is set up for automated code reviews on this repo. Configure here.