Add tiny Qwen3.5 Think/NoThink fixture generation scripts by aazizyan · Pull Request #5819 · huggingface/trl

aazizyan · 2026-05-22T14:28:47Z

What does this PR do?

Adds two new tiny-model fixture generation scripts for Qwen3.5 — sibling -Think and -NoThink variants alongside the existing tiny-Qwen3_5ForConditionalGeneration fixture. Each is sourced from the Qwen3.5 release, whose bundled tokenizer ships the corresponding default thinking behavior: Qwen/Qwen3.5-4B for the thinking-enabled variant, Qwen/Qwen3.5-0.8B for the thinking-disabled variant.

The architecture class, tiny-config overrides, fp32 linear-attn cast, and _common.py push flow are unchanged from the existing script — the only differences are MODEL_ID and the suffix arg to push_to_hub.

Note: scope and roadmap

Strictly additive — scripts only. The existing tiny-Qwen3_5ForConditionalGeneration fixture is untouched, and no tests reference the new fixtures yet. Planned follow-ups:

PR 2: add Qwen3.5 training templates corresponding to the two variants.
PR 3: migrate existing tests off the legacy fixture and retire it gracefully. Gated on a separate issue, I'll open a proposal for the test refactor against the new fixtures — work on PR 3 starts only once maintainers approve the refactor plan in that issue.

Note: push needs to be run by a maintainer

I don't have write access to trl-internal-testing. Per the agreement in #5471, @qgallouedec will run push_to_hub to materialize the fixtures. The Colab dry-runs below show both scripts build cleanly through print_config_diff.

Validation (Colab A100, dry-run)

NoThink (sourced from Qwen/Qwen3.5-0.8B):

[smoke_test] Qwen3_5ForConditionalGeneration: OK (output shape (2, 82, 248320))
[dtype_check] Qwen/Qwen3.5-0.8B: all matched tensors have the reference dtype
[config_diff] Qwen/Qwen3.5-0.8B vs tiny (10 differences)
  text_config.full_attention_interval              4                                  → 2
  text_config.hidden_size                          1024                               → 16
  text_config.layer_types                          ['linear_attention', 'linear_atten → ['linear_attention', 'full_attenti
  text_config.num_attention_heads                  8                                  → 4
  text_config.num_hidden_layers                    24                                 → 2
  vision_config.depth                              12                                 → 2
  vision_config.hidden_size                        768                                → 16
  vision_config.intermediate_size                  3072                               → 32
  vision_config.num_heads                          12                                 → 4
  vision_config.out_hidden_size                    1024                               → 16

Think (sourced from Qwen/Qwen3.5-4B):

[smoke_test] Qwen3_5ForConditionalGeneration: OK (output shape (2, 80, 248320))
[dtype_check] Qwen/Qwen3.5-4B: all matched tensors have the reference dtype
[config_diff] Qwen/Qwen3.5-4B vs tiny (11 differences)
  text_config.full_attention_interval              4                                  → 2
  text_config.hidden_size                          2560                               → 16
  text_config.layer_types                          ['linear_attention', 'linear_atten → ['linear_attention', 'full_attenti
  text_config.num_attention_heads                  16                                 → 4
  text_config.num_hidden_layers                    32                                 → 2
  text_config.num_key_value_heads                  4                                  → 2
  vision_config.depth                              24                                 → 2
  vision_config.hidden_size                        1024                               → 16
  vision_config.intermediate_size                  4096                               → 32
  vision_config.num_heads                          16                                 → 4
  vision_config.out_hidden_size                    2560                               → 16

Both runs: smoke test passes, dtype pattern matches the reference checkpoint, config diff shows only the intended tiny overrides (no key leakage). The extra num_key_value_heads line in the Think run reflects the genuine 0.8B vs. 4B upstream difference (4 → 2 override only shows up against the 4B source). Confirms the existing linear_attn.A_log / linear_attn.norm.weight fp32 cast block applies to both sources without modification.

Changes

scripts/generate_tiny_models/for_conditional_generation/qwen3_5_for_conditional_generation_think.py: new script — MODEL_ID="Qwen/Qwen3.5-4B", pushes as tiny-Qwen3_5ForConditionalGeneration-Think.
scripts/generate_tiny_models/for_conditional_generation/qwen3_5_for_conditional_generation_nothink.py: new script — MODEL_ID="Qwen/Qwen3.5-0.8B", pushes as tiny-Qwen3_5ForConditionalGeneration-NoThink.
Each new script is a near-verbatim copy of the existing qwen3_5_for_conditional_generation.py

Part of #5471

cc: @qgallouedec

Note

Low Risk
Low risk: adds/adjusts standalone fixture-generation scripts only, with no runtime/library behavior changes. Main risk is accidental fixture repo naming/sourcing mistakes when pushing to the Hub.

Overview
Adds a new tiny-model generation script qwen3_5_for_conditional_generation_think.py that builds the same 2-layer Qwen3.5 conditional-generation fixture but sourced from Qwen/Qwen3.5-4B and pushed with the -Think suffix.

Updates the existing qwen3_5_for_conditional_generation_nothink.py script to document the 0.8B source choice and to push the generated fixture with the -NoThink suffix, creating explicit sibling fixtures for tokenizer default thinking behavior.

^{Reviewed by Cursor Bugbot for commit 76e7fb7. Bugbot is set up for automated code reviews on this repo. Configure here.}

qgallouedec · 2026-05-22T15:35:01Z

thanks, can you remove qwen3_5_for_conditional_generation.py as well?

aazizyan · 2026-05-22T15:43:37Z

done

qgallouedec · 2026-05-22T15:52:46Z

everywhere in the codebase, you should rename tiny-Qwen3_5ForConditionalGeneration -> tiny-Qwen3_5ForConditionalGeneration-NoThink

aazizyan · 2026-05-22T15:58:18Z

My initial plan was to land the fixtures here and split the test rename into a follow-up PR (PR 3 in the roadmap above), gated on a separate proposal issue. But if you'd prefer to bundle the codebase-wide tiny-Qwen3_5ForConditionalGeneration -> tiny-Qwen3_5ForConditionalGeneration-NoThink rename into this PR, I'm happy to do that.

qgallouedec · 2026-05-22T16:02:16Z

ok sounds good, fyi:

https://huggingface.co/trl-internal-testing/tiny-Qwen3_5ForConditionalGeneration-Think
https://huggingface.co/trl-internal-testing/tiny-Qwen3_5ForConditionalGeneration-NoThink

HuggingFaceDocBuilderDev · 2026-05-22T16:06:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

aazizyan force-pushed the qwen3.5-think-nothink-tiny-fixtures branch 2 times, most recently from 888e7fa to e415ce2 Compare May 22, 2026 14:33

aazizyan added 2 commits May 22, 2026 19:39

Add tiny Qwen3.5 Think/NoThink fixture generation scripts

6cfbeca

Remove qwen3_5_for_conditional_generation.py

76e7fb7

aazizyan force-pushed the qwen3.5-think-nothink-tiny-fixtures branch from e415ce2 to 76e7fb7 Compare May 22, 2026 15:42

qgallouedec approved these changes May 22, 2026

View reviewed changes

qgallouedec merged commit 0fcc5e2 into huggingface:main May 22, 2026
12 checks passed

This was referenced May 22, 2026

Migrate tests to Qwen3.5 Think/NoThink fixtures #5821

Merged

Add Qwen3.5 Think/NoThink training chat templates with generation markers #5824

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tiny Qwen3.5 Think/NoThink fixture generation scripts#5819

Add tiny Qwen3.5 Think/NoThink fixture generation scripts#5819
qgallouedec merged 2 commits into
huggingface:mainfrom
aazizyan:qwen3.5-think-nothink-tiny-fixtures

aazizyan commented May 22, 2026 •

edited by cursor Bot

Loading

Uh oh!

qgallouedec commented May 22, 2026

Uh oh!

aazizyan commented May 22, 2026

Uh oh!

qgallouedec commented May 22, 2026

Uh oh!

aazizyan commented May 22, 2026

Uh oh!

qgallouedec commented May 22, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aazizyan commented May 22, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Note: scope and roadmap

Note: push needs to be run by a maintainer

Validation (Colab A100, dry-run)

Changes

Uh oh!

qgallouedec commented May 22, 2026

Uh oh!

aazizyan commented May 22, 2026

Uh oh!

qgallouedec commented May 22, 2026

Uh oh!

aazizyan commented May 22, 2026

Uh oh!

qgallouedec commented May 22, 2026

Uh oh!

HuggingFaceDocBuilderDev commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

aazizyan commented May 22, 2026 •

edited by cursor Bot

Loading