Skip to content

Add tiny Qwen3.5 Think/NoThink fixture generation scripts#5819

Merged
qgallouedec merged 2 commits into
huggingface:mainfrom
aazizyan:qwen3.5-think-nothink-tiny-fixtures
May 22, 2026
Merged

Add tiny Qwen3.5 Think/NoThink fixture generation scripts#5819
qgallouedec merged 2 commits into
huggingface:mainfrom
aazizyan:qwen3.5-think-nothink-tiny-fixtures

Conversation

@aazizyan

@aazizyan aazizyan commented May 22, 2026

Copy link
Copy Markdown
Contributor

What does this PR do?

Adds two new tiny-model fixture generation scripts for Qwen3.5 — sibling -Think and -NoThink variants alongside the existing tiny-Qwen3_5ForConditionalGeneration fixture. Each is sourced from the Qwen3.5 release, whose bundled tokenizer ships the corresponding default thinking behavior: Qwen/Qwen3.5-4B for the thinking-enabled variant, Qwen/Qwen3.5-0.8B for the thinking-disabled variant.

The architecture class, tiny-config overrides, fp32 linear-attn cast, and _common.py push flow are unchanged from the existing script — the only differences are MODEL_ID and the suffix arg to push_to_hub.

Note: scope and roadmap

Strictly additive — scripts only. The existing tiny-Qwen3_5ForConditionalGeneration fixture is untouched, and no tests reference the new fixtures yet. Planned follow-ups:

  • PR 2: add Qwen3.5 training templates corresponding to the two variants.
  • PR 3: migrate existing tests off the legacy fixture and retire it gracefully. Gated on a separate issue, I'll open a proposal for the test refactor against the new fixtures — work on PR 3 starts only once maintainers approve the refactor plan in that issue.

Note: push needs to be run by a maintainer

I don't have write access to trl-internal-testing. Per the agreement in #5471, @qgallouedec will run push_to_hub to materialize the fixtures. The Colab dry-runs below show both scripts build cleanly through print_config_diff.

Validation (Colab A100, dry-run)

NoThink (sourced from Qwen/Qwen3.5-0.8B):

[smoke_test] Qwen3_5ForConditionalGeneration: OK (output shape (2, 82, 248320))
[dtype_check] Qwen/Qwen3.5-0.8B: all matched tensors have the reference dtype
[config_diff] Qwen/Qwen3.5-0.8B vs tiny (10 differences)
  text_config.full_attention_interval              4                                  → 2
  text_config.hidden_size                          1024                               → 16
  text_config.layer_types                          ['linear_attention', 'linear_atten → ['linear_attention', 'full_attenti
  text_config.num_attention_heads                  8                                  → 4
  text_config.num_hidden_layers                    24                                 → 2
  vision_config.depth                              12                                 → 2
  vision_config.hidden_size                        768                                → 16
  vision_config.intermediate_size                  3072                               → 32
  vision_config.num_heads                          12                                 → 4
  vision_config.out_hidden_size                    1024                               → 16

Think (sourced from Qwen/Qwen3.5-4B):

[smoke_test] Qwen3_5ForConditionalGeneration: OK (output shape (2, 80, 248320))
[dtype_check] Qwen/Qwen3.5-4B: all matched tensors have the reference dtype
[config_diff] Qwen/Qwen3.5-4B vs tiny (11 differences)
  text_config.full_attention_interval              4                                  → 2
  text_config.hidden_size                          2560                               → 16
  text_config.layer_types                          ['linear_attention', 'linear_atten → ['linear_attention', 'full_attenti
  text_config.num_attention_heads                  16                                 → 4
  text_config.num_hidden_layers                    32                                 → 2
  text_config.num_key_value_heads                  4                                  → 2
  vision_config.depth                              24                                 → 2
  vision_config.hidden_size                        1024                               → 16
  vision_config.intermediate_size                  4096                               → 32
  vision_config.num_heads                          16                                 → 4
  vision_config.out_hidden_size                    2560                               → 16

Both runs: smoke test passes, dtype pattern matches the reference checkpoint, config diff shows only the intended tiny overrides (no key leakage). The extra num_key_value_heads line in the Think run reflects the genuine 0.8B vs. 4B upstream difference (4 → 2 override only shows up against the 4B source). Confirms the existing linear_attn.A_log / linear_attn.norm.weight fp32 cast block applies to both sources without modification.

Changes

  • scripts/generate_tiny_models/for_conditional_generation/qwen3_5_for_conditional_generation_think.py: new script — MODEL_ID="Qwen/Qwen3.5-4B", pushes as tiny-Qwen3_5ForConditionalGeneration-Think.
  • scripts/generate_tiny_models/for_conditional_generation/qwen3_5_for_conditional_generation_nothink.py: new script — MODEL_ID="Qwen/Qwen3.5-0.8B", pushes as tiny-Qwen3_5ForConditionalGeneration-NoThink.
  • Each new script is a near-verbatim copy of the existing qwen3_5_for_conditional_generation.py

Part of #5471

cc: @qgallouedec


Note

Low Risk
Low risk: adds/adjusts standalone fixture-generation scripts only, with no runtime/library behavior changes. Main risk is accidental fixture repo naming/sourcing mistakes when pushing to the Hub.

Overview
Adds a new tiny-model generation script qwen3_5_for_conditional_generation_think.py that builds the same 2-layer Qwen3.5 conditional-generation fixture but sourced from Qwen/Qwen3.5-4B and pushed with the -Think suffix.

Updates the existing qwen3_5_for_conditional_generation_nothink.py script to document the 0.8B source choice and to push the generated fixture with the -NoThink suffix, creating explicit sibling fixtures for tokenizer default thinking behavior.

Reviewed by Cursor Bugbot for commit 76e7fb7. Bugbot is set up for automated code reviews on this repo. Configure here.

@aazizyan aazizyan force-pushed the qwen3.5-think-nothink-tiny-fixtures branch 2 times, most recently from 888e7fa to e415ce2 Compare May 22, 2026 14:33
@qgallouedec

Copy link
Copy Markdown
Member

thanks, can you remove qwen3_5_for_conditional_generation.py as well?

@aazizyan aazizyan force-pushed the qwen3.5-think-nothink-tiny-fixtures branch from e415ce2 to 76e7fb7 Compare May 22, 2026 15:42
@aazizyan

Copy link
Copy Markdown
Contributor Author

done

@qgallouedec

Copy link
Copy Markdown
Member

everywhere in the codebase, you should rename tiny-Qwen3_5ForConditionalGeneration -> tiny-Qwen3_5ForConditionalGeneration-NoThink

@aazizyan

Copy link
Copy Markdown
Contributor Author

My initial plan was to land the fixtures here and split the test rename into a follow-up PR (PR 3 in the roadmap above), gated on a separate proposal issue. But if you'd prefer to bundle the codebase-wide tiny-Qwen3_5ForConditionalGeneration -> tiny-Qwen3_5ForConditionalGeneration-NoThink rename into this PR, I'm happy to do that.

@qgallouedec

Copy link
Copy Markdown
Member

@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@qgallouedec qgallouedec merged commit 0fcc5e2 into huggingface:main May 22, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants