tests: CPU regression detectors for the MoE merge / save path (#5410) by danielhanchen · Pull Request #649 · unslothai/unsloth-zoo

danielhanchen · 2026-05-15T03:52:45Z

Summary

Adds four CPU-only signals so the upstream-drift matrix catches the class of bug that produced unslothai/unsloth#5410 before it ships. Stacked on top of #647: the two _MOE_MERGE_STATE / _detect_moe_lora_layout symbols only exist after #647 lands. Base will retarget to main once #647 merges.

Why CI did not catch the original bug

unsloth-zoo/.github/workflows/consolidated-tests-ci.yml already runs a three-cell drift matrix:

cell	transformers	peft
HF=4.57.6 + TRL<1	`transformers==4.57.6`	`peft>=0.18,<0.20`
HF=latest + TRL=latest	`transformers>=5,<6`	`peft` (latest)
HF=default + TRL=default	from pyproject	from pyproject

This is the right matrix to catch #5410. But the pytest invocation only listed 12 drift-detector files; tests/test_unsloth_zoo_lora_merge.py (the only file that actually covers the merge helpers) was never run inside the matrix. So peft 0.19 could land in the matrix without exercising the helpers that broke.

What this PR adds

1. Re-route the merge battery through the matrix. One-line workflow change at line ~234 of consolidated-tests-ci.yml adds tests/test_unsloth_zoo_lora_merge.py (already 22 cases covering both PEFT layouts, the fallback counter, and the resolver walk, after #647) plus the three new files below to the matrix pytest list.

2. tests/test_peft_paramwrapper_layout_drift.py (45 lines). Builds a 4-line nn.Module with a single fused 3D nn.Parameter, attaches a real peft LoRA via target_parameters=[\"gate_up_proj\"], then asserts the resulting lora_A / lora_B shapes match either the swapped or standard signature. A third layout convention from a future PEFT release fires this test on the day the new pin lands in the matrix, with a message naming both the observed and expected shapes.

3. tests/test_transformers_moe_structure_drift.py (~120 lines). Imports each tracked MoE block class (Qwen3MoeSparseMoeBlock, MixtralSparseMoeBlock) from the installed transformers, instantiates a tiny config, and pins whether the experts container is a fused 3D nn.Parameter (fused_3d) or per-expert nn.ModuleList (module_list). The same kind of refactor that flipped Qwen3MoE from ModuleList to fused 3D between 4.57.x and 5.x will fire this test on the next refactor.

4. tests/test_moe_merge_e2e_cpu.py (~180 lines, 6 parametrized cases). Drives the per-expert merge helpers in the same order the inner shard loop does, for a synthetic 2-layer / 4-expert state dict, in both PEFT layouts. Asserts:

All 3 * E * L tensors receive the analytic delta (sub-1e-4 max error).
_MOE_MERGE_STATE[\"applied\"] matches the expected apply count and \"fallback\" is 0.
Unrecognised layout shapes record a fallback and populate first_error.
The resolver walks past an outer ParamWrapper with .module = None to read num_experts off the inner wrapper (this is the exact regression that produced 935 silent fallbacks before the resolver fix in saving: layout-aware MoE LoRA merge + loud-fail on fallback (#5410) #647).
A module.base_layer is module self-cycle terminates within the 16-hop bound.
_detect_moe_lora_layout returns \"unknown\" for a shard-local num_experts=17 against total_rank=128*r, so the per-expert helper records a fallback rather than producing a silently-wrong slice.

Local results

33 passed in 0.33s

Also confirmed that on pre-#647 main, test_moe_merge_e2e_cpu.py fails to import (ImportError: cannot import name '_MOE_MERGE_STATE'), proving the detector triggers if any future PR weakens or removes the loud-fail guard.

Cost

All four files are pure CPU, no model download, no GPU. Per matrix cell: ~33 extra tests in well under 2 seconds. Three cells, so under 6 seconds added to the total CI wall time.

Test plan

pytest tests/test_unsloth_zoo_lora_merge.py tests/test_peft_paramwrapper_layout_drift.py tests/test_transformers_moe_structure_drift.py tests/test_moe_merge_e2e_cpu.py -> 33 / 33 PASS
Reverted unsloth_zoo/saving_utils.py to pre-saving: layout-aware MoE LoRA merge + loud-fail on fallback (#5410) #647 main locally; confirmed the e2e file fails to import and the detector class as a whole correctly goes red
CI matrix on this PR exercises the four files across the three combos

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 52896d4ac6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T03:55:10Z

+        peft_model = get_peft_model(base, cfg)
+    except Exception as e:
+        pytest.skip(f"PEFT failed to wrap fused 3D param on this build: {e}")


Fail when PEFT can no longer wrap fused params

Because this file is now part of the hard upstream-regression matrix, skipping every get_peft_model exception masks exactly the PEFT drift this detector is supposed to catch: once target_parameters is accepted, a future PEFT release that changes or breaks fused-3D ParamWrapper creation will make the test report skipped instead of failing. The earlier LoraConfig(... target_parameters=...) probe already handles unsupported pre-0.18 builds, so wrapping failures in supported matrix cells should be assertions/failures rather than skips.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-05-15T03:55:10Z

+    accepted = {"fused_3d", "module_list"}
+    assert kind in accepted, (


Pin the expected MoE container kind

This accepts both fused_3d and module_list, so a transformers update that flips a tracked architecture between those two layouts remains green even though the test description says that exact flip should be caught. Since the workflow now relies on this as a drift detector, the expected kind needs to be version/architecture-specific (or otherwise asserted) rather than allowing both supported layouts unconditionally.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request introduces three new test suites designed to validate Mixture of Experts (MoE) merging logic and detect structural drifts in external dependencies. The added tests cover end-to-end CPU regression for LoRA delta applications, PEFT ParamWrapper layout changes, and Transformers MoE expert container refactors. Feedback suggests moving the manual seed initialization out of helper functions to increase test data diversity and utilizing pytest fixtures for more reliable global state management.

gemini-code-assist · 2026-05-15T03:55:18Z

+def _build_synthetic_layer(num_experts: int, rank_per: int, hidden: int,
+                           intermediate: int, layout: str, alpha: float, dtype=torch.float32):
+    """Make: fused base weights + per-expert disk shards + LoRA tensors."""
+    torch.manual_seed(SEED)


Calling torch.manual_seed(SEED) inside _build_synthetic_layer resets the random number generator state every time a layer is built. In test_per_layer_merge_round_trip, this results in all layers having identical weights and LoRA matrices, which reduces the diversity of the test data. It is recommended to set the seed once at the start of the test function instead so that subsequent calls to this helper generate different random values.

Suggested change

torch.manual_seed(SEED)

# torch.manual_seed(SEED)

gemini-code-assist · 2026-05-15T03:55:18Z

+    alpha = 8.0
+    dtype = torch.float32
+
+    _reset_moe_merge_state()


The test manually resets the global _MOE_MERGE_STATE at the beginning and end of the function. To ensure the state is always cleaned up (even if an assertion fails) and to follow pytest best practices, consider using a fixture with autouse=True for state management. This improves test isolation and reliability, especially if tests are ever run in parallel within the same process.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ec86efdefb

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-15T05:03:33Z

+    try:
+        mod = __import__(module_path, fromlist=[block_cls_name, cfg_cls_name])
+    except Exception as e:
+        pytest.skip(f"{module_path} not importable: {e}")


Fail when a tracked MoE module stops importing

When a tracked architecture's import path breaks or is renamed while another tracked module still imports, this skips the broken parametrized case and test_at_least_one_tracked_moe_arch_imports still passes because it only requires one successful import. In the matrix you just added this file to, that means a Qwen3/Mixtral upstream drift can disappear silently instead of failing the canary; treat failures for expected tracked entries as assertions or gate skips by known unsupported transformer versions.

Useful? React with 👍 / 👎.

Adds four CPU-only signals so the upstream drift matrix catches the class of bug that produced unslothai/unsloth#5410 before it ships: 1. Add tests/test_unsloth_zoo_lora_merge.py to the drift-matrix pytest invocation (consolidated-tests-ci.yml). The 22 helper-level merge tests already cover both PEFT 0.18 swapped and PEFT 0.19+ standard layouts, the fallback counter, and the resolver walk; they were only ever exercised in the default cell. Now they fire across all three matrix combos (HF 4.57.6 + TRL<1, HF latest + TRL latest, pyproject defaults). 2. tests/test_peft_paramwrapper_layout_drift.py - constructs a 4-line nn.Module with a single fused 3D nn.Parameter, attaches a real peft LoRA via target_parameters, and asserts the lora_A / lora_B shapes match either the swapped or standard signature. A third layout convention from a future PEFT release fires this test on the day the new pin lands in the matrix. 3. tests/test_transformers_moe_structure_drift.py - imports each tracked MoE block class (Qwen3MoeSparseMoeBlock, MixtralSparseMoeBlock) from the installed transformers, instantiates a tiny config, and pins whether the experts container is fused 3D nn.Parameter or per-expert nn.ModuleList. The same kind of refactor that flipped Qwen3MoE from ModuleList to fused 3D between 4.57.x and 5.x will fire this test on the next refactor. 4. tests/test_moe_merge_e2e_cpu.py - drives the per-expert merge helpers in the same order the inner shard loop does, for a synthetic 2-layer / 4-expert state dict, in both PEFT layouts. Asserts all 3 * E * L tensors receive the analytic delta, the _MOE_MERGE_STATE counter records each apply, fallbacks land on unrecognised shapes, the resolver walks the outer ParamWrapper chain, and a self-cycle in `module.base_layer` terminates. All four files are pure CPU, sub-second, no model download. Total matrix overhead: <2s per cell, three cells, 33 tests. Verified locally (33 / 33 PASS) and confirmed that the e2e file fails to import on pre-#5410 main (ImportError: _MOE_MERGE_STATE), proving the detector triggers if any future PR weakens the loud-fail guard.

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 15, 2026

View reviewed changes

danielhanchen mentioned this pull request May 15, 2026

tests: pinned-symbol canary for unsloth-zoo save_pretrained_merged guards (#5410) unslothai/unsloth#5433

Merged

5 tasks

danielhanchen force-pushed the ci-pr5410-regression-detectors branch from 52896d4 to ec86efd Compare May 15, 2026 05:00

chatgpt-codex-connector Bot reviewed May 15, 2026

View reviewed changes

danielhanchen force-pushed the fix-5410-moe-merge-layout branch from b0112e5 to 6b0f75e Compare May 15, 2026 07:17

danielhanchen requested a review from rolandtannous as a code owner May 15, 2026 07:17

danielhanchen force-pushed the ci-pr5410-regression-detectors branch from ec86efd to c5c7a86 Compare May 15, 2026 07:20

danielhanchen merged commit 0da895c into fix-5410-moe-merge-layout May 15, 2026
7 of 11 checks passed

danielhanchen mentioned this pull request May 16, 2026

tests: CPU regression detectors for the MoE merge / save path (#5410) #655

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

tests: CPU regression detectors for the MoE merge / save path (#5410)#649

tests: CPU regression detectors for the MoE merge / save path (#5410)#649
danielhanchen merged 1 commit into
fix-5410-moe-merge-layoutfrom
ci-pr5410-regression-detectors

danielhanchen commented May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

gemini-code-assist Bot May 15, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		accepted = {"fused_3d", "module_list"}
		assert kind in accepted, (

Conversation

danielhanchen commented May 15, 2026

Summary

Why CI did not catch the original bug

What this PR adds

Local results

Cost

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant