tests: CPU regression detectors for the MoE merge / save path (#5410)#649
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 52896d4ac6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| peft_model = get_peft_model(base, cfg) | ||
| except Exception as e: | ||
| pytest.skip(f"PEFT failed to wrap fused 3D param on this build: {e}") |
There was a problem hiding this comment.
Fail when PEFT can no longer wrap fused params
Because this file is now part of the hard upstream-regression matrix, skipping every get_peft_model exception masks exactly the PEFT drift this detector is supposed to catch: once target_parameters is accepted, a future PEFT release that changes or breaks fused-3D ParamWrapper creation will make the test report skipped instead of failing. The earlier LoraConfig(... target_parameters=...) probe already handles unsupported pre-0.18 builds, so wrapping failures in supported matrix cells should be assertions/failures rather than skips.
Useful? React with 👍 / 👎.
| accepted = {"fused_3d", "module_list"} | ||
| assert kind in accepted, ( |
There was a problem hiding this comment.
Pin the expected MoE container kind
This accepts both fused_3d and module_list, so a transformers update that flips a tracked architecture between those two layouts remains green even though the test description says that exact flip should be caught. Since the workflow now relies on this as a drift detector, the expected kind needs to be version/architecture-specific (or otherwise asserted) rather than allowing both supported layouts unconditionally.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Code Review
This pull request introduces three new test suites designed to validate Mixture of Experts (MoE) merging logic and detect structural drifts in external dependencies. The added tests cover end-to-end CPU regression for LoRA delta applications, PEFT ParamWrapper layout changes, and Transformers MoE expert container refactors. Feedback suggests moving the manual seed initialization out of helper functions to increase test data diversity and utilizing pytest fixtures for more reliable global state management.
| def _build_synthetic_layer(num_experts: int, rank_per: int, hidden: int, | ||
| intermediate: int, layout: str, alpha: float, dtype=torch.float32): | ||
| """Make: fused base weights + per-expert disk shards + LoRA tensors.""" | ||
| torch.manual_seed(SEED) |
There was a problem hiding this comment.
Calling torch.manual_seed(SEED) inside _build_synthetic_layer resets the random number generator state every time a layer is built. In test_per_layer_merge_round_trip, this results in all layers having identical weights and LoRA matrices, which reduces the diversity of the test data. It is recommended to set the seed once at the start of the test function instead so that subsequent calls to this helper generate different random values.
| torch.manual_seed(SEED) | |
| # torch.manual_seed(SEED) |
| alpha = 8.0 | ||
| dtype = torch.float32 | ||
|
|
||
| _reset_moe_merge_state() |
There was a problem hiding this comment.
The test manually resets the global _MOE_MERGE_STATE at the beginning and end of the function. To ensure the state is always cleaned up (even if an assertion fails) and to follow pytest best practices, consider using a fixture with autouse=True for state management. This improves test isolation and reliability, especially if tests are ever run in parallel within the same process.
52896d4 to
ec86efd
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ec86efdefb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| try: | ||
| mod = __import__(module_path, fromlist=[block_cls_name, cfg_cls_name]) | ||
| except Exception as e: | ||
| pytest.skip(f"{module_path} not importable: {e}") |
There was a problem hiding this comment.
Fail when a tracked MoE module stops importing
When a tracked architecture's import path breaks or is renamed while another tracked module still imports, this skips the broken parametrized case and test_at_least_one_tracked_moe_arch_imports still passes because it only requires one successful import. In the matrix you just added this file to, that means a Qwen3/Mixtral upstream drift can disappear silently instead of failing the canary; treat failures for expected tracked entries as assertions or gate skips by known unsupported transformer versions.
Useful? React with 👍 / 👎.
b0112e5 to
6b0f75e
Compare
Adds four CPU-only signals so the upstream drift matrix catches the class of bug that produced unslothai/unsloth#5410 before it ships: 1. Add tests/test_unsloth_zoo_lora_merge.py to the drift-matrix pytest invocation (consolidated-tests-ci.yml). The 22 helper-level merge tests already cover both PEFT 0.18 swapped and PEFT 0.19+ standard layouts, the fallback counter, and the resolver walk; they were only ever exercised in the default cell. Now they fire across all three matrix combos (HF 4.57.6 + TRL<1, HF latest + TRL latest, pyproject defaults). 2. tests/test_peft_paramwrapper_layout_drift.py - constructs a 4-line nn.Module with a single fused 3D nn.Parameter, attaches a real peft LoRA via target_parameters, and asserts the lora_A / lora_B shapes match either the swapped or standard signature. A third layout convention from a future PEFT release fires this test on the day the new pin lands in the matrix. 3. tests/test_transformers_moe_structure_drift.py - imports each tracked MoE block class (Qwen3MoeSparseMoeBlock, MixtralSparseMoeBlock) from the installed transformers, instantiates a tiny config, and pins whether the experts container is fused 3D nn.Parameter or per-expert nn.ModuleList. The same kind of refactor that flipped Qwen3MoE from ModuleList to fused 3D between 4.57.x and 5.x will fire this test on the next refactor. 4. tests/test_moe_merge_e2e_cpu.py - drives the per-expert merge helpers in the same order the inner shard loop does, for a synthetic 2-layer / 4-expert state dict, in both PEFT layouts. Asserts all 3 * E * L tensors receive the analytic delta, the _MOE_MERGE_STATE counter records each apply, fallbacks land on unrecognised shapes, the resolver walks the outer ParamWrapper chain, and a self-cycle in `module.base_layer` terminates. All four files are pure CPU, sub-second, no model download. Total matrix overhead: <2s per cell, three cells, 33 tests. Verified locally (33 / 33 PASS) and confirmed that the e2e file fails to import on pre-#5410 main (ImportError: _MOE_MERGE_STATE), proving the detector triggers if any future PR weakens the loud-fail guard.
ec86efd to
c5c7a86
Compare
0da895c
into
fix-5410-moe-merge-layout
Summary
Adds four CPU-only signals so the upstream-drift matrix catches the class of bug that produced unslothai/unsloth#5410 before it ships. Stacked on top of #647: the two
_MOE_MERGE_STATE/_detect_moe_lora_layoutsymbols only exist after #647 lands. Base will retarget tomainonce #647 merges.Why CI did not catch the original bug
unsloth-zoo/.github/workflows/consolidated-tests-ci.ymlalready runs a three-cell drift matrix:transformers==4.57.6peft>=0.18,<0.20transformers>=5,<6peft(latest)This is the right matrix to catch #5410. But the pytest invocation only listed 12 drift-detector files;
tests/test_unsloth_zoo_lora_merge.py(the only file that actually covers the merge helpers) was never run inside the matrix. Sopeft 0.19could land in the matrix without exercising the helpers that broke.What this PR adds
1. Re-route the merge battery through the matrix. One-line workflow change at line ~234 of
consolidated-tests-ci.ymladdstests/test_unsloth_zoo_lora_merge.py(already 22 cases covering both PEFT layouts, the fallback counter, and the resolver walk, after #647) plus the three new files below to the matrix pytest list.2.
tests/test_peft_paramwrapper_layout_drift.py(45 lines). Builds a 4-linenn.Modulewith a single fused 3Dnn.Parameter, attaches a realpeftLoRA viatarget_parameters=[\"gate_up_proj\"], then asserts the resultinglora_A/lora_Bshapes match either the swapped or standard signature. A third layout convention from a future PEFT release fires this test on the day the new pin lands in the matrix, with a message naming both the observed and expected shapes.3.
tests/test_transformers_moe_structure_drift.py(~120 lines). Imports each tracked MoE block class (Qwen3MoeSparseMoeBlock,MixtralSparseMoeBlock) from the installedtransformers, instantiates a tiny config, and pins whether theexpertscontainer is a fused 3Dnn.Parameter(fused_3d) or per-expertnn.ModuleList(module_list). The same kind of refactor that flipped Qwen3MoE fromModuleListto fused 3D between 4.57.x and 5.x will fire this test on the next refactor.4.
tests/test_moe_merge_e2e_cpu.py(~180 lines, 6 parametrized cases). Drives the per-expert merge helpers in the same order the inner shard loop does, for a synthetic 2-layer / 4-expert state dict, in both PEFT layouts. Asserts:3 * E * Ltensors receive the analytic delta (sub-1e-4max error)._MOE_MERGE_STATE[\"applied\"]matches the expected apply count and\"fallback\"is 0.first_error.ParamWrapperwith.module = Noneto readnum_expertsoff the inner wrapper (this is the exact regression that produced 935 silent fallbacks before the resolver fix in saving: layout-aware MoE LoRA merge + loud-fail on fallback (#5410) #647).module.base_layer is moduleself-cycle terminates within the 16-hop bound._detect_moe_lora_layoutreturns\"unknown\"for a shard-localnum_experts=17againsttotal_rank=128*r, so the per-expert helper records a fallback rather than producing a silently-wrong slice.Local results
Also confirmed that on pre-#647
main,test_moe_merge_e2e_cpu.pyfails to import (ImportError: cannot import name '_MOE_MERGE_STATE'), proving the detector triggers if any future PR weakens or removes the loud-fail guard.Cost
All four files are pure CPU, no model download, no GPU. Per matrix cell: ~33 extra tests in well under 2 seconds. Three cells, so under 6 seconds added to the total CI wall time.
Test plan
pytest tests/test_unsloth_zoo_lora_merge.py tests/test_peft_paramwrapper_layout_drift.py tests/test_transformers_moe_structure_drift.py tests/test_moe_merge_e2e_cpu.py-> 33 / 33 PASSunsloth_zoo/saving_utils.pyto pre-saving: layout-aware MoE LoRA merge + loud-fail on fallback (#5410) #647mainlocally; confirmed the e2e file fails to import and the detector class as a whole correctly goes red