chore: trim verbose comments across PR #637 landing by danielhanchen · Pull Request #640 · unslothai/unsloth-zoo

danielhanchen · 2026-05-14T10:19:21Z

Follow-up to #637. The squashed merge brought in a lot of new test code and workflow YAML where the comments had run a bit long. This PR trims multi-paragraph rationale paragraphs down to 1 to 3 line summaries while keeping everything that earns its keep: license headers, PR and commit citations, zoo file:line callsites, idempotence/no-op guarantees, the DRIFT DETECTED framing, and HARD GATE markers.

Net change

18 files, plus 1541 / minus 4233 lines (net minus 2692).

What got trimmed where

Tests (9 files):

test_zoo_history_regressions_deep.py: minus 327
test_upstream_import_fixes_drift.py: minus 141
test_zoo_source_upstream_refs.py: minus 90
test_upstream_signatures.py: minus 122
test_upstream_source_patterns.py: minus 305
test_extended_dep_api_pins.py: minus 205
test_compiler_rewriter_exhaustive.py: minus 485
test_compiler_dynamic_exec.py: minus 142
test_temporary_patches_exhaustive.py: minus 366
test_upstream_pinned_symbols_transformers.py: minus 125

Source (3 files):

unsloth_zoo/init.py: minus 9 (apply_import_fixes wiring comments)
unsloth_zoo/patching_utils.py: minus 5 (compiled_autograd disable rename block)
tests/conftest.py: minus 49 (_patch_torch_cuda_for_import and import_fixes hook docstrings)

Workflows (5 files):

consolidated-tests-ci.yml: minus 262 (collapsed the 12-file 626-test enumeration to a one-liner; HARD GATE label kept)
security-audit.yml: minus 59
wheel-smoke.yml: minus 23
lint-ci.yml: minus 47
mlx-ci.yml: minus 26

Local verification

```
pytest tests/test_upstream_pinned_symbols_*.py \
tests/test_zoo_history_regressions_deep.py \
tests/test_upstream_import_fixes_drift.py \
tests/test_zoo_source_upstream_refs.py \
tests/test_upstream_signatures.py \
tests/test_extended_dep_api_pins.py \
tests/test_upstream_source_patterns.py \
tests/test_compiler_rewriter_exhaustive.py \
tests/test_compiler_dynamic_exec.py \
tests/test_temporary_patches_exhaustive.py \
tests/security
```

614 passed, 32 skipped, 0 failed (same as on main pre-trim).

YAML round-trip on all 5 workflow files OK. scripts/lint_workflow_triggers.py clean (6 files scanned, no pull_request_target, no unjustified workflow_run, no PR/publish cache-key collision).

Test plan

Confirm pytest pass-count holds across the 12 drift-detector test files plus tests/security.
Confirm YAML round-trip on the 5 workflow files.
Confirm scripts/lint_workflow_triggers.py passes.
Spot-check that PR/commit citations and zoo file:line references are preserved in trimmed docstrings.

gemini-code-assist

Code Review

This pull request focuses on refactoring and cleaning up docstrings and comments across the tests/ and unsloth_zoo/ directories to improve clarity and conciseness. The code review provides an actionable suggestion to simplify the logic in tests/test_temporary_patches_exhaustive.py by removing a redundant conditional check in the _resolve_upstream_method function.

gemini-code-assist · 2026-05-14T10:26:21Z

    qualname = getattr(live, "__qualname__", "") or ""
    if ".<locals>." in qualname and qualname.split(".", 1)[0].startswith("patch_"):
-        # zoo patch wrapper, but no _original_ stash on the class. This
-        # is rare (force=True + store_original=False) but possible. Fall
-        # through; caller will skip cleanly via _maybe_skip_if_patched.
+        # zoo patch wrapper with no _original_ stash (rare: force=True +
+        # store_original=False); _maybe_skip_if_patched handles cleanly.
        return live
    return live


The if block checking for a zoo patch wrapper by __qualname__ is redundant because both branches return live. Additionally, for robustness when identifying patched functions, identity checks are preferred over string-matching attributes like __qualname__. This logic can be simplified to a direct return.

Suggested change

qualname = getattr(live, "__qualname__", "") or ""

if ".<locals>." in qualname and qualname.split(".", 1)[0].startswith("patch_"):

# zoo patch wrapper, but no _original_ stash on the class. This

# is rare (force=True + store_original=False) but possible. Fall

# through; caller will skip cleanly via _maybe_skip_if_patched.

# zoo patch wrapper with no _original_ stash (rare: force=True +

# store_original=False); _maybe_skip_if_patched handles cleanly.

return live

return live

return live

References

When unpatching a function, perform an identity check to ensure the function being replaced is the one originally patched by your code. This is more robust than relying on the state of other modules.

Verifier scripts/verify_trim_comment_only.py flagged 4 non-comment changes in PR #640 (em-dashes replaced in assertion strings, a defensive 'consumed = 0' init, an unused-variable removal, an added 'return' statement in a helper). Restore each so the PR is strictly comment + docstring + whitespace cleanup, nothing else. - test_upstream_pinned_symbols_transformers.py: restore the em-dash and the parenthetical detail in two assertion messages - test_upstream_source_patterns.py: drop the 'consumed = 0' initialization line and restore the 'needle = ...' literal - test_zoo_source_upstream_refs.py: drop the 'return' that was added to _skip_if_missing (its annotation says -> None)

Compress contract preambles and per-test docstrings while keeping DRIFT-DETECTED framing, PR/commit citations, zoo file:line callsite references, and non-obvious WHY notes. Pytest pass-count unchanged.

Compact docstrings and section headers across 4 drift-detector test files. Preserves license headers, zoo file:line citations, DRIFT DETECTED framing, and PR references; cuts multi-paragraph re-statements, methodology essays, and verbose contract descriptions. Reductions: tests/test_compiler_rewriter_exhaustive.py 2633 -> 2148 tests/test_compiler_dynamic_exec.py 831 -> 689 tests/test_temporary_patches_exhaustive.py 2612 -> 2246 tests/test_upstream_pinned_symbols_transformers.py 562 -> 437 pytest pass-count unchanged: 322 passed, 24 skipped.

Trim PR-#637-era explanatory comments to focus on load-bearing details (file:line citations, PR references, idempotence guarantees, HARD GATE markers). Code behaviour is unchanged. Files touched: - unsloth_zoo/patching_utils.py (compiled_autograd disable rename) - unsloth_zoo/__init__.py (apply_import_fixes wiring) - tests/conftest.py (cuda-stub + import_fixes harness blocks) - .github/workflows/consolidated-tests-ci.yml - .github/workflows/security-audit.yml - .github/workflows/lint-ci.yml - .github/workflows/wheel-smoke.yml - .github/workflows/mlx-ci.yml

Verifier scripts/verify_trim_comment_only.py flagged 4 non-comment changes in PR #640 (em-dashes replaced in assertion strings, a defensive 'consumed = 0' init, an unused-variable removal, an added 'return' statement in a helper). Restore each so the PR is strictly comment + docstring + whitespace cleanup, nothing else. - test_upstream_pinned_symbols_transformers.py: restore the em-dash and the parenthetical detail in two assertion messages - test_upstream_source_patterns.py: drop the 'consumed = 0' initialization line and restore the 'needle = ...' literal - test_zoo_source_upstream_refs.py: drop the 'return' that was added to _skip_if_missing (its annotation says -> None)

…nslothai#641) scripts/verify_comment_only_diff.py compares a list of changed files between two git refs and reports whether each diff is strictly comments or docstrings. * .py: parse both revs into AST, strip module / class / function docstrings, then compare ast.unparse output. Pure Python comments are discarded by ast.parse by construction, so any post-strip diff is real code. * .yml / .yaml: yaml.safe_load both sides and compare the parsed Python object; if scalar values differ, also strip shell comments inside any multi-line scalar (i.e. `run: |` script bodies) before comparing. Exit code is 0 if every file is comment-only, 1 otherwise. The script also prints a tight diff snippet for any FAIL line so a reviewer can spot the real code change at a glance. This is what I used to gate the trim PRs unslothai#637 / unslothai#640 (zoo) and #5416 / #5418 / #5421 (unsloth). Shipping it under scripts/ so any contributor can deterministically prove a comment / docstring refactor is truly comment-only, without manually eyeballing every line of a 4000-line diff. Usage: python scripts/verify_comment_only_diff.py [--base REF] [--head REF] path ... Defaults: --base origin/main, --head HEAD. Paths are repo-relative. Smoke test against the squash-merged PR unslothai#640 (an 18-file pure trim): git diff --name-only 02875d0~1..02875d0 \ | xargs python scripts/verify_comment_only_diff.py --base 02875d0~1 --head 02875d0 reports OK for all 18 files.

Tighten the docstrings and inline comments added by the layout-aware MoE merge work so the diff is closer to the surrounding house style (see chore #640). No behaviour change; 22 / 22 merge tests still pass.

…647) * saving: layout-aware MoE LoRA merge + loud-fail on fallback (#5410) `save_pretrained_merged(save_method="merged_16bit")` silently dropped the entire MoE expert LoRA delta on Qwen3-MoE / Qwen3.5-MoE-style models with peft >= 0.19.1. The per-expert helpers in `saving_utils.py` hardcoded the PEFT 0.18 "swapped" tensor layout (`lora_A: (E*r, 2I)`, `lora_B: (H, E*r)` for gate_up_proj; `lora_A: (E*r, H)`, `lora_B: (I, E*r)` for down_proj), while PEFT 0.19+ swaps in/out features for non-transposed 3D parameters and produces `lora_A: (E*r, H)`, `lora_B: (2I, E*r)` and `lora_A: (E*r, I)`, `lora_B: (H, E*r)`. The layout mismatch hit a bare `except Exception: return W` and the dim-heuristic fallthrough in the fused helpers, so the merge silently wrote unmodified base weights and reported success. The `num_experts` value used by the per-expert loop was also taken from the shard-local key scan, which is a non-divisor of `total_rank` whenever experts are split across multiple safetensor shards (16/17 of 128 on Qwen3-30B-A3B). Finally the merged dir was missing `generation_config.json`, so chat-tuned models reloaded with default eos / sampling and ran past EOS. Changes: - `_detect_moe_lora_layout(lora_A, lora_B, num_experts, out_dim, in_dim)` classifies the layout by shape against the per-expert disk weight, so no version sniffing is required. Works on transformers 4.57.x / 5.x and peft 0.18.x / 0.19.x. - `_merge_moe_gate_or_up_expert` and `_merge_moe_down_proj_expert` branch on the detected layout. The "swapped" path is byte-identical to the previous behaviour. - `_resolve_num_experts_from_lora_stats` walks `module -> base_layer -> ...` to read the authoritative `num_experts` off the wrapped MoE module (`Qwen3MoeExperts` etc). `_merge_and_overwrite_lora` uses it to override `moe_num_experts[prefix]` after the converted-key build, so the per-expert loop never trips on a shard-local count. - `_MOE_MERGE_STATE` tracks `(attempted, applied, fallback, first_error)`. Each helper records a fallback with role / expert / shapes / reason on any unrecognised layout or exception. After the shard loop `merge_and_overwrite_lora` raises `RuntimeError` if any fallback fired, so partially-merged checkpoints can no longer be silently written. On success it prints `applied/attempted`. - The `merged_16bit` branch also calls `model.generation_config.save_pretrained(save_directory)` (best-effort, matching `fix_tokenizer_config_json`). Tests: - Existing 16 per-expert / fused / dense merge tests in `test_unsloth_zoo_lora_merge.py` still pass byte-for-byte (PEFT 0.18 swapped layout is the default branch). - 6 new tests: * standard layout for `_merge_moe_gate_expert`, `_merge_moe_up_expert`, `_merge_moe_down_proj_expert`, * layout classifier for both conventions and the unknown cases, * fallback counter increments and `first_error` populates on unrecognised shapes, * `_resolve_num_experts_from_lora_stats` walks the `base_layer` chain. End-to-end verification on Qwen3-30B-A3B (128 experts x 48 layers, fused 3D in memory, per-expert 2D on disk), full SFT + save + reload + logit compare: | transformers | peft | trl | merged tensors | trained vs merged KL | samples | |--------------|--------|--------|----------------|----------------------|---------| | 5.5.0 | 0.19.1 | 0.25.1 | 18432 / 18432 | 1.6e-5 | 3 / 3 | | 5.5.0 | 0.18.1 | 0.25.1 | 18432 / 18432 | 1.3e-5 | 3 / 3 | | 4.57.6 | 0.19.1 | 0.25.1 | dense path | 5.5e-5 | 3 / 3 | | 5.5.0 | 0.19.1 | 1.4.0 | 18432 / 18432 | 2.1e-4 | 3 / 3 | Before the patch the M1 row was KL=1.86, samples=1/3, and 0/18432 expert LoRA deltas were applied. transformers 4.57.6 has `experts = nn.ModuleList (Qwen3MoeMLP)` (no fused 3D parameter) so the MoE merge helpers do not fire and every per-expert Linear takes the standard dense `_merge_lora` path. The MoE helpers are unreachable on transformers <5; the patch only affects the path that produces the bug. Fixes unslothai/unsloth#5410. Likely also resolves unslothai/unsloth#4832 (same author, same "garbage after save_pretrained_merged reload" symptom on DevStral Small 2). * saving: bound _resolve_num_experts walk + harden against hostile attrs The base_layer walk in _resolve_num_experts_from_lora_stats was an unbounded `while module is not None` loop. PEFT's ParamWrapper does not self-reference in practice, but a self-referential or cyclic `base_layer` chain would hang the merge. Bound the walk to 16 hops, dedupe via an id() set, and swallow exceptions on getattr / getattr-of-attrs so a hostile module that raises on attribute access cannot abort the merge. Confirmed by a synthetic suite (52 cases) across three isolated venvs: peft 0.18.1 + transformers 5.5.0, peft 0.19.1 + transformers 5.5.0, peft 0.19.1 + transformers 4.57.6. All 22 existing merge tests still pass byte-for-byte in each. * saving: trim verbose comments per maintainer style Tighten the docstrings and inline comments added by the layout-aware MoE merge work so the diff is closer to the surrounding house style (see chore #640). No behaviour change; 22 / 22 merge tests still pass. --------- Co-authored-by: Daniel Han-Chen <info@unsloth.ai>

Tighten the docstrings and inline comments added by the layout-aware MoE merge work so the diff is closer to the surrounding house style (see chore unslothai#640). No behaviour change; 22 / 22 merge tests still pass.

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

danielhanchen added 5 commits May 14, 2026 11:04

chore: trim verbose comments in test_zoo_history_regressions_deep

4f92ff9

tests: trim verbose comments in PR #637 drift detectors

685c54f

Compress contract preambles and per-test docstrings while keeping DRIFT-DETECTED framing, PR/commit citations, zoo file:line callsite references, and non-obvious WHY notes. Pytest pass-count unchanged.

danielhanchen force-pushed the chore/trim-pr637-comments-final branch from a46e812 to 1400a63 Compare May 14, 2026 11:20

danielhanchen merged commit 02875d0 into main May 14, 2026
15 of 18 checks passed

danielhanchen deleted the chore/trim-pr637-comments-final branch May 14, 2026 11:21

danielhanchen mentioned this pull request May 14, 2026

scripts: ship deterministic comment / docstring-only diff verifier #641

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: trim verbose comments across PR #637 landing#640

chore: trim verbose comments across PR #637 landing#640
danielhanchen merged 5 commits into
mainfrom
chore/trim-pr637-comments-final

danielhanchen commented May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

danielhanchen commented May 14, 2026

Net change

What got trimmed where

Local verification

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant