Skip to content

chore: trim verbose comments across PR #637 landing#640

Merged
danielhanchen merged 5 commits into
mainfrom
chore/trim-pr637-comments-final
May 14, 2026
Merged

chore: trim verbose comments across PR #637 landing#640
danielhanchen merged 5 commits into
mainfrom
chore/trim-pr637-comments-final

Conversation

@danielhanchen

Copy link
Copy Markdown
Member

Follow-up to #637. The squashed merge brought in a lot of new test code and workflow YAML where the comments had run a bit long. This PR trims multi-paragraph rationale paragraphs down to 1 to 3 line summaries while keeping everything that earns its keep: license headers, PR and commit citations, zoo file:line callsites, idempotence/no-op guarantees, the DRIFT DETECTED framing, and HARD GATE markers.

Net change

18 files, plus 1541 / minus 4233 lines (net minus 2692).

What got trimmed where

Tests (9 files):

  • test_zoo_history_regressions_deep.py: minus 327
  • test_upstream_import_fixes_drift.py: minus 141
  • test_zoo_source_upstream_refs.py: minus 90
  • test_upstream_signatures.py: minus 122
  • test_upstream_source_patterns.py: minus 305
  • test_extended_dep_api_pins.py: minus 205
  • test_compiler_rewriter_exhaustive.py: minus 485
  • test_compiler_dynamic_exec.py: minus 142
  • test_temporary_patches_exhaustive.py: minus 366
  • test_upstream_pinned_symbols_transformers.py: minus 125

Source (3 files):

  • unsloth_zoo/init.py: minus 9 (apply_import_fixes wiring comments)
  • unsloth_zoo/patching_utils.py: minus 5 (compiled_autograd disable rename block)
  • tests/conftest.py: minus 49 (_patch_torch_cuda_for_import and import_fixes hook docstrings)

Workflows (5 files):

  • consolidated-tests-ci.yml: minus 262 (collapsed the 12-file 626-test enumeration to a one-liner; HARD GATE label kept)
  • security-audit.yml: minus 59
  • wheel-smoke.yml: minus 23
  • lint-ci.yml: minus 47
  • mlx-ci.yml: minus 26

Local verification

```
pytest tests/test_upstream_pinned_symbols_*.py \
tests/test_zoo_history_regressions_deep.py \
tests/test_upstream_import_fixes_drift.py \
tests/test_zoo_source_upstream_refs.py \
tests/test_upstream_signatures.py \
tests/test_extended_dep_api_pins.py \
tests/test_upstream_source_patterns.py \
tests/test_compiler_rewriter_exhaustive.py \
tests/test_compiler_dynamic_exec.py \
tests/test_temporary_patches_exhaustive.py \
tests/security
```

614 passed, 32 skipped, 0 failed (same as on main pre-trim).

YAML round-trip on all 5 workflow files OK. scripts/lint_workflow_triggers.py clean (6 files scanned, no pull_request_target, no unjustified workflow_run, no PR/publish cache-key collision).

Test plan

  • Confirm pytest pass-count holds across the 12 drift-detector test files plus tests/security.
  • Confirm YAML round-trip on the 5 workflow files.
  • Confirm scripts/lint_workflow_triggers.py passes.
  • Spot-check that PR/commit citations and zoo file:line references are preserved in trimmed docstrings.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request focuses on refactoring and cleaning up docstrings and comments across the tests/ and unsloth_zoo/ directories to improve clarity and conciseness. The code review provides an actionable suggestion to simplify the logic in tests/test_temporary_patches_exhaustive.py by removing a redundant conditional check in the _resolve_upstream_method function.

Comment on lines 107 to 112
qualname = getattr(live, "__qualname__", "") or ""
if ".<locals>." in qualname and qualname.split(".", 1)[0].startswith("patch_"):
# zoo patch wrapper, but no _original_ stash on the class. This
# is rare (force=True + store_original=False) but possible. Fall
# through; caller will skip cleanly via _maybe_skip_if_patched.
# zoo patch wrapper with no _original_ stash (rare: force=True +
# store_original=False); _maybe_skip_if_patched handles cleanly.
return live
return live

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The if block checking for a zoo patch wrapper by __qualname__ is redundant because both branches return live. Additionally, for robustness when identifying patched functions, identity checks are preferred over string-matching attributes like __qualname__. This logic can be simplified to a direct return.

Suggested change
qualname = getattr(live, "__qualname__", "") or ""
if ".<locals>." in qualname and qualname.split(".", 1)[0].startswith("patch_"):
# zoo patch wrapper, but no _original_ stash on the class. This
# is rare (force=True + store_original=False) but possible. Fall
# through; caller will skip cleanly via _maybe_skip_if_patched.
# zoo patch wrapper with no _original_ stash (rare: force=True +
# store_original=False); _maybe_skip_if_patched handles cleanly.
return live
return live
return live
References
  1. When unpatching a function, perform an identity check to ensure the function being replaced is the one originally patched by your code. This is more robust than relying on the state of other modules.

danielhanchen added a commit that referenced this pull request May 14, 2026
Verifier scripts/verify_trim_comment_only.py flagged 4 non-comment
changes in PR #640 (em-dashes replaced in assertion strings, a
defensive 'consumed = 0' init, an unused-variable removal, an
added 'return' statement in a helper). Restore each so the PR is
strictly comment + docstring + whitespace cleanup, nothing else.

- test_upstream_pinned_symbols_transformers.py: restore the
  em-dash and the parenthetical detail in two assertion messages
- test_upstream_source_patterns.py: drop the 'consumed = 0'
  initialization line and restore the 'needle = ...' literal
- test_zoo_source_upstream_refs.py: drop the 'return' that was
  added to _skip_if_missing (its annotation says -> None)
Compress contract preambles and per-test docstrings while keeping
DRIFT-DETECTED framing, PR/commit citations, zoo file:line callsite
references, and non-obvious WHY notes. Pytest pass-count unchanged.
Compact docstrings and section headers across 4 drift-detector test
files. Preserves license headers, zoo file:line citations, DRIFT
DETECTED framing, and PR references; cuts multi-paragraph
re-statements, methodology essays, and verbose contract descriptions.

Reductions:
  tests/test_compiler_rewriter_exhaustive.py     2633 -> 2148
  tests/test_compiler_dynamic_exec.py             831 ->  689
  tests/test_temporary_patches_exhaustive.py     2612 -> 2246
  tests/test_upstream_pinned_symbols_transformers.py  562 ->  437

pytest pass-count unchanged: 322 passed, 24 skipped.
Trim PR-#637-era explanatory comments to focus on load-bearing details
(file:line citations, PR references, idempotence guarantees, HARD GATE
markers). Code behaviour is unchanged.

Files touched:
- unsloth_zoo/patching_utils.py (compiled_autograd disable rename)
- unsloth_zoo/__init__.py (apply_import_fixes wiring)
- tests/conftest.py (cuda-stub + import_fixes harness blocks)
- .github/workflows/consolidated-tests-ci.yml
- .github/workflows/security-audit.yml
- .github/workflows/lint-ci.yml
- .github/workflows/wheel-smoke.yml
- .github/workflows/mlx-ci.yml
Verifier scripts/verify_trim_comment_only.py flagged 4 non-comment
changes in PR #640 (em-dashes replaced in assertion strings, a
defensive 'consumed = 0' init, an unused-variable removal, an
added 'return' statement in a helper). Restore each so the PR is
strictly comment + docstring + whitespace cleanup, nothing else.

- test_upstream_pinned_symbols_transformers.py: restore the
  em-dash and the parenthetical detail in two assertion messages
- test_upstream_source_patterns.py: drop the 'consumed = 0'
  initialization line and restore the 'needle = ...' literal
- test_zoo_source_upstream_refs.py: drop the 'return' that was
  added to _skip_if_missing (its annotation says -> None)
@danielhanchen danielhanchen force-pushed the chore/trim-pr637-comments-final branch from a46e812 to 1400a63 Compare May 14, 2026 11:20
@danielhanchen danielhanchen merged commit 02875d0 into main May 14, 2026
15 of 18 checks passed
@danielhanchen danielhanchen deleted the chore/trim-pr637-comments-final branch May 14, 2026 11:21
CodeMan62 pushed a commit to CodeMan62/unsloth-zoo that referenced this pull request May 14, 2026
…nslothai#641)

scripts/verify_comment_only_diff.py compares a list of changed files
between two git refs and reports whether each diff is strictly comments
or docstrings.

  * .py: parse both revs into AST, strip module / class / function
    docstrings, then compare ast.unparse output. Pure Python comments
    are discarded by ast.parse by construction, so any post-strip diff
    is real code.
  * .yml / .yaml: yaml.safe_load both sides and compare the parsed
    Python object; if scalar values differ, also strip shell comments
    inside any multi-line scalar (i.e. `run: |` script bodies) before
    comparing.

Exit code is 0 if every file is comment-only, 1 otherwise. The script
also prints a tight diff snippet for any FAIL line so a reviewer can
spot the real code change at a glance.

This is what I used to gate the trim PRs unslothai#637 / unslothai#640 (zoo) and #5416 /
#5418 / #5421 (unsloth). Shipping it under scripts/ so any contributor
can deterministically prove a comment / docstring refactor is truly
comment-only, without manually eyeballing every line of a 4000-line
diff.

Usage:

    python scripts/verify_comment_only_diff.py [--base REF] [--head REF] path ...

Defaults: --base origin/main, --head HEAD. Paths are repo-relative.

Smoke test against the squash-merged PR unslothai#640 (an 18-file pure trim):

    git diff --name-only 02875d0~1..02875d0 \
      | xargs python scripts/verify_comment_only_diff.py --base 02875d0~1 --head 02875d0

reports OK for all 18 files.
danielhanchen pushed a commit that referenced this pull request May 15, 2026
Tighten the docstrings and inline comments added by the layout-aware
MoE merge work so the diff is closer to the surrounding house style
(see chore #640). No behaviour change; 22 / 22 merge tests still pass.
danielhanchen pushed a commit that referenced this pull request May 15, 2026
Tighten the docstrings and inline comments added by the layout-aware
MoE merge work so the diff is closer to the surrounding house style
(see chore #640). No behaviour change; 22 / 22 merge tests still pass.
danielhanchen added a commit that referenced this pull request May 15, 2026
…647)

* saving: layout-aware MoE LoRA merge + loud-fail on fallback (#5410)

`save_pretrained_merged(save_method="merged_16bit")` silently dropped the
entire MoE expert LoRA delta on Qwen3-MoE / Qwen3.5-MoE-style models with
peft >= 0.19.1. The per-expert helpers in `saving_utils.py` hardcoded the
PEFT 0.18 "swapped" tensor layout (`lora_A: (E*r, 2I)`, `lora_B: (H, E*r)`
for gate_up_proj; `lora_A: (E*r, H)`, `lora_B: (I, E*r)` for down_proj),
while PEFT 0.19+ swaps in/out features for non-transposed 3D parameters
and produces `lora_A: (E*r, H)`, `lora_B: (2I, E*r)` and `lora_A: (E*r, I)`,
`lora_B: (H, E*r)`. The layout mismatch hit a bare `except Exception: return W`
and the dim-heuristic fallthrough in the fused helpers, so the merge
silently wrote unmodified base weights and reported success. The
`num_experts` value used by the per-expert loop was also taken from the
shard-local key scan, which is a non-divisor of `total_rank` whenever
experts are split across multiple safetensor shards (16/17 of 128 on
Qwen3-30B-A3B). Finally the merged dir was missing `generation_config.json`,
so chat-tuned models reloaded with default eos / sampling and ran past EOS.

Changes:

- `_detect_moe_lora_layout(lora_A, lora_B, num_experts, out_dim, in_dim)`
  classifies the layout by shape against the per-expert disk weight, so
  no version sniffing is required. Works on transformers 4.57.x / 5.x
  and peft 0.18.x / 0.19.x.
- `_merge_moe_gate_or_up_expert` and `_merge_moe_down_proj_expert`
  branch on the detected layout. The "swapped" path is byte-identical
  to the previous behaviour.
- `_resolve_num_experts_from_lora_stats` walks `module -> base_layer ->
  ...` to read the authoritative `num_experts` off the wrapped MoE
  module (`Qwen3MoeExperts` etc). `_merge_and_overwrite_lora` uses it
  to override `moe_num_experts[prefix]` after the converted-key build,
  so the per-expert loop never trips on a shard-local count.
- `_MOE_MERGE_STATE` tracks `(attempted, applied, fallback, first_error)`.
  Each helper records a fallback with role / expert / shapes / reason
  on any unrecognised layout or exception. After the shard loop
  `merge_and_overwrite_lora` raises `RuntimeError` if any fallback
  fired, so partially-merged checkpoints can no longer be silently
  written. On success it prints `applied/attempted`.
- The `merged_16bit` branch also calls
  `model.generation_config.save_pretrained(save_directory)` (best-effort,
  matching `fix_tokenizer_config_json`).

Tests:

- Existing 16 per-expert / fused / dense merge tests in
  `test_unsloth_zoo_lora_merge.py` still pass byte-for-byte (PEFT 0.18
  swapped layout is the default branch).
- 6 new tests:
  * standard layout for `_merge_moe_gate_expert`, `_merge_moe_up_expert`,
    `_merge_moe_down_proj_expert`,
  * layout classifier for both conventions and the unknown cases,
  * fallback counter increments and `first_error` populates on
    unrecognised shapes,
  * `_resolve_num_experts_from_lora_stats` walks the `base_layer` chain.

End-to-end verification on Qwen3-30B-A3B (128 experts x 48 layers,
fused 3D in memory, per-expert 2D on disk), full SFT + save + reload
+ logit compare:

| transformers | peft   | trl    | merged tensors | trained vs merged KL | samples |
|--------------|--------|--------|----------------|----------------------|---------|
| 5.5.0        | 0.19.1 | 0.25.1 | 18432 / 18432  | 1.6e-5               | 3 / 3   |
| 5.5.0        | 0.18.1 | 0.25.1 | 18432 / 18432  | 1.3e-5               | 3 / 3   |
| 4.57.6       | 0.19.1 | 0.25.1 | dense path     | 5.5e-5               | 3 / 3   |
| 5.5.0        | 0.19.1 | 1.4.0  | 18432 / 18432  | 2.1e-4               | 3 / 3   |

Before the patch the M1 row was KL=1.86, samples=1/3, and 0/18432 expert
LoRA deltas were applied. transformers 4.57.6 has `experts = nn.ModuleList
(Qwen3MoeMLP)` (no fused 3D parameter) so the MoE merge helpers do not
fire and every per-expert Linear takes the standard dense `_merge_lora`
path. The MoE helpers are unreachable on transformers <5; the patch only
affects the path that produces the bug.

Fixes unslothai/unsloth#5410. Likely also resolves unslothai/unsloth#4832
(same author, same "garbage after save_pretrained_merged reload" symptom
on DevStral Small 2).

* saving: bound _resolve_num_experts walk + harden against hostile attrs

The base_layer walk in _resolve_num_experts_from_lora_stats was an
unbounded `while module is not None` loop. PEFT's ParamWrapper does not
self-reference in practice, but a self-referential or cyclic `base_layer`
chain would hang the merge. Bound the walk to 16 hops, dedupe via an
id() set, and swallow exceptions on getattr / getattr-of-attrs so a
hostile module that raises on attribute access cannot abort the merge.

Confirmed by a synthetic suite (52 cases) across three isolated venvs:
peft 0.18.1 + transformers 5.5.0, peft 0.19.1 + transformers 5.5.0,
peft 0.19.1 + transformers 4.57.6. All 22 existing merge tests still
pass byte-for-byte in each.

* saving: trim verbose comments per maintainer style

Tighten the docstrings and inline comments added by the layout-aware
MoE merge work so the diff is closer to the surrounding house style
(see chore #640). No behaviour change; 22 / 22 merge tests still pass.

---------

Co-authored-by: Daniel Han-Chen <info@unsloth.ai>
Datta0 pushed a commit to sensai99/unsloth-zoo that referenced this pull request May 15, 2026
Tighten the docstrings and inline comments added by the layout-aware
MoE merge work so the diff is closer to the surrounding house style
(see chore unslothai#640). No behaviour change; 22 / 22 merge tests still pass.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant