chore: trim verbose comments across PR #637 landing#640
Merged
Conversation
Contributor
There was a problem hiding this comment.
Code Review
This pull request focuses on refactoring and cleaning up docstrings and comments across the tests/ and unsloth_zoo/ directories to improve clarity and conciseness. The code review provides an actionable suggestion to simplify the logic in tests/test_temporary_patches_exhaustive.py by removing a redundant conditional check in the _resolve_upstream_method function.
Comment on lines
107
to
112
| qualname = getattr(live, "__qualname__", "") or "" | ||
| if ".<locals>." in qualname and qualname.split(".", 1)[0].startswith("patch_"): | ||
| # zoo patch wrapper, but no _original_ stash on the class. This | ||
| # is rare (force=True + store_original=False) but possible. Fall | ||
| # through; caller will skip cleanly via _maybe_skip_if_patched. | ||
| # zoo patch wrapper with no _original_ stash (rare: force=True + | ||
| # store_original=False); _maybe_skip_if_patched handles cleanly. | ||
| return live | ||
| return live |
Contributor
There was a problem hiding this comment.
The if block checking for a zoo patch wrapper by __qualname__ is redundant because both branches return live. Additionally, for robustness when identifying patched functions, identity checks are preferred over string-matching attributes like __qualname__. This logic can be simplified to a direct return.
Suggested change
| qualname = getattr(live, "__qualname__", "") or "" | |
| if ".<locals>." in qualname and qualname.split(".", 1)[0].startswith("patch_"): | |
| # zoo patch wrapper, but no _original_ stash on the class. This | |
| # is rare (force=True + store_original=False) but possible. Fall | |
| # through; caller will skip cleanly via _maybe_skip_if_patched. | |
| # zoo patch wrapper with no _original_ stash (rare: force=True + | |
| # store_original=False); _maybe_skip_if_patched handles cleanly. | |
| return live | |
| return live | |
| return live |
References
- When unpatching a function, perform an identity check to ensure the function being replaced is the one originally patched by your code. This is more robust than relying on the state of other modules.
danielhanchen
added a commit
that referenced
this pull request
May 14, 2026
Verifier scripts/verify_trim_comment_only.py flagged 4 non-comment changes in PR #640 (em-dashes replaced in assertion strings, a defensive 'consumed = 0' init, an unused-variable removal, an added 'return' statement in a helper). Restore each so the PR is strictly comment + docstring + whitespace cleanup, nothing else. - test_upstream_pinned_symbols_transformers.py: restore the em-dash and the parenthetical detail in two assertion messages - test_upstream_source_patterns.py: drop the 'consumed = 0' initialization line and restore the 'needle = ...' literal - test_zoo_source_upstream_refs.py: drop the 'return' that was added to _skip_if_missing (its annotation says -> None)
Compress contract preambles and per-test docstrings while keeping DRIFT-DETECTED framing, PR/commit citations, zoo file:line callsite references, and non-obvious WHY notes. Pytest pass-count unchanged.
Compact docstrings and section headers across 4 drift-detector test files. Preserves license headers, zoo file:line citations, DRIFT DETECTED framing, and PR references; cuts multi-paragraph re-statements, methodology essays, and verbose contract descriptions. Reductions: tests/test_compiler_rewriter_exhaustive.py 2633 -> 2148 tests/test_compiler_dynamic_exec.py 831 -> 689 tests/test_temporary_patches_exhaustive.py 2612 -> 2246 tests/test_upstream_pinned_symbols_transformers.py 562 -> 437 pytest pass-count unchanged: 322 passed, 24 skipped.
Trim PR-#637-era explanatory comments to focus on load-bearing details (file:line citations, PR references, idempotence guarantees, HARD GATE markers). Code behaviour is unchanged. Files touched: - unsloth_zoo/patching_utils.py (compiled_autograd disable rename) - unsloth_zoo/__init__.py (apply_import_fixes wiring) - tests/conftest.py (cuda-stub + import_fixes harness blocks) - .github/workflows/consolidated-tests-ci.yml - .github/workflows/security-audit.yml - .github/workflows/lint-ci.yml - .github/workflows/wheel-smoke.yml - .github/workflows/mlx-ci.yml
Verifier scripts/verify_trim_comment_only.py flagged 4 non-comment changes in PR #640 (em-dashes replaced in assertion strings, a defensive 'consumed = 0' init, an unused-variable removal, an added 'return' statement in a helper). Restore each so the PR is strictly comment + docstring + whitespace cleanup, nothing else. - test_upstream_pinned_symbols_transformers.py: restore the em-dash and the parenthetical detail in two assertion messages - test_upstream_source_patterns.py: drop the 'consumed = 0' initialization line and restore the 'needle = ...' literal - test_zoo_source_upstream_refs.py: drop the 'return' that was added to _skip_if_missing (its annotation says -> None)
a46e812 to
1400a63
Compare
3 tasks
CodeMan62
pushed a commit
to CodeMan62/unsloth-zoo
that referenced
this pull request
May 14, 2026
…nslothai#641) scripts/verify_comment_only_diff.py compares a list of changed files between two git refs and reports whether each diff is strictly comments or docstrings. * .py: parse both revs into AST, strip module / class / function docstrings, then compare ast.unparse output. Pure Python comments are discarded by ast.parse by construction, so any post-strip diff is real code. * .yml / .yaml: yaml.safe_load both sides and compare the parsed Python object; if scalar values differ, also strip shell comments inside any multi-line scalar (i.e. `run: |` script bodies) before comparing. Exit code is 0 if every file is comment-only, 1 otherwise. The script also prints a tight diff snippet for any FAIL line so a reviewer can spot the real code change at a glance. This is what I used to gate the trim PRs unslothai#637 / unslothai#640 (zoo) and #5416 / #5418 / #5421 (unsloth). Shipping it under scripts/ so any contributor can deterministically prove a comment / docstring refactor is truly comment-only, without manually eyeballing every line of a 4000-line diff. Usage: python scripts/verify_comment_only_diff.py [--base REF] [--head REF] path ... Defaults: --base origin/main, --head HEAD. Paths are repo-relative. Smoke test against the squash-merged PR unslothai#640 (an 18-file pure trim): git diff --name-only 02875d0~1..02875d0 \ | xargs python scripts/verify_comment_only_diff.py --base 02875d0~1 --head 02875d0 reports OK for all 18 files.
danielhanchen
pushed a commit
that referenced
this pull request
May 15, 2026
Tighten the docstrings and inline comments added by the layout-aware MoE merge work so the diff is closer to the surrounding house style (see chore #640). No behaviour change; 22 / 22 merge tests still pass.
danielhanchen
pushed a commit
that referenced
this pull request
May 15, 2026
Tighten the docstrings and inline comments added by the layout-aware MoE merge work so the diff is closer to the surrounding house style (see chore #640). No behaviour change; 22 / 22 merge tests still pass.
danielhanchen
added a commit
that referenced
this pull request
May 15, 2026
…647) * saving: layout-aware MoE LoRA merge + loud-fail on fallback (#5410) `save_pretrained_merged(save_method="merged_16bit")` silently dropped the entire MoE expert LoRA delta on Qwen3-MoE / Qwen3.5-MoE-style models with peft >= 0.19.1. The per-expert helpers in `saving_utils.py` hardcoded the PEFT 0.18 "swapped" tensor layout (`lora_A: (E*r, 2I)`, `lora_B: (H, E*r)` for gate_up_proj; `lora_A: (E*r, H)`, `lora_B: (I, E*r)` for down_proj), while PEFT 0.19+ swaps in/out features for non-transposed 3D parameters and produces `lora_A: (E*r, H)`, `lora_B: (2I, E*r)` and `lora_A: (E*r, I)`, `lora_B: (H, E*r)`. The layout mismatch hit a bare `except Exception: return W` and the dim-heuristic fallthrough in the fused helpers, so the merge silently wrote unmodified base weights and reported success. The `num_experts` value used by the per-expert loop was also taken from the shard-local key scan, which is a non-divisor of `total_rank` whenever experts are split across multiple safetensor shards (16/17 of 128 on Qwen3-30B-A3B). Finally the merged dir was missing `generation_config.json`, so chat-tuned models reloaded with default eos / sampling and ran past EOS. Changes: - `_detect_moe_lora_layout(lora_A, lora_B, num_experts, out_dim, in_dim)` classifies the layout by shape against the per-expert disk weight, so no version sniffing is required. Works on transformers 4.57.x / 5.x and peft 0.18.x / 0.19.x. - `_merge_moe_gate_or_up_expert` and `_merge_moe_down_proj_expert` branch on the detected layout. The "swapped" path is byte-identical to the previous behaviour. - `_resolve_num_experts_from_lora_stats` walks `module -> base_layer -> ...` to read the authoritative `num_experts` off the wrapped MoE module (`Qwen3MoeExperts` etc). `_merge_and_overwrite_lora` uses it to override `moe_num_experts[prefix]` after the converted-key build, so the per-expert loop never trips on a shard-local count. - `_MOE_MERGE_STATE` tracks `(attempted, applied, fallback, first_error)`. Each helper records a fallback with role / expert / shapes / reason on any unrecognised layout or exception. After the shard loop `merge_and_overwrite_lora` raises `RuntimeError` if any fallback fired, so partially-merged checkpoints can no longer be silently written. On success it prints `applied/attempted`. - The `merged_16bit` branch also calls `model.generation_config.save_pretrained(save_directory)` (best-effort, matching `fix_tokenizer_config_json`). Tests: - Existing 16 per-expert / fused / dense merge tests in `test_unsloth_zoo_lora_merge.py` still pass byte-for-byte (PEFT 0.18 swapped layout is the default branch). - 6 new tests: * standard layout for `_merge_moe_gate_expert`, `_merge_moe_up_expert`, `_merge_moe_down_proj_expert`, * layout classifier for both conventions and the unknown cases, * fallback counter increments and `first_error` populates on unrecognised shapes, * `_resolve_num_experts_from_lora_stats` walks the `base_layer` chain. End-to-end verification on Qwen3-30B-A3B (128 experts x 48 layers, fused 3D in memory, per-expert 2D on disk), full SFT + save + reload + logit compare: | transformers | peft | trl | merged tensors | trained vs merged KL | samples | |--------------|--------|--------|----------------|----------------------|---------| | 5.5.0 | 0.19.1 | 0.25.1 | 18432 / 18432 | 1.6e-5 | 3 / 3 | | 5.5.0 | 0.18.1 | 0.25.1 | 18432 / 18432 | 1.3e-5 | 3 / 3 | | 4.57.6 | 0.19.1 | 0.25.1 | dense path | 5.5e-5 | 3 / 3 | | 5.5.0 | 0.19.1 | 1.4.0 | 18432 / 18432 | 2.1e-4 | 3 / 3 | Before the patch the M1 row was KL=1.86, samples=1/3, and 0/18432 expert LoRA deltas were applied. transformers 4.57.6 has `experts = nn.ModuleList (Qwen3MoeMLP)` (no fused 3D parameter) so the MoE merge helpers do not fire and every per-expert Linear takes the standard dense `_merge_lora` path. The MoE helpers are unreachable on transformers <5; the patch only affects the path that produces the bug. Fixes unslothai/unsloth#5410. Likely also resolves unslothai/unsloth#4832 (same author, same "garbage after save_pretrained_merged reload" symptom on DevStral Small 2). * saving: bound _resolve_num_experts walk + harden against hostile attrs The base_layer walk in _resolve_num_experts_from_lora_stats was an unbounded `while module is not None` loop. PEFT's ParamWrapper does not self-reference in practice, but a self-referential or cyclic `base_layer` chain would hang the merge. Bound the walk to 16 hops, dedupe via an id() set, and swallow exceptions on getattr / getattr-of-attrs so a hostile module that raises on attribute access cannot abort the merge. Confirmed by a synthetic suite (52 cases) across three isolated venvs: peft 0.18.1 + transformers 5.5.0, peft 0.19.1 + transformers 5.5.0, peft 0.19.1 + transformers 4.57.6. All 22 existing merge tests still pass byte-for-byte in each. * saving: trim verbose comments per maintainer style Tighten the docstrings and inline comments added by the layout-aware MoE merge work so the diff is closer to the surrounding house style (see chore #640). No behaviour change; 22 / 22 merge tests still pass. --------- Co-authored-by: Daniel Han-Chen <info@unsloth.ai>
Datta0
pushed a commit
to sensai99/unsloth-zoo
that referenced
this pull request
May 15, 2026
Tighten the docstrings and inline comments added by the layout-aware MoE merge work so the diff is closer to the surrounding house style (see chore unslothai#640). No behaviour change; 22 / 22 merge tests still pass.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to #637. The squashed merge brought in a lot of new test code and workflow YAML where the comments had run a bit long. This PR trims multi-paragraph rationale paragraphs down to 1 to 3 line summaries while keeping everything that earns its keep: license headers, PR and commit citations, zoo file:line callsites, idempotence/no-op guarantees, the DRIFT DETECTED framing, and HARD GATE markers.
Net change
18 files, plus 1541 / minus 4233 lines (net minus 2692).
What got trimmed where
Tests (9 files):
Source (3 files):
Workflows (5 files):
Local verification
```
pytest tests/test_upstream_pinned_symbols_*.py \
tests/test_zoo_history_regressions_deep.py \
tests/test_upstream_import_fixes_drift.py \
tests/test_zoo_source_upstream_refs.py \
tests/test_upstream_signatures.py \
tests/test_extended_dep_api_pins.py \
tests/test_upstream_source_patterns.py \
tests/test_compiler_rewriter_exhaustive.py \
tests/test_compiler_dynamic_exec.py \
tests/test_temporary_patches_exhaustive.py \
tests/security
```
614 passed, 32 skipped, 0 failed (same as on main pre-trim).
YAML round-trip on all 5 workflow files OK. scripts/lint_workflow_triggers.py clean (6 files scanned, no pull_request_target, no unjustified workflow_run, no PR/publish cache-key collision).
Test plan