fix embedding matrix size mismatch bug by CodeMan62 · Pull Request #645 · unslothai/unsloth-zoo

CodeMan62 · 2026-05-14T17:51:33Z

This PR is fixing the embedding matrix size mismatch bug
closes 3502
image shows that the patch works

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a0fbebd247

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-14T17:53:11Z


    # Confirm sizes are correct
-    if embedding_matrix.shape[0] != (old_input_length  + len(new_tokens)):
+    if embedding_matrix.shape[0] != new_vocab_length:


Keep validating the number of actually added tokens

When new_tokens contains duplicates such as ['<foo>', '<foo>'] with method='interpolation', tokenizer.add_tokens only adds one row, so new_vocab_length still matches the resized embedding size and this check now passes. The later interpolation loop still enumerates both entries and writes to embedding_matrix[old_length + j], causing an out-of-bounds write on the second duplicate (or initializing rows for tokens that were not added). Please compare new_vocab_length - old_length to the de-duplicated tokens actually added, or filter new_tokens accordingly before the later loop.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request refactors the add_new_tokens function in unsloth_zoo/tokenizer_utils.py by introducing a new_vocab_length variable to simplify embedding resizing and validation logic. Feedback indicates that the variables old_input_length, old_output_length, and old_config_size are now unused and should be removed to maintain code cleanliness.

gemini-code-assist · 2026-05-14T17:54:33Z

    # Add tokens!
    old_length = len(tokenizer)
    tokenizer.add_tokens(new_tokens)
+    new_vocab_length = len(tokenizer)


The variables old_input_length, old_output_length, and old_config_size (defined at lines 111-113) are no longer used after this change. They should be removed to maintain code cleanliness and avoid confusion for future maintainers.

* fix CI fallout from MLX subpackage refactor (#634) Three small fixes triggered by tests on PRs #645 / unsloth#5426: 1) tests/test_upstream_pinned_symbols_accelerator.py read the legacy flat unsloth_zoo/mlx_trainer.py and mlx_loader.py paths directly. #634 moved those to unsloth_zoo/mlx/trainer.py and mlx/loader.py; point the tests at the new paths so they exercise the post-refactor sources. 2) unsloth_zoo/vision_utils.py:68 did an unconditional top-level import torchvision while torchvision is not a declared dependency. On a CPU-only install that lacks torchvision (e.g. zoo's Core CI matrix), every import of unsloth_zoo.vision_utils raises ModuleNotFoundError. Guard the import so the module loads; existing _read_video_torchvision call sites already only fire when video reading is requested. 3) tests/test_zoo_source_upstream_refs.py tries to resolve transformers qwen2_vl / qwen2_5_vl image-processing modules, both of which transitively `import torchvision`. Skip via pytest.importorskip when torchvision is not installed instead of failing the run. Each fix is independent and minimal; no behavior change for installs that have torchvision (the eager import path is identical), and no behavior change for the MLX trainer / loader (the tests still pin the exact same source-level invariants on the new file paths). * tests/conftest: also stub unsloth.device_type on CPU-only runners The post-#634 conftest stubs ``unsloth_zoo.device_type`` so importing the package on a CPU-only CI runner doesn't trip ``get_device_type()``. ``test_unsloth_trainer_exec_marker`` (and any future test that does ``import unsloth.*``) goes one level up: it imports ``unsloth.trainer``, which traverses ``unsloth/__init__.py`` → ``_gpu_init`` → ``unsloth/ device_type.py:get_device_type()``, raising the same NotImplementedError zoo's stub was meant to suppress. Parameterise ``_preload_real_device_type(package, prereqs)`` so the same harness works for both packages, factor the fallback into ``_install_device_type_stub(name)``, and call both. unsloth's ``get_device_type()`` consumes ``unsloth_zoo`` helpers but has no intra-unsloth prereqs, hence ``prereqs=()`` there. * tests/conftest: don't stub unsloth.device_type after all The previous commit (196a78b) stubbed unsloth.device_type so test_unsloth_trainer_exec_marker could complete its ``import unsloth.trainer`` smoke. That makes ``import unsloth`` succeed on CPU-only CI, which then runs ``unsloth/_gpu_init.py:_patch_trl_trainer()`` and rebinds ``trl.trainer.sft_trainer.SFTTrainer`` / ``transformers.models.ministral.MinistralAttention`` to Unsloth's compiled wrappers. ``inspect.getsource(...)`` on those classes follows the wrapper's ``__init__.__code__.co_filename`` and returns the rewritten source -- which doesn't contain the literals the drift detectors anchor on (``self._signature_columns``, ``self._prepare_dataset(``, ``class SFTTrainer``, named ``hidden_states``/``attention_mask``). The HF=4.57.6 job went from -1 pre-existing failure (``test_unsloth_trainer_exec_marker``, also failing on main) to -4 new failures across ``test_MinistralAttention_forward_signature`` and the three ``test_unsloth_rl_trainer_*`` checks. That's a regression. Keep parameterised ``_preload_real_device_type`` and ``_install_device_type_stub`` -- they're harmless infrastructure and useful if a future test needs the unsloth.device_type stub specifically. Just don't fire the second preload at session setup. ``test_unsloth_trainer_exec_marker`` will continue to fail on CPU-only CI as it does on main; that's a separate pre-existing issue with ``unsloth/device_type.py`` raising on accelerator-less hosts and should be fixed upstream in unsloth itself, not papered over in zoo's conftest. * device_type: mirror UNSLOTH_ALLOW_CPU=1 gate from unsloth #5429 unsloth/_gpu_init.py:125 imports DEVICE_TYPE from unsloth_zoo, not local unsloth/device_type.py, so the env-var gate has to live here too for `import unsloth.trainer` to succeed on CPU-only CI. The companion unsloth PR landed as cb15a7a (#5429). Pairs with the conftest change in this branch that sets UNSLOTH_ALLOW_CPU=1 before triggering `import unsloth` from _apply_upstream_import_fixes_for_tests. @functools.cache on get_device_type plus the module-level DEVICE_TYPE = get_device_type() assignment mean the os.environ.get runs exactly once per process. Production hosts with a real accelerator hit the `cuda`/`xpu`/`hip` branches above and never reach the env-var short-circuit. * prod: graceful no-op for two transformers 5.x runtime bugs Two upstream changes broke zoo at runtime (not just CI drift detection): 1. transformers 5.x removed transformers.cache_utils.HybridCache. zoo's utils.py falls back to `HybridCache = typing.Any`, which made `gemma.py:260 isinstance(past_key_values, HybridCache)` raise TypeError ("isinstance() arg 2 must be a type, a tuple of types, or a union") at runtime on the gemma mask path. Fix: expose HAS_HYBRID_CACHE alongside the existing HybridCache fallback in utils.py, and gate the isinstance call in gemma.py:260 on the flag. When HybridCache is genuinely missing, the elif chain falls through to the dynamic-cache branch -- same observable behavior as the pre-5.x path that never matched the type. 2. transformers 5.x rewrote PreTrainedModel.save_pretrained. The source-string anchors zoo's LoRA-merge-on-save optimization patches in `merge_and_dequantize_lora` (state_dict_split assignment, `state_dict[tensor].contiguous()`, `def save_pretrained`, and the filename_to_tensors for-loop on the push_to_hub path) are all gone. Each downstream `raise RuntimeError("Failed to find ...")` would fire, killing `merge_and_dequantize_lora(push_to_hub=True)` on 5.x. Fix: scan all required anchors upfront in saving_utils.py. If any are missing on installed transformers, emit a one-time warnings.warn and fall back to vanilla `model.save_pretrained(...)`. The end-user sees the warning, no crash, and is told to call `model.merge_and_unload()` (or equivalent) before saving if they want merged LoRA weights on disk. The per-anchor RuntimeError raises downstream are kept as defense-in-depth for partial drift (one anchor renamed, others intact) that the upfront check might miss. The two corresponding drift detectors in tests/ are rewritten in the companion commit as positive assertions: 4.57.6 keeps the strict existence check; 5.x asserts the anchor is gone AND zoo's fallback covers it (`HAS_HYBRID_CACHE` flag / `_required_anchors` list). * tests: version-gate transformers-5.x drift detectors Sixteen drift detectors in the Core HF=default / HF=latest matrix slices were pointing at upstream symbols / source strings that transformers 5.x intentionally removed: * 9 `temporary_patches/*_for_causal_lm_forward_named_params` -- cache_position moved from named forward param into **kwargs: Unpack[TransformersKwargs] (DeepseekV3, GptOss, CsmDepth, Csm, Qwen3Moe, Qwen3Next, Qwen3VLMoe, MinistralModel, GraniteMoeHybrid). * 4 compiler/source rewriter probes -- output_attentions branching removed, MoE routing_weights cast refactored, Trainer is_torch_tpu_available/is_torch_xla_available both removed, _update_causal_mask hasattr probe all-False. * 1 routing_weights.to substring probe in compiler_rewriter_exhaustive. * 1 GptOssConfig dedent-compare -- rope_theta/initial_context_length/ rope_scaling collapsed into rope_parameters dict. * 1 GptOssConfig kwarg signature -- same rope_theta rename. For all 16, the zoo runtime patch already gracefully no-ops on 5.x (verified by the Explore audit: try/except + relaxed patch_function + hasattr guards + str.replace's silent no-op-on-miss). Skip the detector on 5.x with a clear "<symbol> removed on transformers {version}" message; keep it strict on 4.57.6 where real drift could still surface. Two more drift detectors -- test_hybrid_cache_class_present and the two saving_utils.py pinned-string tests -- become positive assertions: on 5.x they confirm zoo's prod fix correctly identifies the missing anchor (HAS_HYBRID_CACHE is False / `_required_anchors` covers the missing strings) and the graceful fallback is wired up. Conftest also sets UNSLOTH_ALLOW_CPU=1 before the existing `import unsloth` trigger, which unblocks `test_unsloth_trainer_exec_marker` on CPU-only CI runners (the companion unsloth PR #5429 lets that import succeed without an accelerator). * tests: also gate test_MinistralAttention_forward_signature on 5.x Missed this one in 18c9f62. Same root cause as MinistralModel: transformers 5.x reflowed MinistralAttention.forward, zoo's ``patch_function(..., match_level='relaxed')`` falls back to a ``(self, *args, **kwargs)`` wrapper, the named-param probe sees the wrapper signature and drift-fails. Runtime call still works because the wrapper forwards via kwargs. Skip on 5.x, keep strict on 4.57.6. * tests: probe Ministral stash + skip relaxed-wrapper case Zoo's ministral.py:94-96 wraps the actual implementation in a generic ``def forward(self, *args, **kwargs): return _full_forward(...)`` adapter before calling patch_function -- this lets `check_args_kwargs` accept params transformers 5.x removed (cache_position) by routing them through **kwargs. The side effect: `inspect.signature( MinistralAttention.forward)` after patching shows the wrapper, not the real named params. This test used to pass only because the test runner couldn't fully trigger zoo's TEMPORARY_PATCHES loop (`unsloth/models/_utils.py:580` runs at import time but unsloth import was being silently aborted). With UNSLOTH_ALLOW_CPU=1 now making `import unsloth` succeed, the patch fires correctly and the live signature is the wrapper. Fix: probe the `_original_modeling_ministral_MinistralAttention_forward` stash that `patch_function(store_original=True)` writes, falling back to the live attribute. If the live attr is the relaxed wrapper and there's no stash, the named-param probe isn't meaningful -- the runtime kwargs contract is enforced elsewhere. Skip with a clear message. 4.57.6: stash is present and the named-param probe still runs strict; real drift in the upstream signature still surfaces. * tests: gate qwen2_5_vl image-processor drift on 5.x transformers 5.x dropped the slow image processors entirely (image_processing_qwen2_5_vl.py and image_processing_qwen2_5_vl_fast.py are both gone in v5.5.0). Qwen2.5-VL now reuses Qwen2VLImageProcessor directly. zoo's misc.py:1500-1506 patch site for Qwen2_5_VLImageProcessor is try/except ImportError-wrapped, so it silently no-ops on 5.x and the runtime shim still fires via the Qwen2VLImageProcessor patch at misc.py:1485-1498 (the class Qwen2.5-VL inherits at runtime). The Qwen2.5-VL image-processor path was previously hidden by ``pytest.importorskip("torchvision")`` -- PR #648 added torchvision to the upstream-regression matrix install, which unmasked the drift. Runtime is unaffected. Add the same 5.x skip pattern used by the other 16 gated detectors so the dead-code patch site doesn't block CI. 4.57.6 keeps the strict path-existence check.

danielhanchen · 2026-05-17T11:17:15Z

Thanks for the PR - appreciate it!

fix embedding matrix size mismatch bug

a0fbebd

CodeMan62 requested review from danielhanchen and mmathew23 as code owners May 14, 2026 17:51

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 14, 2026

View reviewed changes

danielhanchen mentioned this pull request May 14, 2026

fix CI fallout from MLX subpackage refactor (#634) #646

Merged

danielhanchen merged commit fc983dd into unslothai:main May 17, 2026
8 of 11 checks passed

Brishen pushed a commit to Brishen/unsloth-zoo that referenced this pull request May 19, 2026

fix embedding matrix size mismatch bug (unslothai#645)

b19f492

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix embedding matrix size mismatch bug#645

fix embedding matrix size mismatch bug#645
danielhanchen merged 1 commit into
unslothai:mainfrom
CodeMan62:new-token-bug

CodeMan62 commented May 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 14, 2026

Uh oh!

Uh oh!

danielhanchen commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

CodeMan62 commented May 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

danielhanchen commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants