Skip to content

[Bug] Unable to resume training from existing checkpoint [QWEN3.6] #5165

@imezx

Description

@imezx

Note: Please do not remove the questions. Answer beside them.

  1. Did you update? pip install --upgrade unsloth unsloth_zoo => Yes, uv pip install -U "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo"
  2. Colab or Kaggle or local / cloud => Colab
  3. Number GPUs used, use nvidia-smi => 1x GPU (G4)
  4. Which notebook? Please link! => private
  5. Which Unsloth version, TRL version, transformers version, PyTorch version? => unsloth==2026.4.8, accelerate==1.13.0, peft=0.19.1, trl==0.24.0, transformers==5.7.0.dev0
  6. Which trainer? SFTTrainer, GRPOTrainer etc => UnslothTrainer

MODEL: 0xSero/Qwen3.6-28B-REAP
It's a REAP version of Qwen3.6-35B-A3B model.

Installation setup:

!uv pip install -U --force-reinstall "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo"
!uv pip install --no-deps xformers trl peft accelerate bitsandbytes tokenizers
!uv pip uninstall torch torchvision torchaudio torchcodec
!uv pip install -U \
  "torch==2.11.0" \
  "torchvision==0.26.0" \
  "torchaudio==2.11.0" \
  "torchcodec==0.11" \
  --index-url https://download.pytorch.org/whl/cu128
!uv pip install -U git+https://github.com/huggingface/transformers.git

Installation output:

Using Python 3.12.13 environment at: /usr
Resolved 99 packages in 1.66s
Prepared 99 packages in 10.24s
Uninstalled 99 packages in 1.41s
Installed 99 packages in 167ms
 ~ accelerate==1.13.0
 ~ aiohappyeyeballs==2.6.1
 ~ aiohttp==3.13.5
 ~ aiosignal==1.4.0
 ~ annotated-doc==0.0.4
 ~ annotated-types==0.7.0
 ~ anyio==4.13.0
 ~ attrs==26.1.0
 ~ bitsandbytes==0.49.2
 ~ certifi==2026.4.22
 ~ charset-normalizer==3.4.7
 ~ click==8.3.3
 - cuda-bindings==12.9.6
 + cuda-bindings==12.9.4
 ~ cuda-pathfinder==1.5.3
 ~ cut-cross-entropy==25.1.1
 ~ datasets==4.3.0
 ~ dill==0.4.0
 ~ docstring-parser==0.18.0
 ~ filelock==3.29.0
 ~ frozenlist==1.8.0
 - fsspec==2026.3.0
 + fsspec==2025.9.0
 ~ h11==0.16.0
 ~ hf-transfer==0.1.9
 ~ hf-xet==1.4.3
 ~ httpcore==1.0.9
 ~ httpx==0.28.1
 ~ huggingface-hub==1.11.0
 ~ idna==3.13
 ~ jinja2==3.1.6
 ~ joblib==1.5.3
 ~ markdown-it-py==4.0.0
 ~ markupsafe==3.0.3
 ~ mdurl==0.1.2
 ~ mpmath==1.3.0
 ~ msgspec==0.21.1
 ~ multidict==6.7.1
 ~ multiprocess==0.70.16
 ~ nest-asyncio==1.6.0
 ~ networkx==3.6.1
 ~ numpy==2.4.4
 ~ nvidia-cublas-cu12==12.8.4.1
 ~ nvidia-cuda-cupti-cu12==12.8.90
 ~ nvidia-cuda-nvrtc-cu12==12.8.93
 ~ nvidia-cuda-runtime-cu12==12.8.90
 - nvidia-cudnn-cu12==9.19.0.56
 + nvidia-cudnn-cu12==9.10.2.21
 ~ nvidia-cufft-cu12==11.3.3.83
 ~ nvidia-cufile-cu12==1.13.1.3
 ~ nvidia-curand-cu12==10.3.9.90
 ~ nvidia-cusolver-cu12==11.7.3.90
 ~ nvidia-cusparse-cu12==12.5.8.93
 ~ nvidia-cusparselt-cu12==0.7.1
 - nvidia-nccl-cu12==2.28.9
 + nvidia-nccl-cu12==2.27.5
 ~ nvidia-nvjitlink-cu12==12.8.93
 ~ nvidia-nvshmem-cu12==3.4.5
 ~ nvidia-nvtx-cu12==12.8.90
 ~ packaging==26.1
 ~ pandas==3.0.2
 ~ peft==0.19.1
 ~ pillow==12.2.0
 ~ propcache==0.4.1
 ~ protobuf==7.34.1
 ~ psutil==7.2.2
 ~ pyarrow==24.0.0
 ~ pydantic==2.13.3
 ~ pydantic-core==2.46.3
 ~ pygments==2.20.0
 ~ python-dateutil==2.9.0.post0
 ~ pyyaml==6.0.3
 ~ regex==2026.4.4
 ~ requests==2.33.1
 ~ rich==15.0.0
 ~ safetensors==0.7.0
 ~ scikit-learn==1.8.0
 ~ scipy==1.17.1
 ~ sentence-transformers==5.4.1
 ~ sentencepiece==0.2.1
 - setuptools==81.0.0
 + setuptools==82.0.1
 ~ shellingham==1.5.4
 ~ six==1.17.0
 ~ sympy==1.14.0
 ~ threadpoolctl==3.6.0
 ~ tokenizers==0.22.2
 - torch==2.11.0+cu128
 + torch==2.10.0
 ~ torchao==0.17.0
 ~ tqdm==4.67.3
 - transformers==5.7.0.dev0 (from git+https://github.com/huggingface/transformers.git@5cf79514dcc6231f5a53c74def7d6847c5aea78c)
 + transformers==5.5.0
 ~ triton==3.6.0
 ~ trl==0.24.0
 ~ typeguard==4.5.1
 ~ typer==0.24.2
 ~ typing-extensions==4.15.0
 ~ typing-inspection==0.4.2
 ~ tyro==1.0.13
 - unsloth==2026.4.8
 + unsloth==2026.4.8 (from git+https://github.com/unslothai/unsloth.git@5c473fab80e079bb525345b86cb71afd409262c3)
 - unsloth-zoo==2026.4.9
 + unsloth-zoo==2026.4.9 (from git+https://github.com/unslothai/unsloth-zoo@46d587c46e21ef9382449209e44d1f2b535fea73)
 ~ urllib3==2.6.3
 ~ wheel==0.47.0
 ~ xxhash==3.6.0
 ~ yarl==1.23.0
Using Python 3.12.13 environment at: /usr
Checked 6 packages in 50ms
Using Python 3.12.13 environment at: /usr
Uninstalled 4 packages in 139ms
 - torch==2.10.0
 - torchaudio==2.11.0+cu128
 - torchcodec==0.11.0+cu128
 - torchvision==0.26.0+cu128
Using Python 3.12.13 environment at: /usr
Resolved 34 packages in 1.05s
Prepared 8 packages in 0.30ms
Uninstalled 4 packages in 5ms
Installed 8 packages in 129ms
 - fsspec==2025.9.0
 + fsspec==2026.2.0
 - nvidia-cudnn-cu12==9.10.2.21
 + nvidia-cudnn-cu12==9.19.0.56
 - nvidia-nccl-cu12==2.27.5
 + nvidia-nccl-cu12==2.28.9
 - setuptools==82.0.1
 + setuptools==70.2.0
 + torch==2.11.0+cu128
 + torchaudio==2.11.0+cu128
 + torchcodec==0.11.0+cu128
 + torchvision==0.26.0+cu128
Using Python 3.12.13 environment at: /usr
Resolved 27 packages in 1.00s
Prepared 2 packages in 4ms
Uninstalled 2 packages in 35ms
Installed 2 packages in 33ms
 - fsspec==2026.2.0
 + fsspec==2026.3.0
 - transformers==5.5.0
 + transformers==5.7.0.dev0 (from git+https://github.com/huggingface/transformers.git@5cf79514dcc6231f5a53c74def7d6847c5aea78c)

Resume by load the checkpoint directly

import gc
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
os.environ["UNSLOTH_MOE_BACKEND"] = "grouped_mm"
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
os.environ["UNSLOTH_CE_LOSS_N_CHUNKS"] = "8"
os.environ["UNSLOTH_CE_LOSS_TARGET_GB"] = "1.5"

from unsloth import FastLanguageModel, UnslothTrainer, UnslothTrainingArguments
from unsloth.chat_templates import get_chat_template, train_on_responses_only
import torch
from trl import SFTTrainer, SFTConfig
from transformers import TrainingArguments

gc.collect()
max_seq_length = 8192

model, tokenizer = FastLanguageModel.from_pretrained(
    "/content/drive/MyDrive/outputs5/checkpoint-2200",
    max_seq_length = max_seq_length,
    load_in_4bit = False,
    load_in_8bit = False,
    load_in_16bit = True,
    fast_inference = False,
    full_finetuning = False,
)
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2026.4.8: Fast Qwen3_5_MoE patching. Transformers: 5.6.2.
   \\   /|    NVIDIA RTX PRO 6000 Blackwell Server Edition. Num GPUs = 1. Max memory: 94.971 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.11.0+cu128. CUDA: 12.0. CUDA Toolkit: 12.8. Triton: 3.6.0
\        /    Bfloat16 = TRUE. FA [Xformers = 0.0.35. FA2 = True]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Loading weights: 100% 693/693 [00:03<00:00, 170.27it/s]---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[/tmp/ipykernel_12327/3466095752.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in <cell line: 0>()
     20 max_seq_length = 8192
     21 
---> 22 model, tokenizer = FastLanguageModel.from_pretrained(
     23     "/content/drive/MyDrive/outputs5/checkpoint-2200",
     24     max_seq_length = max_seq_length,

6 frames[/usr/local/lib/python3.12/dist-packages/unsloth/models/loader.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, load_in_8bit, load_in_16bit, full_finetuning, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, use_exact_model_name, offload_embedding, float32_mixed_precision, fast_inference, gpu_memory_utilization, float8_kv_cache, random_state, max_lora_rank, disable_log_stats, qat_scheme, load_in_fp8, unsloth_tiled_mlp, *args, **kwargs)
    660         #     dispatch_model = FastGraniteModel
    661         else:
--> 662             return FastModel.from_pretrained(
    663                 model_name = old_model_name,
    664                 max_seq_length = max_seq_length,

[/usr/local/lib/python3.12/dist-packages/unsloth/models/loader.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, load_in_8bit, load_in_16bit, full_finetuning, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, return_logits, fullgraph, use_exact_model_name, auto_model, whisper_language, whisper_task, unsloth_force_compile, offload_embedding, float32_mixed_precision, fast_inference, gpu_memory_utilization, float8_kv_cache, random_state, max_lora_rank, disable_log_stats, qat_scheme, load_in_fp8, unsloth_tiled_mlp, target_parameters, *args, **kwargs)
   1598 
   1599             try:
-> 1600                 model = PeftModel.from_pretrained(
   1601                     model,
   1602                     old_model_name,

[/usr/local/lib/python3.12/dist-packages/peft/peft_model.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in from_pretrained(cls, model, model_id, adapter_name, is_trainable, config, autocast_adapter_dtype, ephemeral_gpu_offload, low_cpu_mem_usage, key_mapping, **kwargs)
    580             )
    581 
--> 582         load_result = model.load_adapter(
    583             model_id,
    584             adapter_name,

[/usr/local/lib/python3.12/dist-packages/peft/peft_model.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in load_adapter(self, model_id, adapter_name, is_trainable, torch_device, autocast_adapter_dtype, ephemeral_gpu_offload, low_cpu_mem_usage, key_mapping, **kwargs)
   1406         # load the weights into the model
   1407         ignore_mismatched_sizes = kwargs.get("ignore_mismatched_sizes", False)
-> 1408         load_result = set_peft_model_state_dict(
   1409             self,
   1410             adapters_weights,

[/usr/local/lib/python3.12/dist-packages/peft/utils/save_and_load.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in set_peft_model_state_dict(model, peft_model_state_dict, adapter_name, ignore_mismatched_sizes, low_cpu_mem_usage)
    642         from peft.utils.transformers_weight_conversion import convert_peft_adapter_state_dict_for_transformers
    643 
--> 644         state_dict = convert_peft_adapter_state_dict_for_transformers(
    645             model=model,
    646             peft_config=config,

[/usr/local/lib/python3.12/dist-packages/peft/utils/transformers_weight_conversion.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in convert_peft_adapter_state_dict_for_transformers(model, peft_config, adapter_state_dict, adapter_name)
    507 
    508     weight_conversions = get_model_conversion_mapping(model)
--> 509     peft_weight_conversions = build_peft_weight_mapping(
    510         weight_conversions, adapter_name=adapter_name, peft_config=peft_config
    511     )

[/usr/local/lib/python3.12/dist-packages/peft/utils/transformers_weight_conversion.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in build_peft_weight_mapping(weight_conversions, adapter_name, peft_config)
    266 
    267                 # Instantiate a new object that correctly post process patterns if needed
--> 268                 new_conversion = orig_conversion.__class__(
    269                     source_patterns=new_source_patterns,
    270                     target_patterns=new_target_patterns,

TypeError: WeightConverter.__init__() got an unexpected keyword argument 'distributed_operation'

I have also tried to install non-dev version for transformers and unsloth too (latest version) which also got the same thing.

resume through trainer:

trainer.train(resume_from_checkpoint="drive/MyDrive/outputs5/checkpoint-2200")

got:

[transformers] The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
[/tmp/ipykernel_5363/2561444859.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in <cell line: 0>()
      1 # Compilation can take 2-3 minutes of time, so please be patient :)
----> 2 trainer.train(resume_from_checkpoint="drive/MyDrive/outputs5/checkpoint-2200")
      3 # trainer.train()

6 frames[/content/unsloth_compiled_cache/UnslothSFTTrainer.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in wrapper(self, *args, **kwargs)
     82         if hasattr(self, 'model') and hasattr(self.model, "for_training"):
     83             self.model.for_training(use_gradient_checkpointing=use_gc)
---> 84         output = f(self, *args, **kwargs)
     85         # Restore previous mode when possible
     86         if hasattr(self, 'model') and hasattr(self.model, "for_inference"):

[/usr/local/lib/python3.12/dist-packages/transformers/trainer.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1413             # Deepspeed/FSDP models are loaded after prepare in _prepare_for_training.
   1414             if not is_sagemaker_mp_enabled() and not self.is_deepspeed_enabled and not self.is_fsdp_enabled:
-> 1415                 self._load_from_checkpoint(resume_from_checkpoint)
   1416             state = TrainerState.load_from_json(os.path.join(resume_from_checkpoint, TRAINER_STATE_NAME))
   1417             if state.train_batch_size is not None and args.auto_find_batch_size:

[/usr/local/lib/python3.12/dist-packages/transformers/trainer.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in _load_from_checkpoint(self, resume_from_checkpoint, model)
   3398                         model.set_adapter(active_adapter)
   3399                     else:
-> 3400                         model.load_adapter(resume_from_checkpoint, active_adapter, is_trainable=True)
   3401                 else:
   3402                     logger.warning(

[/usr/local/lib/python3.12/dist-packages/peft/peft_model.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in load_adapter(self, model_id, adapter_name, is_trainable, torch_device, autocast_adapter_dtype, ephemeral_gpu_offload, low_cpu_mem_usage, key_mapping, **kwargs)
   1406         # load the weights into the model
   1407         ignore_mismatched_sizes = kwargs.get("ignore_mismatched_sizes", False)
-> 1408         load_result = set_peft_model_state_dict(
   1409             self,
   1410             adapters_weights,

[/usr/local/lib/python3.12/dist-packages/peft/utils/save_and_load.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in set_peft_model_state_dict(model, peft_model_state_dict, adapter_name, ignore_mismatched_sizes, low_cpu_mem_usage)
    642         from peft.utils.transformers_weight_conversion import convert_peft_adapter_state_dict_for_transformers
    643 
--> 644         state_dict = convert_peft_adapter_state_dict_for_transformers(
    645             model=model,
    646             peft_config=config,

[/usr/local/lib/python3.12/dist-packages/peft/utils/transformers_weight_conversion.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in convert_peft_adapter_state_dict_for_transformers(model, peft_config, adapter_state_dict, adapter_name)
    507 
    508     weight_conversions = get_model_conversion_mapping(model)
--> 509     peft_weight_conversions = build_peft_weight_mapping(
    510         weight_conversions, adapter_name=adapter_name, peft_config=peft_config
    511     )

[/usr/local/lib/python3.12/dist-packages/peft/utils/transformers_weight_conversion.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in build_peft_weight_mapping(weight_conversions, adapter_name, peft_config)
    266 
    267                 # Instantiate a new object that correctly post process patterns if needed
--> 268                 new_conversion = orig_conversion.__class__(
    269                     source_patterns=new_source_patterns,
    270                     target_patterns=new_target_patterns,

TypeError: WeightConverter.__init__() got an unexpected keyword argument 'distributed_operation'

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions