Note: Please do not remove the questions. Answer beside them.
!uv pip install -U --force-reinstall "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo"
!uv pip install --no-deps xformers trl peft accelerate bitsandbytes tokenizers
!uv pip uninstall torch torchvision torchaudio torchcodec
!uv pip install -U \
"torch==2.11.0" \
"torchvision==0.26.0" \
"torchaudio==2.11.0" \
"torchcodec==0.11" \
--index-url https://download.pytorch.org/whl/cu128
!uv pip install -U git+https://github.com/huggingface/transformers.git
Using Python 3.12.13 environment at: /usr
Resolved 99 packages in 1.66s
Prepared 99 packages in 10.24s
Uninstalled 99 packages in 1.41s
Installed 99 packages in 167ms
~ accelerate==1.13.0
~ aiohappyeyeballs==2.6.1
~ aiohttp==3.13.5
~ aiosignal==1.4.0
~ annotated-doc==0.0.4
~ annotated-types==0.7.0
~ anyio==4.13.0
~ attrs==26.1.0
~ bitsandbytes==0.49.2
~ certifi==2026.4.22
~ charset-normalizer==3.4.7
~ click==8.3.3
- cuda-bindings==12.9.6
+ cuda-bindings==12.9.4
~ cuda-pathfinder==1.5.3
~ cut-cross-entropy==25.1.1
~ datasets==4.3.0
~ dill==0.4.0
~ docstring-parser==0.18.0
~ filelock==3.29.0
~ frozenlist==1.8.0
- fsspec==2026.3.0
+ fsspec==2025.9.0
~ h11==0.16.0
~ hf-transfer==0.1.9
~ hf-xet==1.4.3
~ httpcore==1.0.9
~ httpx==0.28.1
~ huggingface-hub==1.11.0
~ idna==3.13
~ jinja2==3.1.6
~ joblib==1.5.3
~ markdown-it-py==4.0.0
~ markupsafe==3.0.3
~ mdurl==0.1.2
~ mpmath==1.3.0
~ msgspec==0.21.1
~ multidict==6.7.1
~ multiprocess==0.70.16
~ nest-asyncio==1.6.0
~ networkx==3.6.1
~ numpy==2.4.4
~ nvidia-cublas-cu12==12.8.4.1
~ nvidia-cuda-cupti-cu12==12.8.90
~ nvidia-cuda-nvrtc-cu12==12.8.93
~ nvidia-cuda-runtime-cu12==12.8.90
- nvidia-cudnn-cu12==9.19.0.56
+ nvidia-cudnn-cu12==9.10.2.21
~ nvidia-cufft-cu12==11.3.3.83
~ nvidia-cufile-cu12==1.13.1.3
~ nvidia-curand-cu12==10.3.9.90
~ nvidia-cusolver-cu12==11.7.3.90
~ nvidia-cusparse-cu12==12.5.8.93
~ nvidia-cusparselt-cu12==0.7.1
- nvidia-nccl-cu12==2.28.9
+ nvidia-nccl-cu12==2.27.5
~ nvidia-nvjitlink-cu12==12.8.93
~ nvidia-nvshmem-cu12==3.4.5
~ nvidia-nvtx-cu12==12.8.90
~ packaging==26.1
~ pandas==3.0.2
~ peft==0.19.1
~ pillow==12.2.0
~ propcache==0.4.1
~ protobuf==7.34.1
~ psutil==7.2.2
~ pyarrow==24.0.0
~ pydantic==2.13.3
~ pydantic-core==2.46.3
~ pygments==2.20.0
~ python-dateutil==2.9.0.post0
~ pyyaml==6.0.3
~ regex==2026.4.4
~ requests==2.33.1
~ rich==15.0.0
~ safetensors==0.7.0
~ scikit-learn==1.8.0
~ scipy==1.17.1
~ sentence-transformers==5.4.1
~ sentencepiece==0.2.1
- setuptools==81.0.0
+ setuptools==82.0.1
~ shellingham==1.5.4
~ six==1.17.0
~ sympy==1.14.0
~ threadpoolctl==3.6.0
~ tokenizers==0.22.2
- torch==2.11.0+cu128
+ torch==2.10.0
~ torchao==0.17.0
~ tqdm==4.67.3
- transformers==5.7.0.dev0 (from git+https://github.com/huggingface/transformers.git@5cf79514dcc6231f5a53c74def7d6847c5aea78c)
+ transformers==5.5.0
~ triton==3.6.0
~ trl==0.24.0
~ typeguard==4.5.1
~ typer==0.24.2
~ typing-extensions==4.15.0
~ typing-inspection==0.4.2
~ tyro==1.0.13
- unsloth==2026.4.8
+ unsloth==2026.4.8 (from git+https://github.com/unslothai/unsloth.git@5c473fab80e079bb525345b86cb71afd409262c3)
- unsloth-zoo==2026.4.9
+ unsloth-zoo==2026.4.9 (from git+https://github.com/unslothai/unsloth-zoo@46d587c46e21ef9382449209e44d1f2b535fea73)
~ urllib3==2.6.3
~ wheel==0.47.0
~ xxhash==3.6.0
~ yarl==1.23.0
Using Python 3.12.13 environment at: /usr
Checked 6 packages in 50ms
Using Python 3.12.13 environment at: /usr
Uninstalled 4 packages in 139ms
- torch==2.10.0
- torchaudio==2.11.0+cu128
- torchcodec==0.11.0+cu128
- torchvision==0.26.0+cu128
Using Python 3.12.13 environment at: /usr
Resolved 34 packages in 1.05s
Prepared 8 packages in 0.30ms
Uninstalled 4 packages in 5ms
Installed 8 packages in 129ms
- fsspec==2025.9.0
+ fsspec==2026.2.0
- nvidia-cudnn-cu12==9.10.2.21
+ nvidia-cudnn-cu12==9.19.0.56
- nvidia-nccl-cu12==2.27.5
+ nvidia-nccl-cu12==2.28.9
- setuptools==82.0.1
+ setuptools==70.2.0
+ torch==2.11.0+cu128
+ torchaudio==2.11.0+cu128
+ torchcodec==0.11.0+cu128
+ torchvision==0.26.0+cu128
Using Python 3.12.13 environment at: /usr
Resolved 27 packages in 1.00s
Prepared 2 packages in 4ms
Uninstalled 2 packages in 35ms
Installed 2 packages in 33ms
- fsspec==2026.2.0
+ fsspec==2026.3.0
- transformers==5.5.0
+ transformers==5.7.0.dev0 (from git+https://github.com/huggingface/transformers.git@5cf79514dcc6231f5a53c74def7d6847c5aea78c)
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))== Unsloth 2026.4.8: Fast Qwen3_5_MoE patching. Transformers: 5.6.2.
\\ /| NVIDIA RTX PRO 6000 Blackwell Server Edition. Num GPUs = 1. Max memory: 94.971 GB. Platform: Linux.
O^O/ \_/ \ Torch: 2.11.0+cu128. CUDA: 12.0. CUDA Toolkit: 12.8. Triton: 3.6.0
\ / Bfloat16 = TRUE. FA [Xformers = 0.0.35. FA2 = True]
"-____-" Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Loading weights: 100% 693/693 [00:03<00:00, 170.27it/s]---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[/tmp/ipykernel_12327/3466095752.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in <cell line: 0>()
20 max_seq_length = 8192
21
---> 22 model, tokenizer = FastLanguageModel.from_pretrained(
23 "/content/drive/MyDrive/outputs5/checkpoint-2200",
24 max_seq_length = max_seq_length,
6 frames[/usr/local/lib/python3.12/dist-packages/unsloth/models/loader.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, load_in_8bit, load_in_16bit, full_finetuning, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, use_exact_model_name, offload_embedding, float32_mixed_precision, fast_inference, gpu_memory_utilization, float8_kv_cache, random_state, max_lora_rank, disable_log_stats, qat_scheme, load_in_fp8, unsloth_tiled_mlp, *args, **kwargs)
660 # dispatch_model = FastGraniteModel
661 else:
--> 662 return FastModel.from_pretrained(
663 model_name = old_model_name,
664 max_seq_length = max_seq_length,
[/usr/local/lib/python3.12/dist-packages/unsloth/models/loader.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in from_pretrained(model_name, max_seq_length, dtype, load_in_4bit, load_in_8bit, load_in_16bit, full_finetuning, token, device_map, rope_scaling, fix_tokenizer, trust_remote_code, use_gradient_checkpointing, resize_model_vocab, revision, return_logits, fullgraph, use_exact_model_name, auto_model, whisper_language, whisper_task, unsloth_force_compile, offload_embedding, float32_mixed_precision, fast_inference, gpu_memory_utilization, float8_kv_cache, random_state, max_lora_rank, disable_log_stats, qat_scheme, load_in_fp8, unsloth_tiled_mlp, target_parameters, *args, **kwargs)
1598
1599 try:
-> 1600 model = PeftModel.from_pretrained(
1601 model,
1602 old_model_name,
[/usr/local/lib/python3.12/dist-packages/peft/peft_model.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in from_pretrained(cls, model, model_id, adapter_name, is_trainable, config, autocast_adapter_dtype, ephemeral_gpu_offload, low_cpu_mem_usage, key_mapping, **kwargs)
580 )
581
--> 582 load_result = model.load_adapter(
583 model_id,
584 adapter_name,
[/usr/local/lib/python3.12/dist-packages/peft/peft_model.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in load_adapter(self, model_id, adapter_name, is_trainable, torch_device, autocast_adapter_dtype, ephemeral_gpu_offload, low_cpu_mem_usage, key_mapping, **kwargs)
1406 # load the weights into the model
1407 ignore_mismatched_sizes = kwargs.get("ignore_mismatched_sizes", False)
-> 1408 load_result = set_peft_model_state_dict(
1409 self,
1410 adapters_weights,
[/usr/local/lib/python3.12/dist-packages/peft/utils/save_and_load.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in set_peft_model_state_dict(model, peft_model_state_dict, adapter_name, ignore_mismatched_sizes, low_cpu_mem_usage)
642 from peft.utils.transformers_weight_conversion import convert_peft_adapter_state_dict_for_transformers
643
--> 644 state_dict = convert_peft_adapter_state_dict_for_transformers(
645 model=model,
646 peft_config=config,
[/usr/local/lib/python3.12/dist-packages/peft/utils/transformers_weight_conversion.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in convert_peft_adapter_state_dict_for_transformers(model, peft_config, adapter_state_dict, adapter_name)
507
508 weight_conversions = get_model_conversion_mapping(model)
--> 509 peft_weight_conversions = build_peft_weight_mapping(
510 weight_conversions, adapter_name=adapter_name, peft_config=peft_config
511 )
[/usr/local/lib/python3.12/dist-packages/peft/utils/transformers_weight_conversion.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in build_peft_weight_mapping(weight_conversions, adapter_name, peft_config)
266
267 # Instantiate a new object that correctly post process patterns if needed
--> 268 new_conversion = orig_conversion.__class__(
269 source_patterns=new_source_patterns,
270 target_patterns=new_target_patterns,
TypeError: WeightConverter.__init__() got an unexpected keyword argument 'distributed_operation'
I have also tried to install non-dev version for transformers and unsloth too (latest version) which also got the same thing.
[transformers] The tokenizer has new PAD/BOS/EOS tokens that differ from the model config and generation config. The model config and generation config were aligned accordingly, being updated with the tokenizer's values. Updated tokens: {'bos_token_id': None}.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
[/tmp/ipykernel_5363/2561444859.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in <cell line: 0>()
1 # Compilation can take 2-3 minutes of time, so please be patient :)
----> 2 trainer.train(resume_from_checkpoint="drive/MyDrive/outputs5/checkpoint-2200")
3 # trainer.train()
6 frames[/content/unsloth_compiled_cache/UnslothSFTTrainer.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in wrapper(self, *args, **kwargs)
82 if hasattr(self, 'model') and hasattr(self.model, "for_training"):
83 self.model.for_training(use_gradient_checkpointing=use_gc)
---> 84 output = f(self, *args, **kwargs)
85 # Restore previous mode when possible
86 if hasattr(self, 'model') and hasattr(self.model, "for_inference"):
[/usr/local/lib/python3.12/dist-packages/transformers/trainer.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval)
1413 # Deepspeed/FSDP models are loaded after prepare in _prepare_for_training.
1414 if not is_sagemaker_mp_enabled() and not self.is_deepspeed_enabled and not self.is_fsdp_enabled:
-> 1415 self._load_from_checkpoint(resume_from_checkpoint)
1416 state = TrainerState.load_from_json(os.path.join(resume_from_checkpoint, TRAINER_STATE_NAME))
1417 if state.train_batch_size is not None and args.auto_find_batch_size:
[/usr/local/lib/python3.12/dist-packages/transformers/trainer.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in _load_from_checkpoint(self, resume_from_checkpoint, model)
3398 model.set_adapter(active_adapter)
3399 else:
-> 3400 model.load_adapter(resume_from_checkpoint, active_adapter, is_trainable=True)
3401 else:
3402 logger.warning(
[/usr/local/lib/python3.12/dist-packages/peft/peft_model.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in load_adapter(self, model_id, adapter_name, is_trainable, torch_device, autocast_adapter_dtype, ephemeral_gpu_offload, low_cpu_mem_usage, key_mapping, **kwargs)
1406 # load the weights into the model
1407 ignore_mismatched_sizes = kwargs.get("ignore_mismatched_sizes", False)
-> 1408 load_result = set_peft_model_state_dict(
1409 self,
1410 adapters_weights,
[/usr/local/lib/python3.12/dist-packages/peft/utils/save_and_load.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in set_peft_model_state_dict(model, peft_model_state_dict, adapter_name, ignore_mismatched_sizes, low_cpu_mem_usage)
642 from peft.utils.transformers_weight_conversion import convert_peft_adapter_state_dict_for_transformers
643
--> 644 state_dict = convert_peft_adapter_state_dict_for_transformers(
645 model=model,
646 peft_config=config,
[/usr/local/lib/python3.12/dist-packages/peft/utils/transformers_weight_conversion.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in convert_peft_adapter_state_dict_for_transformers(model, peft_config, adapter_state_dict, adapter_name)
507
508 weight_conversions = get_model_conversion_mapping(model)
--> 509 peft_weight_conversions = build_peft_weight_mapping(
510 weight_conversions, adapter_name=adapter_name, peft_config=peft_config
511 )
[/usr/local/lib/python3.12/dist-packages/peft/utils/transformers_weight_conversion.py](https://colab.research.google.com/drive/1XBWafB-2pHhM7cjGUMjlii_UNM9RhRfg#) in build_peft_weight_mapping(weight_conversions, adapter_name, peft_config)
266
267 # Instantiate a new object that correctly post process patterns if needed
--> 268 new_conversion = orig_conversion.__class__(
269 source_patterns=new_source_patterns,
270 target_patterns=new_target_patterns,
TypeError: WeightConverter.__init__() got an unexpected keyword argument 'distributed_operation'
Note: Please do not remove the questions. Answer beside them.
pip install --upgrade unsloth unsloth_zoo=> Yes,uv pip install -U "unsloth[colab-new] @ git+https://github.com/unslothai/unsloth.git" "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo"ColaborKaggleor local / cloud => Colabnvidia-smi=> 1x GPU (G4)unsloth==2026.4.8,accelerate==1.13.0,peft=0.19.1,trl==0.24.0,transformers==5.7.0.dev0SFTTrainer,GRPOTraineretc =>UnslothTrainerMODEL: 0xSero/Qwen3.6-28B-REAP
It's a REAP version of Qwen3.6-35B-A3B model.
Installation setup:
Installation output:
Resume by load the checkpoint directly
I have also tried to install non-dev version for transformers and unsloth too (latest version) which also got the same thing.
resume through trainer:
got: