Skip to content

TypeError: argument of type 'NoneType' is not iterable when merging weights to 16bit and pushing to hub #666

@premsa

Description

@premsa

hey guys, I get the following error message, after successfully fine-tuning when trying to merge weights and push to hub:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/x/.venv/lib/python3.10/site-packages/unsloth/save.py", line 1211, in unsloth_push_to_hub_merged
    unsloth_save_model(**arguments)
  File "/x/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/x/.venv/lib/python3.10/site-packages/unsloth/save.py", line 686, in unsloth_save_model
    internal_model.save_pretrained(**save_pretrained_settings)
  File "/x/.venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2634, in save_pretrained
    model_card = create_and_tag_model_card(
  File "/x/projects/mistral-finetune/.venv/lib/python3.10/site-packages/transformers/utils/hub.py", line 1144, in create_and_tag_model_card
    if model_tag not in model_card.data.tags:
TypeError: argument of type 'NoneType' is not iterable
from unsloth import FastLanguageModel
import torch
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported


from utils import dataset 

max_seq_length = 1048

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/mistral-7b-instruct-v0.3", # Choose ANY! eg teknium/OpenHermes-2.5-Mistral-7B
    max_seq_length = 1048,
    dtype = None,
    load_in_4bit = True,
    )

model = FastLanguageModel.get_peft_model(
    model,
    r = 16, # Choose any number > 0 ! Suggested 8, 16, 32, 64, 128
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj",],
    lora_alpha = 16,
    lora_dropout = 0, # Supports any, but = 0 is optimized
    bias = "none",    # Supports any, but = "none" is optimized
    # [NEW] "unsloth" uses 30% less VRAM, fits 2x larger batch sizes!
    use_gradient_checkpointing = "unsloth", # True or "unsloth" for very long context
    random_state = 239,
    use_rslora = False,  # We support rank stabilized LoRA
    loftq_config = None, # And LoftQ
    )


trainer = SFTTrainer(
    model = model,
    tokenizer = tokenizer,
    train_dataset = dataset,
    dataset_text_field = "text",
    max_seq_length = max_seq_length,
    dataset_num_proc = 2,
    packing = False, # Can make training 5x faster for short sequences.
    args = TrainingArguments(
        per_device_train_batch_size = 10,
        gradient_accumulation_steps = 4,
        warmup_steps = 5,
        #num_train_epochs = 1, 
        max_steps = 1, # Set num_train_epochs = 1 for full training runs
        learning_rate = 2e-4,
        fp16 = not is_bfloat16_supported(),
        bf16 = is_bfloat16_supported(),
        logging_steps = 1,
        optim = "adamw_8bit",
        weight_decay = 0.01,
        lr_scheduler_type = "linear",
        seed = 239,
        output_dir = "outputs",
    ),
)

trainer_stats = trainer.train()


model.push_to_hub_merged("user/this-is-my-project", tokenizer, save_method = "merged_16bit", token)

The above code create the config files, but fails before the weights are stored.

When saving the adapter without merging, the script does not fail and stores the adapter weights.

model.push_to_hub("user/this-is-my-project", token = token) 
tokenizer.push_to_hub("user/this-is-my-project", token = token) 

My environment:

accelerate==0.31.0
aiohttp==3.9.5
aiosignal==1.3.1
async-timeout==4.0.3
attrs==23.2.0
bitsandbytes==0.43.1
certifi==2024.6.2
charset-normalizer==3.3.2
datasets==2.20.0
dill==0.3.7
docstring_parser==0.16
einops==0.8.0
filelock==3.15.1
flash-attn==2.5.9.post1
frozenlist==1.4.1
fsspec==2024.5.0
huggingface-hub==0.23.4
idna==3.7
Jinja2==3.1.4
markdown-it-py==3.0.0
MarkupSafe==2.1.5
mdurl==0.1.2
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.15
networkx==3.3
ninja==1.11.1.1
numpy==2.0.0
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.5.40
nvidia-nvtx-cu12==12.1.105
packaging==24.1
pandas==2.2.2
peft==0.11.1
protobuf==3.20.3
psutil==5.9.8
pyarrow==16.1.0
pyarrow-hotfix==0.6
Pygments==2.18.0
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
regex==2024.5.15
requests==2.32.3
rich==13.7.1
safetensors==0.4.3
sentencepiece==0.2.0
shtab==1.7.1
six==1.16.0
sympy==1.12.1
tokenizers==0.19.1
torch==2.3.0
tqdm==4.66.4
transformers==4.41.2
triton==2.3.0
trl==0.8.6
typing_extensions==4.12.2
tyro==0.8.4
tzdata==2024.1
unsloth @ git+https://github.com/unslothai/unsloth.git@87703089fa0ad60f008b7a7990f5cf3e77ccd26e
urllib3==2.2.2
xformers==0.0.26.post1
xxhash==3.4.1
yarl==1.9.4

Any ideas what could be going wrong?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions