/usr/local/lib/python3.10/dist-packages/unsloth/chat_templates.py in <listcomp>(.0)
1714 substring = _longest_common_substring([str(x + [0]) for x in all_input_ids])
1715 substring = substring.split(", ")[:-1]
-> 1716 substring = [int(x) for x in substring]
1717
1718 # Also get rest of tokenized string
ValueError: invalid literal for int() with base 10: ''
trainer_stats = trainer.train()
^^^^^^^^^^^^^^^
File "<string>", line 145, in train
File "<string>", line 320, in _fast_inner_training_loop
File "/home/user/unsloth_env/lib/python3.11/site-packages/accelerate/data_loader.py", line 550, in __iter__
current_batch = next(dataloader_iter)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/unsloth_env/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
data = self._next_data()
^^^^^^^^^^^^^^^^^
File "/home/user/unsloth_env/lib/python3.11/site-packages/torch/utils/data/dataloader.py", line 673, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/unsloth_env/lib/python3.11/site-packages/torch/utils/data/_utils/fetch.py", line 55, in fetch
return self.collate_fn(data)
^^^^^^^^^^^^^^^^^^^^^
File "/home/user/unsloth_env/lib/python3.11/site-packages/transformers/data/data_collator.py", line 45, in __call__
return self.torch_call(features)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/unsloth_env/lib/python3.11/site-packages/transformers/data/data_collator.py", line 806, in torch_call
batch = pad_without_fast_tokenizer_warning(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/unsloth_env/lib/python3.11/site-packages/transformers/data/data_collator.py", line 66, in pad_without_fast_tokenizer_warning
padded = tokenizer.pad(*pad_args, **pad_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/unsloth_env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3560, in pad
return BatchEncoding(batch_outputs, tensor_type=return_tensors)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/unsloth_env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 227, in __init__
self.convert_to_tensors(tensor_type=tensor_type, prepend_batch_axis=prepend_batch_axis)
File "/home/user/unsloth_env/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 778, in convert_to_tensors
raise ValueError(
ValueError: Unable to create tensor, you should probably activate truncation and/or padding with 'padding=True' 'truncation=True' to have batched tensors with the same length. Perhaps your features (`labels` in this case) have excessive nesting (inputs type `list` where type `int` is expected).
The following error occurs while using
train_on_responses_onlyon theunsloth/tinyllama-chat-bnb-4bitmodel.Link to the test notebook: https://colab.research.google.com/gist/akhlakm/c7c40b0c29d112f2544168be42d3410b/llama-3-1-8b-conversational-unsloth-2x-faster-finetuning.ipynb
Also, when the chat template defined in the
tokenizer_config.jsonfile is used, I get the following error iftrain_on_responses_onlyis used.