New colab : Fine-tune LLMs with Axolotl End-to-end guide to the state-of-the-art tool for fine-tuning
Hi, I've uploaded colab that follows your article.
The Merge operation is missing, because I didn't know if you were interested.
Thanks, I completed it and added it to the LLM course, see https://colab.research.google.com/drive/1Xu0BrCB7IShwSWKVcfAfhehwjDrDMH5m?usp=sharing
If this line of code:
!pip install -qqq -e '.[flash-attn,deepspeed]' --progress-bar off
gives you an error, you should downgrade Torch to version 2.1.1:
!pip install torch==2.1.1
Thanks @kukedlc87 I added it
@mlabonne the Fine_tune_LLMs_with_Axolotl.ipynb does not work. Those are the dependencies
****************************************
**** Axolotl Dependency Versions *****
accelerate: 0.28.0
peft: 0.10.0
transformers: 4.40.0.dev0
trl: 0.8.5
torch: 2.2.1+cu121
bitsandbytes: 0.43.0
****************************************
Training is failing on Colab T4 with RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'. This is the full stacktrace
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/content/axolotl/src/axolotl/cli/train.py", line 59, in <module>
fire.Fire(do_cli)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/content/axolotl/src/axolotl/cli/train.py", line 35, in do_cli
return do_train(parsed_cfg, parsed_cli_args)
File "/content/axolotl/src/axolotl/cli/train.py", line 55, in do_train
return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta)
File "/content/axolotl/src/axolotl/train.py", line 170, in train
trainer.train(resume_from_checkpoint=resume_from_checkpoint)
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1837, in train
return inner_training_loop(
File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2227, in _inner_training_loop
_grad_norm = self.accelerator.clip_grad_norm_(
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2145, in clip_grad_norm_
self.unscale_gradients()
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2095, in unscale_gradients
self.scaler.unscale_(opt)
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 336, in unscale_
optimizer_state["found_inf_per_device"] = self._unscale_grads_(
File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 277, in _unscale_grads_
torch._amp_foreach_non_finite_check_and_unscale_(
RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'
0% 0/20 [00:02<?, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main
args.func(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1057, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 673, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'axolotl.cli.train', 'config.yaml']' returned non-zero exit status 1.
Also you need to remove mlflow reporting from the config otherwise it will complain as it is not installed.
Thanks I upgraded PyTorch's version and removed mlflow.
On Sun, May 5, 2024 at 11:44 AM bachr @.***> wrote:
@mlabonne https://github.com/mlabonne the Fine_tune_LLMs_with_Axolotl.ipynb https://colab.research.google.com/drive/1Xu0BrCB7IShwSWKVcfAfhehwjDrDMH5m?usp=sharing does not work. Those are the dependencies
**** Axolotl Dependency Versions ***** accelerate: 0.28.0 peft: 0.10.0 transformers: 4.40.0.dev0 trl: 0.8.5 torch: 2.2.1+cu121 bitsandbytes: 0.43.0
Training is failing on Colab T4 with RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16'. This is the full stacktrace
Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/content/axolotl/src/axolotl/cli/train.py", line 59, in
fire.Fire(do_cli) File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 143, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 477, in _Fire component, remaining_args = CallAndUpdateTrace( File "/usr/local/lib/python3.10/dist-packages/fire/core.py", line 693, in CallAndUpdateTrace component = fn(*varargs, **kwargs) File "/content/axolotl/src/axolotl/cli/train.py", line 35, in do_cli return do_train(parsed_cfg, parsed_cli_args) File "/content/axolotl/src/axolotl/cli/train.py", line 55, in do_train return train(cfg=cfg, cli_args=cli_args, dataset_meta=dataset_meta) File "/content/axolotl/src/axolotl/train.py", line 170, in train trainer.train(resume_from_checkpoint=resume_from_checkpoint) File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1837, in train return inner_training_loop( File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2227, in inner_training_loop grad_norm = self.accelerator.clip_grad_norm( File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2145, in clip_grad_norm self.unscale_gradients() File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 2095, in unscale_gradients self.scaler.unscale(opt) File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 336, in unscale optimizer_state["found_inf_per_device"] = self.unscale_grads( File "/usr/local/lib/python3.10/dist-packages/torch/cuda/amp/grad_scaler.py", line 277, in unscale_grads torch.amp_foreach_non_finite_check_and_unscale( RuntimeError: "_amp_foreach_non_finite_check_and_unscale_cuda" not implemented for 'BFloat16' 0% 0/20 [00:02<?, ?it/s] Traceback (most recent call last): File "/usr/local/bin/accelerate", line 8, in sys.exit(main()) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/accelerate_cli.py", line 46, in main args.func(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 1057, in launch_command simple_launcher(args) File "/usr/local/lib/python3.10/dist-packages/accelerate/commands/launch.py", line 673, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['/usr/bin/python3', '-m', 'axolotl.cli.train', 'config.yaml']' returned non-zero exit status 1. Also you need to remove mlflow reporting from the config otherwise it will complain as it is not installed.
— Reply to this email directly, view it on GitHub https://github.com/mlabonne/llm-course/issues/40#issuecomment-2094714353, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATL5EGX4MMOEDGVKGW6OWQ3ZAX5QTAVCNFSM6AAAAABCMCUZT6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAOJUG4YTIMZVGM . You are receiving this because you were mentioned.Message ID: @.***>