Support Qwen2 by yangjianxin1 · Pull Request #428 · unslothai/unsloth

yangjianxin1 · 2024-05-05T16:38:32Z

We add support of Qwen2 which is important for open-source community. Our repo Firefly has already supported training Qwen2 with Unsloth, more experiment details can be seen in our model card.

We have evaluated the training gain of Qwen1.5-7B, we use QLoRA and Unsloth to train Qwen1.5-7B for 20 steps on a single V100. The result can be listed as follows. Unsloth can reduce GPU memory by 39.13% and training time by 32.12%, and the training speed can increase by 47.32%.

max_seq_length	per_device_train_batch_size	gradient_accumulation_steps	use_unsloth	rank	GPU	Time
1024	1	16	false	8	13.72GB	448s
1024	1	16	true	8	8.43GB(-38.56%)	308s(-31.25%)
1024	1	16	false	64	16.01GB	452s
1024	1	16	true	64	11.07GB(-30.86%)	311s(-31.19%)
2048	1	16	false	64	18.55GB	840s
2048	1	16	true	64	12.99GB(-29.97%)	596s(-29.05%)
1024	4	4	false	64	24.70GB	357s
1024	4	4	true	64	14.36GB(-41.86%)	253s(-29.13%)
2048	4	4	false	64	32.51GB	741s
2048	4	4	true	64	19.79GB(-39.13%)	503s(-32.12%)

We also evaluate our sft and dpo models with Unsloth on Open LLM Leaderboard, they achieve good performance and outperform the official Qwen1.5-7B-Chat.

Model	Average	ARC	HellaSwag	MMLU	TruthfulQA	Winogrande	GSM8K
firefly-gemma-7b	62.93	62.12	79.77	61.57	49.41	75.45	49.28
firefly-qwen1.5-en-7b-dpo-v0.1-unsloth	62.65	56.14	75.5	60.87	58.09	70.72	54.59
zephyr-7b-beta	61.95	62.03	84.36	61.07	57.45	77.74	29.04
firefly-qwen1.5-en-7b-unsloth	61.81	54.27	76.22	61.55	50.62	70.48	57.7
vicuna-13b-v1.5	55.41	57.08	81.24	56.67	51.51	74.66	11.3
Xwin-LM-13B-V0.1	55.29	62.54	82.8	56.53	45.96	74.27	9.63
Qwen1.5-7B-Chat	55.15	55.89	78.56	61.65	53.54	67.72	13.57
gemma-7b-it	53.56	51.45	71.96	53.52	47.29	67.96	29.19

danielhanchen · 2024-05-05T17:01:40Z

@yangjianxin1 Oh wait does Qwen2 not have that weird alternating sliding window & normal attention thingo?

yangjianxin1 · 2024-05-06T06:53:42Z

Yes, there is not weird alternating sliding window & normal attention in Qwen2, and its use_sliding_window is false in the config.json.
And I have compared the code between Llama and Qwen2 almost line by line, they are very similar.

This reverts commit 026b05f.

danielhanchen · 2024-05-10T17:21:20Z

Thanks for the PR again! I streamlined Qwen2 to call FastMistralModel (since I think it's an exact replica right?)

* Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer padding * Update tokenizer_utils.py * Update save.py * Fix: loading models with resized vocabulary (#377) * new: vocab resize on load * new: gitignore * GGUF fix * Readme (#390) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update README.md * Delete .gitignore * Phi-3 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Fix reserved tokens * Update save.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * Update save.py * Update _utils.py * Update chat_templates.py * Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415) * Adds dependencies and extras for torch 2.3.0 with new xformers versions * Add 2.3.0 section to readme * Support Qwen2 (#428) * support Qwen2 * support Qwen2 * Delete README.md * Revert "Delete README.md" This reverts commit 026b05f. * Update README.md * Qwen2 == Mistral * Update llama.py * Update __init__.py * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update save.py * test_hf_gguf_equivalence * Update chat_templates.py * Update chat_templates.py * --pad-vocab * Update tokenizer_utils.py --------- Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com> Co-authored-by: Yang JianXin <995462226@qq.com>

* Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer padding * Update tokenizer_utils.py * Update save.py * Fix: loading models with resized vocabulary (#377) * new: vocab resize on load * new: gitignore * GGUF fix * Readme (#390) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update README.md * Delete .gitignore * Phi-3 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Fix reserved tokens * Update save.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * Update save.py * Update _utils.py * Update chat_templates.py * Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415) * Adds dependencies and extras for torch 2.3.0 with new xformers versions * Add 2.3.0 section to readme * Support Qwen2 (#428) * support Qwen2 * support Qwen2 * Delete README.md * Revert "Delete README.md" This reverts commit 026b05f. * Update README.md * Qwen2 == Mistral * Update llama.py * Update __init__.py * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update save.py * test_hf_gguf_equivalence * Update chat_templates.py * Update chat_templates.py * --pad-vocab * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Unspecified max_seq_length * possible_pad_token * Update tokenizer_utils.py --------- Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com> Co-authored-by: Yang JianXin <995462226@qq.com>

* Fix prompt * Update chat_templates.py * fix_untrained_tokens * Update llama.py * add tokens * Update _utils.py * Update tokenizer_utils.py * Update llama.py * Update llama.py * Update llama.py * Update llama.py * pad_token * Update chat_templates.py * Update chat_templates.py * tokenizer * Update save.py * Update chat_templates.py * Update chat_templates.py * patch tokenizer padding * Update tokenizer_utils.py * Update save.py * Fix: loading models with resized vocabulary (#377) * new: vocab resize on load * new: gitignore * GGUF fix * Readme (#390) * Update README.md * Update README.md --------- Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> * Update README.md * Delete .gitignore * Phi-3 * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Fix reserved tokens * Update save.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update chat_templates.py * Update save.py * Update _utils.py * Update chat_templates.py * Adds dependencies and extras for torch 2.3.0 with new xformers versions (#415) * Adds dependencies and extras for torch 2.3.0 with new xformers versions * Add 2.3.0 section to readme * Support Qwen2 (#428) * support Qwen2 * support Qwen2 * Delete README.md * Revert "Delete README.md" This reverts commit 026b05f. * Update README.md * Qwen2 == Mistral * Update llama.py * Update __init__.py * Update README.md --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Update save.py * Update save.py * Update _utils.py * Update save.py * Update save.py * Update save.py * test_hf_gguf_equivalence * Update chat_templates.py * Update chat_templates.py * --pad-vocab * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Update tokenizer_utils.py * Unspecified max_seq_length * possible_pad_token * Update tokenizer_utils.py * past_key_values * Update llama.py * Update llama.py * Update llama.py * Update _utils.py * Update _utils.py * Update _utils.py * Update _utils.py * _wrap_fast_inference * Update llama.py * Update llama.py * flag --------- Co-authored-by: Igor Kilbas <whitemarsstudios@gmail.com> Co-authored-by: Michael Han <107991372+shimmyshimmer@users.noreply.github.com> Co-authored-by: Nathan Azrak <42650258+nathan-az@users.noreply.github.com> Co-authored-by: Yang JianXin <995462226@qq.com>

NeoFii · 2024-05-16T12:18:34Z

Could you please provide a detailed explanation of the specific process of fine-tuning Qwen-1.5B-Chat using Unsloth?I want to fine-tune Qwen1.5-7B myself.

yangjianxin1 added 2 commits May 4, 2024 11:37

support Qwen2

0f1e607

support Qwen2

7b138b0

Delete README.md

026b05f

danielhanchen changed the base branch from main to nightly May 10, 2024 16:57

danielhanchen added 5 commits May 11, 2024 02:58

Revert "Delete README.md"

9fe1e15

This reverts commit 026b05f.

Update README.md

907a2c9

Qwen2 == Mistral

94e4a26

Update llama.py

2a17de2

Update __init__.py

60a2f00

Update README.md

4973f5b

danielhanchen merged commit cf83fe3 into unslothai:nightly May 10, 2024

danielhanchen mentioned this pull request May 10, 2024

May 2024 Prelim #447

Merged

bratao mentioned this pull request May 17, 2024

Unsloth optims for Llama axolotl-ai-cloud/axolotl#1609

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support Qwen2#428

Support Qwen2#428
danielhanchen merged 9 commits intounslothai:nightlyfrom
yangjianxin1:main

yangjianxin1 commented May 5, 2024

Uh oh!

danielhanchen commented May 5, 2024

Uh oh!

yangjianxin1 commented May 6, 2024

Uh oh!

danielhanchen commented May 10, 2024

Uh oh!

NeoFii commented May 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

yangjianxin1 commented May 5, 2024

Uh oh!

danielhanchen commented May 5, 2024

Uh oh!

yangjianxin1 commented May 6, 2024

Uh oh!

danielhanchen commented May 10, 2024

Uh oh!

NeoFii commented May 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants