Fix/casting continue pretraining by Erland366 · Pull Request #1200 · unslothai/unsloth

Erland366 · 2024-10-27T12:06:18Z

Theres' this issue of attempting unscale FP16 gradients

After investigation, this is because of global dtype, which is when we use it on colab, we will use torch.float16 instead of torch.bfloat16. This error does not happened if we use torch.bfloat16 (my own RTX4090 for example). So we need to use torch.float32 on device that is still using torch.float16

Here is the result on bfloat16

Abd here's the result on float32 (colab)

…ure correct dtype usage

…raining

…in mixed precision training

…at32 usage in mixed precision training

…pe usage in mixed precision training

Erland366 · 2024-10-27T12:14:03Z

Previously I drafted this because somehow I can't use BS 2 just like the example. BUt now I can use BS 2. So opening this PR instead of Draft

gautamabambang · 2024-10-27T22:00:13Z

Thankyou so much for bringing this up in PR man🙏🙏

danielhanchen · 2024-10-27T22:06:52Z

Oh I totally missed float16 cannot be used and only bfloat16 can be used for continued pretraining - nice catch!

mombip · 2025-04-25T15:17:46Z

I believe this change lead to error while saving model with float32 embeddings (then model.config.torch_dtype is 'float32').

model = FastLanguageModel.get_peft_model(
    model,
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj", "embed_tokens"],
    ...

During saving model with model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",) I'm getting error: RuntimeError: Invalid device string: 'float32'. It is caused because string value of torch_dtype = 'float32' is not supported - there is no mapping to torch.float32:

unsloth/save.py (551)

    ...
    torch_dtype = internal_model.config.torch_dtype
    if type(torch_dtype) is str:
        if   torch_dtype ==  "float16": torch_dtype = torch.float16
        elif torch_dtype == "bfloat16": torch_dtype = torch.bfloat16
    pass

    # Check modules to save float32 dtype
    state_dict["model.embed_tokens.weight"] = internal_model.model.embed_tokens.weight.data.to(torch_dtype)
    ...

For "float16" and "float16" are converted to Torch dtype. For "float32" it remains a string and causes Error when internal_model.model.embed_tokens.weight.data.to(torch_dtype) is called.

* Bring back float32 if float16 instead of bfloat16 * Refactor mixed precision handling for lm_head and embed_tokens to ensure correct dtype usage * Fix dtype retrieval for embed_tokens and lm_head in mixed precision training * Fix dtype retrieval for embed_tokens and lm_head to use weight dtype in mixed precision training * Fix dtype handling for embed_tokens and lm_head to ensure correct float32 usage in mixed precision training * Fix dtype assignment for lm_head modules to ensure correct weight dtype usage in mixed precision training

Erland366 added 6 commits October 27, 2024 15:31

Bring back float32 if float16 instead of bfloat16

40da53d

Refactor mixed precision handling for lm_head and embed_tokens to ens…

3cfae1f

…ure correct dtype usage

Fix dtype retrieval for embed_tokens and lm_head in mixed precision t…

1dd5fed

…raining

Fix dtype retrieval for embed_tokens and lm_head to use weight dtype …

2c4ad1a

…in mixed precision training

Fix dtype handling for embed_tokens and lm_head to ensure correct flo…

439a158

…at32 usage in mixed precision training

Fix dtype assignment for lm_head modules to ensure correct weight dty…

0cf6fa6

…pe usage in mixed precision training

Erland366 marked this pull request as ready for review October 27, 2024 12:12

danielhanchen merged commit fdf25b7 into unslothai:main Oct 27, 2024

Erland366 deleted the fix/casting-continue-pretraining branch October 27, 2024 22:32

mmathew23 mentioned this pull request Apr 1, 2025

stop casting hidden states to float32 for cut cross entropy unslothai/unsloth-zoo#107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix/casting continue pretraining#1200

Fix/casting continue pretraining#1200
danielhanchen merged 6 commits into
unslothai:mainfrom
Erland366:fix/casting-continue-pretraining

Erland366 commented Oct 27, 2024 •

edited

Loading

Uh oh!

Erland366 commented Oct 27, 2024

Uh oh!

gautamabambang commented Oct 27, 2024

Uh oh!

danielhanchen commented Oct 27, 2024

Uh oh!

mombip commented Apr 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

Erland366 commented Oct 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Erland366 commented Oct 27, 2024

Uh oh!

gautamabambang commented Oct 27, 2024

Uh oh!

danielhanchen commented Oct 27, 2024

Uh oh!

mombip commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Erland366 commented Oct 27, 2024 •

edited

Loading

mombip commented Apr 25, 2025 •

edited

Loading