feat: add fp16 safe patch option for training on older GPUs to prevent NaNs for Anima model by ihatenumbers · Pull Request #2274 · kohya-ss/sd-scripts

ihatenumbers · 2026-02-21T05:25:39Z

This fixes NaN loss for GPUs without bf16 support by adding fp16_safe_patch, outputting black images even with full_fp16 = true, and/or mixed_precision = "fp16". Just add fp16_safe_patch = true in config.toml. Confirmed it doesn't give black images during sampling while training and doesn't do NaN anymore.

This was inspired from https://huggingface.co/RicemanT/Loras_Collection/blob/main/anina_fp16_patch.py and with the help of gemini.

kohya-ss · 2026-02-22T12:45:47Z

Thank you for this PR and for identifying the fp16 NaN issue on older GPUs! The core problem you've identified (residual stream overflow in fp16) is real and important.

After reviewing the approach, we realized there's a much simpler solution. When training with mixed_precision=fp16, PyTorch/accelerate already wraps the model forward pass with torch.autocast(dtype=torch.float16), so all sub-module computations (attention, MLP, adaln modulation) are already running in fp16. The NaN issue comes from the residual additions accumulating in fp16 and overflowing.

The fix can be as simple as adding this to the beginning of Block._forward:

if x_B_T_H_W_D.dtype == torch.float16:
    x_B_T_H_W_D = x_B_T_H_W_D.float()

This promotes the residual stream to fp32, preventing overflow. The sub-modules still run in fp16 thanks to the existing autocast context, and their outputs are automatically upcast when added back to the fp32 residual. No monkey-patching, no global settings, and no extra flag needed — it activates automatically when the input is fp16.

We'll close this PR and implement this fix on our side, but we really appreciate you bringing this issue to our attention. Your work and the reference to the fp16 patch on HuggingFace were very helpful in understanding the problem.

kohya-ss · 2026-02-23T12:17:32Z

I've merged #2277. If you'd like, I'd be happy if you could check that it works.

feat: add fp16 safe patch option for older GPUs to prevent NaNs

525ced5

ihatenumbers closed this Feb 22, 2026

kohya-ss mentioned this pull request Feb 23, 2026

feat: Stability with fp16 for anima #2277

Merged

sashasubbbb mentioned this pull request Mar 23, 2026

NaN errors with fp16 training on Anima. #2293

Closed

kohya-ss mentioned this pull request Mar 29, 2026

fix: AdaLN modulation to use float32 for numerical stability in fp16 #2297

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add fp16 safe patch option for training on older GPUs to prevent NaNs for Anima model#2274

feat: add fp16 safe patch option for training on older GPUs to prevent NaNs for Anima model#2274
ihatenumbers wants to merge 1 commit intokohya-ss:mainfrom
ihatenumbers:anima-fp16-support

ihatenumbers commented Feb 21, 2026 •

edited

Loading

Uh oh!

kohya-ss commented Feb 22, 2026

Uh oh!

kohya-ss commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ihatenumbers commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kohya-ss commented Feb 22, 2026

Uh oh!

kohya-ss commented Feb 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ihatenumbers commented Feb 21, 2026 •

edited

Loading