Fix masked-image VAE encode dtype in fine_tune and train_textual_inversion by kohya-ss · Pull Request #2320 · kohya-ss/sd-scripts

kohya-ss · 2026-05-05T13:55:29Z

Summary

Follow-up to PR Add inpainting training and sampling support for SD1.5 and SDXL #2309 / Fix/inpainting followups #2319 — finding Merge requirements #3 from the inpainting review.
In fine_tune.py and train_textual_inversion.py, the masked-image VAE encode path used weight_dtype for input and skipped the output cast, while the regular latents path encodes in vae_dtype and casts the sample to weight_dtype.
Under --no_half_vae (VAE kept in fp32 while weight_dtype is fp16/bf16), this could cause a torch.cat([noisy_latents, mask, masked_latents], dim=1) dtype mismatch and VAE numerical instability.
Fix aligns both scripts with the pattern already used in sdxl_train.py and the regular latents path: .to(dtype=vae_dtype) on input, .latent_dist.sample().to(weight_dtype) on output.
train_db.py is internally consistent (both paths use weight_dtype) and is intentionally left out of this PR; its broader vae_dtype/weight_dtype handling is a pre-existing concern to be addressed separately.

Test plan

Verified --train_inpainting runs on fine_tune.py after the change
Verified --train_inpainting runs on train_textual_inversion.py after the change
(Optional) Spot-check with --no_half_vae + fp16/bf16 --mixed_precision

🤖 Generated with Claude Code

…rsion Match the regular latents path: encode in vae_dtype, cast output to weight_dtype. Prevents torch.cat dtype mismatch and VAE numerical instability under --no_half_vae (vae_dtype != weight_dtype). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

kohya-ss merged commit b212706 into dev May 5, 2026
3 checks passed

kohya-ss deleted the fix/inpainting-vae-dtype branch May 5, 2026 14:04

kohya-ss mentioned this pull request May 6, 2026

Inpainting cleanup: misc fixes following PR #2309 review #2321

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix masked-image VAE encode dtype in fine_tune and train_textual_inversion#2320

Fix masked-image VAE encode dtype in fine_tune and train_textual_inversion#2320
kohya-ss merged 1 commit intodevfrom
fix/inpainting-vae-dtype

kohya-ss commented May 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

kohya-ss commented May 5, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant