add performance and accuracy eval of flux-1.schnell by vkuzo · Pull Request #3502 · pytorch/ao

vkuzo · 2025-12-17T20:38:14Z

Summary:

Adds performance and accuracy eval for the flux-1.schnell model. This is useful as diffusion models are a major use case for torchao, and before this PR we didn't have reproducible benchmarks for them.

Results, measured on a B200 machine:

experiment	lpips_avg	time_s	speedup
bfloat16 (baseline)	-	1.77	-
float8_rowwise	0.1714	1.54	1.15
mxfp8	0.1747	1.47	1.20
nvfp4	0.3081	1.32	1.34

Details:

For performance, we measure e2e time for single image generation, with torch.compile on and num_inference_steps=4. In future PRs we can tighten this up to align with https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/. For now I did not do any performance debugging.
For accuracy, we measure the LPIPS (https://github.com/richzhang/PerceptualSimilarity) score between the image generated by the baseline (bf16) and quantized model, averaged over the DrawBench (https://huggingface.co/datasets/sayakpaul/drawbench) dataset of 200 prompts.
we start with three supported quantization recipes: float8_rowwise, mxfp8, nvfp4 (because I wrote this on a B200). We can expand to other recipes in future PRs as needed.
for selecting layers for applying quantization to a model, I wrote a basic heuristic (don't quantize embeddings, etc) - this was not validated with any accuracy study or sensitivity analysis.

How to run the e2e script:

// takes ~16 mins using 8 GPUs on a B200
benchmarks/quantization/eval_accuracy_and_perf_of_flux.sh
// full log: https://www.internalfb.com/phabricator/paste/view/P2093514733

Note: the script quality is not ideal, we can improve in future PRs if it proves to be worth our time. The current code is good enough to check in and start reporting metrics.

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

[ghstack-poisoned]

vkuzo · 2025-12-17T20:38:15Z

Stack from ghstack (oldest at bottom):

-> add performance and accuracy eval of flux-1.schnell #3502

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 25daf59 ghstack-comment-id: 3667066648 Pull-Request: #3502

pytorch-bot · 2025-12-17T20:38:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3502

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 1 Pending

As of commit 27b34e9 with merge base dd41e98 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b1bb3d2 ghstack-comment-id: 3667066648 Pull-Request: #3502

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 70a7b71 ghstack-comment-id: 3667066648 Pull-Request: #3502

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 551cd15 ghstack-comment-id: 3667066648 Pull-Request: #3502

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 58f5c33 ghstack-comment-id: 3667066648 Pull-Request: #3502

sayakpaul · 2025-12-30T15:56:22Z

This is good enough, actually.

For performance, we measure e2e time for single image generation, with torch.compile on and num_inference_steps=4. In future PRs we can tighten this up to align with https://pytorch.org/blog/presenting-flux-fast-making-flux-go-brrr-on-h100s/. For now I did not do any performance debugging.

We shouldn't need basic performance debugging as the mentioned blog post already did that (such as ensuring no graph-breaks, recompilations, CPU<->GPU syncs, etc.). I think we could add the following context before the inference runs to ensure no graph breaks (as it's simple):
https://github.com/huggingface/diffusers/blob/1cdb8723b85f1b427031e390e0bd0bebfe92454e/tests/models/test_modeling_common.py#L2143C9-L2149C37

We can squeeze out more, but that would probably be intrusive. Also, note that we log the performance benchmarks, too: https://huggingface.co/datasets/diffusers/benchmarks. In the future, it could be great for us to pair up and consolidate this like we have done many times in the past :-)

vkuzo · 2026-01-05T12:36:48Z

@sayakpaul ,

We shouldn't need basic performance debugging as the mentioned blog post already did that (such as ensuring no graph-breaks, recompilations, CPU<->GPU syncs, etc.).

if I use the diffusers pipeline as-is I get all of that for free except for qkv fusion, right? that's great

I think we could add the following context before the inference runs to ensure no graph breaks (as it's simple):

from the code link, seems like no-graph-breaks is enforced in diffusers CI? Lmk if I got that right. If so, I'd rather trust diffusers and not check for it again here, to keep things simple.

Also, note that we log the performance benchmarks, too

That's great! I do want something in torchao to help guide local development, and the goal for the benchmark in this PR is more "here is how different torchao quantization recipes compare to each other" and not "push perf + accuracy to SOTA / catch regressions / etc". I'd definitely be happy to collaborate more on this, where can we find you on slack?

sayakpaul · 2026-01-05T15:28:49Z

if I use the diffusers pipeline as-is I get all of that for free except for qkv fusion, right? that's great

QKV fusion is also supported:
https://github.com/huggingface/flux-fast/blob/0a1dcc91658f0df14cd7fce862a5c8842784c6da/utils/pipeline_utils.py#L389C9-L390C44

from the code link, seems like no-graph-breaks is enforced in diffusers CI? Lmk if I got that right. If so, I'd rather trust diffusers and not check for it again here, to keep things simple.

Yup, that's correct.

I'd definitely be happy to collaborate more on this, where can we find you on slack?

I think you can sync with @jerryzh168 / @supriyar on this. We have a fairly active collaboration channel on Slack :-)

[ghstack-poisoned]

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: dd31816 ghstack-comment-id: 3667066648 Pull-Request: #3502

vkuzo · 2026-01-06T12:36:51Z

QKV fusion is also supported:

@sayakpaul , does that flux-fast code path get hit if I use diffusers pipeline to load the flux model family, or does it require the user to use flux-fast directly?

sayakpaul · 2026-01-06T14:47:35Z

does that flux-fast code path get hit if I use diffusers pipeline to load the flux model family, or does it require the user to use flux-fast directly?

It's not flux-fast specific. It's implemented at the diffusers-level. You can use it on the Flux model family.

## Summary - Added new benchmark for new low precision attention API - Can set baseline and test models between different backends: (fa2, fa3, fa3_fp8, fa4, fa4_fp8) - uses flux.1-schnell model, 4 inference steps, DrawBench prompts - has options to control number of prompts, torch.compile usage, warmup_iters, using debug prompts, number of inference steps, rope fusion - Following the guidelines of #3502 ## Example Run python benchmarks/prototype/attention/eval_flux_model.py --baseline fa3 --test fa3_fp8 --compile

Update

bcfba9a

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Dec 17, 2025

[wip] flux eval

6a3bc39

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 25daf59 ghstack-comment-id: 3667066648 Pull-Request: #3502

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 17, 2025

Update

31ec4fe

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Dec 19, 2025

[wip] flux eval

5bffd03

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b1bb3d2 ghstack-comment-id: 3667066648 Pull-Request: #3502

vkuzo added the module: not user facing Use this tag if you don't want this PR to show up in release notes label Dec 19, 2025

Update

d0792a1

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Dec 19, 2025

[wip] flux eval

28a94e6

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 70a7b71 ghstack-comment-id: 3667066648 Pull-Request: #3502

Update

7db9995

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Dec 22, 2025

[wip] flux eval

7e330d3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 551cd15 ghstack-comment-id: 3667066648 Pull-Request: #3502

Update

43343c5

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Dec 22, 2025

[wip] flux eval

70057e3

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 58f5c33 ghstack-comment-id: 3667066648 Pull-Request: #3502

vkuzo changed the title ~~[wip] flux eval~~ add performance and accuracy eval of flux-1.schnell Dec 22, 2025

jerryzh168 requested a review from jainapurva December 23, 2025 00:07

This was referenced Dec 30, 2025

fix torchao quantizer for new torchao versions huggingface/diffusers#12901

Merged

make eval script also handle performance measurement #3473

Merged

jainapurva approved these changes Jan 2, 2026

View reviewed changes

Update

27b34e9

[ghstack-poisoned]

vkuzo added a commit that referenced this pull request Jan 6, 2026

[wip] flux eval

f098038

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: dd31816 ghstack-comment-id: 3667066648 Pull-Request: #3502

vkuzo merged commit 3955b6c into main Jan 6, 2026
59 checks passed

This was referenced Jan 30, 2026

Added benchmarking on Flux.1-schnell for new fp8 sdpa API #3706

Open

Added new fp8_rope_sdpa_inference API #3730

Open

howardzhang-cv mentioned this pull request Feb 12, 2026

Added benchmarking for new torchao low precision attention api #3865

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add performance and accuracy eval of flux-1.schnell#3502

add performance and accuracy eval of flux-1.schnell#3502
vkuzo merged 6 commits intomainfrom
gh/vkuzo/186/head

vkuzo commented Dec 17, 2025 •

edited

Loading

Uh oh!

vkuzo commented Dec 17, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Dec 17, 2025 •

edited

Loading

Uh oh!

sayakpaul commented Dec 30, 2025

Uh oh!

vkuzo commented Jan 5, 2026

Uh oh!

sayakpaul commented Jan 5, 2026

Uh oh!

vkuzo commented Jan 6, 2026

Uh oh!

Uh oh!

sayakpaul commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

vkuzo commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vkuzo commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3502

⏳ No Failures, 1 Pending

Uh oh!

sayakpaul commented Dec 30, 2025

Uh oh!

vkuzo commented Jan 5, 2026

Uh oh!

sayakpaul commented Jan 5, 2026

Uh oh!

vkuzo commented Jan 6, 2026

Uh oh!

Uh oh!

sayakpaul commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vkuzo commented Dec 17, 2025 •

edited

Loading

vkuzo commented Dec 17, 2025 •

edited

Loading

pytorch-bot Bot commented Dec 17, 2025 •

edited

Loading