Makes fallback float8 1x128 by 128x128 gemm output bfloat16#3265
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3265
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit c877d67 with merge base f856d36 ( BROKEN TRUNK - The following jobs failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: For now, we just care about bf16 output. We can add fp32 and a flag to control it later, if needed. Test Plan: ``` pytest test/quantization/quantize_/workflows/float8/test_float8_tensor.py -s -k fp8_linear_variants -x ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: b3c443c ghstack-comment-id: 3469836810 Pull-Request: #3265
Summary: For now, we just care about bf16 output. We can add fp32 and a flag to control it later, if needed. Test Plan: ``` pytest test/quantization/quantize_/workflows/float8/test_float8_tensor.py -s -k fp8_linear_variants -x ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: f00cd47 ghstack-comment-id: 3469836810 Pull-Request: #3265
Summary: For now, we just care about bf16 output. We can add fp32 and a flag to control it later, if needed. Test Plan: ``` pytest test/quantization/quantize_/workflows/float8/test_float8_tensor.py -s -k fp8_linear_variants -x ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 8a63a04 ghstack-comment-id: 3469836810 Pull-Request: #3265
danielvegamyhre
left a comment
There was a problem hiding this comment.
LGTM. Btw in my benchmarks I found the torch._scaled_mm cutlass kernel for blockwise gemms to be much faster than the triton kernels. This was a few months ago, you can run the benchmarks scripts in this dir if you want: https://github.com/pytorch/ao/tree/main/benchmarks/prototype/blockwise_fp8_training
Yes, two things for ^:
|
…3265) * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned] * Update [ghstack-poisoned]
Summary:
For now, we just care about bf16 output. We can add fp32 and a flag to
control it later, if needed.
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags: