Add Float8Tensor#2463
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/2463
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 1 New FailureAs of commit b0c2cf3 with merge base b757fb9 ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Summary: Splits out the float8 rowwise quantized path (both act and weight) of AQT to Float8RowwiseTensor Next: could potentially incorporate the per tensor activation path there as well Next: we can split the per tensor weight path to another Tensor as well, so we can deprecate AQT path for float8 Test Plan: python test/dtypes/test_affine_quantized_float.py python test/quantization/quantize_/test_float8_rowwise_tensor.py Reviewers: Subscribers: Tasks: Tags: stack-info: PR: #2463, branch: jerryzh168/stack/9
da79207 to
5cae4d0
Compare
5cae4d0 to
33ca58e
Compare
33ca58e to
897ec7e
Compare
7897dcf to
99a1bb1
Compare
99a1bb1 to
7e9f224
Compare
7e9f224 to
442bd6c
Compare
| from torchao.quantization.quantize_.common import QuantizeTensorKwargs | ||
|
|
||
|
|
||
| def _choose_quant_func_and_quantize_tensor( |
There was a problem hiding this comment.
nit: colocate with QuantizeTensorKwargs and move out from utils, since this is more of a developer facing function which is central to the design?
There was a problem hiding this comment.
hmmm, I have thought about move this to common/ but then common is going to import from the workflow specific code, this seems weird, like:
from torchao.quantization.quantize_.workflows import (
Float8Tensor,
QuantizeTensorToFloat8Kwargs,
)
any ideas to resolve that? or do you feel it's fine for functions in common to import from workflow?
| git submodule update --init --recursive | ||
| python use_existing_torch.py | ||
| pip install -r requirements/build.txt | ||
| pip install --no-build-isolation -e . |
There was a problem hiding this comment.
@jerryzh168 This setup, unfortunately, destroys our H100 fleet because:
- It builds vLLM from scratch without any cache and that could take hours. So, it's a very inefficient use of H100 runners
- The build somehow hang Docker daemon, maybe memory related. At least, you want to set MAX_JOBS here to match what vLLM does https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile#L198-L202
As it's not trivial to build vLLM from source locally, I think we should explore the option to use pre-build image from vLLM where you can pull them from public.ecr.aws/q9t5s3a7/vllm-ci-postmerge-repo:<COMMIT_HASH>. Here is an example https://github.com/pytorch/pytorch-integration-testing/blob/main/.github/workflows/vllm-benchmark.yml#L149
Another way is to run this test on PyTorch CI where @yangw-dev are working on a CI job to test vLLM main v.s. PyTorch (we also need to build vLLM there, but with caching properly setup). We could also run pytest test/integration from AO there I think
There was a problem hiding this comment.
I see thanks, will try out the suggestions in #2601
Summary:
* Added Float8Tensor that's using fbgemm kernels and scaled_mm:
* per row activation + per row weight linear calling torch._scaled_mm op (for compatibilty with SM 8.9)
* per tensor activation + per tensor weight quant linear calling torch._scaled_mm op (for compatibilty with SM 8.9)
* per row activation + per row weight bmm calling torch.ops.fbgemm.f8f8bf16_rowwise_batched kernel (only works for SM 9.0+) can use batched scaled mm from torch when it's supported: pytorch/pytorch#157950
* dynamic quantization kwargs is added to the Float8Tensor directly
* Added QuantizeTensorKwargs and QuantizeTensorToFloat8Kwargs to store key word args for Float8Tensor.to_float8
* Updated Float8DynamicActivationFloat8WeightConfig and Float8WeightOnlyConfig to use Float8Tensor
Test Plan:
python test/dtypes/test_affine_quantized_float.py
python test/quantization/quantize_/workflows/float8/test_float8_tensor.py
Reviewers:
Subscribers:
Tasks:
Tags:
stack-info: PR: #2463, branch: jerryzh168/stack/9
|
/easycla |
Stacked PRs:
Add Float8Tensor
Summary:
Test Plan:
python test/dtypes/test_affine_quantized_float.py
python test/quantization/quantize_/workflows/float8/test_float8_tensor.py
Reviewers:
Subscribers:
Tasks:
Tags: