[quant][core][gpu][feature] Implemented quantized cuda gelu#77212
[quant][core][gpu][feature] Implemented quantized cuda gelu#77212dzdang wants to merge 9 commits intogh/dzdang/110/basefrom
Conversation
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` [ghstack-poisoned]
🔗 Helpful links
❌ 2 New FailuresAs of commit a8d5169 (more details on the Dr. CI page): Expand to see more
🕵️ 2 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages
|
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 7f74cbc Pull Request resolved: #77212
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 7ac334a Pull Request resolved: #77212
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@jerryzh168 wondering if we should move |
|
yeah I think maybe we can move it to cuda folder |
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 2bec24f Pull Request resolved: #77212
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
| auto x_fp32 = at::dequantize(qx); | ||
| auto result_fp32 = at::gelu(x_fp32); | ||
| return at::quantize_per_tensor(result_fp32, qx.q_scale(), qx.q_zero_point(), qx.scalar_type()); |
There was a problem hiding this comment.
if each one of them supports L11-13 we can remove L11-13 right? should we add a TODO to do this?
There was a problem hiding this comment.
they do, but I think I've seen several other functions this use pruning check for early termination
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 0d104e9 Pull Request resolved: #77212
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@pytorchbot merge this (Initiating merge automatically since Phabricator Diff has merged) |
Summary: Pull Request resolved: #77212 Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Reviewed By: jerryzh168 Differential Revision: D36302475 Pulled By: dzdang fbshipit-source-id: 11342fb290031d62ba5e620cbe572fe2cc8ed701
|
Reverting this PR internally as it broke bazel builds, see https://hud.pytorch.org/pytorch/pytorch/commit/b892b85b881c7b3b2b6bde529c4d174e348ba9fb |
|
@pytorchbot revert this This Pull Request has been reverted by a revert inside Meta. To re-land this change, please open another pull request, assign the same reviewers, fix the CI failures that caused the revert and make sure that the failing CI runs on the PR by applying the proper ciflow label (e.g., ciflow/trunk).) |
This reverts commit b892b85. Reverted #77212 on behalf of https://github.com/facebook-github-bot
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36302475](https://our.internmc.facebook.com/intern/diff/D36302475) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: e617725 Pull Request resolved: #77212
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36392774](https://our.internmc.facebook.com/intern/diff/D36392774) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 31203eb Pull Request resolved: #77212
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Differential Revision: [D36392774](https://our.internmc.facebook.com/intern/diff/D36392774) [ghstack-poisoned]
Summary: Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` ghstack-source-id: 5b2a79f Pull Request resolved: #77212
|
@dzdang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
1 similar comment
|
@pytorchbot merge (Initiating merge automatically since Phabricator Diff has merged) |
…36302475) (#77212) Summary: Pull Request resolved: #77212 Support for quantized cuda gelu has been provided by using `dequantize -> fp32 cuda gelu kernel -> quantize`. Mathematically, this is not equivalent to doing int8 gelu, so we have opted for this approach for now. It might be possible to write a variant of the int8 gelu that's equivalent to `dequantize -> fp32 cuda gelu kernel -> quantize`, which can be a topic for future work. Test function `test_qgelu` was amended to test gelu for quantized cuda backends. Test Plan: ``` python test/test_quantization.py -k test_qgelu ``` Reviewed By: cpuhrsch Differential Revision: D36392774 Pulled By: dzdang fbshipit-source-id: 1accdefb042ee4930451ef016c527c5cd3e13168
|
Can't merge closed PR #77212 |
|
@pytorchbot merge |
|
Can't merge closed PR #77212 |
|
@pytorchbot merge |
|
@pytorchbot merge |
|
Merge failed due to Command Raised by https://github.com/pytorch/pytorch/actions/runs/2384646215 |
Stack from ghstack (oldest at bottom):
Summary:
Support for quantized cuda gelu has been provided by using
dequantize -> fp32 cuda gelu kernel -> quantize. Mathematically, thisis not equivalent to doing int8 gelu, so we have opted for this approach
for now. It might be possible to write a variant of the int8 gelu that's
equivalent to
dequantize -> fp32 cuda gelu kernel -> quantize, whichcan be a topic for future work.
Test function
test_qgeluwas amended to test gelu for quantized cudabackends.
Test Plan:
Differential Revision: D36392774