[WIP] V1.3.0 quantization cherrypicks#27303
Merged
Conversation
4775d76 to
7bb0521
Compare
4b8bee8 to
82671eb
Compare
Summary: Pull Request resolved: #27002 This was taking a significant amount of time in my benchmarks with larger output sizes (e.g. final output projection in a language classification model) Test Plan: Imported from OSS Differential Revision: D17641765 Pulled By: jamesr66a fbshipit-source-id: b0ef30767eec9774fc503bb51fed039222026bba
Summary: Pull Request resolved: #26516 ghstack-source-id: 90982010 Test Plan: Integrate per-channel support into conv and linear modules. The following tests pass: buck test caffe2/test:quantized -- 'test_linear_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details buck test caffe2/test:quantized -- 'test_conv_api \(test_quantized_nn_mods\.ModuleAPITest\)' --print-passing-details buck test caffe2/test:quantized -- 'test_float_quant_compare_per_channel \(test_quantized_models\.ModelNumerics\)' --print-passing-details Differential Revision: D17342622 fbshipit-source-id: f0d618928e3d9348672c589a6b7a47049c372a2e
Summary: Pull Request resolved: #27008 Test Plan: Imported from OSS Differential Revision: D17649174 Pulled By: jamesr66a fbshipit-source-id: e3e6c4bb31e1ad8ed1ebe27f803f90d564ecfe53
Summary: Pull Request resolved: #26623 Per-channel fake quant cpu and cuda operators, per-channel support in fake quant module, tests for per-channel fake-quant and serializability of fake quant modules ghstack-source-id: 91008299 ghstack-source-id: 91008299 Test Plan: buck test mode/dev caffe2/test:fake_quant -- Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/1970324848875929 ✓ caffe2/test:fake_quant - test_backward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.242 1/10 (passed) ✓ caffe2/test:fake_quant - test_numerical_consistency_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.204 2/10 (passed) ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerTensor) 0.174 3/10 (passed) ✓ caffe2/test:fake_quant - test_numerical_consistency_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.279 4/10 (passed) ✓ caffe2/test:fake_quant - test_forward_per_tensor (test_fake_quant.TestFakeQuantizePerTensor) 0.241 5/10 (passed) ✓ caffe2/test:fake_quant - test_forward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.353 6/10 (passed) ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerTensor) 0.354 7/10 (passed) ✓ caffe2/test:fake_quant - test_backward_per_channel (test_fake_quant.TestFakeQuantizePerChannel) 0.334 8/10 (passed) ✓ caffe2/test:fake_quant - test_fq_serializable (test_fake_quant.TestFakeQuantizePerChannel) 0.168 9/10 (passed) ✓ caffe2/test:fake_quant - test_fq_module (test_fake_quant.TestFakeQuantizePerChannel) 0.429 10/10 (passed) ✓ caffe2/test:fake_quant - main 0.000 (passed) Differential Revision: D17439406 fbshipit-source-id: 64bfff5e4f40bc2ab8af2b432c7bc33805418077
Summary: Pull Request resolved: #26624 For QAT we need to be able to control batch norm for all modules from the top. Adding helper functions to enable/disable batch norm freezing during training ghstack-source-id: 91008297 Test Plan: buck test caffe2/test:quantization -- --print-passing-details Differential Revision: D17512199 fbshipit-source-id: f7b981e2b1966ab01c4dbb161030177274a998b6
…st (#26625) Summary: Pull Request resolved: #26625 ghstack-source-id: 91008296 Test Plan: buck test caffe2/test:quantized -- 'test_weight_only_activation_only_fakequant \(test_quantized_models\.ModelNumerics\)' --print-passing-details Differential Revision: D17520342 fbshipit-source-id: 26e148d3299afcfdfb1187aff6ab80687ed8df47
Summary: Pull Request resolved: #26627 ghstack-source-id: 91008337 Test Plan: buck test caffe2/test:quantization -- --print-passing-details Differential Revision: D17518194 fbshipit-source-id: 1eb8a7a85dc811c4ee5228d68563abb157613ceb
Summary: Pull Request resolved: #26612 Add support for add relu functional module, this allows for fusion of add and relu quantized operations ghstack-source-id: 91055976 Test Plan: buck test caffe2/test:quantization -- 'test_functional_module \(test_quantization\.FunctionalModuleTest\)' --print-passing-details Differential Revision: D17518268 fbshipit-source-id: e1e8b4655d6b32405863ab9d1c7da111fb4343cc
Summary: Pull Request resolved: #27113 Fix bug in fake quant control of observer and fake-quantize operations. Add test to ensure that features work as expected ghstack-source-id: 91071181 Test Plan: buck test mode/dev-nosan caffe2/test:fake_quant -- test_fake_quant_control Differential Revision: D17678875 fbshipit-source-id: 2912ad8b6e674daa1d129f7a7c6f27d8c1b4f93b
Summary: Pull Request resolved: #26457 Enhancement to fuse module to support sequentials, fuse list can now be just like the state dict. Also add support for Conv-Relu and linear-relu fusion Also support inplace and out of place fusion of models. ghstack-source-id: 91076386 Test Plan: buck test caffe2/test:quantization -- 'test_fusion_sequential_model_train \(test_quantization\.FusionTest\)' --print-passing-details buck test caffe2/test:quantization -- 'test_fusion_sequential_model_eval \(test_quantization\.FusionTest\)' --print-passing-details Differential Revision: D17466382 fbshipit-source-id: 0a548f8f4c366f3ecc59db693bac725ccd62328e
Summary: Pull Request resolved: #26992 Run the same test for FBGEMM and QNNPACK backends. Checks that QNNPACK or FBGEMM are supported before running it (using supported_qengines) Test Plan: python test/test_quantized.py TestQuantizedLinear python test/test_quantized.py TestQuantizedConv python test/test_quantized_models.py python test/test_quantized_nn_mods.py Imported from OSS Differential Revision: D17689171 fbshipit-source-id: e11c0a5e41f5f4e6836a614a5b61e4db3c5e384b
Summary: Pull Request resolved: #27151 We need to be ab le to handle observers with no min/max data correctly as models sometimes have modules that do not get any data. ghstack-source-id: 91113403 Test Plan: buck test caffe2/test:quantization -- test_minmax_observer buck test caffe2/test:quantization -- test_per_channel_minmax_observer buck test caffe2/test:quantization --test_histogram_observer Reviewed By: csummersea Differential Revision: D17690828 fbshipit-source-id: e95709333ea0f66d79ddb8141b7cba5a83347dbd
Summary: Pull Request resolved: #27193 Test Plan: Imported from OSS Differential Revision: D17704958 Pulled By: zafartahirov fbshipit-source-id: d8ab58b724cce2f5130b10ead0f10f5f32e26cfb
Summary: Pull Request resolved: #26692 Adding intra-op parallelism for qconv and qlinear. export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear TODO: Performance numbers. ghstack-source-id: 91135613 Test Plan: export OMP_NUM_THREADS=4 python test/test_quantized.py TestQuantizedConv.test_qconv python test/test_quantized.py TestQuantizedLinear.test_qlinear Differential Revision: D17540567 fbshipit-source-id: e9962bdf0c25fd3ac4bd0673eee1edd697924406
Summary: Pull Request resolved: #27194 Test Plan: Imported from OSS Differential Revision: D17704957 Pulled By: zafartahirov fbshipit-source-id: 46f02d129aa77c3047b2a6c606bfadd831a6b0fc
Summary: Pull Request resolved: #27181 Test Plan: Imported from OSS Differential Revision: D17717482 Pulled By: jamesr66a fbshipit-source-id: f3930fc87831cbdcf4390cd769c594bb13f5cd81
Summary: Pull Request resolved: #27184 Test Plan: Imported from OSS Differential Revision: D17717481 Pulled By: jamesr66a fbshipit-source-id: 4bd72bcd42191d9b21d03f5bb6698198dbffffda
Summary: Pull Request resolved: #27164 Test Plan: Imported from OSS Differential Revision: D17694475 Pulled By: zafartahirov fbshipit-source-id: df8df5f7d66062ed35da957064a31344e1d3c961
Summary: Pull Request resolved: #27298 PR #26908 toggles NonVariableTypeMode in ATen dispatcher, which is where USE_STATIC_DISPATCH takes place. This causes an issue with numel() as it gets called through the dispatch mode and probably not getting inlined. Also the thread local state is expensive to read/write so many times and this kills perf. PR #27274 is another approach to fix this and has more details. Test Plan: Quantized mobilenetV2 perf before this change Main run finished. Milliseconds per iter: 28.6782. Iters per second: 34.8696 Perf after this change Main run finished. Milliseconds per iter: 22.2585. Iters per second: 44.9267 Imported from OSS Differential Revision: D17742565 fbshipit-source-id: 43c6045cc001c46916ba339555c9d809a2537eff
Summary: Pull Request resolved: #27183 Test Plan: Imported from OSS Differential Revision: D17700548 Pulled By: zafartahirov fbshipit-source-id: 18e6ffbda496b14ac1da1783f928ad539cdb1d16
Summary: Pull Request resolved: #27396 Observer that estimates moving averages of min and max values per batch, more suited for quantization aware training instead of minmax observers that track extremal values across batches ghstack-source-id: 91369018 Test Plan: buck test caffe2/test:quantization -- 'test_per_tensor_observers \(test_quantization\.ObserverTest\)' --print-passing-details buck test caffe2/test:quantization -- 'test_per_channel_observers \(test_quantization\.ObserverTest\)' --print-passing-details Differential Revision: D17727213 fbshipit-source-id: 024a890bf3dd0bf269d8bfe61f19871d027326f0
82671eb to
328f499
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.