[quant] Implement PTQ for APoT FakeQuant by asl3 · Pull Request #81040 · pytorch/pytorch

asl3 · 2022-07-07T13:03:00Z

Stack from ghstack (oldest at bottom):

-> [quant] Implement PTQ for APoT FakeQuant #81040

Summary:

This PR implements PTQ for APoT FakeQuant. It runs models (Resnet-18 pre-trained model, ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight.

According to the collected accuracy stats, model #2 (uniform activation and APoT weight) appears to have a slight improvement in accuracy compared to model #1 (uniform activation and uniform weight) for 8-bit and significant improvement for 4-bit (see "Accuracy Stats" section below).

Test Plan:

Run models with: python test/quantization/core/experimental/fx_graph_mode_apot.py

Accuracy Stats:

8-bit (Uniform int8, APoT b = 8 k = 2)

Model #1: Uniform activation, uniform weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 64.43% (Top-1), 85.62% (Top-5)

Model #2: Uniform activation, APoT weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 64.51% (Top-1), 85.78% (Top-5)

Model #3: APoT activation, APoT weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 64.32% (Top-1), 85.78% (Top-5)

4-bit (Uniform int4, APoT b = 4 k = 2)

Model #1: Uniform activation, uniform weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 45.63% (Top-1), 71.96% (Top-5)

Model #2: Uniform activation, APoT weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 64.24% (Top-1), 85.56% (Top-5)

Model #3: APoT activation, APoT weight (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 45.40% (Top-1), 76.21% (Top-5)

Full Precision model (FX Graph Mode quantized)
Evaluation accuracy on test dataset: 69.76% (Top-1), 89.08% (Top-5)

Eager mode quantized model
Evaluation accuracy on test dataset: 69.49% (Top-1), 88.90% (Top-5)

[ghstack-poisoned]

facebook-github-bot · 2022-07-07T13:03:09Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/81040
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit 0530d26 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

[ghstack-poisoned]

ghstack-source-id: 6be1565 Pull Request resolved: #81040

### Summary: This PR implements FX Graph Mode QAT for APoT FakeQuant. ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` Accuracy Stats: Model #1: uniform activation, uniform weight (FX Graph Mode quantized) Size of model (MB): 46.801265 Evaluation accuracy on test dataset: 69.76%, 89.08% Model #2: uniform activation, APoT weight (FX Graph Mode quantized) Size of model (MB): 46.820369 Evaluation accuracy on test dataset: 69.00%, 88.66% Model #3: APoT activation and weight (FX Graph Mode quantized) Size of model (MB): 46.801431 Evaluation accuracy on test dataset: 69.76%, 89.08% Eager mode quantized model Resnet18 Size of model (MB): 11.839989 Evaluation accuracy on test dataset: 69.49%, 88.90% [ghstack-poisoned]

ghstack-source-id: 6c5148e Pull Request resolved: #81040

### Summary: This PR implements FX Graph Mode QAT for APoT FakeQuant. ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: Uniform: int8 APoT: 8-bit (b = 8, k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.54%, 88.99% **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.76%, 89.08% **Model #3:** APoT activation and weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.54%, 89.04% **Model #4:** Eager mode quantized model Resnet18 Evaluation accuracy on test dataset: 69.49%, 88.90% [ghstack-poisoned]

ghstack-source-id: 41e0c9f Pull Request resolved: #81040

### Summary: This PR implements FX Graph Mode QAT for APoT FakeQuant. ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: Uniform: int8 APoT: 8-bit (b = 8, k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.54%, 88.99% **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.76%, 89.08% **Model #3:** APoT activation and weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.54%, 89.04% **Model #4:** Eager mode quantized model Resnet18 Evaluation accuracy on test dataset: 69.49%, 88.90% [ghstack-poisoned]

ghstack-source-id: 33e6a19 Pull Request resolved: #81040

### Summary: This PR implements FX Graph Mode QAT for APoT FakeQuant. ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: Uniform: int8 APoT: 8-bit (b = 8, k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.54%, 88.99% **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.76%, 89.08% **Model #3:** APoT activation and weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.54%, 89.04% **Model #4:** Eager mode quantized model Resnet18 Evaluation accuracy on test dataset: 69.49%, 88.90% [ghstack-poisoned]

ghstack-source-id: 5ef9162 Pull Request resolved: #81040

dzdang · 2022-07-14T14:23:37Z

this PR is for PTQ and not QAT?

### Summary: This PR implements FX Graph Mode QAT for APoT FakeQuant. ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: Uniform: int8 APoT: 8-bit (b = 8, k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.54%, 88.99% **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.76%, 89.08% **Model #3:** APoT activation and weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 69.54%, 89.04% **Model #4:** Eager mode quantized model Resnet18 Evaluation accuracy on test dataset: 69.49%, 88.90% [ghstack-poisoned]

ghstack-source-id: 25a8def Pull Request resolved: #81040

test/quantization/core/experimental/fx_graph_mode_apot.py

torch/ao/quantization/experimental/apot_utils.py

torch/ao/quantization/experimental/qconfig.py

torch/ao/quantization/experimental/apot_utils.py

torch/ao/quantization/experimental/fake_quantize.py

### Summary: This PR implements PTQ for APoT FakeQuant. ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: 8-bit (Uniform int8, APoT b = 8 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.43%, 85.62% **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.51%, 85.78% **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.32%, 85.78% 4-bit (Uniform int4, APoT b = 8 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.63%, 71.96% **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.24%, 85.56% **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.40%, 76.21% **Full Precision model (FX Graph Mode quantized)** Evaluation accuracy on test dataset: 69.76%, 89.08% **Eager mode quantized model** Evaluation accuracy on test dataset: 69.49%, 88.90% [ghstack-poisoned]

asl3 · 2022-07-27T21:55:40Z

@pytorchbot merge -g

pytorchmergebot · 2022-07-27T21:56:55Z

@pytorchbot successfully started a merge job. Check the current status here

pytorchmergebot · 2022-07-27T23:12:13Z

Merge failed due to Refusing to merge as mandatory check(s) pull failed for rule superuser
Raised by https://github.com/pytorch/pytorch/actions/runs/2749891464

### Summary: This PR implements PTQ for APoT FakeQuant. It runs models (Resnet-18 pre-trained model, ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight. According to the collected accuracy stats, model #2 (uniform activation and APoT weight) appears to have a slight improvement in accuracy compared to model #1 (uniform activation and uniform weight) for 8-bit and significant improvement for 4-bit (see "Accuracy Stats" section below). ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: 8-bit (Uniform int8, APoT b = 8 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.43% (Top-1), 85.62% (Top-5) **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.51% (Top-1), 85.78% (Top-5) **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.32% (Top-1), 85.78% (Top-5) 4-bit (Uniform int4, APoT b = 4 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.63% (Top-1), 71.96% (Top-5) **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.24% (Top-1), 85.56% (Top-5) **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.40% (Top-1), 76.21% (Top-5) **Full Precision model (FX Graph Mode quantized)** Evaluation accuracy on test dataset: 69.76% (Top-1), 89.08% (Top-5) **Eager mode quantized model** Evaluation accuracy on test dataset: 69.49% (Top-1), 88.90% (Top-5) [ghstack-poisoned]

asl3 · 2022-07-27T23:49:42Z

@pytorchbot merge -g

pytorchmergebot · 2022-07-27T23:50:54Z

@pytorchbot successfully started a merge job. Check the current status here

pytorchmergebot · 2022-07-28T03:01:36Z

Merge failed due to Command git -C /home/runner/actions-runner/_work/pytorch/pytorch cherry-pick -x 1dc7eed3f86d655b2f808b9d6a19f914bdb6c1b4 returned non-zero exit code 1

Auto-merging mypy.ini
CONFLICT (content): Merge conflict in mypy.ini
Auto-merging test/quantization/core/experimental/test_fake_quantize.py
Auto-merging torch/ao/quantization/experimental/fake_quantize.py
CONFLICT (content): Merge conflict in torch/ao/quantization/experimental/fake_quantize.py
Auto-merging torch/ao/quantization/experimental/quantizer.py
CONFLICT (content): Merge conflict in torch/ao/quantization/experimental/quantizer.py
error: could not apply 1dc7eed3f8... [quant] Implement PTQ for APoT FakeQuant
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".

Raised by https://github.com/pytorch/pytorch/actions/runs/2750364309

### Summary: This PR implements PTQ for APoT FakeQuant. It runs models (Resnet-18 pre-trained model, ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight. According to the collected accuracy stats, model #2 (uniform activation and APoT weight) appears to have a slight improvement in accuracy compared to model #1 (uniform activation and uniform weight) for 8-bit and significant improvement for 4-bit (see "Accuracy Stats" section below). ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: 8-bit (Uniform int8, APoT b = 8 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.43% (Top-1), 85.62% (Top-5) **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.51% (Top-1), 85.78% (Top-5) **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.32% (Top-1), 85.78% (Top-5) 4-bit (Uniform int4, APoT b = 4 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.63% (Top-1), 71.96% (Top-5) **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.24% (Top-1), 85.56% (Top-5) **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.40% (Top-1), 76.21% (Top-5) **Full Precision model (FX Graph Mode quantized)** Evaluation accuracy on test dataset: 69.76% (Top-1), 89.08% (Top-5) **Eager mode quantized model** Evaluation accuracy on test dataset: 69.49% (Top-1), 88.90% (Top-5) [ghstack-poisoned]

asl3 · 2022-07-28T03:12:32Z

@pytorchbot merge -g

pytorchmergebot · 2022-07-28T03:13:51Z

@pytorchbot successfully started a merge job. Check the current status here

### Summary: This PR implements PTQ for APoT FakeQuant. It runs models (Resnet-18 pre-trained model, ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight. According to the collected accuracy stats, model #2 (uniform activation and APoT weight) appears to have a slight improvement in accuracy compared to model #1 (uniform activation and uniform weight) for 8-bit and significant improvement for 4-bit (see "Accuracy Stats" section below). ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: 8-bit (Uniform int8, APoT b = 8 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.43% (Top-1), 85.62% (Top-5) **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.51% (Top-1), 85.78% (Top-5) **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.32% (Top-1), 85.78% (Top-5) 4-bit (Uniform int4, APoT b = 4 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.63% (Top-1), 71.96% (Top-5) **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.24% (Top-1), 85.56% (Top-5) **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.40% (Top-1), 76.21% (Top-5) **Full Precision model (FX Graph Mode quantized)** Evaluation accuracy on test dataset: 69.76% (Top-1), 89.08% (Top-5) **Eager mode quantized model** Evaluation accuracy on test dataset: 69.49% (Top-1), 88.90% (Top-5) [ghstack-poisoned]

ghstack-source-id: 81c9fd9 Pull Request resolved: #81040

asl3 · 2022-07-28T03:18:23Z

@pytorchbot merge -g

pytorchmergebot · 2022-07-28T03:20:34Z

@pytorchbot successfully started a merge job. Check the current status here

Summary: ### Summary: This PR implements PTQ for APoT FakeQuant. It runs models (Resnet-18 pre-trained model, ImageNet dataset) to compare accuracy metrics for different qconfig settings of uniform vs. APoT quantized activation and weight. According to the collected accuracy stats, model #2 (uniform activation and APoT weight) appears to have a slight improvement in accuracy compared to model #1 (uniform activation and uniform weight) for 8-bit and significant improvement for 4-bit (see "Accuracy Stats" section below). ### Test Plan: Run models with: `python test/quantization/core/experimental/fx_graph_mode_apot.py` ### Accuracy Stats: 8-bit (Uniform int8, APoT b = 8 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.43% (Top-1), 85.62% (Top-5) **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.51% (Top-1), 85.78% (Top-5) **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.32% (Top-1), 85.78% (Top-5) 4-bit (Uniform int4, APoT b = 4 k = 2) **Model #1:** Uniform activation, uniform weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.63% (Top-1), 71.96% (Top-5) **Model #2:** Uniform activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 64.24% (Top-1), 85.56% (Top-5) **Model #3:** APoT activation, APoT weight (FX Graph Mode quantized) Evaluation accuracy on test dataset: 45.40% (Top-1), 76.21% (Top-5) **Full Precision model (FX Graph Mode quantized)** Evaluation accuracy on test dataset: 69.76% (Top-1), 89.08% (Top-5) **Eager mode quantized model** Evaluation accuracy on test dataset: 69.49% (Top-1), 88.90% (Top-5) Pull Request resolved: #81040 Approved by: https://github.com/jerryzh168 Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/13ad4739a6e9402e2039a1ce521b9aed595760b3 Reviewed By: osalpekar Differential Revision: D38252390 Pulled By: asl3 fbshipit-source-id: 86ff2f3928fb1fc2b57867d6abcac998d17306e4

[quant] Implement QAT for APoT FakeQuant

f361ef4

[ghstack-poisoned]

facebook-github-bot added the cla signed label Jul 7, 2022

Update on "[quant] Implement QAT for APoT FakeQuant"

b45a076

[ghstack-poisoned]

asl3 added a commit that referenced this pull request Jul 7, 2022

[quant] Implement QAT for APoT FakeQuant

0dea6c5

ghstack-source-id: 6be1565 Pull Request resolved: #81040

asl3 added a commit that referenced this pull request Jul 12, 2022

[quant] Implement QAT for APoT FakeQuant

a311a15

ghstack-source-id: 6c5148e Pull Request resolved: #81040

asl3 requested review from HDCharles, dzdang and jerryzh168 July 12, 2022 13:59

asl3 added a commit that referenced this pull request Jul 12, 2022

[quant] Implement QAT for APoT FakeQuant

c3885be

ghstack-source-id: 41e0c9f Pull Request resolved: #81040

asl3 added a commit that referenced this pull request Jul 13, 2022

[quant] Implement QAT for APoT FakeQuant

a7bd8fa

ghstack-source-id: 33e6a19 Pull Request resolved: #81040

asl3 added a commit that referenced this pull request Jul 13, 2022

[quant] Implement QAT for APoT FakeQuant

8d327d7

ghstack-source-id: 5ef9162 Pull Request resolved: #81040

asl3 changed the title ~~[quant] Implement QAT for APoT FakeQuant~~ [quant] Implement PTQ for APoT FakeQuant Jul 26, 2022

asl3 added a commit that referenced this pull request Jul 26, 2022

[quant] Implement QAT for APoT FakeQuant

4783691

ghstack-source-id: 25a8def Pull Request resolved: #81040