Skip to content

Add benchmark for calculate_qparams#42138

Closed
durumu wants to merge 1 commit intogh/durumu/12/basefrom
gh/durumu/12/head
Closed

Add benchmark for calculate_qparams#42138
durumu wants to merge 1 commit intogh/durumu/12/basefrom
gh/durumu/12/head

Conversation

@durumu
Copy link
Copy Markdown
Contributor

@durumu durumu commented Jul 27, 2020

Adds a benchmark for HistogramObserver.calculate_qparams to the quantized op benchmarks. The next PR in this stack (#41041) adds a ~15x speedup for this benchmark.

While in the folder benchmarks/operator_benchmark, the benchmark can be run using python -m benchmark_all_quantized_test --operators HistogramObserverCalculateQparams.

Stack from ghstack:

Differential Revision: D22779291

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@durumu merged this pull request in 5ca08b8.

@facebook-github-bot facebook-github-bot deleted the gh/durumu/12/head branch August 10, 2020 14:15
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Adds a benchmark for `HistogramObserver.calculate_qparams` to the quantized op benchmarks. The next diff in this stack adds a ~15x speedup for this benchmark.

Pull Request resolved: pytorch#42138

Test Plan:
While in the folder `benchmarks/operator_benchmark`, the benchmark can be run using `python -m benchmark_all_quantized_test --operators HistogramObserverCalculateQparams`.

Benchmark results before speedup:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine
Forward Execution Time (us) : 185818.566

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric
Forward Execution Time (us) : 165325.916
```

Benchmark results after speedup:
```
# ----------------------------------------
# PyTorch/Caffe2 Operator Micro-benchmarks
# ----------------------------------------
# Tag : short

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_affine
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_affine
Forward Execution Time (us) : 12242.241

# Benchmarking PyTorch: HistogramObserverCalculateQparams
# Mode: Eager
# Name: HistogramObserverCalculateQparams_C3_M512_N512_dtypetorch.quint8_cpu_qschemetorch.per_tensor_symmetric
# Input: C: 3, M: 512, N: 512, dtype: torch.quint8, device: cpu, qscheme: torch.per_tensor_symmetric
Forward Execution Time (us) : 12655.354
```

Reviewed By: supriyar

Differential Revision: D22779291

Pulled By: durumu

fbshipit-source-id: 1fe17d20eda5dd99e0e2590480142034c3574d4e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants