quant bench: update observer configs#42956
Conversation
Summary: In preparation for observer perf improvement, cleans up the micro benchmarks: * disable CUDA for histogram observers (it's too slow) * add larger shapes for better representation of real workloads Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qobserver_test ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]
💊 CI failures summary and remediationsAs of commit d96e549 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 3 times. |
Summary: In preparation for observer perf improvement, cleans up the micro benchmarks: * disable CUDA for histogram observers (it's too slow) * add larger shapes for better representation of real workloads Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qobserver_test ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D23093996](https://our.internmc.facebook.com/intern/diff/D23093996) [ghstack-poisoned]
Summary: In preparation for observer perf improvement, cleans up the micro benchmarks: * disable CUDA for histogram observers (it's too slow) * add larger shapes for better representation of real workloads Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qobserver_test ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: 6047570 Pull Request resolved: #42956
|
|
||
| def forward(self): | ||
| return self.op_func(self.f_input) | ||
| self.op_func(self.f_input) |
There was a problem hiding this comment.
Previously we had a forward and qparam benchmark separately which might be more useful in practice. We call forward for multiple iterations and calcqparams once at convert. With the separate ones, we can also synthesize the time taken for the combined forward+calcqparam call. Is there a reason to prefer this way of doing profiling?
There was a problem hiding this comment.
this is making the benchmark represent what happens inside the observer during QAT, not keeping the old code around because I'm not aware of a need for it in the near future. We have separate benchmarks for histogram observers, and I'm not aware of any requests to optimize observers outside of QAT + histogram observers.
There was a problem hiding this comment.
calculate_qparams is called at every pass through the observer during QAT, when observers are enabled
|
This pull request has been merged in 5aa61af. |
Summary: Pull Request resolved: pytorch#42956 In preparation for observer perf improvement, cleans up the micro benchmarks: * disable CUDA for histogram observers (it's too slow) * add larger shapes for better representation of real workloads Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qobserver_test ``` Imported from OSS Reviewed By: supriyar Differential Revision: D23093996 fbshipit-source-id: 5dc477c9bd5490d79d85ff8537270cd25aca221a
Stack from ghstack:
Summary:
In preparation for observer perf improvement, cleans up the
micro benchmarks:
Test Plan:
Reviewers:
Subscribers:
Tasks:
Tags:
Differential Revision: D23093996