Skip to content

Replace empty_affine_quantizer with direct dispatch to at::native::empty_affine.. .#36814

Closed
kimishpatel wants to merge 9 commits intogh/kimishpatel/3/basefrom
gh/kimishpatel/3/head
Closed

Replace empty_affine_quantizer with direct dispatch to at::native::empty_affine.. .#36814
kimishpatel wants to merge 9 commits intogh/kimishpatel/3/basefrom
gh/kimishpatel/3/head

Conversation

@kimishpatel
Copy link
Copy Markdown
Contributor

@kimishpatel kimishpatel commented Apr 17, 2020

Stack from ghstack:

From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable

Differential Revision: D21093840

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Apr 17, 2020

💊 Build failures summary and remediations

As of commit 729b708 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 35 times.

@dreiss
Copy link
Copy Markdown
Contributor

dreiss commented Apr 17, 2020

Can you update the commit message to explain the motivation for this change?

@kimishpatel
Copy link
Copy Markdown
Contributor Author

Can you update the commit message to explain the motivation for this change?

Updated.

@kimishpatel kimishpatel requested review from jerryzh168 and supriyar and removed request for supriyar April 23, 2020 16:48
const auto b_scale = qb_contig.q_scale();

Tensor qy = at::_empty_affine_quantized(
Tensor qy = at::new_qtensor_cpu(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you use at::native:: empty_affine_quantized?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whey do you think that would be better? Does it get around the dispatch overhead?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better because it has the same API as at::_empty_affine_quantized.
yes, it will get around the dispatch overhead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let me try.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still slower, but not a whole lot. So I think it can make for a decent compromise.

From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable

Differential Revision: [D21093840](https://our.internmc.facebook.com/intern/diff/D21093840/)

[ghstack-poisoned]
From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable

Differential Revision: [D21093840](https://our.internmc.facebook.com/intern/diff/D21093840/)

[ghstack-poisoned]
From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable

Differential Revision: [D21093840](https://our.internmc.facebook.com/intern/diff/D21093840/)

[ghstack-poisoned]
#include <ATen/native/quantized/cpu/quantized_ops.h>
#include <ATen/native/quantized/cpu/init_qnnpack.h>
#include <ATen/native/quantized/cpu/qnnpack_utils.h>
#include <c10/core/TensorOptions.h>
Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 Apr 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these includes be removed?

Comment on lines +10 to +13
#include <ATen/quantized/Quantizer.h>
#include <ATen/native/quantized/cpu/fbgemm_utils.h>
#include <ATen/native/quantized/cpu/qnnpack_utils.h>
#include <c10/core/TensorOptions.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these includes be removed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean the added ones

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was gonna do that.

From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable

Differential Revision: [D21093840](https://our.internmc.facebook.com/intern/diff/D21093840/)

[ghstack-poisoned]
From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable

Differential Revision: [D21093840](https://our.internmc.facebook.com/intern/diff/D21093840/)

[ghstack-poisoned]
Copy link
Copy Markdown
Contributor

@jerryzh168 jerryzh168 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@kimishpatel kimishpatel changed the title Replace empty_affine_quantizer with new_qtensor_cpu. Replace empty_affine_quantizer with direct dispatch to at::native::empty_affine.. . Apr 28, 2020
…:native::empty_affine.. ."


From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable

Differential Revision: [D21093840](https://our.internmc.facebook.com/intern/diff/D21093840/)

[ghstack-poisoned]
…:native::empty_affine.. ."


From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable

Differential Revision: [D21093840](https://our.internmc.facebook.com/intern/diff/D21093840/)

[ghstack-poisoned]
…:native::empty_affine.. ."


From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable

Differential Revision: [D21093840](https://our.internmc.facebook.com/intern/diff/D21093840/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in 1510bdd.

@facebook-github-bot facebook-github-bot deleted the gh/kimishpatel/3/head branch May 5, 2020 14:17
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#36814

ghstack-source-id: 103218412

From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable

{F234432545}

Test Plan: Quantized op tests.

Reviewed By: jerryzh168

Differential Revision: D21093840

fbshipit-source-id: 1b98b57eae403353596fc31171069d2f43b13385
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants