[v1.3.0] add AutoNonVariableTypeMode for USE_STATIC_DISPATCH on JIT->ATen path (#27274) by ljk53 · Pull Request #27321 · pytorch/pytorch

ljk53 · 2019-10-03T21:00:40Z

Summary:
Pull Request resolved: #27274

This is yet another fix to address #26764.

PR #26908 toggles NonVariableTypeMode in ATen dispatcher, which is where
USE_STATIC_DISPATCH takes place thus it's most logically sound place to do
such tweaks.

However, we observed nontrivial perf regression due to this fix. Turns out
the numel() tensor method gets called in several for-loops thus incurs ~7M
thread_local updates in a single forward call:

7173330 numel
    558 size
    416 q_scale
    302 _empty_affine_quantized
    288 contiguous
    257 q_zero_point
    216 qscheme
    173 empty
    110 set_
    105 as_strided
    104 permute
...

As numel() is not called from a single place so a natural workaround is to
update function_wrapper.py so that it only adds the guard on gen_namespace_function()
case and ignore the gen_tensor_method() case. But some tensor methods are actually
being called from JIT side directly (e.g. "aten::eq_" -> "(self).eq_") so the
only "band aid" left on the table is to insert guard on JIT->aten path as originally
did on #26868 - this is a simplified version of it as it doesn't hurt to extend the
NonVariableMode scope a little bit to also cover stack drop/pack calls.

On Android we only expose JIT API so we don't need worry about TensorMethods being
called directly. On iOS we don't provide a wrapper yet but we can mention this caveat
in the doc. Hopefully by the time it's widely used we can finish Variable/Tensor
unification and remove all these hacks.

Test Plan:

Verified it runs quantized/fp32 MobileNetV2 models;
Verified it fixes the perf regression (revert Make static dispatch turn off variable before entering the kernel. #26908 separately);

Differential Revision: D17732489

Pulled By: ljk53

fbshipit-source-id: c14ca66aebc6b6f17ad6efac7ca47f9487c98de5

…pytorch#27274) Summary: Pull Request resolved: pytorch#27274 This is yet another fix to address pytorch#26764. PR pytorch#26908 toggles NonVariableTypeMode in ATen dispatcher, which is where USE_STATIC_DISPATCH takes place thus it's most logically sound place to do such tweaks. However, we observed nontrivial perf regression due to this fix. Turns out the numel() tensor method gets called in several for-loops thus incurs ~7M thread_local updates in a single forward call: ``` 7173330 numel 558 size 416 q_scale 302 _empty_affine_quantized 288 contiguous 257 q_zero_point 216 qscheme 173 empty 110 set_ 105 as_strided 104 permute ... ``` As numel() is not called from a single place so a natural workaround is to update function_wrapper.py so that it only adds the guard on gen_namespace_function() case and ignore the gen_tensor_method() case. But some tensor methods are actually being called from JIT side directly (e.g. "aten::eq_" -> "(self).eq_") so the only "band aid" left on the table is to insert guard on JIT->aten path as originally did on pytorch#26868 - this is a simplified version of it as it doesn't hurt to extend the NonVariableMode scope a little bit to also cover stack drop/pack calls. On Android we only expose JIT API so we don't need worry about TensorMethods being called directly. On iOS we don't provide a wrapper yet but we can mention this caveat in the doc. Hopefully by the time it's widely used we can finish Variable/Tensor unification and remove all these hacks. Test Plan: - Verified it runs quantized/fp32 MobileNetV2 models; - Verified it fixes the perf regression (revert pytorch#26908 separately); Differential Revision: D17732489 Pulled By: ljk53 fbshipit-source-id: c14ca66aebc6b6f17ad6efac7ca47f9487c98de5

pytorchbot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Oct 3, 2019

ljk53 requested a review from soumith October 3, 2019 21:00

ljk53 mentioned this pull request Oct 3, 2019

[v1.3.0] Release Tracker #27011

Closed

soumith force-pushed the v13-p2 branch from 228375b to beb9dce Compare October 4, 2019 05:17

soumith merged commit ccf3a6d into pytorch:v1.3.0 Oct 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v1.3.0] add AutoNonVariableTypeMode for USE_STATIC_DISPATCH on JIT->ATen path (#27274)#27321

[v1.3.0] add AutoNonVariableTypeMode for USE_STATIC_DISPATCH on JIT->ATen path (#27274)#27321
soumith merged 1 commit intopytorch:v1.3.0from
ljk53:v13-p2

ljk53 commented Oct 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ljk53 commented Oct 3, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants