[pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites by ljk53 · Pull Request #34958 · pytorch/pytorch

ljk53 · 2020-03-18T16:25:28Z

Stack from ghstack:

[pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites #34958 [pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites

There are three guards related to mobile build:

AutoGradMode
AutoNonVariableTypeMode
GraphOptimizerEnabledGuard

Today we need set some of these guards before calling libtorch APIs because we customized mobile build to only support inference (for both OSS and most FB use cases) to optimize binary size.

Several changes were made since 1.3 release so there are already inconsistent uses of these guards in the codebase. I did a sweep of all mobile related model loading & forward() call sites, trying to unify the use of these guards:

Full JIT: still set all three guards. More specifically:

OSS: Fixed a bug of not setting the guard at model load time correctly in Android JNI.
FB: Not covered by this diff (as we are using mobile interpreter for most internal builds).

Lite JIT (mobile interpreter): only needs AutoNonVariableTypeMode guard. AutoGradMode doesn't seem to be relevant (so removed from a few places) and GraphOptimizerEnabledGuard definitely not relevant (only full JIT has graph optimizer). More specifically:

OSS: At this point we are not committed to support Lite-JIT. For Android it shares the same code with FB JNI callsites.
FB:
** JNI callsites: Use the unified LiteJITCallGuard.
** For iOS/C++: manually set AutoNonVariableTypeMode for _load_for_mobile() & forward() callsites.

Ideally we should avoid having to set AutoNonVariableTypeMode for mobile interpreter. It's currently needed for dynamic dispatch + inference-only mobile build (where variable kernels are not registered) - without the guard it will try to run variable_fallback_kernel and crash (PR #34038). The proper fix will take some time so using this workaround to unblock selective BUCK build which depends on dynamic dispatch.

PS. The current status (of having to set AutoNonVariableTypeMode) should not block running FL model + mobile interpreter - if all necessary variable kernels are registered then it can call _load_for_mobile()/forward() against the FL model without setting the AutoNonVariableTypeMode guard. It's still inconvenient for JAVA callsites as it's set unconditionally inside JNI methods.

Differential Revision: D20498017

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

… mobile callsites There are three guards related to mobile build: * AutoGradMode * AutoNonVariableTypeMode * GraphOptimizerEnabledGuard Today we need set some of these guards before calling libtorch APIs because we customized mobile build to only support inference (for both OSS and most FB use cases) to optimize binary size. Several changes were made since 1.3 release so there are already inconsistent uses of these guards in the codebase. I did a sweep of all mobile related model loading & forward() call sites, trying to unify the use of these guards: Full JIT: still set all three guards. More specifically: * OSS: Fixed a bug of not setting the guard at model load time correctly in Android JNI. * FB: Not covered by this diff (as we are using mobile interpreter for most internal builds). Lite JIT (mobile interpreter): only needs AutoNonVariableTypeMode guard. AutoGradMode doesn't seem to be relevant (so removed from a few places) and GraphOptimizerEnabledGuard definitely not relevant (only full JIT has graph optimizer). More specifically: * OSS: At this point we are not committed to support Lite-JIT. For Android it shares the same code with FB JNI callsites. * FB: ** JNI callsites: Use the unified LiteJITCallGuard. ** For iOS/C++: manually set AutoNonVariableTypeMode for _load_for_mobile() & forward() callsites. Ideally we should avoid having to set AutoNonVariableTypeMode for mobile interpreter. It's currently needed for dynamic dispatch + inference-only mobile build (where variable kernels are not registered) - without the guard it will try to run `variable_fallback_kernel` and crash (PR #34038). The proper fix will take some time so using this workaround to unblock selective BUCK build which depends on dynamic dispatch. PS. The current status (of having to set AutoNonVariableTypeMode) should not block running FL model + mobile interpreter - if all necessary variable kernels are registered then it can call _load_for_mobile()/forward() against the FL model without setting the AutoNonVariableTypeMode guard. It's still inconvenient for JAVA callsites as it's set unconditionally inside JNI methods. Differential Revision: [D20498017](https://our.internmc.facebook.com/intern/diff/D20498017/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D20498017/)! [ghstack-poisoned]

… mobile callsites There are three guards related to mobile build: * AutoGradMode * AutoNonVariableTypeMode * GraphOptimizerEnabledGuard Today we need set some of these guards before calling libtorch APIs because we customized mobile build to only support inference (for both OSS and most FB use cases) to optimize binary size. Several changes were made since 1.3 release so there are already inconsistent uses of these guards in the codebase. I did a sweep of all mobile related model loading & forward() call sites, trying to unify the use of these guards: Full JIT: still set all three guards. More specifically: * OSS: Fixed a bug of not setting the guard at model load time correctly in Android JNI. * FB: Not covered by this diff (as we are using mobile interpreter for most internal builds). Lite JIT (mobile interpreter): only needs AutoNonVariableTypeMode guard. AutoGradMode doesn't seem to be relevant (so removed from a few places) and GraphOptimizerEnabledGuard definitely not relevant (only full JIT has graph optimizer). More specifically: * OSS: At this point we are not committed to support Lite-JIT. For Android it shares the same code with FB JNI callsites. * FB: ** JNI callsites: Use the unified LiteJITCallGuard. ** For iOS/C++: manually set AutoNonVariableTypeMode for _load_for_mobile() & forward() callsites. Ideally we should avoid having to set AutoNonVariableTypeMode for mobile interpreter. It's currently needed for dynamic dispatch + inference-only mobile build (where variable kernels are not registered) - without the guard it will try to run `variable_fallback_kernel` and crash (PR #34038). The proper fix will take some time so using this workaround to unblock selective BUCK build which depends on dynamic dispatch. PS. The current status (of having to set AutoNonVariableTypeMode) should not block running FL model + mobile interpreter - if all necessary variable kernels are registered then it can call _load_for_mobile()/forward() against the FL model without setting the AutoNonVariableTypeMode guard. It's still inconvenient for JAVA callsites as it's set unconditionally inside JNI methods. Differential Revision: [D20498017](https://our.internmc.facebook.com/intern/diff/D20498017/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D20498017/)! ghstack-source-id: 100385440 Pull Request resolved: #34958

dr-ci · 2020-03-18T17:50:29Z

💊 CircleCI build failures summary and remediations

As of commit 2a52b62 (more details on the Dr. CI page):

✅ None of the build failures appear to be your fault 💚

1/1 broken upstream at merge base d927d58 on Mar 18 from 2:35pm to 7:57pm (10 commits; d927d58 - c747f09)
Please rebase on the viable/strict branch (expand for instructions)

If your commit is newer than viable/strict, you can try basing on an older, stable commit:
```
git fetch https://github.com/pytorch/pytorch viable/strict
git rebase --onto FETCH_HEAD $(git merge-base origin/master HEAD)
```
If your commit is older than viable/strict:
```
git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD
```
Check out the recency history of this "viable master" tracking branch.

🚧 1 upstream failure:

These were probably caused by upstream breakages:

pytorch_windows_vs2019_py36_cuda10.1_build on Mar 18 from 2:35pm to 7:57pm (10 commits; d927d58 - c747f09)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 2 times.

ljk53 · 2020-03-19T05:36:17Z

This has been merged in 6e47e7b

ljk53 requested review from dreiss, ezyang and xcheng16 and removed request for dreiss March 18, 2020 16:26

ljk53 closed this Mar 19, 2020

facebook-github-bot deleted the gh/ljk53/125/head branch April 18, 2020 14:17

johanlantz mentioned this pull request May 12, 2020

Status of support for training on mobile #38312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites#34958

[pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites#34958
ljk53 wants to merge 1 commit intogh/ljk53/125/basefrom
gh/ljk53/125/head

ljk53 commented Mar 18, 2020 •

edited

Loading

Uh oh!

dr-ci Bot commented Mar 18, 2020 •

edited

Loading

Uh oh!

ljk53 commented Mar 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ljk53 commented Mar 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci Bot commented Mar 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

🚧 1 upstream failure:

Uh oh!

ljk53 commented Mar 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ljk53 commented Mar 18, 2020 •

edited

Loading

dr-ci Bot commented Mar 18, 2020 •

edited

Loading