Skip to content

[pytorch][mobile] support for custom mobile build with dynamic dispatch#34055

Closed
ljk53 wants to merge 6 commits intogh/ljk53/110/basefrom
gh/ljk53/110/head
Closed

[pytorch][mobile] support for custom mobile build with dynamic dispatch#34055
ljk53 wants to merge 6 commits intogh/ljk53/110/basefrom
gh/ljk53/110/head

Conversation

@ljk53
Copy link
Copy Markdown
Contributor

@ljk53 ljk53 commented Mar 2, 2020

Stack from ghstack:

Summary:
Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g. tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:

TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh

Differential Revision: D20193327

Summary:
Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

[ghstack-poisoned]
…amic dispatch"

Summary:
Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Mar 2, 2020
Summary:
Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

ghstack-source-id: d571825
Pull Request resolved: #34055
@ljk53 ljk53 requested review from bhosmer, dreiss, ezyang and iseeyuan March 2, 2020 09:26
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Mar 2, 2020

💊 CircleCI build failures summary and remediations

As of commit 6cf4d94 (more details on the Dr. CI page):


  • 1/6 failures introduced in this PR

  • 3/6 tentatively recognized as flaky ❄️

  • 2/6 broken upstream at merge base 99e211e since Mar 04

    Please rebase on the viable/strict branch (expand for instructions)

    If your commit is newer than viable/strict, you can try basing on an older, stable commit:

    git fetch origin viable/strict
    git rebase --onto viable/strict $(git merge-base origin/master HEAD)
    

    If your commit is older than viable/strict:

    git fetch origin viable/strict
    git rebase viable/strict
    

    Check out the recency history of this "viable master" tracking branch.


🕵️ 1 new failure recognized by patterns

The following build failures do not appear to be due to upstream breakages:

See CircleCI build pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32 (1/1)

Step: "pytorch android gradle build only x86_32 (for PR)" (full log | pattern match details)

Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] FAILED: ../../../../build/intermediates/cmake/release/obj/x86/libpytorch_jni.so
Mar 04 01:51:16 01:51:16.482 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] 	at org.gradle.internal.operations.DefaultBuildOperationExecutor.run(DefaultBuildOperationExecutor.java:91) 
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] 	at org.gradle.internal.operations.DelegatingBuildOperationExecutor.run(DelegatingBuildOperationExecutor.java:31) 
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] 	at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeAction(ExecuteActionsTaskExecuter.java:120) 
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] 	at org.gradle.api.internal.tasks.execution.ExecuteActionsTaskExecuter.executeActions(ExecuteActionsTaskExecuter.java:99) 
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] 	... 31 more 
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] Caused by: Build command failed. 
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] Error while executing process /usr/local/bin/cmake with arguments {--build /var/lib/jenkins/workspace/android/pytorch_android/.externalNativeBuild/cmake/release/x86 --target pytorch_jni} 
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] [1/3] Building CXX object CMakeFiles/pytorch_jni.dir/src/main/cpp/pytorch_jni_jit.cpp.o 
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] [2/3] Building CXX object CMakeFiles/pytorch_jni.dir/src/main/cpp/pytorch_jni_common.cpp.o 
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] [3/3] Linking CXX shared library ../../../../build/intermediates/cmake/release/obj/x86/libpytorch_jni.so 
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] FAILED: ../../../../build/intermediates/cmake/release/obj/x86/libpytorch_jni.so  
Mar 04 01:51:16 01:51:16.483 [ERROR] [org.gradle.internal.buildevents.BuildExceptionReporter] 01:51:16.484 [LIFECYCLE] [org.gradle.internal.buildevents.BuildResultLogger] 45 actionable tasks: 45 executed 
Mar 04 01:51:16 01:51:16.499 [INFO] [org.gradle.api.Task] AAPT2 aapt2-3.3.2-5309881-linux Daemon #0: shutdown 
Mar 04 01:51:16 01:51:16.504 [DEBUG] [org.gradle.internal.work.DefaultWorkerLeaseService] Worker lease root.1 completed (0 worker(s) in use) 
Mar 04 01:51:16 01:51:16.504 [DEBUG] [org.gradle.internal.resources.AbstractTrackedResourceLock] Daemon worker: released lock on root.1 
Mar 04 01:51:16 01:51:16.504 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationExecutor] Completing Build operation 'Run build' 
Mar 04 01:51:16 01:51:16.505 [DEBUG] [org.gradle.internal.operations.DefaultBuildOperationExecutor] Build operation 'Run build' completed 
Mar 04 01:51:16 01:51:16.523 [DEBUG] [org.gradle.cache.internal.LockOnDemandCrossProcessCacheAccess] Releasing file lock for file content cache (/var/lib/jenkins/workspace/android/.gradle/4.10.3/fileContent) 
Mar 04 01:51:16 01:51:16.523 [DEBUG] [org.gradle.cache.internal.DefaultFileLockManager] Releasing lock on file content cache (/var/lib/jenkins/workspace/android/.gradle/4.10.3/fileContent). 
Mar 04 01:51:16 01:51:16.526 [DEBUG] [org.gradle.cache.internal.LockOnDemandCrossProcessCacheAccess] Releasing file lock for task history cache (/var/lib/jenkins/workspace/android/.gradle/4.10.3/taskHistory) 
Mar 04 01:51:16 01:51:16.527 [DEBUG] [org.gradle.cache.internal.btree.BTreePersistentIndexedCache] Closing cache taskHistory.bin (/var/lib/jenkins/workspace/android/.gradle/4.10.3/taskHistory/taskHistory.bin) 

❄️ 3 tentatively flaky failures

3 failures tentatively classified as flaky but have not launched reruns to confirm:

See CircleCI build pytorch_linux_xenial_cuda10_1_cudnn7_py3_slow_test (1/3)

Step: "Test" (full log | pattern match details) ❄️

Mar 04 02:12:25 RuntimeError: Error downloading resource!
Mar 04 02:12:25  
Mar 04 02:12:25 During handling of the above exception, another exception occurred: 
Mar 04 02:12:25  
Mar 04 02:12:25 Traceback (most recent call last): 
Mar 04 02:12:25   File "tools/download_mnist.py", line 87, in <module> 
Mar 04 02:12:25     main() 
Mar 04 02:12:25   File "tools/download_mnist.py", line 80, in main 
Mar 04 02:12:25     download(path, url, options.quiet) 
Mar 04 02:12:25   File "tools/download_mnist.py", line 41, in download 
Mar 04 02:12:25     raise RuntimeError('Error downloading resource!') 
Mar 04 02:12:25 RuntimeError: Error downloading resource! 
Mar 04 02:12:25 + cleanup 
Mar 04 02:12:25 + retcode=1 
Mar 04 02:12:25 + set +x 
Mar 04 02:12:25 =================== sccache compilation log =================== 
Mar 04 02:12:26 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/tmp/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/tmp/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" } 
Mar 04 02:12:26  
Mar 04 02:12:26 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 04 02:12:26 Compile requests               161 
Mar 04 02:12:26 Compile requests executed       57 
Mar 04 02:12:26 Cache hits                      46 

See CircleCI build pytorch_linux_xenial_cuda10_1_cudnn7_py3_nogpu_test (2/3)

Step: "Test" (full log | pattern match details) ❄️

Mar 04 02:21:29 RuntimeError: Error downloading resource!
Mar 04 02:21:29  
Mar 04 02:21:29 During handling of the above exception, another exception occurred: 
Mar 04 02:21:29  
Mar 04 02:21:29 Traceback (most recent call last): 
Mar 04 02:21:29   File "tools/download_mnist.py", line 87, in <module> 
Mar 04 02:21:29     main() 
Mar 04 02:21:29   File "tools/download_mnist.py", line 80, in main 
Mar 04 02:21:29     download(path, url, options.quiet) 
Mar 04 02:21:29   File "tools/download_mnist.py", line 41, in download 
Mar 04 02:21:29     raise RuntimeError('Error downloading resource!') 
Mar 04 02:21:29 RuntimeError: Error downloading resource! 
Mar 04 02:21:29 + cleanup 
Mar 04 02:21:29 + retcode=1 
Mar 04 02:21:29 + set +x 
Mar 04 02:21:29 =================== sccache compilation log =================== 
Mar 04 02:21:29 ERROR:sccache::server: Compilation failed: Output { status: ExitStatus(ExitStatus(256)), stdout: "", stderr: "/tmp/torch_extensions/test_compilation_error_formatting/main.cpp: In function \'int main()\':\n/tmp/torch_extensions/test_compilation_error_formatting/main.cpp:2:23: error: expected \';\' before \'}\' token\n int main() { return 0 }\n                       ^\n" } 
Mar 04 02:21:29  
Mar 04 02:21:29 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 04 02:21:29 Compile requests                 68 
Mar 04 02:21:29 Compile requests executed        28 
Mar 04 02:21:29 Cache hits                       21 

See CircleCI build pytorch_linux_xenial_cuda10_1_cudnn7_py3_NO_AVX2_test (3/3)

Step: "Test" (full log | pattern match details) ❄️

Mar 04 02:57:53 RuntimeError: Error downloading resource!
Mar 04 02:57:53  
Mar 04 02:57:53 During handling of the above exception, another exception occurred: 
Mar 04 02:57:53  
Mar 04 02:57:53 Traceback (most recent call last): 
Mar 04 02:57:53   File "tools/download_mnist.py", line 87, in <module> 
Mar 04 02:57:53     main() 
Mar 04 02:57:53   File "tools/download_mnist.py", line 80, in main 
Mar 04 02:57:53     download(path, url, options.quiet) 
Mar 04 02:57:53   File "tools/download_mnist.py", line 41, in download 
Mar 04 02:57:53     raise RuntimeError('Error downloading resource!') 
Mar 04 02:57:53 RuntimeError: Error downloading resource! 
Mar 04 02:57:53 + cleanup 
Mar 04 02:57:53 + retcode=1 
Mar 04 02:57:53 + set +x 
Mar 04 02:57:53 =================== sccache compilation log =================== 
Mar 04 02:57:53 =========== If your build fails, please take a look at the log above for possible reasons =========== 
Mar 04 02:57:53 Compile requests                19 
Mar 04 02:57:53 Compile requests executed        0 
Mar 04 02:57:53 Cache hits                       0 
Mar 04 02:57:53 Cache misses                     0 
Mar 04 02:57:53 Cache timeouts                   0 

🚧 2 upstream failures recognized by patterns:

These were probably caused by upstream breakages:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 10 times.

@iseeyuan
Copy link
Copy Markdown
Contributor

iseeyuan commented Mar 2, 2020

"Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well." I just noticed there are some extra ops in the actual bytecode than from the get_opnames. Do you mean this? I could try some fixes on get_opnames.


# Dump root ops used by the model (for custom build optimization).
ops = torch.jit.export_opnames(traced_script_module)
ops.append('aten::ones') # HACK because predictor.cpp explicitly calls this!
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought that if our eventual setup still needs to explicitly push roots into ops like this we'll probably want to institutionalize it, like have a global list called EXTRA_ROOT_OPS or whatever

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as I replied to iseeyuan, we need run static analysis against JIT code to find out common EXTRA_ROOT_OPS, if any.

However, this case is different - aten::ones is directly called from client code (the dummy predictor.cpp calls it to create all-one-tensor for testing purpose). Generally, user could write pure c++ client code to directly call functions or tensor methods, which won't be captured by extracting ops from model's bytecodes.

This is less a problem for Android, where we expect users to use the limited set of Java APIs. To actually solve this problem, we probably need ask users to run code analyzer against their client code to dump these extra root ops - that's why I feel there will be more support work to switch to dynamic dispatch - in static dispatch case these extra ops will be kept by linker automatically.

For CI purpose this one-off hack is probably fine? :)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, does this mean every custom build is gonna get aten::ones whether they use it or not? Or just binaries that use the dummy predictor.cpp? (I'm guessing the former since this is a general script, but plz correct :)

A single op isn't that big a deal I guess, but I think it's important to describe the situation more explicitly - something like your description above in a comment above a global that declares the op more explicitly. Text below is just your comment above, so may not be entirely correct, but this is the kind of thing I mean:

# We need to run static analysis against JIT code to find out common EXTRA_ROOT_OPS, if any.
# 
# However, this case is different - aten::ones is directly called from client code (the dummy
# predictor.cpp calls it to create all-one-tensor for testing purpose). Generally, user could
# write pure c++ client code to directly call functions or tensor methods, which won't be
# captured by extracting ops from model's bytecodes.
# 
# This is less a problem for Android, where we expect users to use the limited set of Java 
# APIs. To actually solve this problem, we probably need ask users to run code analyzer against 
# their client code to dump these extra root ops - that's why I feel there will be more support 
# work to switch to dynamic dispatch - in static dispatch case these extra ops will be kept by 
# linker automatically.
# 
# For CI purpose this one-off hack is probably fine? :)
EXTRA_CI_ROOT_OPS = ['aten::ones']

...
ops.extend(EXTRA_CI_ROOT_OPS)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, does this mean every custom build is gonna get aten::ones whether they use it or not? Or just binaries that use the dummy predictor.cpp? (I'm guessing the former since this is a general script, but plz correct :)

It's the latter - the script is under test/mobile/... which is for mobile CI specifically. That's why I don't feel so guilty to add this hack :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comment to the test script to keep a record.

…amic dispatch"

Summary:
Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

Differential Revision: [D20193327](https://our.internmc.facebook.com/intern/diff/D20193327)

[ghstack-poisoned]
…amic dispatch"

Summary:
Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

Differential Revision: [D20193327](https://our.internmc.facebook.com/intern/diff/D20193327)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Mar 3, 2020
Summary:
Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

ghstack-source-id: c94a3db
Pull Request resolved: #34055
@ljk53
Copy link
Copy Markdown
Contributor Author

ljk53 commented Mar 3, 2020

"Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well." I just noticed there are some extra ops in the actual bytecode than from the get_opnames. Do you mean this? I could try some fixes on get_opnames.

The problem you mentioned should be fixed (regardless of static dispatch custom build or dynamic dispatch custom build) :)

What I meant was potential case where model's bytecode doesn't fully cover all ops called from JIT, e.g. what if during model loading time some JIT code decides to call some ops directly? I can imagine some common preprocessing pass might do this.

One way to address this problem is to run static analysis against JIT codebase and find out. It will require a little work.

@ljk53 ljk53 requested a review from bhosmer March 3, 2020 06:12
@iseeyuan
Copy link
Copy Markdown
Contributor

iseeyuan commented Mar 3, 2020

"Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well." I just noticed there are some extra ops in the actual bytecode than from the get_opnames. Do you mean this? I could try some fixes on get_opnames.

The problem you mentioned should be fixed (regardless of static dispatch custom build or dynamic dispatch custom build) :)

What I meant was potential case where model's bytecode doesn't fully cover all ops called from JIT, e.g. what if during model loading time some JIT code decides to call some ops directly? I can imagine some common preprocessing pass might do this.

One way to address this problem is to run static analysis against JIT codebase and find out. It will require a little work.

Are load-time operations embedded in __setstate__, which can also be collected in bytecode level? I'm not aware of other run-time op calls.

Copy link
Copy Markdown

@bhosmer bhosmer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One inline thing about the commenting aten::ones dependency but LGTM :)

…amic dispatch"

Summary:
Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

Differential Revision: [D20193327](https://our.internmc.facebook.com/intern/diff/D20193327)

[ghstack-poisoned]
…amic dispatch"

Summary:
Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

Differential Revision: [D20193327](https://our.internmc.facebook.com/intern/diff/D20193327)

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Mar 4, 2020
Summary:
Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

ghstack-source-id: 9bf078d
Pull Request resolved: #34055
@mrshenli
Copy link
Copy Markdown
Contributor

mrshenli commented Mar 4, 2020

Hey @ljk53, looks like either this PR or the one below it has broken master. Do we have any fix coming for it?

https://app.circleci.com/jobs/github/pytorch/pytorch/4683824
https://app.circleci.com/jobs/github/pytorch/pytorch/4683798

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ljk53 merged this pull request in 3c042a6.

@ljk53
Copy link
Copy Markdown
Contributor Author

ljk53 commented Mar 4, 2020

@mrshenli - looking - this might affect build but shouldn't break test.

ljk53 added a commit that referenced this pull request Mar 4, 2020
…c dispatch (#34055)"

This reverts commit 3c042a6.

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Mar 4, 2020
…c dispatch (#34055)"

This reverts commit 3c042a6.

ghstack-source-id: 6a1b1f7
Pull Request resolved: #34189
@ljk53
Copy link
Copy Markdown
Contributor Author

ljk53 commented Mar 4, 2020

@mrshenli created #34188 and #34189 to revert #32814 and this PR, respectively: none of them fixed the failed CI :(

@mrshenli
Copy link
Copy Markdown
Contributor

mrshenli commented Mar 4, 2020

@ljk53 Thanks a lot for investigating. Looks like that test is flaky. It also fails on #33860 but succeeded in several subsequent runs. Let me disable that.

ttumiel pushed a commit to ttumiel/pytorch that referenced this pull request Mar 4, 2020
…ch (pytorch#34055)

Summary:
Pull Request resolved: pytorch#34055

Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D20193327

Pulled By: ljk53

fbshipit-source-id: 9d369b8864856b098342aea79e0ac8eec04149aa
@facebook-github-bot facebook-github-bot deleted the gh/ljk53/110/head branch March 7, 2020 15:18
ljk53 added a commit to ljk53/pytorch that referenced this pull request Mar 9, 2020
…ch (pytorch#34055)

Summary:
Pull Request resolved: pytorch#34055

Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D20193327

Pulled By: ljk53

fbshipit-source-id: 9d369b8864856b098342aea79e0ac8eec04149aa
@ljk53
Copy link
Copy Markdown
Contributor Author

ljk53 commented Mar 13, 2020

"Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well." I just noticed there are some extra ops in the actual bytecode than from the get_opnames. Do you mean this? I could try some fixes on get_opnames.

The problem you mentioned should be fixed (regardless of static dispatch custom build or dynamic dispatch custom build) :)
What I meant was potential case where model's bytecode doesn't fully cover all ops called from JIT, e.g. what if during model loading time some JIT code decides to call some ops directly? I can imagine some common preprocessing pass might do this.
One way to address this problem is to run static analysis against JIT codebase and find out. It will require a little work.

Are load-time operations embedded in __setstate__, which can also be collected in bytecode level? I'm not aware of other run-time op calls.

@iseeyuan Here is one example of calling ops during model loading time: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/serialization/unpickler.cpp#L425

PickleOpCode Unpickler::readInstruction() {
  ...
  case PickleOpCode::BINPERSID: {
    ..
    tensor = at::empty({0}, options).set_(storage);

It calls aten::empty() and aten::set_().

Today it works because these ops are pretty common and are also indirectly called by regular models - but there is no guarantee it will keep this way. For example, #34641 breaks dynamic dispatch CI in a subtle way. We probably need run code analyzer against torch/csrc to find out these common ops used by JIT.

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…ch (pytorch#34055)

Summary:
Pull Request resolved: pytorch#34055

Enable custom mobile build with dynamic dispatch for OSS build.

It calls a python util script to calculate transitive dependencies from
the op dependency graph and the list of used root ops, then pass the
result as the op registration whitelist to aten codegen, so that only
these used ops are registered and kept at link time.

For custom build with dynamic dispatch to work correctly, it's critical
to have the accurate list of used ops. Current assumption is that only
those ops referenced by TorchScript model are used. It works well if
client code doesn't call libtorch API (e.g.  tensor methods) directly;
otherwise the extra used ops need to be added to the whitelist manually,
as shown by the HACK in prepare_model.py.

Also, if JIT starts calling extra ops independent of specific model,
then the extra ops need to be added to the whitelist as well.

Verified the correctness of the whole process with MobileNetV2:
```
TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh
```

Test Plan: Imported from OSS

Reviewed By: bhosmer

Differential Revision: D20193327

Pulled By: ljk53

fbshipit-source-id: 9d369b8864856b098342aea79e0ac8eec04149aa
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants