Skip to content

[pytorch][CI] end-to-end custom build script#34012

Closed
ljk53 wants to merge 8 commits intogh/ljk53/107/basefrom
gh/ljk53/107/head
Closed

[pytorch][CI] end-to-end custom build script#34012
ljk53 wants to merge 8 commits intogh/ljk53/107/basefrom
gh/ljk53/107/head

Conversation

@ljk53
Copy link
Copy Markdown
Contributor

@ljk53 ljk53 commented Feb 29, 2020

Stack from ghstack:

Summary:
Today some mobile simulator tests only run on landed PRs and it requires
setting up special build environment to repro errors locally.

The goal of the PR is to do end-to-end mobile custom build & integration
tests with host toolchain (using same CMake options as mobile build). This
way, non-mobile engineers can capture & debug mobile related build issues
much more easily.

There are three custom build types that this script supports:

  1. TEST_DEFAULT_BUILD=1 ./build.sh - it is similar to the prebuilt libtorch
    libraries released for Android and iOS (same CMake build options + host
    toolchain), which doesn't contain autograd function nor backward ops thus is
    smaller than full LibTorch.

  2. TEST_CUSTOM_BUILD_STATIC=1 ./build.sh - it further optimizes libtorch
    size by only including ops used by a specific model.

  3. TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh - similar as 2) except that it
    relies on the op dependency graph (instead of static dispatch) to calculate
    and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

Replacing existing mobile build CI to run Type 1) build & integration test.

Differential Revision: D20193328

Summary:
Add script to show/test the flow to build libtorch locally with optimized
binary size for mobile devices and the flow to integrate it with a simple
predictor in c++.

There are three custom build types that this script supports:

1. `TEST_DEFAULT=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model. Type 2) will be
deprecated by type 3) in the future.

Type 3) custom build has not been fully supported yet so it's expected to fail.

[ghstack-poisoned]
Summary:
Add script to show/test the flow to build libtorch locally with optimized
binary size for mobile devices and the flow to integrate it with a simple
predictor in c++.

There are three custom build types that this script supports:

1. `TEST_DEFAULT=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model. Type 2) will be
deprecated by type 3) in the future.

Type 3) custom build has not been fully supported yet so it's expected to fail.

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Feb 29, 2020
Summary:
Add script to show/test the flow to build libtorch locally with optimized
binary size for mobile devices and the flow to integrate it with a simple
predictor in c++.

There are three custom build types that this script supports:

1. `TEST_DEFAULT=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model. Type 2) will be
deprecated by type 3) in the future.

Type 3) custom build has not been fully supported yet so it's expected to fail.

ghstack-source-id: a838dcc
Pull Request resolved: #34012
Summary:
Add script to show/test the flow to build libtorch locally with optimized
binary size for mobile devices and the flow to integrate it with a simple
predictor in c++.

There are three custom build types that this script supports:

1. `TEST_DEFAULT=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model. Type 2) will be
deprecated by type 3) in the future.

Type 3) custom build has not been fully supported yet so it's expected to fail.

[ghstack-poisoned]
Summary:
Add script to show/test the flow to build libtorch locally with optimized
binary size for mobile devices and the flow to integrate it with a simple
predictor in c++.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Feb 29, 2020
Summary:
Add script to show/test the flow to build libtorch locally with optimized
binary size for mobile devices and the flow to integrate it with a simple
predictor in c++.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

ghstack-source-id: b04cf66
Pull Request resolved: #34012
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Mar 1, 2020

💊 CircleCI build failures summary and remediations

As of commit e3727c1 (more details on the Dr. CI page):


None of the build failures appear to be your fault 💚



❄️ 2 tentatively flaky failures

2 failures tentatively classified as flaky but have not launched reruns to confirm:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_ge_config_legacy_test (1/2)

Step: "Set Up CI Environment After attach_workspace" (full log | pattern match details) ❄️

E: Failed to fetch https://download.docker.com/linux/ubuntu/dists/xenial/stable/binary-amd64/Packages.bz2 Hash Sum mismatch
                                                                                87% [13 Sources store 0 B] [Waiting for headers]                                                  Get:53 http://security.ubuntu.com/ubuntu xenial-security/universe Sources [118 kB] 
                                                           87% [13 Sources store 0 B]                            Get:54 http://security.ubuntu.com/ubuntu xenial-security/multiverse Sources [3,412 B] 
                                                             87% [13 Sources store 0 B]                            Get:55 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [830 kB] 
                                                            90% [16 Packages store 0 B]                             Get:56 http://security.ubuntu.com/ubuntu xenial-security/main Translation-en [316 kB] 
91% [Waiting for headers] 91% [17 Translation-en store 0 B] [Waiting for headers]                                                         Get:57 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [486 kB] 
93% [Waiting for headers] 93% [20 Packages store 0 B] [Waiting for headers]                                                   Get:58 http://security.ubuntu.com/ubuntu xenial-security/universe Translation-en [199 kB] 
                                                                  93% [20 Packages store 0 B]                             Get:59 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [5,728 B] 
                                                               93% [20 Packages store 0 B]                             Get:60 http://security.ubuntu.com/ubuntu xenial-security/multiverse Translation-en [2,708 B] 
                                   100% [Working]                Fetched 28.9 MB in 5s (5,479 kB/s) 
Reading package lists... 99%  Reading package lists... Done  
E: Failed to fetch https://download.docker.com/linux/ubuntu/dists/xenial/stable/binary-amd64/Packages.bz2  Hash Sum mismatch 
E: Some index files failed to download. They have been ignored, or old ones used instead. 

See CircleCI build pytorch_python_doc_push (2/2)

Step: "Set Up CI Environment After attach_workspace" (full log | pattern match details) ❄️

E: Failed to fetch https://download.docker.com/linux/ubuntu/dists/xenial/stable/binary-amd64/Packages.bz2 Hash Sum mismatch
0% [10 Sources store 0 B] [22 InRelease gpgv 17.5 kB] [Connecting to ppa.launch 87% [10 Sources store 0 B] [Waiting for headers] [Connecting to ppa.launchpad.n                                                                                 Get:53 http://security.ubuntu.com/ubuntu xenial-security/main amd64 Packages [830 kB] 
                                                                                89% [10 Sources store 0 B] [Waiting for headers]                                                  Get:54 http://ppa.launchpad.net/git-core/ppa/ubuntu xenial/main amd64 Packages [3,356 B] 
                                                                                89% [10 Sources store 0 B] [Waiting for headers]                                                  Get:55 http://security.ubuntu.com/ubuntu xenial-security/main Translation-en [316 kB] 
                                                                                90% [10 Sources store 0 B] [Connecting to ppa.launchpad.net (91.189.95.83)]                                                                             Get:56 http://security.ubuntu.com/ubuntu xenial-security/universe amd64 Packages [486 kB] 
                                                                                92% [10 Sources store 0 B] [Connecting to ppa.launchpad.net (91.189.95.83)]                                                                             Get:57 http://security.ubuntu.com/ubuntu xenial-security/universe Translation-en [199 kB] 
                                                                                92% [10 Sources store 0 B] [Connecting to ppa.launchpad.net (91.189.95.83)]                                                                             Get:58 http://security.ubuntu.com/ubuntu xenial-security/multiverse amd64 Packages [5,728 B] 
                                                                                92% [10 Sources store 0 B] [Waiting for headers]                                                  Get:59 http://security.ubuntu.com/ubuntu xenial-security/multiverse Translation-en [2,708 B] 
                                                                                92% [10 Sources store 0 B] [Waiting for headers]                                                  Get:60 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu xenial/main amd64 Packages [11.8 kB] 
                             100% [Working]                Fetched 28.9 MB in 5s (5,526 kB/s) 
Reading package lists... 99%  Reading package lists... Done  
E: Failed to fetch https://download.docker.com/linux/ubuntu/dists/xenial/stable/binary-amd64/Packages.bz2  Hash Sum mismatch 
E: Some index files failed to download. They have been ignored, or old ones used instead. 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 10 times.

Summary:
Today some mobile simulator tests only run on landed PRs and it requires
setting up special build environment to repro errors locally.

The goal of the PR is to do end-to-end mobile custom build & integration
tests with host toolchain (using same CMake options as mobile build). This
way, non-mobile engineers can capture & debug mobile related build issues
much more easily.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

Replacing existing mobile build CI to run Type 1) build & integration test.

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Mar 1, 2020
Summary:
Today some mobile simulator tests only run on landed PRs and it requires
setting up special build environment to repro errors locally.

The goal of the PR is to do end-to-end mobile custom build & integration
tests with host toolchain (using same CMake options as mobile build). This
way, non-mobile engineers can capture & debug mobile related build issues
much more easily.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

Replacing existing mobile build CI to run Type 1) build & integration test.

ghstack-source-id: 4fdc7e6
Pull Request resolved: #34012
echo "Clang version:"
clang --version
# Install torch & torchvision - used to download & trace test model.
pip install torch torchvision --progress-bar off
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see iOS simulator test is doing this - I'm not entirely sure whether it's a good idea to download and install torch & torchvision on every PR.

My goal is to download & trace & run a non-trivial model so I chose MobileNetV2 - shall I create a dummy TorchScript model with similar set of ops like MobileNetV2 but with small dummy weights instead so that I can commit the test model without downloading it from network?
Are there any similar integration tests for non-mobile builds that I can reuse?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great (I commented on this at #30133 (comment)) but it's certainly the most convenient way to dump the model (and we do this in a few other places). An alternative is to save the model in s3 directly and then suck it down directly, but that makes it more difficult to update the model later.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for your other question, let me read the patch and get back to you on it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great (I commented on this at #30133 (comment)) but it's certainly the most convenient way to dump the model (and we do this in a few other places). An alternative is to save the model in s3 directly and then suck it down directly, but that makes it more difficult to update the model later.

I saw you commented there - "concerned about relying on nightlies for per PR tests" - it installs stable torch, isn't it?

@ljk53 ljk53 requested a review from ezyang March 1, 2020 04:08
fi
}

generate_op_dependency_graph() {
Copy link
Copy Markdown
Contributor Author

@ljk53 ljk53 Mar 1, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

selected mobile build with dynamic dispatch is not supported in OSS yet so this won't be called now.

When I enable it in the future, it will build libtorch twice - first generate the op dependency graph inline (which will build mobile libtorch into LLVM bitcode), then call the regular mobile build (which will build mobile libtorch again).

I could reuse the output of the other code-analysis CI's output to save some work, but it won't reduce latency of the job. Fortunately it takes much less time to build libtorch with mobile CMake options (as it skips autograd and caffe2) so it's probably not going to be too bad.

Let me know if you have better suggestions.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the terminal state of the static analyzer, is that right? Because we cannot use LLVM to generate the op dependency graph without compiling everything, and then we need to build the actual build. Do you have some sense of how long these steps take?

It's kind of irritating that you have to build twice, but assuming you need LLVM bitcode I'm not sure how to get around it. I suppose that for OSS end users, we could directly distribute LLVM bitcode for libtorch, so they don't need to do this step.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's generally much faster than regular full pytorch build + test, so hopefully this won't be the bottleneck of our CI. But I'll test before adding the custom build + dynamic dispatch option to the CI (only this mode requires build twice).

For external users, we can simply add the op dependency graph to the release/nightly package. The only potential issue is that some cmake options will affect the op dependency graph, e.g.: whether to strip out error message, because full error message calls tensor.toString() which calls some other aten ops :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"oops"

Summary:
Today some mobile simulator tests only run on landed PRs and it requires
setting up special build environment to repro errors locally.

The goal of the PR is to do end-to-end mobile custom build & integration
tests with host toolchain (using same CMake options as mobile build). This
way, non-mobile engineers can capture & debug mobile related build issues
much more easily.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

Replacing existing mobile build CI to run Type 1) build & integration test.

[ghstack-poisoned]
model = torchvision.models.mobilenet_v2(pretrained=True)
model.eval()
example = torch.rand(1, 3, 224, 224)
traced_script_module = torch.jit.trace(model, example)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated question: how come we're tracing here (and not using TorchScript, e.g.?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's copied from a very old example, so the latest approach should be script_model = torch.jit.script(mode)?

Comment thread test/mobile/custom_build/predictor.cpp
torch::autograd::AutoGradMode no_autograd_guard{false};
// Disable graph optimizer to ensure list of unused ops are not changed for
// custom mobile build.
torch::jit::GraphOptimizerEnabledGuard no_optimizer_guard{false};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reasoning here makes sense, but in general, this seems a bit questionable, since I imagine that one might quite reasonably want to optimize graphs for mobile build. Wouldn't it be "more correct" to optimize first, and then compute the list of unused ops? Can this be conveniently supported?

I understand the goal of this PR is to get an e2e CI running, so this wouldn't be blocking, but it's something to think about...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was from a discussion with @zdevito @bhosmer @iseeyuan - the decision was that we won't run any optimization pass for mobile in near future, and it should not affect perf in meaningful way. Not sure if we should revisit this decision sometime...

Comment thread test/mobile/custom_build/predictor.cpp Outdated
auto qengines = at::globalContext().supportedQEngines();
if (std::find(qengines.begin(), qengines.end(), at::QEngine::QNNPACK) !=
qengines.end()) {
at::globalContext().setQEngine(at::QEngine::QNNPACK);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not really related to this PR, but is there really any reason why you wouldn't want the lack of QNNPACK to be a hard error? This code (1) seems annoying boilerplate that everyone has to write, and (2) seems like it would mask build problems when QNNPACK is not available.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's copied from the Android JNI glue code (I guess we wanted to keep it flexible to support custom mobile build without QNNPACK?) I can remove this from e2e CI code, though.
And yes, it's annoying to copy this type of boilerplate code (including MobileCallGuard) to Android/iOS and here - shall we add them to torch/script.h header for mobile builds?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, it wouldn't be necessary to write this boilerplate at all! I guess it would be better to figure out who added this in the first place and ask them about it. @jerryzh168, any comments here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, it wouldn't be necessary to write this boilerplate at all! I guess it would be better to figure out who added this in the first place and ask them about it. @jerryzh168, any comments here?

I think this is introduced since @supriyar unified fbgemm & qnnpack interface at dispatcher level. Now they look the same but the underlying packed weights format / etc are different. So we introduced the concept called "qengine" to switch between these two quantization implementations. And it's possible to have both qnnpack/fbgemm available in one build so I guess it needs to be set explicitly. Yet another dispatching mechanism? :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For mobile build though we turn off FBGEMM, right? In that case the default qengine should be set to QNNPACK. We control that here - https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/Context.cpp#L122

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it is another runtime dispatcher because we unified the API for fbgemm and qnnpack.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@supriyar did you add this boilerplate code? could you talk about why did we do this? Do we ever run this code with fbgemm backend?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryzh168 see my comment above. I don't think this boilerplate is necessary because we set a default backend (for mobile it should be qnnpack) in Context.cpp.

Comment thread test/mobile/custom_build/build.sh Outdated
@@ -0,0 +1,145 @@
#!/bin/bash
###############################################################################
# This script shows/tests the flow to build libtorch locally with optimized
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioning a pet peeve of mine: I generally don't like build scripts intended for general use by users, because they can make it difficult to toggle build parameters on the underlying cmake invocations (we had this problem for the longest time in setup.py until xuhdev wrote a general framework for passing environment variables into cmake flags for the underlying cmake invocation).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. First, I wasn't thinking about exposing this to external users - I meant to say "shows" how things works to internal developers who need repeat the build process locally.

Right now the exposed mobile build scripts are under scripts/build_ios.sh, scripts/build_pytorch_android.sh, scripts/build_mobile.sh, which kinda support overriding cmake options from command line.

This scripts invokes multiple other shell scripts, and some of them are meant to take specialized cmake options;
Large part of this script (building analyzer tool, building LLVM bitcode, running analyzer against bitcode) is also done by the other CI (mobile_code_analysis) - we should pack its output to release package so external users don't need to do it by themselves.

The whole custom build + dynamic dispatch workflow is still evolving so it's not set in stone - I'll take a look at xuhdev's general framework and might migrate to it in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing "shows" from the comment to avoid confusing external developers :)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure xuhdev's stuff in setup.py is all that useful, since it's specific for building PyTorch on Windows. But worth taking a look :)

# relies on the op dependency graph (instead of static dispatch) to calculate
# and keep all transitively dependent ops by the model.
# Note that LLVM_DIR environment variable should be set to the location of
# LLVM-dev toolchain.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be an example for users for how they set things up? :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess yes for internal developers if they need debug why their change breaks custom build + dynamic dispatch? But probably not for external users?

Comment thread test/mobile/custom_build/build.sh Outdated
Copy link
Copy Markdown
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, looks reasonable.

Summary:
Today some mobile simulator tests only run on landed PRs and it requires
setting up special build environment to repro errors locally.

The goal of the PR is to do end-to-end mobile custom build & integration
tests with host toolchain (using same CMake options as mobile build). This
way, non-mobile engineers can capture & debug mobile related build issues
much more easily.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

Replacing existing mobile build CI to run Type 1) build & integration test.

Differential Revision: [D20193328](https://our.internmc.facebook.com/intern/diff/D20193328)

[ghstack-poisoned]
@ljk53 ljk53 requested a review from ezyang March 3, 2020 02:26
Summary:
Today some mobile simulator tests only run on landed PRs and it requires
setting up special build environment to repro errors locally.

The goal of the PR is to do end-to-end mobile custom build & integration
tests with host toolchain (using same CMake options as mobile build). This
way, non-mobile engineers can capture & debug mobile related build issues
much more easily.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

Replacing existing mobile build CI to run Type 1) build & integration test.

Differential Revision: [D20193328](https://our.internmc.facebook.com/intern/diff/D20193328)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ljk53 merged this pull request in 51936c5.

ttumiel pushed a commit to ttumiel/pytorch that referenced this pull request Mar 4, 2020
Summary:
Pull Request resolved: pytorch#34012

Today some mobile simulator tests only run on landed PRs and it requires
setting up special build environment to repro errors locally.

The goal of the PR is to do end-to-end mobile custom build & integration
tests with host toolchain (using same CMake options as mobile build). This
way, non-mobile engineers can capture & debug mobile related build issues
much more easily.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

Replacing existing mobile build CI to run Type 1) build & integration test.

Test Plan: Imported from OSS

Differential Revision: D20193328

Pulled By: ljk53

fbshipit-source-id: 48c14cae849fde86e27123f00f9911996c1cf40e
@facebook-github-bot facebook-github-bot deleted the gh/ljk53/107/head branch March 7, 2020 15:18
ljk53 added a commit to ljk53/pytorch that referenced this pull request Mar 9, 2020
Summary:
Pull Request resolved: pytorch#34012

Today some mobile simulator tests only run on landed PRs and it requires
setting up special build environment to repro errors locally.

The goal of the PR is to do end-to-end mobile custom build & integration
tests with host toolchain (using same CMake options as mobile build). This
way, non-mobile engineers can capture & debug mobile related build issues
much more easily.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

Replacing existing mobile build CI to run Type 1) build & integration test.

Test Plan: Imported from OSS

Differential Revision: D20193328

Pulled By: ljk53

fbshipit-source-id: 48c14cae849fde86e27123f00f9911996c1cf40e
ljk53 added a commit that referenced this pull request Mar 10, 2020
Summary:
According to
#34012 (comment),
this `at::globalContext().setQEngine(at::QEngine::QNNPACK);` call isn't
really necessary for mobile.

In Context.cpp it selects the last available QEngine if the engine isn't
set explicitly. For OSS mobile prebuild it should only include QNNPACK
engine so the default behavior should already be desired behavior.

It makes difference only when USE_FBGEMM is set - but it should be off
for both OSS mobile build and internal mobile build.

[ghstack-poisoned]
ljk53 added a commit that referenced this pull request Mar 10, 2020
Summary:
According to
#34012 (comment),
this `at::globalContext().setQEngine(at::QEngine::QNNPACK);` call isn't
really necessary for mobile.

In Context.cpp it selects the last available QEngine if the engine isn't
set explicitly. For OSS mobile prebuild it should only include QNNPACK
engine so the default behavior should already be desired behavior.

It makes difference only when USE_FBGEMM is set - but it should be off
for both OSS mobile build and internal mobile build.

ghstack-source-id: 139184c
Pull Request resolved: #34556
facebook-github-bot pushed a commit that referenced this pull request Mar 11, 2020
…ors (#34556)

Summary:
Pull Request resolved: #34556

According to
#34012 (comment),
this `at::globalContext().setQEngine(at::QEngine::QNNPACK);` call isn't
really necessary for mobile.

In Context.cpp it selects the last available QEngine if the engine isn't
set explicitly. For OSS mobile prebuild it should only include QNNPACK
engine so the default behavior should already be desired behavior.

It makes difference only when USE_FBGEMM is set - but it should be off
for both OSS mobile build and internal mobile build.

Test Plan: Imported from OSS

Differential Revision: D20374522

Pulled By: ljk53

fbshipit-source-id: d4e437a03c6d4f939edccb5c84f02609633a0698
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#34012

Today some mobile simulator tests only run on landed PRs and it requires
setting up special build environment to repro errors locally.

The goal of the PR is to do end-to-end mobile custom build & integration
tests with host toolchain (using same CMake options as mobile build). This
way, non-mobile engineers can capture & debug mobile related build issues
much more easily.

There are three custom build types that this script supports:

1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch
libraries released for Android and iOS (same CMake build options + host
toolchain), which doesn't contain autograd function nor backward ops thus is
smaller than full LibTorch.

2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch
size by only including ops used by a specific model.

3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it
relies on the op dependency graph (instead of static dispatch) to calculate
and keep all transitively dependent ops by the model.

Type 2) will be deprecated by type 3) in the future.
Type 3) custom build has not been fully supported yet so it's expected to fail.

Replacing existing mobile build CI to run Type 1) build & integration test.

Test Plan: Imported from OSS

Differential Revision: D20193328

Pulled By: ljk53

fbshipit-source-id: 48c14cae849fde86e27123f00f9911996c1cf40e
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…ors (pytorch#34556)

Summary:
Pull Request resolved: pytorch#34556

According to
pytorch#34012 (comment),
this `at::globalContext().setQEngine(at::QEngine::QNNPACK);` call isn't
really necessary for mobile.

In Context.cpp it selects the last available QEngine if the engine isn't
set explicitly. For OSS mobile prebuild it should only include QNNPACK
engine so the default behavior should already be desired behavior.

It makes difference only when USE_FBGEMM is set - but it should be off
for both OSS mobile build and internal mobile build.

Test Plan: Imported from OSS

Differential Revision: D20374522

Pulled By: ljk53

fbshipit-source-id: d4e437a03c6d4f939edccb5c84f02609633a0698
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants