Skip to content

Splits CPU and CUDA fusion compilers#10981

Closed
mruberry wants to merge 21 commits intopytorch:masterfrom
mruberry:fusion_compiler_split
Closed

Splits CPU and CUDA fusion compilers#10981
mruberry wants to merge 21 commits intopytorch:masterfrom
mruberry:fusion_compiler_split

Conversation

@mruberry
Copy link
Collaborator

@mruberry mruberry commented Aug 28, 2018

This PR splits the CPU and CUDA fusion compilers, putting them into a new jit/fusers/ directory with jit/fusers/common for common components. In particular:

  • A fusion interface is created that allows "fusion handles" to be requested
  • The CPU and CUDA fusers implement this interface, with dispatch determined by device
  • The fusion compilers, fusion function specializations and resource strings are split
  • CPU-specific classes like TempFile and DynamicLibrary are in the CPU fuser
  • Common classes likes TensorDesc and the base fusion function class are in jit/fusers/common
  • There is still some specialization in jit/fusers/common, but these specializations are small(-ish)
  • Updates the build system to remove the dummy interface on Windows and minimize the use of macros

This structure should allow in-flight PRs to easily rebase while providing a clear interface to the fusers.

@mruberry
Copy link
Collaborator Author

Looks like all tests are passing. fyi @zdevito @apaszke

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks much better than the initial PR. Some naming requests and this should be ready to land. I'd also like @ezyang to double check the use of macros, as I'm not sure if it fits well with our build philosophy.

, AnnotatedGraph& agraph
, bool use_cuda);

struct CommonFusionFunction {

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -0,0 +1,87 @@
#if USE_CPU_FUSER || USE_CUDA_FUSER

This comment was marked as off-topic.

This comment was marked as off-topic.

// - the shapes satisfy graph invariants for our fused code (e.g. that all intermediate shapes
// are the same - see fusion_compiler.cpp for more details).
// - their FusionArgSpecs compare equal
struct CommonFusionHandle : public FusionHandle {

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@@ -0,0 +1,47 @@
#if USE_CPU_FUSER || USE_CUDA_FUSER

This comment was marked as off-topic.

@@ -0,0 +1,34 @@
#pragma once

This comment was marked as off-topic.

This comment was marked as off-topic.

)

if (NOT WIN32)
add_definitions(-DUSE_CPU_FUSER)

This comment was marked as off-topic.

This comment was marked as off-topic.

ezyang
ezyang previously requested changes Sep 5, 2018
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Macro treatment is problematic.

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. @ezyang can you please take a look at macros again?

@mruberry
Copy link
Collaborator Author

mruberry commented Sep 7, 2018

Naming and config changes are in, I also took the three relevant PRs that updated the fusion_compiler and ported them to the split.

I am unsure of what's going on with the two failure. The circleci appears to be running itself again (?) and pytorch-linux-trusty-py2.7.9 has had jit problems (there's a PR to disable a test for it), but in this case there are three errors, not just test_scalar_fusion.

@@ -0,0 +1,4 @@
#pragma once

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.


CONFIGURE_FILE(
${TORCH_SRC_DIR}/csrc/jit/fusers/Config.h.in
${CMAKE_CURRENT_SOURCE_DIR}/csrc/jit/fusers/Config.h)

This comment was marked as off-topic.

@apaszke
Copy link
Contributor

apaszke commented Sep 10, 2018

@mruberry CI failure looks related

@apaszke
Copy link
Contributor

apaszke commented Sep 10, 2018

Hmm that might be a transient issue. I can't reproduce this locally. Let's see if a retest fixes it.

@apaszke
Copy link
Contributor

apaszke commented Sep 10, 2018

Nope, this seems to consistently fail 😕

@zou3519
Copy link
Contributor

zou3519 commented Sep 10, 2018

CPU fuser tests are flaky and possibly broken (even on master). See #11360 for some discussion -- I'm not sure why, but I think the pr/pytorch-linux-trusty-py2.7 machine runs out of memory, but this does need more investigation.

We are looking into it.

@mruberry
Copy link
Collaborator Author

It is strange that three tests were failing consistently (test_scalar_fusion was also failing until it was disabled), and this is mentioned above. I suspect these failures are revealed by this PR more than caused by it.

@apaszke
Copy link
Contributor

apaszke commented Sep 11, 2018

I agree, but unfortunately we can't merge this until we resolve the problem. It's likely that we will disable the CPU fuser by default, but we don't want to disable its tests entirely to avoid undetected breakages of that code path. @zou3519 is looking into that

@weiyangfb weiyangfb added the oncall: jit Add this issue/PR to JIT oncall triage queue label Sep 11, 2018
@ezyang
Copy link
Contributor

ezyang commented Sep 12, 2018

@pytorchbot retest this please

@zou3519
Copy link
Contributor

zou3519 commented Sep 12, 2018

I'm still looking into it, sorry for the delay. Some strange things are going on

@t-vi t-vi mentioned this pull request Sep 12, 2018
@zou3519
Copy link
Contributor

zou3519 commented Sep 12, 2018

I posted an update in #11360 about what I've found. The summary is that test_jit.py has high peak memory usage that causes the graph fuser to fail when it runs the fork() syscall, despite fork having copy-on-write semantics.

The fastest solution to unblock this PR would be to move the fuser tests to run first in test_jit.py so they run before test_jit starts using a lot of memory. A more robust solution (that I am looking into right now) is to figure out why test_jit.py uses so much memory (> 4gb) and fix that.

zou3519 and others added 2 commits September 13, 2018 10:28
Run TestEndToEndHybridFrontendModels last. It has high peak memory usage
that causes unrelated CPU fuser test failures if those tests run after.
A more robust fix for this issue is being tracked in pytorch#11360
@zou3519
Copy link
Contributor

zou3519 commented Sep 13, 2018

@mruberry @apaszke I pushed a change to make TestEndToEndHybridFrontendModels run last in test_jit.py, which should avoid the CPU fuser test failures and unblock this PR. I am investigating a more robust solution but this PR should be good to go after the CI runs

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@apaszke
Copy link
Contributor

apaszke commented Sep 14, 2018

Hmm while OSS tests are passing, this has unfortunately triggered a failure in some internal tests... I'll push some code that disables the fuser by default (except for the tests) tomorrow in the morning.

Also, disable grad mode in _check_trace, which greatly decreases
peak memory usage when inputs with requires_grad are used to trace.
@apaszke
Copy link
Contributor

apaszke commented Sep 14, 2018

I have disabled the CPU fuser by default, except for the few tests that should actually exercise it. The memory usage regression should be now addressed thanks to @zou3519, who noticed that we run _check_trace with grad mode enabled (which forces us to keep two graphs in memory). We still don't have a definite answer for why the peak RSS is persistent in many environments, but it looks like free is simply very unwilling to return memory to the OS.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Sep 20, 2018
Summary:
This patch adds fused forward and backward for clamp to the jit.
This is one item of #11118 . If it's OK, I'd be happy to also add some more of #11118 .

The patch depends on #11150 , which I merged into master as a base. I'll rebase it when that or #10981 is merged.

This is first serious jit patch, thank you, ngimel and the others for their guidance. All errors are my own.
Pull Request resolved: #11574

Differential Revision: D9943090

Pulled By: apaszke

fbshipit-source-id: c40954b8c28c374baab8d3bd89acc9250580dc67
@mruberry mruberry deleted the fusion_compiler_split branch September 25, 2018 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants