Splits CPU and CUDA fusion compilers#10981
Conversation
| , AnnotatedGraph& agraph | ||
| , bool use_cuda); | ||
|
|
||
| struct CommonFusionFunction { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| @@ -0,0 +1,87 @@ | |||
| #if USE_CPU_FUSER || USE_CUDA_FUSER | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| // - the shapes satisfy graph invariants for our fused code (e.g. that all intermediate shapes | ||
| // are the same - see fusion_compiler.cpp for more details). | ||
| // - their FusionArgSpecs compare equal | ||
| struct CommonFusionHandle : public FusionHandle { |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| @@ -0,0 +1,47 @@ | |||
| #if USE_CPU_FUSER || USE_CUDA_FUSER | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| @@ -0,0 +1,34 @@ | |||
| #pragma once | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/CMakeLists.txt
Outdated
| ) | ||
|
|
||
| if (NOT WIN32) | ||
| add_definitions(-DUSE_CPU_FUSER) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
ezyang
left a comment
There was a problem hiding this comment.
Macro treatment is problematic.
|
Naming and config changes are in, I also took the three relevant PRs that updated the fusion_compiler and ported them to the split. I am unsure of what's going on with the |
torch/csrc/jit/fusers/Config.h
Outdated
| @@ -0,0 +1,4 @@ | |||
| #pragma once | |||
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
|
||
| CONFIGURE_FILE( | ||
| ${TORCH_SRC_DIR}/csrc/jit/fusers/Config.h.in | ||
| ${CMAKE_CURRENT_SOURCE_DIR}/csrc/jit/fusers/Config.h) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@mruberry CI failure looks related |
|
Hmm that might be a transient issue. I can't reproduce this locally. Let's see if a retest fixes it. |
|
Nope, this seems to consistently fail 😕 |
|
CPU fuser tests are flaky and possibly broken (even on master). See #11360 for some discussion -- I'm not sure why, but I think the pr/pytorch-linux-trusty-py2.7 machine runs out of memory, but this does need more investigation. We are looking into it. |
|
It is strange that three tests were failing consistently (test_scalar_fusion was also failing until it was disabled), and this is mentioned above. I suspect these failures are revealed by this PR more than caused by it. |
|
I agree, but unfortunately we can't merge this until we resolve the problem. It's likely that we will disable the CPU fuser by default, but we don't want to disable its tests entirely to avoid undetected breakages of that code path. @zou3519 is looking into that |
|
@pytorchbot retest this please |
|
I'm still looking into it, sorry for the delay. Some strange things are going on |
|
I posted an update in #11360 about what I've found. The summary is that test_jit.py has high peak memory usage that causes the graph fuser to fail when it runs the The fastest solution to unblock this PR would be to move the fuser tests to run first in test_jit.py so they run before test_jit starts using a lot of memory. A more robust solution (that I am looking into right now) is to figure out why test_jit.py uses so much memory (> 4gb) and fix that. |
Run TestEndToEndHybridFrontendModels last. It has high peak memory usage that causes unrelated CPU fuser test failures if those tests run after. A more robust fix for this issue is being tracked in pytorch#11360
facebook-github-bot
left a comment
There was a problem hiding this comment.
apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Hmm while OSS tests are passing, this has unfortunately triggered a failure in some internal tests... I'll push some code that disables the fuser by default (except for the tests) tomorrow in the morning. |
Also, disable grad mode in _check_trace, which greatly decreases peak memory usage when inputs with requires_grad are used to trace.
|
I have disabled the CPU fuser by default, except for the few tests that should actually exercise it. The memory usage regression should be now addressed thanks to @zou3519, who noticed that we run |
facebook-github-bot
left a comment
There was a problem hiding this comment.
apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This patch adds fused forward and backward for clamp to the jit. This is one item of #11118 . If it's OK, I'd be happy to also add some more of #11118 . The patch depends on #11150 , which I merged into master as a base. I'll rebase it when that or #10981 is merged. This is first serious jit patch, thank you, ngimel and the others for their guidance. All errors are my own. Pull Request resolved: #11574 Differential Revision: D9943090 Pulled By: apaszke fbshipit-source-id: c40954b8c28c374baab8d3bd89acc9250580dc67
This PR splits the CPU and CUDA fusion compilers, putting them into a new jit/fusers/ directory with jit/fusers/common for common components. In particular:
This structure should allow in-flight PRs to easily rebase while providing a clear interface to the fusers.