Skip to content

profile ivalue for nvfuser#47668

Closed
jjsjann123 wants to merge 15 commits intogh/jjsjann123/4/basefrom
gh/jjsjann123/4/head
Closed

profile ivalue for nvfuser#47668
jjsjann123 wants to merge 15 commits intogh/jjsjann123/4/basefrom
gh/jjsjann123/4/head

Conversation

@jjsjann123
Copy link
Copy Markdown
Collaborator

@jjsjann123 jjsjann123 commented Nov 10, 2020

Stack from ghstack:

Differential Revision: D25255571

[ghstack-poisoned]
@jjsjann123 jjsjann123 requested a review from apaszke as a code owner November 10, 2020 12:04
jjsjann123 added a commit that referenced this pull request Nov 10, 2020
ghstack-source-id: 207164a
Pull Request resolved: #47668
@facebook-github-bot facebook-github-bot added cla signed oncall: jit Add this issue/PR to JIT oncall triage queue labels Nov 10, 2020
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Nov 10, 2020

💊 CI failures summary and remediations

As of commit afbff83 (more details on the Dr. CI page):


  • 9/9 failures possibly* introduced in this PR
    • 2/9 non-CircleCI failure(s)

🕵️ 7 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_6_clang9_test (1/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 13 08:29:40 AssertionError: False is not true :
Jan 13 08:29:40   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 596, in wrapper
Jan 13 08:29:40     fn(*args, **kwargs)
Jan 13 08:29:40   File "test_jit.py", line 15749, in do_test
Jan 13 08:29:40     run_test()
Jan 13 08:29:40   File "test_jit.py", line 15743, in run_test
Jan 13 08:29:40     self.assertAutodiffNode(script_fn.last_graph, should_autodiff_node, autodiff_nodes, fusible_nodes)
Jan 13 08:29:40   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_jit.py", line 220, in assertAutodiffNode
Jan 13 08:29:40     found_all_nonfusible_nodes and found_all_fusible_nodes, err_msg)
Jan 13 08:29:40   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1216, in assertEqual
Jan 13 08:29:40     super().assertTrue(x == y, msg=msg)
Jan 13 08:29:40 AssertionError: False is not true : 
Jan 13 08:29:40 Failure in testing nodes' autodifferentiation, one or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. 
Jan 13 08:29:40 Specifically:
Jan 13 08:29:40   aten::softmax was not in one of the DifferentiableGraphs when it was expected to be. Did you intend for this node to be autodiffed? If not, remove it from the list of nonfusible nodes.
Jan 13 08:29:40 
Jan 13 08:29:40 ----------------------------------------------------------------------
Jan 13 08:29:40 Ran 2887 tests in 166.777s
Jan 13 08:29:40 
Jan 13 08:29:40 FAILED (failures=84, errors=1, skipped=76, expected failures=1)
Jan 13 08:29:40 
Jan 13 08:29:40 Generating XML reports...

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (2/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 13 08:31:49 AssertionError: False is not true :
Jan 13 08:31:49   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 596, in wrapper
Jan 13 08:31:49     fn(*args, **kwargs)
Jan 13 08:31:49   File "test_jit.py", line 15749, in do_test
Jan 13 08:31:49     run_test()
Jan 13 08:31:49   File "test_jit.py", line 15743, in run_test
Jan 13 08:31:49     self.assertAutodiffNode(script_fn.last_graph, should_autodiff_node, autodiff_nodes, fusible_nodes)
Jan 13 08:31:49   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_jit.py", line 220, in assertAutodiffNode
Jan 13 08:31:49     found_all_nonfusible_nodes and found_all_fusible_nodes, err_msg)
Jan 13 08:31:49   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1216, in assertEqual
Jan 13 08:31:49     super().assertTrue(x == y, msg=msg)
Jan 13 08:31:49 AssertionError: False is not true : 
Jan 13 08:31:49 Failure in testing nodes' autodifferentiation, one or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. 
Jan 13 08:31:49 Specifically:
Jan 13 08:31:49   aten::softmax was not in one of the DifferentiableGraphs when it was expected to be. Did you intend for this node to be autodiffed? If not, remove it from the list of nonfusible nodes.
Jan 13 08:31:49 
Jan 13 08:31:49 ----------------------------------------------------------------------
Jan 13 08:31:49 Ran 2887 tests in 151.740s
Jan 13 08:31:49 
Jan 13 08:31:49 FAILED (failures=84, errors=1, skipped=76, expected failures=1)
Jan 13 08:31:49 
Jan 13 08:31:49 Generating XML reports...

See CircleCI build pytorch_linux_xenial_py3_clang5_asan_test2 (3/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 13 08:43:32 AssertionError: False is not true :
Jan 13 08:43:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 596, in wrapper
Jan 13 08:43:32     fn(*args, **kwargs)
Jan 13 08:43:32   File "test_jit.py", line 15749, in do_test
Jan 13 08:43:32     run_test()
Jan 13 08:43:32   File "test_jit.py", line 15743, in run_test
Jan 13 08:43:32     self.assertAutodiffNode(script_fn.last_graph, should_autodiff_node, autodiff_nodes, fusible_nodes)
Jan 13 08:43:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_jit.py", line 220, in assertAutodiffNode
Jan 13 08:43:32     found_all_nonfusible_nodes and found_all_fusible_nodes, err_msg)
Jan 13 08:43:32   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1216, in assertEqual
Jan 13 08:43:32     super().assertTrue(x == y, msg=msg)
Jan 13 08:43:32 AssertionError: False is not true : 
Jan 13 08:43:32 Failure in testing nodes' autodifferentiation, one or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. 
Jan 13 08:43:32 Specifically:
Jan 13 08:43:32   aten::softmax was not in one of the DifferentiableGraphs when it was expected to be. Did you intend for this node to be autodiffed? If not, remove it from the list of nonfusible nodes.
Jan 13 08:43:32 
Jan 13 08:43:32 ----------------------------------------------------------------------
Jan 13 08:43:32 Ran 2856 tests in 739.654s
Jan 13 08:43:32 
Jan 13 08:43:32 FAILED (failures=84, errors=1, skipped=76, expected failures=1)
Jan 13 08:43:32 
Jan 13 08:43:32 Generating XML reports...

See CircleCI build pytorch_macos_10_13_py3_test (4/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Jan 13 07:29:31 AssertionError: False is not true :
Jan 13 07:29:31   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 596, in wrapper
Jan 13 07:29:31     fn(*args, **kwargs)
Jan 13 07:29:31   File "test_jit.py", line 15749, in do_test
Jan 13 07:29:31     run_test()
Jan 13 07:29:31   File "test_jit.py", line 15743, in run_test
Jan 13 07:29:31     self.assertAutodiffNode(script_fn.last_graph, should_autodiff_node, autodiff_nodes, fusible_nodes)
Jan 13 07:29:31   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_jit.py", line 220, in assertAutodiffNode
Jan 13 07:29:31     found_all_nonfusible_nodes and found_all_fusible_nodes, err_msg)
Jan 13 07:29:31   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1216, in assertEqual
Jan 13 07:29:31     super().assertTrue(x == y, msg=msg)
Jan 13 07:29:31 AssertionError: False is not true : 
Jan 13 07:29:31 Failure in testing nodes' autodifferentiation, one or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. 
Jan 13 07:29:31 Specifically:
Jan 13 07:29:31   aten::softmax was not in one of the DifferentiableGraphs when it was expected to be. Did you intend for this node to be autodiffed? If not, remove it from the list of nonfusible nodes.
Jan 13 07:29:31 
Jan 13 07:29:31 ----------------------------------------------------------------------
Jan 13 07:29:31 Ran 2887 tests in 199.141s
Jan 13 07:29:31 
Jan 13 07:29:31 FAILED (failures=84, errors=1, skipped=107, expected failures=1)
Jan 13 07:29:31 
Jan 13 07:29:31 Generating XML reports...

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (5/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 13 10:27:03 AssertionError: False is not true :
Jan 13 10:27:03   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 596, in wrapper
Jan 13 10:27:03     fn(*args, **kwargs)
Jan 13 10:27:03   File "test_jit.py", line 15749, in do_test
Jan 13 10:27:03     run_test()
Jan 13 10:27:03   File "test_jit.py", line 15743, in run_test
Jan 13 10:27:03     self.assertAutodiffNode(script_fn.last_graph, should_autodiff_node, autodiff_nodes, fusible_nodes)
Jan 13 10:27:03   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_jit.py", line 220, in assertAutodiffNode
Jan 13 10:27:03     found_all_nonfusible_nodes and found_all_fusible_nodes, err_msg)
Jan 13 10:27:03   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1216, in assertEqual
Jan 13 10:27:03     super().assertTrue(x == y, msg=msg)
Jan 13 10:27:03 AssertionError: False is not true : 
Jan 13 10:27:03 Failure in testing nodes' autodifferentiation, one or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. 
Jan 13 10:27:03 Specifically:
Jan 13 10:27:03   aten::softmax was not in one of the DifferentiableGraphs when it was expected to be. Did you intend for this node to be autodiffed? If not, remove it from the list of nonfusible nodes.
Jan 13 10:27:03 
Jan 13 10:27:03 ----------------------------------------------------------------------
Jan 13 10:27:03 Ran 2889 tests in 209.434s
Jan 13 10:27:03 
Jan 13 10:27:03 FAILED (failures=73, errors=1, skipped=43, expected failures=1)
Jan 13 10:27:03 
Jan 13 10:27:03 Generating XML reports...

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (6/7)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 13 08:13:20 AssertionError: False is not true :
Jan 13 08:13:20   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 596, in wrapper
Jan 13 08:13:20     fn(*args, **kwargs)
Jan 13 08:13:20   File "test_jit.py", line 15749, in do_test
Jan 13 08:13:20     run_test()
Jan 13 08:13:20   File "test_jit.py", line 15743, in run_test
Jan 13 08:13:20     self.assertAutodiffNode(script_fn.last_graph, should_autodiff_node, autodiff_nodes, fusible_nodes)
Jan 13 08:13:20   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_jit.py", line 219, in assertAutodiffNode
Jan 13 08:13:20     self.assertEqual(should_autodiff_node,
Jan 13 08:13:20   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_utils.py", line 1216, in assertEqual
Jan 13 08:13:20     super().assertTrue(x == y, msg=msg)
Jan 13 08:13:20 AssertionError: False is not true : 
Jan 13 08:13:20 Failure in testing nodes' autodifferentiation, one or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. 
Jan 13 08:13:20 Specifically:
Jan 13 08:13:20   aten::softmax was not in one of the DifferentiableGraphs when it was expected to be. Did you intend for this node to be autodiffed? If not, remove it from the list of nonfusible nodes.
Jan 13 08:13:20 
Jan 13 08:13:21 ----------------------------------------------------------------------
Jan 13 08:13:21 Ran 2887 tests in 361.683s
Jan 13 08:13:21 
Jan 13 08:13:21 FAILED (failures=84, errors=1, skipped=76, expected failures=1)
Jan 13 08:13:21 
Jan 13 08:13:21 Generating XML reports...

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test2 (7/7)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

AssertionError: False is not true :
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 596, in wrapper
    fn(*args, **kwargs)
  File "test_jit.py", line 15749, in do_test
    run_test()
  File "test_jit.py", line 15743, in run_test
    self.assertAutodiffNode(script_fn.last_graph, should_autodiff_node, autodiff_nodes, fusible_nodes)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_jit.py", line 220, in assertAutodiffNode
    found_all_nonfusible_nodes and found_all_fusible_nodes, err_msg)
  File "C:\Users\circleci\project\build\win_tmp\build\torch\testing\_internal\common_utils.py", line 1216, in assertEqual
    super().assertTrue(x == y, msg=msg)
AssertionError: False is not true : 
Failure in testing nodes' autodifferentiation, one or more nodes were expected to be autodiffed, but were not found in specified fusible/nonfusible DifferentiableGraph groups. 
Specifically:
  aten::softmax was not in one of the DifferentiableGraphs when it was expected to be. Did you intend for this node to be autodiffed? If not, remove it from the list of nonfusible nodes.

----------------------------------------------------------------------
Ran 2889 tests in 177.091s

FAILED (failures=73, errors=1, skipped=82, expected failures=1)

Generating XML reports...

Extra GitHub checks: 1 failed


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

jjsjann123 added a commit that referenced this pull request Nov 20, 2020
ghstack-source-id: 9d2d2c2
Pull Request resolved: #47668
@eellison
Copy link
Copy Markdown
Contributor

eellison commented Dec 2, 2020

@jjsjann123 can you get someone from nvidia to review and accept before we land this ? @Krovatkin

@jjsjann123 jjsjann123 requested a review from csarofeen December 3, 2020 14:10
void profileIntList(ProfilingRecord* pr, Node* node, size_t offset) {
auto pn = insertProfileIValueOp(node, offset, pr);

std::function<void(Stack&)> ivalue_profiler = [pr, pn](Stack& stack) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why assign to a std::function here? (rather than say ... const auto)

void profileBool(ProfilingRecord* pr, Node* node, size_t offset) {
auto pn = insertProfileIValueOp(node, offset, pr);

std::function<void(Stack&)> ivalue_profiler = [pr, pn](Stack& stack) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

return IrParser::canParseNode(node);
}

// TODO: we should incorporate this to our parser as well;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear what our parser refers to in this context: replace our parser to a more explicit pointer ?

->schema();
if (node->matches(reduction_operator_schema)) {
switch (offset) {
case 1:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do 1 and 2 mean here?

}

void insertProfileNodesForCUDAFuser_(Block* block, ProfilingRecord* pr) {
for (auto it = block->nodes().begin(); it != block->nodes().end(); ++it) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for (const auto& n : block->nodes()) ?


// TODO: failure in buildShapeExpressions should not break fusion execution,
// we can add a try/catch here to bailout from removeOutputsUsedOnlyInSize.
GRAPH_DUMP("before build shape expression: ", graph_);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this unconditionally dump the graph?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, this is only enabled via environment variable:

// `TorchScript` offers a simple logging facility that can enabled by setting an
// environment variable `PYTORCH_JIT_LOG_LEVEL`.
// Logging is enabled on a per file basis. To enable logging in
// `dead_code_elimination.cpp`, `PYTORCH_JIT_LOG_LEVEL` should be
// set to `dead_code_elimination.cpp` or, simply, to `dead_code_elimination`
// (i.e. `PYTORCH_JIT_LOG_LEVEL=dead_code_elimination`).
// Multiple files can be logged by separating each file name with a colon `:` as
// in the following example,
// `PYTORCH_JIT_LOG_LEVEL=dead_code_elimination:guard_elimination`
// There are 3 logging levels available for your use ordered by the detail level
// from lowest to highest.
// * `GRAPH_DUMP` should be used for printing entire graphs after optimization
// passes
// * `GRAPH_UPDATE` should be used for reporting graph transformations (i.e.
// node deletion, constant folding, etc)
// * `GRAPH_DEBUG` should be used for providing information useful for debugging
// the internals of a particular optimization pass or analysis
// The default logging level is `GRAPH_DUMP` meaning that only `GRAPH_DUMP`
// statements will be enabled when one specifies a file(s) in
// `PYTORCH_JIT_LOG_LEVEL`.
// `GRAPH_UPDATE` can be enabled by prefixing a file name with an `>` as in
// `>alias_analysis`.
// `GRAPH_DEBUG` can be enabled by prefixing a file name with an `>>` as in
// `>>alias_analysis`.
// `>>>` is also valid and **currently** is equivalent to `GRAPH_DEBUG` as there
// is no logging level that is higher than `GRAPH_DEBUG`.

jjsjann123 added a commit that referenced this pull request Dec 3, 2020
ghstack-source-id: 4b784cb
Pull Request resolved: #47668
jjsjann123 added a commit that referenced this pull request Dec 8, 2020
ghstack-source-id: c9106c9
Pull Request resolved: #47668
jjsjann123 added a commit that referenced this pull request Dec 8, 2020
ghstack-source-id: 129a345
Pull Request resolved: #47668
jjsjann123 added a commit to csarofeen/pytorch that referenced this pull request Dec 19, 2020
…or nvfuser.

We tried to go around PR pytorch#47667 refactor profiling optional, since upstream is
still working on it at this time.
jjsjann123 added a commit to csarofeen/pytorch that referenced this pull request Jan 5, 2021
* This is a cherry-pick from upstream PR pytorch#47668 profile ivalue for nvfuser. We tried to go around PR pytorch#47667 refactor profiling optional, since upstream is still working on it at this time.

createConditionalConstant supports profile ivalue including bool, int_list and size
New guard to check conditional constant at runtime
size_eq_guard op to facilitate comparison of dynamic sizes
sum_to_size & _grad_sum_to_size added in integration
jjsjann123 added a commit that referenced this pull request Jan 12, 2021
ghstack-source-id: 29c6d7a
Pull Request resolved: #47668
jjsjann123 added a commit that referenced this pull request Jan 12, 2021
ghstack-source-id: 5a3aa53
Pull Request resolved: #47668
jjsjann123 added a commit that referenced this pull request Jan 13, 2021
ghstack-source-id: b0654c7
Pull Request resolved: #47668
jjsjann123 added a commit that referenced this pull request Jan 16, 2021
ghstack-source-id: 095feb5
Pull Request resolved: #47668
jjsjann123 added a commit that referenced this pull request Jan 19, 2021
ghstack-source-id: f6203c7
Pull Request resolved: #47668
jjsjann123 added a commit that referenced this pull request Jan 20, 2021
ghstack-source-id: 343bbfc
Pull Request resolved: #47668
@jjsjann123 jjsjann123 closed this Jan 29, 2021
@facebook-github-bot facebook-github-bot deleted the gh/jjsjann123/4/head branch March 1, 2021 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants