[ONNX] disable size optimizations by eellison · Pull Request #35401 · pytorch/pytorch

eellison · 2020-03-25T19:14:56Z

Seeing which tests fail in the CI.

…_shape

dr-ci · 2020-03-25T19:16:00Z

💊 CircleCI build failures summary and remediations

As of commit 96af1f3 (more details on the Dr. CI page):

2/6 failures introduced in this PR
4/6 broken upstream at merge base bf24753 on Mar 26 from 11:29am to 1:36pm PDT (4 commits; bf24753 - 4d39aee)
Please rebase on the viable/strict branch (expand for instructions)

Since your merge base is older than viable/strict, run these commands:
```
git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD
```
Check out the recency history of this "viable master" tracking branch.

🕵️ 2 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "Build" (full log | pattern match details) <confirmed not flaky by 2 failures>

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py 
Auto-merging .circleci/cimodel/data/dimensions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/caffe2_build_definitions.py 
Auto-merging .circleci/cimodel/data/caffe2_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/caffe2_build_data.py 
Auto-merging .circleci/cimodel/data/caffe2_build_data.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_data.py 
Auto-merging .circleci/cimodel/data/binary_build_data.py 
CONFLICT (add/add): Merge conflict in .circleci/README.md 
Auto-merging .circleci/README.md 
Automatic merge failed; fix conflicts and then commit the result.

pytorch_xla_linux_xenial_py3_6_clang7_test (2/2)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 31 19:44:48 caused by: Connection refused (os error 111)

Mar 31 19:44:48 +++ eval 'extract_trap_cmd ' 
Mar 31 19:44:48 ++++ extract_trap_cmd 
Mar 31 19:44:48 ++++ printf '%s\n' '' 
Mar 31 19:44:48 +++ printf '%s\n' cleanup 
Mar 31 19:44:48 ++ trap -- ' 
Mar 31 19:44:48 cleanup' EXIT 
Mar 31 19:44:48 ++ which sccache 
Mar 31 19:44:48 ++ sccache --stop-server 
Mar 31 19:44:48 Stopping sccache server... 
Mar 31 19:44:48 error: couldn't connect to server 
Mar 31 19:44:48 caused by: Connection refused (os error 111) 
Mar 31 19:44:48 ++ true 
Mar 31 19:44:48 ++ rm /var/lib/jenkins/sccache_error.log 
Mar 31 19:44:48 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Mar 31 19:44:48 ++ SCCACHE_IDLE_TIMEOUT=1200 
Mar 31 19:44:48 ++ RUST_LOG=sccache::server=error 
Mar 31 19:44:48 ++ sccache --start-server 
Mar 31 19:44:48 Starting sccache server... 
Mar 31 19:44:48 ++ sccache --zero-stats 
Mar 31 19:44:48 Compile requests                 0 
Mar 31 19:44:48 Compile requests executed        0

1 job timed out:

pytorch_linux_xenial_py3_clang5_asan_test

🚧 3 upstream failures:

These were probably caused by upstream breakages:

caffe2_onnx_ort2_py3_6_clang7_ubuntu16_04_test on Mar 26 from 11:08am to 1:33pm PDT (7 commits; 3b2b6ae - e005750)
caffe2_onnx_ort1_py3_6_clang7_ubuntu16_04_test on Mar 26 from 11:08am to 1:33pm PDT (7 commits; 3b2b6ae - e005750)
pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test on Mar 26 from 11:29am to 1:36pm PDT (4 commits; bf24753 - 4d39aee)

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 28 times.

facebook-github-bot

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

…_disable_size_opt

BowenBao · 2020-03-27T17:45:40Z

@eellison the failure in onnx test test_dim reveals an issue with jit, instead of onnx export. I have created a small repro for you to look at.

import torch

class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())

input_1 = torch.arange(6).view(2, 3)
print(m(input_1))
"""
outputs tensor([[ 0,  4,  8],
        [12, 16, 20]])
"""

input_2 = torch.arange(6).view(1, 2, 3)
print(m(input_2))
"""
outputs tensor([[[ 0,  4,  8],
         [12, 16, 20]]])

but should be tensor([[[ 0,  6, 12],
         [18, 24, 30]]])
The correct result can also be produced if DimModel()(input_1) is commented
"""

edit: cc @houseroad

eellison · 2020-03-27T18:13:07Z

@BowenBao when I check the results against eager I don't get any difference:

import torch

class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())
eager = DimModel()

input_1 = torch.arange(6).view(2, 3)
self.assertEqual(eager(input_1), m(input_1))

input_2 = torch.arange(6).view(1, 2, 3)
self.assertEqual(eager(input_2), m(input_2))

BowenBao · 2020-03-27T20:16:20Z

@eellison what is self in the snippet above? I changed self.assertEqual to print and got the following

tensor([[ 0,  4,  8],
        [12, 16, 20]]) tensor([[ 0,  4,  8],
        [12, 16, 20]])
tensor([[[ 0,  6, 12],
         [18, 24, 30]]]) tensor([[[ 0,  4,  8],
         [12, 16, 20]]])

Could you verify if you are using the pytorch build of this pr?

eellison · 2020-03-28T01:00:30Z

Yep, I ran it again on 3a9fc1265dad403479a35f405452853ae0ae6ed8, no failure.

import torch
class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())
eager = DimModel()

input_1 = torch.arange(6).view(2, 3)
print(eager(input_1), m(input_1))
# tensor([[ 0,  4,  8],
#        [12, 16, 20]]) tensor([[ 0,  4,  8],
#        [12, 16, 20]])

input_2 = torch.arange(6).view(1, 2, 3)
print(eager(input_2), m(input_2))
# tensor([[[ 0,  6, 12],
#         [18, 24, 30]]]) tensor([[[ 0,  6, 12],
#        [18, 24, 30]]])

BowenBao · 2020-03-31T18:14:08Z

@eellison that's strange, this is the repro I fetched from the CI failure.

Mar 27 00:10:14     def test_dim(self):
Mar 27 00:10:14         class DimModel(torch.jit.ScriptModule):
Mar 27 00:10:14             @torch.jit.script_method
Mar 27 00:10:14             def forward(self, input):
Mar 27 00:10:14                 out = input * 2
Mar 27 00:10:14                 out *= out.dim()
Mar 27 00:10:14                 return out
Mar 27 00:10:14         empty_input = torch.randn(0, requires_grad=True)
Mar 27 00:10:14         multi_dim_input = torch.randn(1, 2, 3, requires_grad=True)
Mar 27 00:10:14         self.run_test(DimModel(), empty_input)
Mar 27 00:10:14 >       self.run_test(DimModel(), multi_dim_input)
...
Mar 27 00:10:14 >   [np.testing.assert_allclose(out, ort_out, rtol=rtol, atol=atol) for out, ort_out in zip(outputs, ort_outs)]
Mar 27 00:10:14 E   AssertionError: 
Mar 27 00:10:14 E   Not equal to tolerance rtol=0.001, atol=1e-07
Mar 27 00:10:14 E   Mismatch: 100%
Mar 27 00:10:14 E   Max absolute difference: 8.715158
Mar 27 00:10:14 E   Max relative difference: 0.6666667
Mar 27 00:10:14 E    x: array([[[ 3.081992, -0.586858, -4.357579],
Mar 27 00:10:14 E           [ 1.136863, -2.169045, -2.797191]]], dtype=float32)
Mar 27 00:10:14 E    y: array([[[  9.245976,  -1.760573, -13.072737],
Mar 27 00:10:14 E           [  3.410588,  -6.507134,  -8.391573]]], dtype=float32)

Summary: Reviving this PR #35401 eellison. I believe after the profiled graph executor fix the test failures are handled. Pull Request resolved: #36243 Differential Revision: D20950623 Pulled By: eellison fbshipit-source-id: 5fbee426d1a098d84d5938540d45ce00828299be

Summary: Reviving this PR pytorch#35401 eellison. I believe after the profiled graph executor fix the test failures are handled. Pull Request resolved: pytorch#36243 Differential Revision: D20950623 Pulled By: eellison fbshipit-source-id: 5fbee426d1a098d84d5938540d45ce00828299be

eellison added 14 commits March 18, 2020 18:05

[JIT] remove prim::shape op

cee40e0

fix type

d81b11e

add BC ignore

417f213

guard aten::dim optimization for onnx

fc52d29

update w/onnx changes

6a3ac1a

revert to original behavior

11c7704

update name

316d4cb

disable tests

e50fb7c

Merge branch 'master' of https://github.com/pytorch/pytorch into prim…

c946b61

…_shape

skip size

e1a5fdd

Merge branch 'master' of https://github.com/pytorch/pytorch into prim…

f37f624

…_shape

revert changes

19ce29d

Merge branch 'master' of https://github.com/pytorch/pytorch into prim…

bc5a9db

…_shape

remove formatting changes

d006f9e

eellison requested a review from BowenBao March 25, 2020 19:14

eellison requested a review from apaszke as a code owner March 25, 2020 19:14

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 25, 2020

facebook-github-bot reviewed Mar 25, 2020

View reviewed changes

eellison mentioned this pull request Mar 25, 2020

[JIT] remove prim::shape op #34286

Closed

disable size optimizations

ce8e670

eellison force-pushed the onnx_disable_size_opt branch from 859ae96 to ce8e670 Compare March 25, 2020 22:02

facebook-github-bot reviewed Mar 25, 2020

View reviewed changes

eellison closed this Mar 26, 2020

eellison reopened this Mar 26, 2020

eellison and others added 2 commits March 26, 2020 12:09

Merge branch 'master' of https://github.com/pytorch/pytorch into onnx…

3a9fc12

…_disable_size_opt

patch onnx opset 11 export for aten::size

d126512

BowenBao mentioned this pull request Mar 31, 2020

[JIT] Optimize before inlining #35562

Closed

update test_dim for test debugging

96af1f3

eellison closed this Apr 6, 2020

BowenBao mentioned this pull request Apr 8, 2020

[ONNX] disable size optimizations for onnx #36243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ONNX] disable size optimizations#35401

[ONNX] disable size optimizations#35401
eellison wants to merge 18 commits intopytorch:masterfrom
eellison:onnx_disable_size_opt

eellison commented Mar 25, 2020

Uh oh!

dr-ci Bot commented Mar 25, 2020 •

edited

Loading

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

BowenBao commented Mar 27, 2020 •

edited

Loading

Uh oh!

eellison commented Mar 27, 2020

Uh oh!

BowenBao commented Mar 27, 2020

Uh oh!

eellison commented Mar 28, 2020

Uh oh!

BowenBao commented Mar 31, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

eellison commented Mar 25, 2020

Uh oh!

dr-ci Bot commented Mar 25, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

🕵️ 2 new failures recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

pytorch_xla_linux_xenial_py3_6_clang7_test (2/2)

🚧 3 upstream failures:

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

BowenBao commented Mar 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eellison commented Mar 27, 2020

Uh oh!

BowenBao commented Mar 27, 2020

Uh oh!

eellison commented Mar 28, 2020

Uh oh!

BowenBao commented Mar 31, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dr-ci Bot commented Mar 25, 2020 •

edited

Loading

BowenBao commented Mar 27, 2020 •

edited

Loading