Skip to content

[ONNX] disable size optimizations#35401

Closed
eellison wants to merge 18 commits intopytorch:masterfrom
eellison:onnx_disable_size_opt
Closed

[ONNX] disable size optimizations#35401
eellison wants to merge 18 commits intopytorch:masterfrom
eellison:onnx_disable_size_opt

Conversation

@eellison
Copy link
Copy Markdown
Contributor

Seeing which tests fail in the CI.

@eellison eellison requested a review from BowenBao March 25, 2020 19:14
@eellison eellison requested a review from apaszke as a code owner March 25, 2020 19:14
@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 25, 2020
@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Mar 25, 2020

💊 CircleCI build failures summary and remediations

As of commit 96af1f3 (more details on the Dr. CI page):


  • 2/6 failures introduced in this PR

  • 4/6 broken upstream at merge base bf24753 on Mar 26 from 11:29am to 1:36pm PDT (4 commits; bf24753 - 4d39aee)

    Please rebase on the viable/strict branch (expand for instructions)

    Since your merge base is older than viable/strict, run these commands:

    git fetch https://github.com/pytorch/pytorch viable/strict
    git rebase FETCH_HEAD
    

    Check out the recency history of this "viable master" tracking branch.


🕵️ 2 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "Build" (full log | pattern match details) <confirmed not flaky by 2 failures>

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/dimensions.py 
Auto-merging .circleci/cimodel/data/dimensions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/caffe2_build_definitions.py 
Auto-merging .circleci/cimodel/data/caffe2_build_definitions.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/caffe2_build_data.py 
Auto-merging .circleci/cimodel/data/caffe2_build_data.py 
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/binary_build_data.py 
Auto-merging .circleci/cimodel/data/binary_build_data.py 
CONFLICT (add/add): Merge conflict in .circleci/README.md 
Auto-merging .circleci/README.md 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_xla_linux_xenial_py3_6_clang7_test (2/2)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 31 19:44:48 caused by: Connection refused (os error 111)
Mar 31 19:44:48 +++ eval 'extract_trap_cmd ' 
Mar 31 19:44:48 ++++ extract_trap_cmd 
Mar 31 19:44:48 ++++ printf '%s\n' '' 
Mar 31 19:44:48 +++ printf '%s\n' cleanup 
Mar 31 19:44:48 ++ trap -- ' 
Mar 31 19:44:48 cleanup' EXIT 
Mar 31 19:44:48 ++ which sccache 
Mar 31 19:44:48 ++ sccache --stop-server 
Mar 31 19:44:48 Stopping sccache server... 
Mar 31 19:44:48 error: couldn't connect to server 
Mar 31 19:44:48 caused by: Connection refused (os error 111) 
Mar 31 19:44:48 ++ true 
Mar 31 19:44:48 ++ rm /var/lib/jenkins/sccache_error.log 
Mar 31 19:44:48 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Mar 31 19:44:48 ++ SCCACHE_IDLE_TIMEOUT=1200 
Mar 31 19:44:48 ++ RUST_LOG=sccache::server=error 
Mar 31 19:44:48 ++ sccache --start-server 
Mar 31 19:44:48 Starting sccache server... 
Mar 31 19:44:48 ++ sccache --zero-stats 
Mar 31 19:44:48 Compile requests                 0 
Mar 31 19:44:48 Compile requests executed        0 

1 job timed out:

  • pytorch_linux_xenial_py3_clang5_asan_test

🚧 3 upstream failures:

These were probably caused by upstream breakages:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 28 times.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@eellison eellison force-pushed the onnx_disable_size_opt branch from 859ae96 to ce8e670 Compare March 25, 2020 22:02
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eellison has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@eellison eellison closed this Mar 26, 2020
@eellison eellison reopened this Mar 26, 2020
@BowenBao
Copy link
Copy Markdown
Collaborator

BowenBao commented Mar 27, 2020

@eellison the failure in onnx test test_dim reveals an issue with jit, instead of onnx export. I have created a small repro for you to look at.

import torch

class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())

input_1 = torch.arange(6).view(2, 3)
print(m(input_1))
"""
outputs tensor([[ 0,  4,  8],
        [12, 16, 20]])
"""

input_2 = torch.arange(6).view(1, 2, 3)
print(m(input_2))
"""
outputs tensor([[[ 0,  4,  8],
         [12, 16, 20]]])

but should be tensor([[[ 0,  6, 12],
         [18, 24, 30]]])
The correct result can also be produced if DimModel()(input_1) is commented
"""

edit: cc @houseroad

@eellison
Copy link
Copy Markdown
Contributor Author

@BowenBao when I check the results against eager I don't get any difference:

import torch

class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())
eager = DimModel()

input_1 = torch.arange(6).view(2, 3)
self.assertEqual(eager(input_1), m(input_1))

input_2 = torch.arange(6).view(1, 2, 3)
self.assertEqual(eager(input_2), m(input_2))

@BowenBao
Copy link
Copy Markdown
Collaborator

@eellison what is self in the snippet above? I changed self.assertEqual to print and got the following

tensor([[ 0,  4,  8],
        [12, 16, 20]]) tensor([[ 0,  4,  8],
        [12, 16, 20]])
tensor([[[ 0,  6, 12],
         [18, 24, 30]]]) tensor([[[ 0,  4,  8],
         [12, 16, 20]]])

Could you verify if you are using the pytorch build of this pr?

@eellison
Copy link
Copy Markdown
Contributor Author

Yep, I ran it again on 3a9fc1265dad403479a35f405452853ae0ae6ed8, no failure.

import torch
class DimModel(torch.nn.Module):
    def forward(self, input):
        out = input * 2
        out *= out.dim()
        return out

m = torch.jit.script(DimModel())
eager = DimModel()

input_1 = torch.arange(6).view(2, 3)
print(eager(input_1), m(input_1))
# tensor([[ 0,  4,  8],
#        [12, 16, 20]]) tensor([[ 0,  4,  8],
#        [12, 16, 20]])

input_2 = torch.arange(6).view(1, 2, 3)
print(eager(input_2), m(input_2))
# tensor([[[ 0,  6, 12],
#         [18, 24, 30]]]) tensor([[[ 0,  6, 12],
#        [18, 24, 30]]])

@BowenBao
Copy link
Copy Markdown
Collaborator

@eellison that's strange, this is the repro I fetched from the CI failure.

Mar 27 00:10:14     def test_dim(self):
Mar 27 00:10:14         class DimModel(torch.jit.ScriptModule):
Mar 27 00:10:14             @torch.jit.script_method
Mar 27 00:10:14             def forward(self, input):
Mar 27 00:10:14                 out = input * 2
Mar 27 00:10:14                 out *= out.dim()
Mar 27 00:10:14                 return out
Mar 27 00:10:14         empty_input = torch.randn(0, requires_grad=True)
Mar 27 00:10:14         multi_dim_input = torch.randn(1, 2, 3, requires_grad=True)
Mar 27 00:10:14         self.run_test(DimModel(), empty_input)
Mar 27 00:10:14 >       self.run_test(DimModel(), multi_dim_input)
...
Mar 27 00:10:14 >   [np.testing.assert_allclose(out, ort_out, rtol=rtol, atol=atol) for out, ort_out in zip(outputs, ort_outs)]
Mar 27 00:10:14 E   AssertionError: 
Mar 27 00:10:14 E   Not equal to tolerance rtol=0.001, atol=1e-07
Mar 27 00:10:14 E   Mismatch: 100%
Mar 27 00:10:14 E   Max absolute difference: 8.715158
Mar 27 00:10:14 E   Max relative difference: 0.6666667
Mar 27 00:10:14 E    x: array([[[ 3.081992, -0.586858, -4.357579],
Mar 27 00:10:14 E           [ 1.136863, -2.169045, -2.797191]]], dtype=float32)
Mar 27 00:10:14 E    y: array([[[  9.245976,  -1.760573, -13.072737],
Mar 27 00:10:14 E           [  3.410588,  -6.507134,  -8.391573]]], dtype=float32)

@eellison eellison closed this Apr 6, 2020
facebook-github-bot pushed a commit that referenced this pull request Apr 10, 2020
Summary:
Reviving this PR #35401 eellison. I believe after the profiled graph executor fix the test failures are handled.
Pull Request resolved: #36243

Differential Revision: D20950623

Pulled By: eellison

fbshipit-source-id: 5fbee426d1a098d84d5938540d45ce00828299be
ashishfarmer pushed a commit to ashishfarmer/pytorch that referenced this pull request Apr 13, 2020
Summary:
Reviving this PR pytorch#35401 eellison. I believe after the profiled graph executor fix the test failures are handled.
Pull Request resolved: pytorch#36243

Differential Revision: D20950623

Pulled By: eellison

fbshipit-source-id: 5fbee426d1a098d84d5938540d45ce00828299be
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Reviving this PR pytorch#35401 eellison. I believe after the profiled graph executor fix the test failures are handled.
Pull Request resolved: pytorch#36243

Differential Revision: D20950623

Pulled By: eellison

fbshipit-source-id: 5fbee426d1a098d84d5938540d45ce00828299be
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants