Skip to content

ONNX Update training ops and training amenable export API#32950

Closed
lara-hdr wants to merge 32 commits intopytorch:masterfrom
lara-hdr:lahaidar/onnx_training
Closed

ONNX Update training ops and training amenable export API#32950
lara-hdr wants to merge 32 commits intopytorch:masterfrom
lara-hdr:lahaidar/onnx_training

Conversation

@lara-hdr
Copy link
Copy Markdown
Contributor

@lara-hdr lara-hdr commented Feb 3, 2020

Comment thread torch/onnx/symbolic_opset12.py
@kostmo
Copy link
Copy Markdown
Member

kostmo commented Feb 3, 2020

💊 CircleCI build failures summary and remediations

As of commit bf610bf (more details on the Dr. CI page):


  • 2/3 failures introduced in this PR

  • 1/3 broken upstream at merge base bfdcc39 from Mar 23 until Mar 24 (47 commits; 77ccb5c - 3f896ef)

    Please rebase on the viable/strict branch (expand for instructions)

    Since your merge base is older than viable/strict, run these commands:

    git fetch https://github.com/pytorch/pytorch viable/strict
    git rebase FETCH_HEAD
    

    Check out the recency history of this "viable master" tracking branch.


🕵️ 2 new failures recognized by patterns

The following build failures do not appear to be due to upstream breakages (reran 1 job to discount flakiness):

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/2)

Step: "Build" (full log | pattern match details) <confirmed not flaky by 2 failures>

Automatic merge failed; fix conflicts and then commit the result.
CONFLICT (add/add): Merge conflict in CMakeLists.txt 
Auto-merging CMakeLists.txt 
CONFLICT (add/add): Merge conflict in .jenkins/pytorch/win-test-helpers/build_pytorch.bat 
Auto-merging .jenkins/pytorch/win-test-helpers/build_pytorch.bat 
CONFLICT (add/add): Merge conflict in .jenkins/pytorch/macos-common.sh 
Auto-merging .jenkins/pytorch/macos-common.sh 
CONFLICT (add/add): Merge conflict in .github/workflows/lint.yml 
Auto-merging .github/workflows/lint.yml 
CONFLICT (add/add): Merge conflict in .circleci/scripts/binary_populate_env.sh 
Auto-merging .circleci/scripts/binary_populate_env.sh 
Automatic merge failed; fix conflicts and then commit the result. 

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test (2/2)

Step: "Test" (full log | pattern match details) <confirmed not flaky by 2 failures>

Mar 27 05:26:23 AssertionError: 11 not less than or equal to 1e-05 :
Mar 27 05:26:23 ---------------------------------------------------------------------- 
Mar 27 05:26:23 Traceback (most recent call last): 
Mar 27 05:26:23   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 175, in wrapper 
Mar 27 05:26:23     self._join_processes(fn) 
Mar 27 05:26:23   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 285, in _join_processes 
Mar 27 05:26:23     self._check_return_codes(elapsed_time) 
Mar 27 05:26:23   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 328, in _check_return_codes 
Mar 27 05:26:23     self.assertEqual(first_process.exitcode, 0) 
Mar 27 05:26:23   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 915, in assertEqual 
Mar 27 05:26:23     super(TestCase, self).assertLessEqual(abs(x - y), prec, message) 
Mar 27 05:26:23 AssertionError: 11 not less than or equal to 1e-05 :  
Mar 27 05:26:23  
Mar 27 05:26:23 ---------------------------------------------------------------------- 
Mar 27 05:26:23 Ran 29 tests in 23.055s 
Mar 27 05:26:23  
Mar 27 05:26:23 FAILED (failures=1) 
Mar 27 05:26:23  
Mar 27 05:26:23 Generating XML reports... 
Mar 27 05:26:23 Traceback (most recent call last): 
Mar 27 05:26:23   File "test/run_test.py", line 682, in <module> 
Mar 27 05:26:23     main() 

1 job timed out:

  • pytorch_macos_10_13_py3_test

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 153 times.

Comment thread torch/onnx/utils.py
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@jerryzh168 jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 5, 2020
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Comment thread torch/onnx/symbolic_helper.py
Comment thread torch/onnx/utils.py Outdated
Comment thread torch/onnx/utils.py
Comment thread torch/onnx/utils.py Outdated
Comment thread torch/onnx/utils.py
Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Comment thread torch/onnx/symbolic_opset12.py
Comment thread torch/onnx/utils.py Outdated
Comment thread torch/onnx/symbolic_helper.py Outdated
Comment thread torch/onnx/utils.py
Comment thread torch/onnx/utils.py Outdated
Copy link
Copy Markdown
Member

@houseroad houseroad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you rebase to master and trigger the test again?

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@houseroad
Copy link
Copy Markdown
Member

https://github.com/pytorch/pytorch/pull/32950/checks?check_run_id=512600263 clang-tidy is broken, could you fix it?

@lara-hdr
Copy link
Copy Markdown
Contributor Author

lara-hdr commented Mar 18, 2020

@houseroad it seems like after adding the "Training" enum in torch/csrc/onnx/init.cpp, clang-tidy is trying to include onnx-ml.pb.h (instead of onnx.ph.h) which is the reason of the failure.

I am not sure why this is happening since ONNX_ML is not defined..
https://github.com/onnx/onnx/blob/9fdae4c68960a2d44cd1cc871c74a6a9d469fa1f/onnx/onnx_pb.h#L50

-g"-torch/csrc/jit/export.cpp" \
-g"-torch/csrc/jit/import.cpp" \
-g"-torch/csrc/jit/netdef_converter.cpp" \
-g"-torch/csrc/onnx/init.cpp" \
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad added this line to exclude init.cpp (which includes onnx_pb.h) to fix the linter error, as explained in the comment above (line 191-193)

@lara-hdr
Copy link
Copy Markdown
Contributor Author

cc @houseroad clang-tidy issue is fixed

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@houseroad has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@houseroad merged this pull request in 025a0ab.

lara-hdr added a commit to lara-hdr/pytorch that referenced this pull request Mar 27, 2020
)

Summary:
- Update Dropout and Batchnorm in opset 12 : onnx/onnx#2568
- Update api logic for exporting to ONNX training amenable models
Pull Request resolved: pytorch#32950

Reviewed By: hl475

Differential Revision: D19710370

Pulled By: houseroad

fbshipit-source-id: e5e79d38552936966662c41d39ddf33be1ba3e35
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
)

Summary:
- Update Dropout and Batchnorm in opset 12 : onnx/onnx#2568
- Update api logic for exporting to ONNX training amenable models
Pull Request resolved: pytorch#32950

Reviewed By: hl475

Differential Revision: D19710370

Pulled By: houseroad

fbshipit-source-id: e5e79d38552936966662c41d39ddf33be1ba3e35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants