Skip to content

Enabled pytorch-linux-backward-compatibility-check-test in GHA#61973

Closed
MaxMotovilov wants to merge 8 commits intopytorch:masterfrom
MaxMotovilov:enable-ci-action
Closed

Enabled pytorch-linux-backward-compatibility-check-test in GHA#61973
MaxMotovilov wants to merge 8 commits intopytorch:masterfrom
MaxMotovilov:enable-ci-action

Conversation

@MaxMotovilov
Copy link
Copy Markdown
Contributor

Summary:

Test Plan: Execute GHA, verify results

Reviewers: Zhuojie Zhou

Subscribers:

Tasks: T96112383

Tags:

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Jul 21, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 21013ce (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test2 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Aug 05 23:00:15 unknown file: Failure
Aug 05 23:00:15 frame #9: build/bin/test_api() [0xbdb39a]
Aug 05 23:00:15 frame #10: build/bin/test_api() [0xbdbbed]
Aug 05 23:00:15 frame #11: testing::internal::UnitTestImpl::RunAllTests() + 0xe27 (0xbe47b7 in build/bin/test_api)
Aug 05 23:00:15 frame #12: testing::UnitTest::Run() + 0x8f (0xbe4b6f in build/bin/test_api)
Aug 05 23:00:15 frame #13: main + 0xc8 (0x586cb8 in build/bin/test_api)
Aug 05 23:00:15 frame #14: __libc_start_main + 0xf0 (0x7f22493d0840 in /lib/x86_64-linux-gnu/libc.so.6)
Aug 05 23:00:15 frame #15: _start + 0x29 (0x5c7b29 in build/bin/test_api)
Aug 05 23:00:15 " thrown in the test body.
Aug 05 23:00:15 [  FAILED  ] IMethodTest.CallMethod (3 ms)
Aug 05 23:00:15 [ RUN      ] IMethodTest.GetArgumentNames
Aug 05 23:00:15 unknown file: Failure
Aug 05 23:00:15 C++ exception with description "open file failed, file path: torch/csrc/deploy/example/generated/simple_jit
Aug 05 23:00:15 Exception raised from RAIIFile at /var/lib/jenkins/workspace/caffe2/serialize/file_adapter.cc:13 (most recent call first):
Aug 05 23:00:15 frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x6b (0x7f2249ae64ab in /var/lib/jenkins/workspace/build/lib/libc10.so)
Aug 05 23:00:15 frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xce (0x7f2249ae1c6e in /var/lib/jenkins/workspace/build/lib/libc10.so)
Aug 05 23:00:15 frame #2: caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0xdc (0x7f2265cea12c in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
Aug 05 23:00:15 frame #3: caffe2::serialize::FileAdapter::FileAdapter(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x41 (0x7f2265cea7b1 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
Aug 05 23:00:15 frame #4: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&) + 0x40 (0x7f2267345460 in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
Aug 05 23:00:15 frame #5: torch::jit::load(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, c10::optional<c10::Device>) + 0x6f (0x7f22673455ef in /var/lib/jenkins/workspace/build/lib/libtorch_cpu.so)
Aug 05 23:00:15 frame #6: IMethodTest_GetArgumentNames_Test::TestBody() + 0x7e (0xb9f5be in build/bin/test_api)
Aug 05 23:00:15 frame #7: void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) + 0x4a (0xbebafa in build/bin/test_api)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@MaxMotovilov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@zhouzhuojie zhouzhuojie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it still triggers a full build, is it possible that the backward tests can be included in the existing linux CI workflow? For example, just like the build_docs job, it's only enabled for pytorch-linux-xenial-py3.6-gcc5.4, and reuse the build artifacts

docker_image_base=f"{DOCKER_REGISTRY}/pytorch/pytorch-linux-xenial-py3.6-gcc5.4",
test_runner_type=LINUX_CPU_TEST_RUNNER,
on_pull_request=True,
num_test_shards=2,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think test_shards is handled for backward tests, can you help check the logs from both shards - I guess they are testing same things, if yes might want to set it to 1 or remove this field

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think test_shards is handled for backward tests, can you help check the logs from both shards - I guess they are testing same things, if yes might want to set it to 1 or remove this field

Pretty sure they are... I was wondering where the /0 and /1 are coming from.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shards does not really work for backward compat tests, so let's keep them at one

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is an entirely new workflow really needed for this? Seems like this is just an extension of pytorch-linux-xenial-py3.6-gcc5.4 in general

@samestep did a lot of work to enable separate configurations within that workflow

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I could understand, the script for the bw compat test installs a nightly build of PyTorch anyway -

pip_install --pre torch -f https://download.pytorch.org/whl/nightly/cpu/torch_nightly.html
. Wonder if it is possible to run it against some manifestly pre-existing Docker image rather than require a fresh build that will get overwritten anyway?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the main goal of bc check is to build with current commit and compare against latest nightly API FunctionSchemas.

so it is not "overwritten" --> once the FunctionSchema from nightly wheel got extracted it is actually not useful anymore.
That being said, it would be nice to just have some way to host the API schema instead of having to download the entire nightly build.

@MaxMotovilov
Copy link
Copy Markdown
Contributor Author

it still triggers a full build, is it possible that the backward tests can be included in the existing linux CI workflow? For example, just like the build_docs job, it's only enabled for pytorch-linux-xenial-py3.6-gcc5.4, and reuse the build artifacts

Given that the first thing it does for the actual test step is installs a nightly, perhaps it doesn't even need the build dependency? Strange that CircleCI version seems to have it...

@zhouzhuojie
Copy link
Copy Markdown
Contributor

it still triggers a full build, is it possible that the backward tests can be included in the existing linux CI workflow? For example, just like the build_docs job, it's only enabled for pytorch-linux-xenial-py3.6-gcc5.4, and reuse the build artifacts

Given that the first thing it does for the actual test step is installs a nightly, perhaps it doesn't even need the build dependency? Strange that CircleCI version seems to have it...

oh wait, so basically it just test the backward compatibility for nightly builds? if so, then yes, probably can skip the build step and just go for the test

@MaxMotovilov
Copy link
Copy Markdown
Contributor Author

oh wait, so basically it just test the backward compatibility for nightly builds? if so, then yes, probably can skip the build step and just go for the test

But what docker image should I use? Need something that has [a fresh enough] .jenkins/ and probably some dependencies. What it doesn't need is a PyTorch build.

@zhouzhuojie
Copy link
Copy Markdown
Contributor

oh wait, so basically it just test the backward compatibility for nightly builds? if so, then yes, probably can skip the build step and just go for the test

But what docker image should I use? Need something that has [a fresh enough] .jenkins/ and probably some dependencies. What it doesn't need is a PyTorch build.

That's a base docker image we rebuilt not frequently, e.g. some low-level dependencies will be put there to prepare the pytorch build. The image can be found from the calculate-docker-image step

@MaxMotovilov MaxMotovilov requested a review from zhouzhuojie July 28, 2021 21:18
@zhouzhuojie zhouzhuojie mentioned this pull request Aug 4, 2021
Copy link
Copy Markdown
Contributor

@walterddr walterddr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in order to achieve what @zhouzhuojie mentioned I think you also need to change generate_pytorch_test_matrix.py to generate additional test step for backward_compatibility.

@walterddr
Copy link
Copy Markdown
Contributor

it still triggers a full build, is it possible that the backward tests can be included in the existing linux CI workflow? For example, just like the build_docs job, it's only enabled for pytorch-linux-xenial-py3.6-gcc5.4, and reuse the build artifacts

Given that the first thing it does for the actual test step is installs a nightly, perhaps it doesn't even need the build dependency? Strange that CircleCI version seems to have it...

oh wait, so basically it just test the backward compatibility for nightly builds? if so, then yes, probably can skip the build step and just go for the test

this statement is not true, it is comparing against nightly as baseline (it is test for the current commit)

@MaxMotovilov
Copy link
Copy Markdown
Contributor Author

in order to achieve what @zhouzhuojie mentioned I think you also need to change generate_pytorch_test_matrix.py to generate additional test step for backward_compatibility.

Wait - why? Current code in test.sh just runs the bwc check after the regular test step in pytorch-linux-xenial-py3.6-gcc5.4 because the BUILD_ENVIRONMENT passed into the test step (and the test step only!) is set to pytorch-linux-xenial-py3.6-gcc5.4-backward in the template. Or is this too much of a hack?

@walterddr
Copy link
Copy Markdown
Contributor

Yes I am just reflecting on the original comment from test.sh

# Do NOT run this test before any other tests, like test_python_shard1, etc.
# Because this function uninstalls the torch built from branch, and install
# nightly version.

and I am sure there's a reason why it needs to be on its own job instead of being ran as part of other tests. also this will run bc check on both test1 and test2 right?

@MaxMotovilov
Copy link
Copy Markdown
Contributor Author

MaxMotovilov commented Aug 4, 2021

and I am sure there's a reason why it needs to be on its own job instead of being ran as part of other tests

Says "before" - which makes sense given that it overwrites current build with a nightly. Shouldn't have a problem with "after" though?

also this will run bc check on both test1 and test2 right?

Ah! Yes, I meant to fix that and forgot. Will do - just need to pull this all back on a devserver because the one I was using has already lapsed after I finished bootcamp.

@driazati driazati removed their request for review September 29, 2021 18:30
@github-actions
Copy link
Copy Markdown
Contributor

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions Bot added the Stale label Apr 13, 2022
@github-actions github-actions Bot closed this May 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants