Skip to content

OpInfo: nn.functional.conv_transpose2d#62882

Closed
krshrimali wants to merge 18 commits intopytorch:masterfrom
krshrimali:opinfo/high_priority/nn/conv_transpose2d
Closed

OpInfo: nn.functional.conv_transpose2d#62882
krshrimali wants to merge 18 commits intopytorch:masterfrom
krshrimali:opinfo/high_priority/nn/conv_transpose2d

Conversation

@krshrimali
Copy link
Contributor

@krshrimali krshrimali commented Aug 6, 2021

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 6, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit b480838 (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 1/3 non-scanned failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang7_asan_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 17 08:29:05 test_remote_message_script_de...yUniqueId(created_on=0, local_id=0) to be created.
Aug 17 08:28:25 frame #13: <unknown function> + 0x198a85b0 (0x7f97c0f835b0 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 17 08:28:25 frame #14: c10::ThreadPool::main_loop(unsigned long) + 0x7f1 (0x7f979e1c8b91 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 17 08:28:25 frame #15: <unknown function> + 0xb8c80 (0x7f97ea839c80 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
Aug 17 08:28:25 frame #16: <unknown function> + 0x76ba (0x7f97eaed46ba in /lib/x86_64-linux-gnu/libpthread.so.0)
Aug 17 08:28:25 frame #17: clone + 0x6d (0x7f97eac0a51d in /lib/x86_64-linux-gnu/libc.so.6)
Aug 17 08:28:25 
Aug 17 08:28:25 ok (7.111s)
Aug 17 08:28:36   test_remote_message_dropped_pickle (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (11.108s)
Aug 17 08:28:47   test_remote_message_dropped_pickle_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (11.010s)
Aug 17 08:28:58   test_remote_message_script_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (10.109s)
Aug 17 08:29:05   test_remote_message_script_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [E request_callback_no_python.cpp:559] Received error while processing request type 260: falseINTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Aug 17 08:29:05 Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):
Aug 17 08:29:05 frame #0: <unknown function> + 0x1a231c (0x7f737dca531c in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 17 08:29:05 frame #1: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x6d (0x7f739f6f9c5d in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 17 08:29:05 frame #2: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x160 (0x7f737dca3800 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 17 08:29:05 frame #3: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x18a (0x7f737dc9e66a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 17 08:29:05 frame #4: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x115 (0x7f737dc9ed75 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 17 08:29:05 frame #5: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0xd62 (0x7f73a097da22 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 17 08:29:05 frame #6: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 0x223 (0x7f73a0942773 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 17 08:29:05 frame #7: torch::distributed::rpc::RequestCallbackImpl::processScriptRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x8e3 (0x7f73c30f3c23 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Aug 17 08:29:05 frame #8: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x78d (0x7f73a093f8fd in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)

1 failure not recognized by patterns:

Job Step Action
GitHub Actions Lint / flake8-py3 Fail if there were any warnings 🔁 rerun

1 job timed out:

  • pytorch_linux_xenial_py3_clang7_asan_test1

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@krshrimali krshrimali marked this pull request as draft August 6, 2021 09:51
@krshrimali krshrimali marked this pull request as ready for review August 9, 2021 10:37
@gchanan gchanan requested a review from mruberry August 9, 2021 14:56
@gchanan gchanan added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Aug 9, 2021
@krshrimali krshrimali requested a review from zou3519 August 10, 2021 03:52
@krshrimali krshrimali added the module: testing Issues related to the torch.testing module (not tests) label Aug 10, 2021
Copy link
Contributor

@zou3519 zou3519 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I had some suggestions for more test cases

Comment on lines +6673 to +6674
dtypesIfCPU=floating_types(),
dtypesIfCUDA=floating_types_and(torch.float16, torch.bfloat16),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype tests seem to be failing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies. This seems to be because conv_transpose2d only supports torch.bfloat16 for CUDA > 11.0 versions. Should be fixed in the latest commit.

# Ordered as shapes for: input, weight, bias, stride, padding, output_padding, groups
cases = (((1, 3, 4, 4), (3, 3, 3, 3), (3), (2, 2), 2, (1, 1), 1),
((2, 2, 4, 4), (2, 2, 4, 5), (4), (3, 3), 1, (2, 2), 2),
((1, 1, 4, 5), (1, 1, 4, 3), (1), 2, 1, 1, 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(1) doesn't actually make a tuple, you want (1,)

Copy link
Contributor Author

@krshrimali krshrimali Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, @zou3519. I have now used (in the recent commit) a single number since it represents a shape, and is passed to make_arg which works for integers or tuples as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: having an int there raises mypy errors (the combined type becomes builtins.object which is not iterable). Having a tuple there seems like a better solution (1,) instead of 1. Fixed in the recent commit. Thanks!

make_arg = partial(make_tensor, device=device, dtype=dtype, requires_grad=requires_grad)

# Ordered as shapes for: input, weight, bias, stride, padding, output_padding, groups
cases = (((1, 3, 4, 4), (3, 3, 3, 3), (3), (2, 2), 2, (1, 1), 1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add some more test cases:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestions, @zou3519 - I've made the revisions. Please let me know if they sound good to you. :)

@krshrimali krshrimali requested a review from zou3519 August 12, 2021 04:15
@facebook-github-bot
Copy link
Contributor

@zou3519 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@zou3519 merged this pull request in baedb55.

@ngimel
Copy link
Collaborator

ngimel commented Aug 16, 2021

This likely broke windows test (it's flaky and complaining about small mismatch) https://github.com/pytorch/pytorch/runs/3327143486, can you please adjust tolerance?

@krshrimali krshrimali reopened this Aug 17, 2021
@krshrimali krshrimali closed this Aug 17, 2021
alanwaketan pushed a commit that referenced this pull request Aug 17, 2021
Summary:
See pytorch/functorch#78 and #54261.

cc: mruberry zou3519 Chillee

Pull Request resolved: #62882

Reviewed By: bdhirsh

Differential Revision: D30280804

Pulled By: zou3519

fbshipit-source-id: e40cdf43e98c1f11e45df6b8bc13110b4d29c45f
facebook-github-bot pushed a commit that referenced this pull request Aug 17, 2021
Summary:
Addresses comment: #62882 (comment).

cc: mruberry ngimel

Pull Request resolved: #63389

Reviewed By: mruberry

Differential Revision: D30377481

Pulled By: ngimel

fbshipit-source-id: 0fa21acc3503c259c9b27463e8555247c43d9e2e
@zou3519 zou3519 mentioned this pull request Aug 26, 2021
facebook-github-bot pushed a commit that referenced this pull request Sep 16, 2021
Summary:
Reference: #54261

Reference: pytorch/functorch#78

Mostly inspired from #62882

Pull Request resolved: #63517

Reviewed By: heitorschueroff

Differential Revision: D30993855

Pulled By: zou3519

fbshipit-source-id: 7402f99addb4ef8f19c2ce1a09ed9006e737cc7e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: testing Issues related to the torch.testing module (not tests) open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants