Skip to content

OpInfo fix: conv_transpose2d#63389

Closed
krshrimali wants to merge 14 commits intopytorch:masterfrom
krshrimali:opinfo/fixes/conv_transpose2d
Closed

OpInfo fix: conv_transpose2d#63389
krshrimali wants to merge 14 commits intopytorch:masterfrom
krshrimali:opinfo/fixes/conv_transpose2d

Conversation

@krshrimali
Copy link
Contributor

Addresses comment: #62882 (comment).

cc: @mruberry @ngimel

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Aug 17, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit c89de66 (more details on the Dr. CI page):



🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_clang7_asan_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Aug 17 08:06:46 test_remote_message_script_de...yUniqueId(created_on=0, local_id=0) to be created.
Aug 17 08:06:07 frame #13: <unknown function> + 0x198a85b0 (0x7f941bc9c5b0 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 17 08:06:07 frame #14: c10::ThreadPool::main_loop(unsigned long) + 0x7f1 (0x7f93f8ee1b91 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 17 08:06:07 frame #15: <unknown function> + 0xb8c80 (0x7f9445550c80 in /usr/lib/x86_64-linux-gnu/libstdc++.so.6)
Aug 17 08:06:07 frame #16: <unknown function> + 0x76ba (0x7f9445beb6ba in /lib/x86_64-linux-gnu/libpthread.so.0)
Aug 17 08:06:07 frame #17: clone + 0x6d (0x7f944592151d in /lib/x86_64-linux-gnu/libc.so.6)
Aug 17 08:06:07 
Aug 17 08:06:07 ok (6.892s)
Aug 17 08:06:18   test_remote_message_dropped_pickle (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (10.794s)
Aug 17 08:06:29   test_remote_message_dropped_pickle_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (10.802s)
Aug 17 08:06:39   test_remote_message_script_delay_timeout (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... ok (9.794s)
Aug 17 08:06:46   test_remote_message_script_delay_timeout_to_self (__main__.FaultyFaultyAgentRpcTestWithSpawn) ... [E request_callback_no_python.cpp:559] Received error while processing request type 260: falseINTERNAL ASSERT FAILED at "/var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp":387, please report a bug to PyTorch. Expected OwnerRRef with id GloballyUniqueId(created_on=0, local_id=0) to be created.
Aug 17 08:06:46 Exception raised from getOwnerRRef at /var/lib/jenkins/workspace/torch/csrc/distributed/rpc/rref_context.cpp:387 (most recent call first):
Aug 17 08:06:46 frame #0: <unknown function> + 0x1a231c (0x7f4ea7eaf31c in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 17 08:06:46 frame #1: std::function<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > ()>::operator()() const + 0x6d (0x7f4ec9903c5d in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 17 08:06:46 frame #2: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0x160 (0x7f4ea7ead800 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 17 08:06:46 frame #3: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x18a (0x7f4ea7ea866a in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 17 08:06:46 frame #4: c10::detail::torchInternalAssertFail(char const*, char const*, unsigned int, char const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x115 (0x7f4ea7ea8d75 in /opt/conda/lib/python3.6/site-packages/torch/lib/libc10.so)
Aug 17 08:06:46 frame #5: torch::distributed::rpc::RRefContext::getOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, bool) + 0xd62 (0x7f4ecab87a22 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 17 08:06:46 frame #6: torch::distributed::rpc::RequestCallbackNoPython::assignOwnerRRef(torch::distributed::rpc::GloballyUniqueId const&, torch::distributed::rpc::GloballyUniqueId const&, c10::intrusive_ptr<c10::ivalue::Future, c10::detail::intrusive_target_default_null_type<c10::ivalue::Future> >) const + 0x223 (0x7f4ecab4c773 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)
Aug 17 08:06:46 frame #7: torch::distributed::rpc::RequestCallbackImpl::processScriptRemoteCall(torch::distributed::rpc::RpcCommandBase&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x8e3 (0x7f4eed2fdc23 in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so)
Aug 17 08:06:46 frame #8: torch::distributed::rpc::RequestCallbackNoPython::processRpc(torch::distributed::rpc::RpcCommandBase&, torch::distributed::rpc::MessageType const&, std::vector<c10::Stream, std::allocator<c10::Stream> >) const + 0x78d (0x7f4ecab498fd in /opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_cpu.so)

1 job timed out:

  • pytorch_linux_xenial_py3_clang7_asan_test1

❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_windows_vs2019_py38_cuda10.1_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun) ❄️

requests.exceptions.ChunkedEncodingError: ("Con...ly closed by the remote host', None, 10054, None))
      File "C:\Jenkins\Miniconda3\lib\site-packages\requests\sessions.py", line 555, in get
        return self.request('GET', url, **kwargs)
      File "C:\Jenkins\Miniconda3\lib\site-packages\requests\sessions.py", line 542, in request
        resp = self.send(prep, **send_kwargs)
      File "C:\Jenkins\Miniconda3\lib\site-packages\requests\sessions.py", line 697, in send
        r.content
      File "C:\Jenkins\Miniconda3\lib\site-packages\requests\models.py", line 831, in content
        self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
      File "C:\Jenkins\Miniconda3\lib\site-packages\requests\models.py", line 756, in generate
        raise ChunkedEncodingError(e)
    requests.exceptions.ChunkedEncodingError: ("Connection broken: ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None)", ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

`$ C:\Jenkins\Miniconda3\Scripts\conda-script.py install -y -q -c conda-forge libuv=1.39`

  environment variables:
                 CIO_TEST=<not set>
       CMAKE_INCLUDE_PATH=C:\Users\circleci\project\build\win_tmp\mkl\include
        CONDA_DEFAULT_ENV=base
                CONDA_EXE=C:\Jenkins\Miniconda3\condabin\..\Scripts\conda.exe
               CONDA_EXES="C:\Jenkins\Miniconda3\condabin\..\Scripts\conda.exe"
         CONDA_PARENT_DIR=C:\Jenkins

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@krshrimali krshrimali added the module: testing Issues related to the torch.testing module (not tests) label Aug 17, 2021
@krshrimali krshrimali requested a review from mruberry August 17, 2021 17:59
@krshrimali
Copy link
Contributor Author

Hi, @mruberry - I couldn't find the relevant in the list (it was there earlier but had to rerun as it was failing for some different reason). Though the Windows tests seem to pass now, and failures don't look relevant to this PR.

PTAL, whenever you find the time. Thanks!

@facebook-github-bot
Copy link
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@ngimel merged this pull request in a2db5d3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: testing Issues related to the torch.testing module (not tests) open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants