Skip to content

test_caching_pinned_memory_multi_gpu: Test has a race condition leading to flakiness #68299

@arindamroy-eng

Description

@arindamroy-eng

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

  1. python3.6 test/test_cuda.py TestCuda.test_caching_pinned_memory_multi_gpu

Expected behavior

The test should pass.

Current Output:
Sometimes the test fails with the following error

======================================================================
FAIL: test_caching_pinned_memory_multi_gpu (main.TestCuda)

Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1396, in wrapper
method(*args, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1396, in wrapper
method(*args, **kwargs)
File "test_cuda.py", line 1397, in test_caching_pinned_memory_multi_gpu
self.assertNotEqual(t.data_ptr(), ptr, msg='allocation re-used too soon')
File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1962, in assertNotEqual
self.assertEqual(x, y, msg, atol=atol, rtol=rtol, **kwargs)
AssertionError: AssertionError not raised : allocation re-used too soon

The test makes an assumption that a pinned memory which has been freed, will not be re-used very soon.
This is not well defined and does not seem to follow any standard time.
Sometimes the memory is freed and re-used early.

cc @ezyang @gchanan @zou3519 @bdhirsh @jbschlosser @ngimel

Metadata

Metadata

Assignees

No one assigned

    Labels

    high prioritymodule: cudaRelated to torch.cuda, and CUDA support in generalmodule: flaky-testsProblem is a flaky test in CItriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions