Skip to content

Some largeCUDATensorTest fails with OOM when running with the entire test suit, but not when running standalone #43677

@zasdfgbnm

Description

@zasdfgbnm

As discussed in #43257, when working on #43092, I observed that after I changed some unrelated things, the test_reduction_split starts to fail with OOM. The failure only happens when running this test together with other tests. When ran standalone, this test does not fail. This might suggest that the test suite is holding CUDA memory between tests, which it probably shouldn't do.

cc @mruberry @VitalyFedyunin

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: testsIssues related to tests (not the torch.testing module)triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions