Skip to content

Add meta device to generic device testing framework, skip NotImplementedError#53682

Closed
ezyang wants to merge 10 commits intogh/ezyang/938/basefrom
gh/ezyang/938/head
Closed

Add meta device to generic device testing framework, skip NotImplementedError#53682
ezyang wants to merge 10 commits intogh/ezyang/938/basefrom
gh/ezyang/938/head

Conversation

@ezyang
Copy link
Copy Markdown
Contributor

@ezyang ezyang commented Mar 10, 2021

Stack from ghstack:

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:

  • NotImplementedError suppression at test level is implemented
    in the same way as CUDA memory leak check, i.e., by wrapping
    test methods and monkeypatching them back in.
  • I had to reimplement assertRaises/assertRaisesRegex from scratch to
    ignore NotImplementedError when _ignore_not_implemented_error is True.
    The implementation relies on a small amount of private API that hasn't
    changed since 2010
  • expectedAlertNondeterministic doesn't really work so I skipped them
    all; there's probably a way to do it better

I tested this using pytest --disable-warnings --tb=native -k meta --sw test/*.py and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang ezyang@fb.com

Differential Revision: D26955539

…tedError

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 10, 2021
…tedError

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: b4a1846
Pull Request resolved: #53682
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Mar 10, 2021

💊 CI failures summary and remediations

As of commit c6b4feb (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

@ezyang ezyang requested review from bdhirsh, bhosmer and mruberry March 10, 2021 04:08
@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Mar 10, 2021

Here is what is currently causing tests to skip:

            {'aten::_copy_from': 2167,
             'aten::arange.start_out': 24,
             'aten::as_strided': 8,
             'aten::eye.m_out': 45,
             'aten::fill_.Scalar': 138,
             'aten::linspace.out': 20,
             'aten::logspace.out': 18,
             'aten::normal_': 1757,
             'aten::random_': 9,
             'aten::random_.from': 4103,
             'aten::randperm.generator_out': 20,
             'aten::range.out': 2,
             'aten::take': 1,
             'aten::unfold': 1,
             'aten::uniform_': 4236,
             'aten::view': 4,
             'aten::zero_': 62,
             'torch.tensor': 1146})

computed by looking at the junit xml with

from junitparser import JUnitXml
import re
from collections import defaultdict
from pprint import pprint

xml = JUnitXml.fromfile('foo.xml')

result = defaultdict(int)

for suite in xml:
    for case in suite:
        if case.result:
            r = case.result[0]
            if r.type == "pytest.skip" and r.message.startswith("not implemented"):
                m = re.search(r"Could not run '([^']+)'", r.text)
                if m:
                    op = m.group(1)
                elif "torch.tensor" in r.text:
                    op = "torch.tensor"
                else:
                    continue
                result[op] += 1

pprint(result)

…NotImplementedError"

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 10, 2021
…tedError

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 6f86730
Pull Request resolved: #53682
Comment thread test/test_tensor_creation_ops.py Outdated
Comment thread torch/testing/_internal/common_utils.py
Comment thread torch/testing/_internal/common_utils.py
Comment thread torch/testing/_internal/common_device_type.py
Comment thread test/test_linalg.py Outdated
Comment thread torch/testing/_internal/common_utils.py
@mruberry
Copy link
Copy Markdown
Collaborator

mruberry commented Mar 10, 2021

Adding a meta device seems cool. How time consuming are meta tests? Would it make sense to limit them to only certain builds?

Follow-up question: is there a planned announcement regarding meta tests? What should people do if a meta test fails? Do they need to call you in to unblock the PR, or do you recommend skipping the meta device and filing an issue, for example?

I made a few inline comments. Mostly suggestions for additional comments, but a sanity check on subclassing unittest.case._AssertRaisesContext, too.

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Mar 10, 2021

How time consuming are meta tests?

On an optimized build of PyTorch, it still takes a few minutes to run them all because Python is slow :( (according to a XML file I have lying around, it took just shy of 7min to run all test with "meta" in their name). 7min isn't half an hour though, so maybe it's OK? I think it would be pretty OK to only run these on one build, though, because (1) a regular CPU kernel will still exercise all this and (2) the meta kernels aren't very platform sensitive (unlike other sense)

@mruberry
Copy link
Copy Markdown
Collaborator

How time consuming are meta tests?

On an optimized build of PyTorch, it still takes a few minutes to run them all because Python is slow :( (according to a XML file I have lying around, it took just shy of 7min to run all test with "meta" in their name). 7min isn't half an hour though, so maybe it's OK? I think it would be pretty OK to only run these on one build, though, because (1) a regular CPU kernel will still exercise all this and (2) the meta kernels aren't very platform sensitive (unlike other sense)

Limiting them to one build would be preferable if possibly. 7 minutes isn't so bad but 7 minutes on every test build adds up.

I also forgot to mention that some tests use the @onlyOnCPUAndCUDA decorator, but those tests probably want to be @onlyOnCPUAndCUDAAndMeta now. Should we change the decorators so that if CPU is run then Meta is run, too, and require explicitly skipping meta? So @onlyCPU would actually be @onlyCPUAndMeta, for example? And to actually run exclusively on the CPU the test would need to have both the @onlyCPU decorator and the @skipMeta decorator?

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Mar 10, 2021

Follow-up question: is there a planned announcement regarding meta tests? What should people do if a meta test fails? Do they need to call you in to unblock the PR, or do you recommend skipping the meta device and filing an issue, for example?

If I've done my job right it should be obvious how to fix these problems (and you should be unlikely to break the meta test if you aren't working on anything meta), but yeah, we can do a short post about it.

@ezyang
Copy link
Copy Markdown
Contributor Author

ezyang commented Mar 10, 2021

I also forgot to mention that some tests use the @onlyOnCPUAndCUDA decorator, but those tests probably want to be @onlyOnCPUAndCUDAAndMeta now. Should we change the decorators so that if CPU is run then Meta is run, too, and require explicitly skipping meta? So @onlyCPU would actually be @onlyCPUAndMeta, for example? And to actually run exclusively on the CPU the test would need to have both the @onlyCPU decorator and the @skipMeta decorator?

I think we need to sit down and have a serious design discussion about how to organize these decorators.

…NotImplementedError"

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 10, 2021
…tedError

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

ghstack-source-id: 27d9770
Pull Request resolved: #53682
…NotImplementedError"

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D26955539](https://our.internmc.facebook.com/intern/diff/D26955539)

[ghstack-poisoned]
@mruberry
Copy link
Copy Markdown
Collaborator

Offline discussion with @ezyang is tracked in #53849.

Copy link
Copy Markdown
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ezyang ezyang requested a review from ailzhang March 12, 2021 15:12
…NotImplementedError"

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D26955539](https://our.internmc.facebook.com/intern/diff/D26955539)

[ghstack-poisoned]
ezyang added 2 commits March 14, 2021 08:30
…NotImplementedError"

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D26955539](https://our.internmc.facebook.com/intern/diff/D26955539)

[ghstack-poisoned]
…NotImplementedError"

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Differential Revision: [D26955539](https://our.internmc.facebook.com/intern/diff/D26955539)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ezyang merged this pull request in c2f41b6.

@facebook-github-bot facebook-github-bot deleted the gh/ezyang/938/head branch March 18, 2021 14:16
xsacha pushed a commit to xsacha/pytorch that referenced this pull request Mar 31, 2021
…tedError (pytorch#53682)

Summary:
Pull Request resolved: pytorch#53682

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D26955539

Pulled By: ezyang

fbshipit-source-id: ac21c8734562497fdcca3b614a28010bc4c03d74
ezyang added a commit that referenced this pull request Mar 21, 2022
After discussion with Can Balioglu, we have concluded that
#53682 , while clever, is more
trouble than it is worth.  The main problem is that whenever someone
adds support for new meta tensors, they then get dozens of new test case
failures, because tests that were previously halted by lack of support
for an operator on meta tensors, now have gotten further and hit some
logic which expects to be able to, e.g., pull out a real value from a
tensor (which clearly doesn't work).  This is very annoying and time
consuming!  Most of these tests aren't written with meta device in
mind, and it's not a good use of time to try to make them more generic.

The plan on record is to switch meta testing to OpInfo, but that patch
will take some time to prepare for now I want to stem the bleeding.  I
don't think we're at high risk for regressions here because meta tensors
mostly share logic with their regular brethren.

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 21, 2022
After discussion with Can Balioglu, we have concluded that
#53682 , while clever, is more
trouble than it is worth.  The main problem is that whenever someone
adds support for new meta tensors, they then get dozens of new test case
failures, because tests that were previously halted by lack of support
for an operator on meta tensors, now have gotten further and hit some
logic which expects to be able to, e.g., pull out a real value from a
tensor (which clearly doesn't work).  This is very annoying and time
consuming!  Most of these tests aren't written with meta device in
mind, and it's not a good use of time to try to make them more generic.

The plan on record is to switch meta testing to OpInfo, but that patch
will take some time to prepare for now I want to stem the bleeding.  I
don't think we're at high risk for regressions here because meta tensors
mostly share logic with their regular brethren.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Differential Revision: [D35010278](https://our.internmc.facebook.com/intern/diff/D35010278)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 21, 2022
After discussion with Can Balioglu, we have concluded that
#53682 , while clever, is more
trouble than it is worth.  The main problem is that whenever someone
adds support for new meta tensors, they then get dozens of new test case
failures, because tests that were previously halted by lack of support
for an operator on meta tensors, now have gotten further and hit some
logic which expects to be able to, e.g., pull out a real value from a
tensor (which clearly doesn't work).  This is very annoying and time
consuming!  Most of these tests aren't written with meta device in
mind, and it's not a good use of time to try to make them more generic.

The plan on record is to switch meta testing to OpInfo, but that patch
will take some time to prepare for now I want to stem the bleeding.  I
don't think we're at high risk for regressions here because meta tensors
mostly share logic with their regular brethren.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: d0ea328
Pull Request resolved: #74468
ezyang added a commit that referenced this pull request Mar 22, 2022
After discussion with Can Balioglu, we have concluded that
#53682 , while clever, is more
trouble than it is worth.  The main problem is that whenever someone
adds support for new meta tensors, they then get dozens of new test case
failures, because tests that were previously halted by lack of support
for an operator on meta tensors, now have gotten further and hit some
logic which expects to be able to, e.g., pull out a real value from a
tensor (which clearly doesn't work).  This is very annoying and time
consuming!  Most of these tests aren't written with meta device in
mind, and it's not a good use of time to try to make them more generic.

The plan on record is to switch meta testing to OpInfo, but that patch
will take some time to prepare for now I want to stem the bleeding.  I
don't think we're at high risk for regressions here because meta tensors
mostly share logic with their regular brethren.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Differential Revision: [D35010278](https://our.internmc.facebook.com/intern/diff/D35010278)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 22, 2022
After discussion with Can Balioglu, we have concluded that
#53682 , while clever, is more
trouble than it is worth.  The main problem is that whenever someone
adds support for new meta tensors, they then get dozens of new test case
failures, because tests that were previously halted by lack of support
for an operator on meta tensors, now have gotten further and hit some
logic which expects to be able to, e.g., pull out a real value from a
tensor (which clearly doesn't work).  This is very annoying and time
consuming!  Most of these tests aren't written with meta device in
mind, and it's not a good use of time to try to make them more generic.

The plan on record is to switch meta testing to OpInfo, but that patch
will take some time to prepare for now I want to stem the bleeding.  I
don't think we're at high risk for regressions here because meta tensors
mostly share logic with their regular brethren.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Differential Revision: [D35010278](https://our.internmc.facebook.com/intern/diff/D35010278)

[ghstack-poisoned]
ezyang added a commit that referenced this pull request Mar 22, 2022
After discussion with Can Balioglu, we have concluded that
#53682 , while clever, is more
trouble than it is worth.  The main problem is that whenever someone
adds support for new meta tensors, they then get dozens of new test case
failures, because tests that were previously halted by lack of support
for an operator on meta tensors, now have gotten further and hit some
logic which expects to be able to, e.g., pull out a real value from a
tensor (which clearly doesn't work).  This is very annoying and time
consuming!  Most of these tests aren't written with meta device in
mind, and it's not a good use of time to try to make them more generic.

The plan on record is to switch meta testing to OpInfo, but that patch
will take some time to prepare for now I want to stem the bleeding.  I
don't think we're at high risk for regressions here because meta tensors
mostly share logic with their regular brethren.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

ghstack-source-id: 2442622
Pull Request resolved: #74468
pytorchmergebot pushed a commit that referenced this pull request Mar 22, 2022
After discussion with Can Balioglu, we have concluded that
#53682 , while clever, is more
trouble than it is worth.  The main problem is that whenever someone
adds support for new meta tensors, they then get dozens of new test case
failures, because tests that were previously halted by lack of support
for an operator on meta tensors, now have gotten further and hit some
logic which expects to be able to, e.g., pull out a real value from a
tensor (which clearly doesn't work).  This is very annoying and time
consuming!  Most of these tests aren't written with meta device in
mind, and it's not a good use of time to try to make them more generic.

The plan on record is to switch meta testing to OpInfo, but that patch
will take some time to prepare for now I want to stem the bleeding.  I
don't think we're at high risk for regressions here because meta tensors
mostly share logic with their regular brethren.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: #74468

Approved by: https://github.com/mruberry
shahofblah pushed a commit that referenced this pull request Mar 25, 2022
After discussion with Can Balioglu, we have concluded that
#53682 , while clever, is more
trouble than it is worth.  The main problem is that whenever someone
adds support for new meta tensors, they then get dozens of new test case
failures, because tests that were previously halted by lack of support
for an operator on meta tensors, now have gotten further and hit some
logic which expects to be able to, e.g., pull out a real value from a
tensor (which clearly doesn't work).  This is very annoying and time
consuming!  Most of these tests aren't written with meta device in
mind, and it's not a good use of time to try to make them more generic.

The plan on record is to switch meta testing to OpInfo, but that patch
will take some time to prepare for now I want to stem the bleeding.  I
don't think we're at high risk for regressions here because meta tensors
mostly share logic with their regular brethren.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: #74468

Approved by: https://github.com/mruberry
facebook-github-bot pushed a commit that referenced this pull request Mar 29, 2022
Summary:
After discussion with Can Balioglu, we have concluded that
#53682 , while clever, is more
trouble than it is worth.  The main problem is that whenever someone
adds support for new meta tensors, they then get dozens of new test case
failures, because tests that were previously halted by lack of support
for an operator on meta tensors, now have gotten further and hit some
logic which expects to be able to, e.g., pull out a real value from a
tensor (which clearly doesn't work).  This is very annoying and time
consuming!  Most of these tests aren't written with meta device in
mind, and it's not a good use of time to try to make them more generic.

The plan on record is to switch meta testing to OpInfo, but that patch
will take some time to prepare for now I want to stem the bleeding.  I
don't think we're at high risk for regressions here because meta tensors
mostly share logic with their regular brethren.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: #74468

Approved by: https://github.com/mruberry

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/65329f4fac8fb22318b7a3eb115e9da207d8d41a

Reviewed By: bigfootjon

Differential Revision: D35193490

Pulled By: atalman

fbshipit-source-id: 6e6bae34c075e60c6a84936f812e178bf88784cd
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…tedError (pytorch#53682)

Summary:
Pull Request resolved: pytorch#53682

With this, under the meta device, 101 tests passed and 16953 skipped.
It ain't much, but it's a start.

Some various bits and bobs:
- NotImplementedError suppression at test level is implemented
  in the same way as CUDA memory leak check, i.e., by wrapping
  test methods and monkeypatching them back in.
- I had to reimplement assertRaises/assertRaisesRegex from scratch to
  ignore NotImplementedError when _ignore_not_implemented_error is True.
  The implementation relies on a small amount of private API that hasn't
  changed since 2010
- expectedAlertNondeterministic doesn't really work so I skipped them
  all; there's probably a way to do it better

I tested this using `pytest --disable-warnings --tb=native -k meta --sw
test/*.py` and a pile of extra patches to make collection actually work
(lol).

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D26955539

Pulled By: ezyang

fbshipit-source-id: ac21c8734562497fdcca3b614a28010bc4c03d74
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
After discussion with Can Balioglu, we have concluded that
pytorch#53682 , while clever, is more
trouble than it is worth.  The main problem is that whenever someone
adds support for new meta tensors, they then get dozens of new test case
failures, because tests that were previously halted by lack of support
for an operator on meta tensors, now have gotten further and hit some
logic which expects to be able to, e.g., pull out a real value from a
tensor (which clearly doesn't work).  This is very annoying and time
consuming!  Most of these tests aren't written with meta device in
mind, and it's not a good use of time to try to make them more generic.

The plan on record is to switch meta testing to OpInfo, but that patch
will take some time to prepare for now I want to stem the bleeding.  I
don't think we're at high risk for regressions here because meta tensors
mostly share logic with their regular brethren.

Signed-off-by: Edward Z. Yang <ezyangfb.com>

Pull Request resolved: pytorch#74468

Approved by: https://github.com/mruberry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants