Add fallback check to test_core_aten_ops.py by wonjoo-wj · Pull Request #6559 · pytorch/xla

wonjoo-wj · 2024-02-16T22:50:15Z

Noticed that if we run a test when an op is not supported, the op just falls back to CPU and that unit test succeeds silently. This adds a explicit metric check to ensure that the op is actually lowered in torch_xla.

By running this, we can see that 6 of the tests were silently passing while falling back to CPU:

test_aten_grid_sampler_2d_0
test_aten_reflection_pad1d_0
test_aten_reflection_pad1d_1
test_aten_reflection_pad3d_0
test_aten_reflection_pad3d_1
test_aten_reflection_pad3d_2

Example output of such failing test:

======================================================================
FAIL: test_aten_reflection_pad3d_2 (__main__.AtenOpTest) [torch_xla_metric]
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/wonjoo/pytorch/xla/test/test_core_aten_ops.py", line 59, in run_export_and_compare
    testcase.assertNotIn(aten_function_name, met.metrics_report())
AssertionError: 'aten::reflection_pad3d' unexpectedly found in 'Metric: DeviceLockWait\n  TotalSamples: 2\n  Accumulator: 063.560us\n  ValueRate: 10s009ms448.819us / second\n  Rate: 314961 / second\n  Percentiles: 1%=006.920us; 5%=006.920us; 10%=006.920us; 20%=006.920us; 50%=056.640us; 80%=056.640us; 90%=056.640us; 95%=056.640us; 99%=056.640us\nMetric: IrValueTensorToXlaData\n  TotalSamples: 2\n  Accumulator: 509.530us\n  ValueRate: 003ms442.425us / second\n  Rate: 13.5122 / second\n  Percentiles: 1%=201.220us; 5%=201.220us; 10%=201.220us; 20%=201.220us; 50%=308.310us; 80%=308.310us; 90%=308.310us; 95%=308.310us; 99%=308.310us\nMetric: LazyTracing\n  TotalSamples: 7\n  Accumulator: 145ms559.775us\n  ValueRate: 975ms930.568us / second\n  Rate: 47.2089 / second\n  Percentiles: 1%=262.510us; 5%=262.510us; 10%=262.510us; 20%=369.390us; 50%=627.540us; 80%=001ms011.209us; 90%=141ms770.906us; 95%=141ms770.906us; 99%=141ms770.906us\nMetric: TensorToData\n  TotalSamples: 2\n  Accumulator: 415.050us\n  ValueRate: 003ms804.118us / second\n  Rate: 13.5122 / second\n  Percentiles: 1%=154.880us; 5%=154.880us; 10%=154.880us; 20%=154.880us; 50%=260.170us; 80%=260.170us; 90%=260.170us; 95%=260.170us; 99%=260.170us\nMetric: TensorsGraphSize\n  TotalSamples: 1\n  Accumulator: 1.00\n  Percentiles: 1%=1.00; 5%=1.00; 10%=1.00; 20%=1.00; 50%=1.00; 80%=1.00; 90%=1.00; 95%=1.00; 99%=1.00\nMetric: UnwrapXlaData\n  TotalSamples: 2\n  Accumulator: 012.490us\n  ValueRate: 049ms705.350us / second\n  Rate: 7799.1 / second\n  Percentiles: 1%=004.480us; 5%=004.480us; 10%=004.480us; 20%=004.480us; 50%=008.010us; 80%=008.010us; 90%=008.010us; 95%=008.010us; 99%=008.010us\nMetric: WrapXlaData\n  TotalSamples: 1\n  Accumulator: 001.930us\n  Percentiles: 1%=001.930us; 5%=001.930us; 10%=001.930us; 20%=001.930us; 50%=001.930us; 80%=001.930us; 90%=001.930us; 95%=001.930us; 99%=001.930us\nCounter: CreateXlaTensor\n  Value: 2\nCounter: UncachedCompile\n  Value: 1\nCounter: xla::_copy_from\n  Value: 2\nCounter: xla::_to_copy\n  Value: 2\nCounter: xla::_to_cpu\n  Value: 1\nCounter: xla::empty_symint\n  Value: 2\nMetric: CompileTime\n  TotalSamples: 1\n  Accumulator: 139ms227.076us\n  Percentiles: 1%=139ms227.076us; 5%=139ms227.076us; 10%=139ms227.076us; 20%=139ms227.076us; 50%=139ms227.076us; 80%=139ms227.076us; 90%=139ms227.076us; 95%=139ms227.076us; 99%=139ms227.076us\nMetric: ExecuteTime\n  TotalSamples: 1\n  Accumulator: 177.490us\n  Percentiles: 1%=177.490us; 5%=177.490us; 10%=177.490us; 20%=177.490us; 50%=177.490us; 80%=177.490us; 90%=177.490us; 95%=177.490us; 99%=177.490us\nMetric: InboundData\n  TotalSamples: 1\n  Accumulator: 972.00B\n  Percentiles: 1%=972.00B; 5%=972.00B; 10%=972.00B; 20%=972.00B; 50%=972.00B; 80%=972.00B; 90%=972.00B; 95%=972.00B; 99%=972.00B\nMetric: OutboundData\n  TotalSamples: 2\n  Accumulator: 8.54KB\n  ValueRate: 57.72KB / second\n  Rate: 13.5128 / second\n  Percentiles: 1%=972.00B; 5%=972.00B; 10%=972.00B; 20%=972.00B; 50%=7.59KB; 80%=7.59KB; 90%=7.59KB; 95%=7.59KB; 99%=7.59KB\nMetric: TransferFromDeviceTime\n  TotalSamples: 1\n  Accumulator: 200.740us\n  Percentiles: 1%=200.740us; 5%=200.740us; 10%=200.740us; 20%=200.740us; 50%=200.740us; 80%=200.740us; 90%=200.740us; 95%=200.740us; 99%=200.740us\nMetric: TransferToDeviceTime\n  TotalSamples: 2\n  Accumulator: 183.900us\n  ValueRate: 001ms242.496us / second\n  Rate: 13.5127 / second\n  Percentiles: 1%=075.200us; 5%=075.200us; 10%=075.200us; 20%=075.200us; 50%=108.700us; 80%=108.700us; 90%=108.700us; 95%=108.700us; 99%=108.700us\nCounter: CreateCompileHandles\n  Value: 1\nCounter: CreateDataHandles\n  Value: 3\nCounter: aten::reflection_pad3d\n  Value: 1\n'

As seen in the metrics, we can see that aten::reflection_pad3d has fallen back to CPU (which makes sense because it isn't lowered in torch_xla according to https://github.com/pytorch/xla/blob/master/codegen/xla_native_functions.yaml).

Good thing is that only 3 ops were silently passing, so not too much more work is required from our end. I'll add these to core aten op issues.

cc @qihqi

… op tests

wonjoo-wj added 3 commits February 16, 2024 20:30

Add metric check for fallback ops in core aten op unit tests

06e5d8e

Add fallback check to test_core_aten_ops.py

8bfaa43

Skip grid_sampler, reflection_pad_1d, and reflection_pad_2d core aten…

b0776ff

… op tests

qihqi approved these changes Feb 16, 2024

View reviewed changes

wonjoo-wj merged commit 423b9a3 into master Feb 17, 2024

This was referenced Feb 21, 2024

[Core ATen Opset] Lower aten_reflection_pad1d and aten_reflection_pad3d #6577

Closed

[Core ATen Opset] Lower grid_sampler_2d #6581

Open

ManfeiBai mentioned this pull request Feb 22, 2024

[Core Aten Ops] Lower reflection_pad1d, reflection_pad1d_backward, reflection_pad3d and reflection_pad3d_backward #6588

Merged

amithrm pushed a commit to amithrm/xla that referenced this pull request Mar 1, 2024

Add fallback check to test_core_aten_ops.py (pytorch#6559)

c6f9170

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Add fallback check to test_core_aten_ops.py (#6559)

44d218c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add fallback check to test_core_aten_ops.py#6559

Add fallback check to test_core_aten_ops.py#6559
wonjoo-wj merged 3 commits intomasterfrom
wonjoo/core-aten-ops/metrics

wonjoo-wj commented Feb 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

wonjoo-wj commented Feb 16, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants