Skip to content

[torchbench] timm_nfnet training failing on non-dynamo. #7084

@ysiraichi

Description

@ysiraichi

After #7067, timm_nfnet started failing with the following error:

python xla/benchmarks/experiment_runner.py \
    --suite-name torchbench --accelerator cuda --repeat 8 --iterations-per-run 1 \
    --xla PJRT --dynamo None --test train \
    --filter timm_nfnet
Traceback (most recent call last):
  File "xla/benchmarks/experiment_runner.py", line 960, in <module>
    main()
  File "xla/benchmarks/experiment_runner.py", line 956, in main
    runner.run()
  File "xla/benchmarks/experiment_runner.py", line 61, in run
    self.run_single_config()
  File "xla/benchmarks/experiment_runner.py", line 256, in run_single_config
    metrics, last_output = self.run_once_and_gather_metrics(
  File "xla/benchmarks/experiment_runner.py", line 349, in run_once_and_gather_metrics
    output, _ = loop(iter_fn=self._default_iter_fn)
  File "xla/benchmarks/experiment_runner.py", line 306, in loop
    output, timing, trace = iter_fn(benchmark_experiment, benchmark_model,
  File "xla/benchmarks/experiment_runner.py", line 224, in _default_iter_fn
    self._mark_step(benchmark_experiment, output)
  File "xla/benchmarks/experiment_runner.py", line 428, in _mark_step
    xm.mark_step()
  File "xla/torch_xla/core/xla_model.py", line 1055, in mark_step
    torch_xla._XLAC._xla_step_marker(
RuntimeError: Bad StatusOr access: INTERNAL: during context [Unknown]: Seen floating point types of different precisions in %multiply.3753 = f32[128,3072,6,6]{3,2,1,0} multiply(f16[128,3072,6,6]{3,2,1,0} %multiply.3730, f32[128,3072,6,6]{3,2,1,0} %add.3752), but mixed precision is
disallowed.

Environment

  • Reproducible on XLA backend [CPU/TPU]: CUDA
  • torch_xla version: 62c3ba6

cc @miladm @JackCaoG @vanbasten23 @zpcore

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions