Use `benchmark_cls` for checking precision. by ysiraichi · Pull Request #6375 · pytorch/xla

ysiraichi · 2024-01-24T21:04:16Z

This PR makes it so we don't have to call load_benchmark only for checking the precision to be used.

zpcore · 2024-01-24T21:26:42Z

Refer to the issue here for the context: #6286
Thanks for making the fix.

The key point I think is to prevent leaving behind a dangling object which e.g., moved a model to xla device. del benchmark doesn't resolve the issue because it has already claimed the PJRT runtime. This will trigger the stackdump error: RuntimeError: Bad StatusOr access: UNKNOWN: TPU initialization failed: open(/dev/vfio/0): Device or resource busy: Device or resource busy; Couldn't open iommu group /dev/vfio/0.

zpcore · 2024-01-24T22:11:35Z

Since we only need to detect the precision, we can fetch the information directly without invoking

benchmark_cls(
        test=self.benchmark_experiment.test,
        device=device,
        batch_size=self.benchmark_experiment.batch_size,
    )

I think we can call the following load_benchmark_precision instead of load_benchmark to get the precision directly.

  def load_benchmark_precision(self):
   try:
     module = importlib.import_module(
         f"torchbenchmark.models.{self.model_name}")
   except ModuleNotFoundError:
     module = importlib.import_module(
         f"torchbenchmark.models.fb.{self.model_name}")
   benchmark_train_precision = getattr(module.Model, "DEFAULT_TRAIN_CUDA_PRECISION", None)
   benchmark_eval_precision = getattr(module.Model, "DEFAULT_EVAL_CUDA_PRECISION", None)
   return benchmark_train_precision, benchmark_eval_precision

WDYT?

ysiraichi · 2024-01-25T11:04:44Z

Right. Correct if I'm misunderstanding things, but isn't that exactly what I'm doing here?

zpcore · 2024-01-25T17:34:22Z

Right. Correct if I'm misunderstanding things, but isn't that exactly what I'm doing here?

Hah, you are right. I didn't notice that you called benchmark_cls instead.

Now it LGTM!

Cache benchmark_cls and use it for checking precision.

a8a4065

ysiraichi requested review from cota, frgossen, golechwierowicz, vanbasten23 and zpcore January 24, 2024 21:04

ysiraichi added the xla:gpu label Jan 24, 2024

ysiraichi mentioned this pull request Jan 24, 2024

Torchbench Benchmark Running ERROR #6286

Closed

zpcore reviewed Jan 24, 2024

View reviewed changes

Comment thread benchmarks/torchbench_model.py

zpcore requested a review from will-cromar January 24, 2024 21:29

ysiraichi mentioned this pull request Jan 25, 2024

fix subprocess issue with orphaned PJRT loading #6376

Closed

zpcore self-requested a review January 25, 2024 17:41

zpcore approved these changes Jan 25, 2024

View reviewed changes

zpcore merged commit a1e51e4 into master Jan 25, 2024

This was referenced Jan 29, 2024

Failing Torchbench Models: tracking issue #5932

Open

[torchbench] moco inference and training fail on inductor. #6367

Closed

lezcano changed the title ~~Use benchmark_cls for checking precision.`~~ Use benchmark_cls for checking precision. Feb 5, 2024

ysiraichi linked an issue Feb 16, 2024 that may be closed by this pull request

benchmarks/torchbench_model: some benchmarks fail to load and kill experiment_runner's main process #6207

Closed

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Use benchmark_cls for checking precision.` (#6375)

f8025cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use `benchmark_cls` for checking precision.#6375

Use `benchmark_cls` for checking precision.#6375
zpcore merged 1 commit intomasterfrom
ysiraichi/dont-load-for-precision

ysiraichi commented Jan 24, 2024 •

edited

Loading

Uh oh!

Uh oh!

zpcore commented Jan 24, 2024

Uh oh!

zpcore commented Jan 24, 2024

Uh oh!

ysiraichi commented Jan 25, 2024

Uh oh!

zpcore commented Jan 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ysiraichi commented Jan 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

zpcore commented Jan 24, 2024

Uh oh!

zpcore commented Jan 24, 2024

Uh oh!

ysiraichi commented Jan 25, 2024

Uh oh!

zpcore commented Jan 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ysiraichi commented Jan 24, 2024 •

edited

Loading