Skip to content

Move PJRT Python APIs out of torch_xla.experimental.*#5011

Merged
will-cromar merged 25 commits intomasterfrom
wcromar/stable-pjrt-api
Jun 6, 2023
Merged

Move PJRT Python APIs out of torch_xla.experimental.*#5011
will-cromar merged 25 commits intomasterfrom
wcromar/stable-pjrt-api

Conversation

@will-cromar
Copy link
Copy Markdown
Collaborator

@will-cromar will-cromar commented May 15, 2023

Reorganize the experimental PJRT Python APIs.

  • Add new _internal module for APIs that are well-tested, but likely to change. I moved device-specific logic here, since I expect to rework it in the near future. All of these functions are mainly used for framework development. In general, users shouldn't have to call them directly.
  • Most functionality that interacts directly with the runtime is moved into the new torch.runtime module.
  • Added deprecation module to register deprecated aliases for all public functions that are moving out into other parts of of torch_xla.
  • Print a warning once per deprecated function. For example:
>>> from torch_xla.experimental import tpu
>>> tpu.num_available_chips()
WARNING:root:torch_xla.experimental.tpu.num_available_chips is deprecated. Use torch_xla._internal.tpu.num_available_chips instead.
4
>>> tpu.num_available_chips()
4
>>> tpu.version()
WARNING:root:torch_xla.experimental.tpu.version is deprecated. Use torch_xla._internal.tpu.version instead.
4
  • Update references across the repository.

Summary of new modules:

  • torch_xla.runtime
  • torch_xla._internal.tpu
  • torch_xla._internal.gpu
  • torch_xla._internal.pjrt

@will-cromar will-cromar changed the title [WIP] Move PJRT Python APIs out of torch_xla.experimental.* Move PJRT Python APIs out of torch_xla.experimental.* May 18, 2023
@will-cromar will-cromar requested a review from JackCaoG May 18, 2023 17:43
@will-cromar will-cromar marked this pull request as ready for review May 18, 2023 17:45
@@ -11,14 +11,15 @@
import torch_xla.core.xla_env_vars as xenv
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we renamed these experimental files?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good catch.

Comment thread torch_xla/runtime.py
# TODO(wcromar): Detect GPU device too


def device_type() -> Optional[str]:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have similar functions in torch_xla.core.xla_model, do we want to do some clean up?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to take this chance to do some clean up. I am always confuse what function to call for local ordinal, gloabal ordinal, worla_size etc and what do they really mean in a pod context. If we can restructure those api a bit and maybe have those apis in this runtime instead that would be nice...(random idea, might need more thinking)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xla_model becomes a kitch sink and we put random things in it, if we can move all runtime related bits in this file it is actually nicer..

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this offline. We'll start to move APIs that interact directly with the runtime to a new module, and leave any modeling-related APIs in xla_model. I moved the PJRT version of rendezvous back to xla_model, and the old rendezvous is will be an alias of that implementation when PJRT is enabled.

@will-cromar will-cromar force-pushed the wcromar/stable-pjrt-api branch from 8fa857f to 660a4cf Compare May 24, 2023 21:30
@will-cromar will-cromar force-pushed the wcromar/stable-pjrt-api branch 2 times, most recently from 93def62 to de2f117 Compare June 1, 2023 21:22
@will-cromar will-cromar requested a review from JackCaoG June 1, 2023 21:23
Comment thread torch_xla/runtime.py Outdated
return

logging.warning(
'XRT configuration not detected. Defaulting to preview PJRT '
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to change preview to stable here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Done.

Copy link
Copy Markdown
Collaborator

@JackCaoG JackCaoG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but prefer to merge on Monday

@will-cromar will-cromar force-pushed the wcromar/stable-pjrt-api branch from dc1130e to f11be83 Compare June 6, 2023 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants