Skip to content

[RFC] Add support for device extension autoloading#127074

Closed
shink wants to merge 31 commits intopytorch:mainfrom
shink:feat/autoload
Closed

[RFC] Add support for device extension autoloading#127074
shink wants to merge 31 commits intopytorch:mainfrom
shink:feat/autoload

Conversation

@shink
Copy link
Copy Markdown
Contributor

@shink shink commented May 24, 2024

Fixes #122468

  • Load device extensions at the end of torch/__init__.py
  • Enabled by default, or you can disable it with TORCH_DEVICE_BACKEND_AUTOLOAD=0

run test:

python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable

doc:

https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html

co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @fffrog @hipudding

cc @albanD

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented May 24, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127074

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 45fd7ad with merge base 6b5fbc5 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@shink shink marked this pull request as draft May 24, 2024 09:35
@shink shink marked this pull request as draft May 24, 2024 09:35
@shink shink marked this pull request as draft May 24, 2024 09:35
@jgong5
Copy link
Copy Markdown
Collaborator

jgong5 commented May 24, 2024

Thanks for the pull request. Could you please add some tests?

@jgong5
Copy link
Copy Markdown
Collaborator

jgong5 commented May 24, 2024

cc @bsochack

@shink
Copy link
Copy Markdown
Contributor Author

shink commented May 24, 2024

Thanks for the pull request. Could you please add some tests?

Sure, I'm willing to do this!

@bsochack
Copy link
Copy Markdown

@shink we pushed a draft PR of a change here: #127228
A few changes (as in review) + unit tests are still required.

@jczaja
Copy link
Copy Markdown

jczaja commented May 27, 2024

@shink , @jgong5 This is a nice work that directly reflect proposal (RFC). We have tried to test it against our device_extension and there was a crash :
image

Problem seems to be that torch._dynamo is having a code checking all symbols exported from torch/init.py .
So to avoid that it would be good to hide changes proposed here inside a function. This issue may be observed with extensions that are directly importing torch._dynamo . Here is other PR implementing mentioned RFC that works for us: #127228 .

@shink
Copy link
Copy Markdown
Contributor Author

shink commented May 28, 2024

@bsochack @jczaja Nice work! Thanks so much!

@shink , @jgong5 This is a nice work that directly reflect proposal (RFC). We have tried to test it against our device_extension and there was a crash : image

Problem seems to be that torch._dynamo is having a code checking all symbols exported from torch/init.py . So to avoid that it would be good to hide changes proposed here inside a function. This issue may be observed with extensions that are directly importing torch._dynamo . Here is other PR implementing mentioned RFC that works for us: #127228 .

Yes, we alse met this crash.

@shink shink marked this pull request as ready for review May 31, 2024 08:44
@shink
Copy link
Copy Markdown
Contributor Author

shink commented May 31, 2024

@jgong5 @bsochack @bkowalskiINTEL @fffrog @hipudding Please have a look at this unit test. See b0e08d3

Copy link
Copy Markdown
Contributor Author

@shink shink May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in fact, this file should be ignored, and the reason it is here is to declare the entry point of the backend package

Comment thread torch/__init__.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to show failure information when an exception is encountered.

Comment thread torch/__init__.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getenv with default value shoule be better.

Comment thread test/autoload/test_autoload.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to think of this path operation as context.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fxied, please take a look again

@bsochack
Copy link
Copy Markdown

@jgong5 @bsochack @bkowalskiINTEL @fffrog @hipudding Please have a look at this unit test. See b0e08d3

Please add @bkowalskiINTEL and @jczaja as co-authors. Lets also continue all addressing code review here.

@shink
Copy link
Copy Markdown
Contributor Author

shink commented May 31, 2024

@jgong5 @bsochack @bkowalskiINTEL @fffrog @hipudding Please have a look at this unit test. See b0e08d3

Please add @bkowalskiINTEL and @jczaja as co-authors. Lets also continue all addressing code review here.

Sure, thanks so much, and any suggestions are welcome.

Comment thread test/autoload/test_autoload.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides checking the environment variable, are you going to add test to autoload an extension (maybe a mock one)?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review. Please have a look at the code I just updated.

@cpuhrsch cpuhrsch requested a review from albanD June 3, 2024 18:10
@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 3, 2024
@cpuhrsch cpuhrsch requested a review from ezyang June 3, 2024 18:10
@cpuhrsch cpuhrsch added the module: python frontend For issues relating to PyTorch's Python frontend label Jun 3, 2024
Comment thread torch/__init__.py Outdated
Comment thread torch/__init__.py Outdated
Comment thread torch/__init__.py Outdated
Comment thread test/autoload/device_backend/README.md Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to reuse the existing modules we already have for testing in test/cpp_extension/setup.py to test this without having to create a brand new extension?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for your review. I'm testing this idea now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please have a look at this commit 1e7cf76

Comment thread test/autoload/test_autoload.py Outdated
Comment on lines 33 to 36
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what this is testing?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's testing a function defined in the backend extension

@shink shink requested a review from albanD June 5, 2024 10:04
@jgong5
Copy link
Copy Markdown
Collaborator

jgong5 commented Jun 7, 2024

@shink Seems there is still lint errors. You may repro with "lintrunner -a" locally.

Comment on lines 8 to 14
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do things inside this entrypoint instead to make sure it is called?

Comment thread test/test_cpp_extensions_aot.py Outdated
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand it correctly but when the cpp extension is explicitly imported from this test file, not autoloaded implicitly. Is that correct?

Copy link
Copy Markdown
Contributor Author

@shink shink Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for putting this test case here is that I want to reuse the existing modules in test/cpp_extension/setup.py.

The extension is installed here:

pytorch/test/run_test.py

Lines 667 to 671 in 7efaeb1

# Build the test cpp extensions modules
shell_env = os.environ.copy()
shell_env["USE_NINJA"] = str(1 if use_ninja else 0)
cmd = [sys.executable, "setup.py", "install", "--root", "./install"]
return_code = shell(cmd, cwd=cpp_extensions_test_dir, env=shell_env)

@shink shink requested a review from a team as a code owner June 7, 2024 12:25
Comment thread test/test_autoload.py Outdated
Comment thread test/test_autoload.py Outdated
Comment thread torch/__init__.py
"""
Whether autoloading out-of-the-tree device extensions is enabled.
The switch depends on the value of the environment variable
`TORCH_DEVICE_BACKEND_AUTOLOAD`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drisspg which .rst file should we add this new env variable to for proper documentation?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main env var RST is here:https://github.com/pytorch/pytorch/blob/main/docs/source/torch_environment_variables.rst

and they branch out to others

@shink shink requested a review from albanD June 28, 2024 01:20
@bsochack
Copy link
Copy Markdown

bsochack commented Jul 2, 2024

@albanD any comments to this PR?

Copy link
Copy Markdown
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small nit on moving the doc to another file, but SGTM otherwise!

miscellaneous_environment_variables
logging
torch_nccl_environment_variables
privateuse1_environment_variables
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please move this to the miscellaneous section? I do expect all users to be interested in this env variable, not just privateuse1 devs.

- Under some conditions, autograd threads can hang on shutdown, therefore we do not wait for them to shutdown indefinitely but rely on timeout that is default set to ``10`` seconds. This environment variable can be used to set the timeout in seconds.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD Sure, it has been moved. Please take a look again. Thanks so much for your patient review.

@jgong5
Copy link
Copy Markdown
Collaborator

jgong5 commented Jul 9, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot Bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 9, 2024
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team Raised by workflow job

@jgong5 jgong5 added the release notes: python_frontend python frontend release notes category label Jul 9, 2024
@jgong5
Copy link
Copy Markdown
Collaborator

jgong5 commented Jul 9, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
Fixes pytorch#122468

- Load device extensions at the end of `torch/__init__.py`
- Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0`

run test:

```python
python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable
```

doc:

https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html

co-author:  @jgong5 @bsochack @bkowalskiINTEL @jczaja @fffrog @hipudding

Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: pytorch#127074
Approved by: https://github.com/albanD, https://github.com/jgong5
dvrogozh added a commit to dvrogozh/torchcodec that referenced this pull request Jan 6, 2026
This commit adds support of out-of-tree plugins autoloading using
python package entry point specification:
* https://packaging.python.org/en/latest/specifications/entry-points/#entry-points
* https://packaging.python.org/en/latest/guides/creating-and-discovering-plugins/#using-package-metadata

Out-of-tree plugins must register entry points as `torchcodec.backends`:
```
[project.entry-points.'torchcodec.backends']
device_backend = 'torchcodec_plugin:load_plugin'
```

Torchcodec will automatically load plugins if discovered. Loading can be
explicitly suppressed with `TORCHCODEC_DEVICE_BACKEND_AUTOLOAD=1` environment
variable.

The same approach is being used to load PyTorch device backends. See:
* pytorch/pytorch#127074

Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: python frontend For issues relating to PyTorch's Python frontend open source release notes: python_frontend python frontend release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[RFC] Autoload Device Extension

10 participants