Skip to content

[TPU] Fix PjRT tests#17408

Merged
carmocca merged 17 commits intomasterfrom
carmocca/tpu-fixes
Apr 19, 2023
Merged

[TPU] Fix PjRT tests#17408
carmocca merged 17 commits intomasterfrom
carmocca/tpu-fixes

Conversation

@carmocca
Copy link
Copy Markdown
Contributor

@carmocca carmocca commented Apr 18, 2023

What does this PR do?

This PR fixes PJRT tests

The PyTorch XRT test is still hanging. Left for a follow-up

There are some changes to other strategies made for consistency that are not strictly necessary. I can pull them out if requested by reviewers.

cc @Borda @carmocca @justusschock @awaelchli

@carmocca carmocca self-assigned this Apr 18, 2023
@carmocca carmocca added this to the 2.1 milestone Apr 18, 2023
@github-actions github-actions Bot added the fabric lightning.fabric.Fabric label Apr 18, 2023
@github-actions github-actions Bot added the pl Generic label for PyTorch Lightning package label Apr 18, 2023
Comment thread src/lightning/fabric/plugins/io/xla.py
Comment thread src/lightning/fabric/plugins/io/xla.py
Comment thread src/lightning/pytorch/strategies/xla.py
Comment thread src/lightning/pytorch/trainer/trainer.py
Comment thread src/lightning/pytorch/trainer/connectors/checkpoint_connector.py
Comment thread src/lightning/fabric/strategies/xla.py
Comment thread tests/tests_fabric/strategies/test_xla.py Outdated
Comment thread src/lightning/fabric/strategies/xla.py
Comment thread src/lightning/fabric/strategies/xla.py
@carmocca carmocca marked this pull request as ready for review April 19, 2023 00:02
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 19, 2023

⚡ Required checks status: All passing 🟢

Groups summary

🟢 pytorch_lightning: Tests workflow
Check ID Status
pl-cpu (macOS-11, lightning, 3.8, 1.11) success
pl-cpu (macOS-11, lightning, 3.9, 1.12) success
pl-cpu (macOS-11, lightning, 3.10, 1.13) success
pl-cpu (macOS-11, lightning, 3.10, 2.0) success
pl-cpu (macOS-11, lightning, 3.8, 1.11, oldest) success
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11) success
pl-cpu (ubuntu-20.04, lightning, 3.9, 1.12) success
pl-cpu (ubuntu-20.04, lightning, 3.10, 1.13) success
pl-cpu (ubuntu-20.04, lightning, 3.10, 2.0) success
pl-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest) success
pl-cpu (windows-2022, lightning, 3.8, 1.11) success
pl-cpu (windows-2022, lightning, 3.9, 1.12) success
pl-cpu (windows-2022, lightning, 3.10, 1.13) success
pl-cpu (windows-2022, lightning, 3.10, 2.0) success
pl-cpu (windows-2022, lightning, 3.8, 1.11, oldest) success
pl-cpu (macOS-11, pytorch, 3.8, 1.13) success
pl-cpu (ubuntu-20.04, pytorch, 3.8, 1.13) success
pl-cpu (windows-2022, pytorch, 3.8, 1.13) success

These checks are required after the changes to src/lightning/fabric/accelerators/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/io/xla.py, src/lightning/fabric/strategies/ddp.py, src/lightning/fabric/strategies/fsdp.py, src/lightning/fabric/strategies/xla.py, src/lightning/fabric/utilities/distributed.py, src/lightning/pytorch/core/module.py, src/lightning/pytorch/strategies/ddp.py, src/lightning/pytorch/strategies/fsdp.py, src/lightning/pytorch/strategies/xla.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py, src/lightning/pytorch/trainer/connectors/checkpoint_connector.py, src/lightning/pytorch/trainer/connectors/logger_connector/result.py, src/lightning/pytorch/trainer/trainer.py, tests/tests_pytorch/checkpointing/test_trainer_checkpoint.py, tests/tests_pytorch/core/test_metric_result_integration.py, tests/tests_pytorch/strategies/test_ddp_strategy.py, tests/tests_pytorch/strategies/test_deepspeed_strategy.py, tests/tests_pytorch/strategies/test_xla.py, tests/tests_pytorch/trainer/connectors/test_accelerator_connector.py, tests/tests_pytorch/utilities/test_deepspeed_collate_checkpoint.py.

🟢 pytorch_lightning: Azure GPU
Check ID Status
pytorch-lightning (GPUs) success

These checks are required after the changes to src/lightning/pytorch/core/module.py, src/lightning/pytorch/strategies/ddp.py, src/lightning/pytorch/strategies/fsdp.py, src/lightning/pytorch/strategies/xla.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py, src/lightning/pytorch/trainer/connectors/checkpoint_connector.py, src/lightning/pytorch/trainer/connectors/logger_connector/result.py, src/lightning/pytorch/trainer/trainer.py, tests/tests_pytorch/checkpointing/test_trainer_checkpoint.py, tests/tests_pytorch/core/test_metric_result_integration.py, tests/tests_pytorch/strategies/test_ddp_strategy.py, tests/tests_pytorch/strategies/test_deepspeed_strategy.py, tests/tests_pytorch/strategies/test_xla.py, tests/tests_pytorch/trainer/connectors/test_accelerator_connector.py, tests/tests_pytorch/utilities/test_deepspeed_collate_checkpoint.py, src/lightning/fabric/accelerators/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/io/xla.py, src/lightning/fabric/strategies/ddp.py, src/lightning/fabric/strategies/fsdp.py, src/lightning/fabric/strategies/xla.py, src/lightning/fabric/utilities/distributed.py.

🟢 fabric: Docs
Check ID Status
make-doctest (fabric) success
make-html (fabric) success

These checks are required after the changes to src/lightning/fabric/accelerators/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/io/xla.py, src/lightning/fabric/strategies/ddp.py, src/lightning/fabric/strategies/fsdp.py, src/lightning/fabric/strategies/xla.py, src/lightning/fabric/utilities/distributed.py.

🟢 pytorch_lightning: Docs
Check ID Status
make-doctest (pytorch) success
make-html (pytorch) success

These checks are required after the changes to src/lightning/pytorch/core/module.py, src/lightning/pytorch/strategies/ddp.py, src/lightning/pytorch/strategies/fsdp.py, src/lightning/pytorch/strategies/xla.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py, src/lightning/pytorch/trainer/connectors/checkpoint_connector.py, src/lightning/pytorch/trainer/connectors/logger_connector/result.py, src/lightning/pytorch/trainer/trainer.py.

🟢 lightning_fabric: CPU workflow
Check ID Status
fabric-cpu (macOS-11, lightning, 3.8, 1.11) success
fabric-cpu (macOS-11, lightning, 3.9, 1.12) success
fabric-cpu (macOS-11, lightning, 3.10, 1.13) success
fabric-cpu (macOS-11, lightning, 3.10, 2.0) success
fabric-cpu (macOS-11, lightning, 3.8, 1.11, oldest) success
fabric-cpu (ubuntu-20.04, lightning, 3.8, 1.11) success
fabric-cpu (ubuntu-20.04, lightning, 3.9, 1.12) success
fabric-cpu (ubuntu-20.04, lightning, 3.10, 1.13) success
fabric-cpu (ubuntu-20.04, lightning, 3.10, 2.0) success
fabric-cpu (ubuntu-20.04, lightning, 3.8, 1.11, oldest) success
fabric-cpu (windows-2022, lightning, 3.8, 1.11) success
fabric-cpu (windows-2022, lightning, 3.9, 1.12) success
fabric-cpu (windows-2022, lightning, 3.10, 1.13) success
fabric-cpu (windows-2022, lightning, 3.10, 2.0) success
fabric-cpu (windows-2022, lightning, 3.8, 1.11, oldest) success
fabric-cpu (macOS-11, fabric, 3.8, 1.13) success
fabric-cpu (ubuntu-20.04, fabric, 3.8, 1.13) success
fabric-cpu (windows-2022, fabric, 3.8, 1.13) success

These checks are required after the changes to src/lightning/fabric/accelerators/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/io/xla.py, src/lightning/fabric/strategies/ddp.py, src/lightning/fabric/strategies/fsdp.py, src/lightning/fabric/strategies/xla.py, src/lightning/fabric/utilities/distributed.py, tests/tests_fabric/plugins/environments/test_xla.py, tests/tests_fabric/strategies/test_deepspeed_integration.py, tests/tests_fabric/strategies/test_fsdp_integration.py, tests/tests_fabric/strategies/test_xla.py, tests/tests_fabric/test_connector.py, tests/tests_fabric/test_fabric.py.

🟢 lightning_fabric: Azure GPU
Check ID Status
lightning-fabric (GPUs) success

These checks are required after the changes to src/lightning/fabric/accelerators/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/io/xla.py, src/lightning/fabric/strategies/ddp.py, src/lightning/fabric/strategies/fsdp.py, src/lightning/fabric/strategies/xla.py, src/lightning/fabric/utilities/distributed.py, tests/tests_fabric/plugins/environments/test_xla.py, tests/tests_fabric/strategies/test_deepspeed_integration.py, tests/tests_fabric/strategies/test_fsdp_integration.py, tests/tests_fabric/strategies/test_xla.py, tests/tests_fabric/test_connector.py, tests/tests_fabric/test_fabric.py.

🟢 mypy
Check ID Status
mypy success

These checks are required after the changes to src/lightning/fabric/accelerators/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/io/xla.py, src/lightning/fabric/strategies/ddp.py, src/lightning/fabric/strategies/fsdp.py, src/lightning/fabric/strategies/xla.py, src/lightning/fabric/utilities/distributed.py, src/lightning/pytorch/core/module.py, src/lightning/pytorch/strategies/ddp.py, src/lightning/pytorch/strategies/fsdp.py, src/lightning/pytorch/strategies/xla.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py, src/lightning/pytorch/trainer/connectors/checkpoint_connector.py, src/lightning/pytorch/trainer/connectors/logger_connector/result.py, src/lightning/pytorch/trainer/trainer.py.

🟢 install
Check ID Status
install-pkg (ubuntu-22.04, app, 3.8) success
install-pkg (ubuntu-22.04, app, 3.10) success
install-pkg (ubuntu-22.04, fabric, 3.8) success
install-pkg (ubuntu-22.04, fabric, 3.10) success
install-pkg (ubuntu-22.04, pytorch, 3.8) success
install-pkg (ubuntu-22.04, pytorch, 3.10) success
install-pkg (ubuntu-22.04, lightning, 3.8) success
install-pkg (ubuntu-22.04, lightning, 3.10) success
install-pkg (ubuntu-22.04, notset, 3.8) success
install-pkg (ubuntu-22.04, notset, 3.10) success
install-pkg (macOS-12, app, 3.8) success
install-pkg (macOS-12, app, 3.10) success
install-pkg (macOS-12, fabric, 3.8) success
install-pkg (macOS-12, fabric, 3.10) success
install-pkg (macOS-12, pytorch, 3.8) success
install-pkg (macOS-12, pytorch, 3.10) success
install-pkg (macOS-12, lightning, 3.8) success
install-pkg (macOS-12, lightning, 3.10) success
install-pkg (macOS-12, notset, 3.8) success
install-pkg (macOS-12, notset, 3.10) success
install-pkg (windows-2022, app, 3.8) success
install-pkg (windows-2022, app, 3.10) success
install-pkg (windows-2022, fabric, 3.8) success
install-pkg (windows-2022, fabric, 3.10) success
install-pkg (windows-2022, pytorch, 3.8) success
install-pkg (windows-2022, pytorch, 3.10) success
install-pkg (windows-2022, lightning, 3.8) success
install-pkg (windows-2022, lightning, 3.10) success
install-pkg (windows-2022, notset, 3.8) success
install-pkg (windows-2022, notset, 3.10) success

These checks are required after the changes to src/lightning/fabric/accelerators/tpu.py, src/lightning/fabric/connector.py, src/lightning/fabric/plugins/io/xla.py, src/lightning/fabric/strategies/ddp.py, src/lightning/fabric/strategies/fsdp.py, src/lightning/fabric/strategies/xla.py, src/lightning/fabric/utilities/distributed.py, src/lightning/pytorch/core/module.py, src/lightning/pytorch/strategies/ddp.py, src/lightning/pytorch/strategies/fsdp.py, src/lightning/pytorch/strategies/xla.py, src/lightning/pytorch/trainer/connectors/accelerator_connector.py, src/lightning/pytorch/trainer/connectors/checkpoint_connector.py, src/lightning/pytorch/trainer/connectors/logger_connector/result.py, src/lightning/pytorch/trainer/trainer.py.


Thank you for your contribution! 💜

Note
This comment is automatically generated and updates for 60 minutes every 180 seconds. If you have any other questions, contact carmocca for help.

@carmocca carmocca changed the title [TPU] Test fixes [TPU] Fix PjRT tests Apr 19, 2023
Comment thread src/lightning/fabric/strategies/ddp.py
Comment thread src/lightning/fabric/strategies/ddp.py
Comment thread src/lightning/fabric/strategies/xla.py
Comment thread src/lightning/fabric/strategies/xla.py Outdated
Comment thread tests/tests_fabric/test_connector.py Outdated
Comment thread tests/tests_pytorch/strategies/test_xla.py Outdated
Comment thread tests/tests_pytorch/trainer/connectors/test_accelerator_connector.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 19, 2023

Codecov Report

Merging #17408 (91657c6) into master (4772639) will decrease coverage by 23%.
The diff coverage is 17%.

Additional details and impacted files
@@            Coverage Diff            @@
##           master   #17408     +/-   ##
=========================================
- Coverage      83%      60%    -23%     
=========================================
  Files         414      409      -5     
  Lines       31579    31545     -34     
=========================================
- Hits        26221    18826   -7395     
- Misses       5358    12719   +7361     

@carmocca carmocca enabled auto-merge (squash) April 19, 2023 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

fabric lightning.fabric.Fabric pl Generic label for PyTorch Lightning package ready to be merged PRs ready to be merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants