Support DDP in `PyTorch-Lightning` by Alnusjaponica · Pull Request #4384 · optuna/optuna

Alnusjaponica · 2023-02-02T03:44:05Z

Motivation

Follow up #4322.
Temporary, DDP is not supported in PyTorchLightningPruningCallback because of the problem described in #4322.
This PR make PyTorchLightningPruningCallback support DDP again.

Description of the changes

This PR

Require users to call callback.check_pruned() in objective functions when they use DDP
Activates test_pytorch_lightning_pruning_callback_ddp_monitor and test_pytorch_lightning_pruning_callback_ddp_unsupported_storage
Store intermediate values, pruning state and message directly in the storage

Alnusjaponica · 2023-02-02T04:21:06Z

I made this PR review ready. Note that this PR is based on #4322 and should be merged after it.

codecov-commenter · 2023-02-02T04:25:45Z

Codecov Report

Merging #4384 (d14363c) into master (fa54271) will decrease coverage by 0.24%.
The diff coverage is 85.71%.

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##           master    #4384      +/-   ##
==========================================
- Coverage   90.40%   90.16%   -0.24%     
==========================================
  Files         172      181       +9     
  Lines       13682    14037     +355     
==========================================
+ Hits        12369    12657     +288     
- Misses       1313     1380      +67

Impacted Files	Coverage Δ
optuna/integration/pytorch_lightning.py	`92.40% <85.71%> (+92.40%)`	⬆️

... and 16 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

toshihikoyanase · 2023-02-06T07:26:15Z

@HideakiImamura This is a follow-up for #4322. Could you join the review, please?

toshihikoyanase · 2023-02-09T07:08:31Z

@Alnusjaponica #4322 was merged into master. Could you rebase master, please?

github-actions · 2023-02-16T23:05:46Z

This pull request has not seen any recent activity.

…tning-followup

toshihikoyanase

Let me share my early comments.
I guess the change uses system attrs to store intermediate values even if users execute only a single process. I'm thinking of keeping them to simplify the logic. What do you think of it?

optuna/integration/pytorch_lightning.py

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

Alnusjaponica · 2023-02-24T06:02:31Z

@toshihikoyanase Thank you for your indication. I fixed the comment and am going to make change to use system_attr only under distributed situation.

toshihikoyanase · 2023-03-02T01:29:18Z

@Alnusjaponica Thank you for your update. As we pair-programmed the code, we may simplify the code in terms of the following points:

The empty intermediate values in system_attrs are only required for the DDP. We can skip it by moving the logic to on_fit_start
The logic in on_validation_end is a bit complicated. We may be separate the logic for single-process optimization and DDP optimization
check_pruned is only used for DDP optimization. It should be documented.
check_pruned is assumed to be used with _cachedStorage. We may remove #type ignored if we assert it.
The message can be removed from the system_attrs since the pruned epoch can generate it.

toshihikoyanase

Thank you for your update. Let me add some small comments.

optuna/integration/pytorch_lightning.py

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

tests/integration_tests/test_pytorch_lightning.py

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

tests/integration_tests/test_pytorch_lightning.py

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

toshihikoyanase

I confirmed that the updated callback worked with the PyTorch Lightning DDP example in optuna-examples. I have a small comment, but the change almost looks good to me.

Diff of pytorch_lightning_ddp.py

```diff diff --git a/pytorch/pytorch_lightning_ddp.py b/pytorch/pytorch_lightning_ddp.py index 030834d..6d609cb 100644 --- a/pytorch/pytorch_lightning_ddp.py +++ b/pytorch/pytorch_lightning_ddp.py @@ -78,6 +78,9 @@ class LightningNet(pl.LightningModule): self.log("val_acc", accuracy, sync_dist=True) self.log("hp_metric", accuracy, on_step=False, on_epoch=True, sync_dist=True)

def validation_epoch_end(self, output) -> None:
```
   return
```
def configure_optimizers(self) -> optim.Optimizer:
return optim.Adam(self.model.parameters())

@@ -124,19 +127,22 @@ def objective(trial: optuna.trial.Trial) -> float:
model = LightningNet(dropout, output_dims)
datamodule = FashionMNISTDataModule(data_dir=DIR, batch_size=BATCHSIZE)

callback = PyTorchLightningPruningCallback(trial, monitor="val_acc")
trainer = pl.Trainer(
logger=True,
limit_val_batches=PERCENT_VALID_EXAMPLES,
enable_checkpointing=False,
max_epochs=EPOCHS,
gpus=-1 if torch.cuda.is_available() else None,

   accelerator="ddp_cpu" if not torch.cuda.is_available() else None,

   accelerator="cpu" if not torch.cuda.is_available() else None,

   strategy="ddp_spawn",
   num_processes=os.cpu_count() if not torch.cuda.is_available() else None,

   callbacks=[PyTorchLightningPruningCallback(trial, monitor="val_acc")],

```
   callbacks=[callback],
```
)
hyperparameters = dict(n_layers=n_layers, dropout=dropout, output_dims=output_dims)
trainer.logger.log_hyperparams(hyperparameters)
trainer.fit(model, datamodule=datamodule)
callback.check_pruned()

return trainer.callback_metrics["val_acc"].item()

</details>

optuna/integration/pytorch_lightning.py

toshihikoyanase

We have some follow-up tasks as described in TODO comments, but I think we can work on them in a new PR.

LGTM. Thank you!

HideakiImamura

Thanks for the PR and sorry for the late reply. I checked the overall codes and basically looks good to me. Could you check my several comment?

optuna/integration/pytorch_lightning.py

HideakiImamura · 2023-03-03T09:32:13Z

optuna/integration/pytorch_lightning.py

+            self._trial.storage.set_trial_system_attr(self._trial._trial_id, _PRUNED_KEY, True)
+            self._trial.storage.set_trial_system_attr(self._trial._trial_id, _EPOCH_KEY, epoch)
+
+    def check_pruned(self) -> None:


This function is intended to be called by users by hand after the Trainer.fit. How about adding the concrete instruction on the document about where this function should be called?

I added some explanation and example codes in docstrings.

optuna/integration/pytorch_lightning.py

HideakiImamura

Thanks for the update. LGTM.

Support DDP for pytorch-lightning

fc1d81c

github-actions bot added the optuna.integration Related to the `optuna.integration` submodule. This is automatically labeled by github-actions. label Feb 2, 2023

Fix black error

e97f8af

Alnusjaponica marked this pull request as ready for review February 2, 2023 04:21

Alnusjaponica mentioned this pull request Feb 2, 2023

Fix checks integration about pytorch lightning #4322

Merged

toshihikoyanase assigned toshihikoyanase and HideakiImamura Feb 6, 2023

toshihikoyanase added the feature Change that does not break compatibility, but affects the public interfaces. label Feb 9, 2023

Alnusjaponica mentioned this pull request Feb 10, 2023

Update pytorch-lightning version optuna/optuna-examples#172

Merged

github-actions bot added the stale Exempt from stale bot labeling. label Feb 16, 2023

Merge branch 'optuna:master' into fix-checks-integration-pytorch-ligh…

b677a47

…tning-followup

github-actions bot removed the stale Exempt from stale bot labeling. label Feb 19, 2023

toshihikoyanase reviewed Feb 24, 2023

View reviewed changes

optuna/integration/pytorch_lightning.py Show resolved Hide resolved

optuna/integration/pytorch_lightning.py Outdated Show resolved Hide resolved

optuna/integration/pytorch_lightning.py Show resolved Hide resolved

Update optuna/integration/pytorch_lightning.py

a443502

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

Cleanup pytorch_lightning.py

2237b0b

toshihikoyanase reviewed Mar 3, 2023

View reviewed changes

Alnusjaponica and others added 2 commits March 3, 2023 11:34

Simplify the non-distributed case

215451a

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

Remove type: ignore

0f4d7ba

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

toshihikoyanase reviewed Mar 3, 2023

View reviewed changes

tests/integration_tests/test_pytorch_lightning.py Show resolved Hide resolved

Alnusjaponica and others added 2 commits March 3, 2023 11:35

Fix comment

2182c4a

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

Fix comment

11ff0e8

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

toshihikoyanase reviewed Mar 3, 2023

View reviewed changes

tests/integration_tests/test_pytorch_lightning.py Outdated Show resolved Hide resolved

tests/integration_tests/test_pytorch_lightning.py Outdated Show resolved Hide resolved

Fix comment

d35201d

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

Alnusjaponica and others added 3 commits March 3, 2023 11:53

Fix comment

19142d5

Co-authored-by: Toshihiko Yanase <toshihiko.yanase@gmail.com>

Fix comment

39e671a

Fix type annotations

7c0fbc9

toshihikoyanase reviewed Mar 3, 2023

View reviewed changes

optuna/integration/pytorch_lightning.py Show resolved Hide resolved

Add todo

c3262a1

toshihikoyanase approved these changes Mar 3, 2023

View reviewed changes

toshihikoyanase removed their assignment Mar 3, 2023

HideakiImamura reviewed Mar 3, 2023

View reviewed changes

Alnusjaponica mentioned this pull request Mar 9, 2023

Fix PytorchLightningPruningCallback for pytorch_lightning>=1.8.0 #4499

Closed

Alnusjaponica added 7 commits March 9, 2023 14:25

Add examples in docstring

0611b05

Fix blackdoc error

5d61945

Enable check_pruned() for Single process

e3f5dc6

Initialize intermediate only when global zero

8a1f343

Fix error

bdb453c

Remove example code

b66fc02

Ignore check_pruned() in non_DDP situation

d14363c

Alnusjaponica added a commit to Alnusjaponica/optuna-examples that referenced this pull request Mar 10, 2023

Support DDP according to optuna/optuna#4384

281ada1

Alnusjaponica mentioned this pull request Mar 10, 2023

Update pytorch lightning version for ddp optuna/optuna-examples#179

Merged

HideakiImamura approved these changes Mar 13, 2023

View reviewed changes

HideakiImamura merged commit 6bf2e2e into optuna:master Mar 13, 2023

HideakiImamura added this to the v3.2.0 milestone Mar 13, 2023

Alnusjaponica deleted the fix-checks-integration-pytorch-lightning-followup branch March 13, 2023 10:00

nzw0301 pushed a commit to nzw0301/optuna-examples that referenced this pull request May 4, 2023

Support DDP according to optuna/optuna#4384

8936811

Akshay1-6180 mentioned this pull request Feb 4, 2024

Is Optunan only usable with ddp_spawn (pytorch lightning) or can it also be used with strategy='ddp'? #3281

Closed

Uh oh!

Conversation

Alnusjaponica commented Feb 2, 2023

Motivation

Description of the changes

Uh oh!

Alnusjaponica commented Feb 2, 2023

Uh oh!

codecov-commenter commented Feb 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

toshihikoyanase commented Feb 6, 2023

Uh oh!

toshihikoyanase commented Feb 9, 2023

Uh oh!

github-actions bot commented Feb 16, 2023

Uh oh!

toshihikoyanase left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Alnusjaponica commented Feb 24, 2023

Uh oh!

toshihikoyanase commented Mar 2, 2023

Uh oh!

toshihikoyanase left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

toshihikoyanase left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

toshihikoyanase left a comment

Choose a reason for hiding this comment

Uh oh!

HideakiImamura left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HideakiImamura Mar 3, 2023

Choose a reason for hiding this comment

Uh oh!

Alnusjaponica Mar 9, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

HideakiImamura left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Feb 2, 2023 •

edited

Loading