Skip to content
This repository was archived by the owner on Nov 3, 2023. It is now read-only.

Fix ray_ddp_sharded_example#153

Merged
amogkam merged 1 commit intoray-project:mainfrom
chongxiaoc:ddp_sharded_example
May 18, 2022
Merged

Fix ray_ddp_sharded_example#153
amogkam merged 1 commit intoray-project:mainfrom
chongxiaoc:ddp_sharded_example

Conversation

@chongxiaoc
Copy link
Copy Markdown
Contributor

@chongxiaoc chongxiaoc commented May 18, 2022

Fix on_train_epoch_end() hook usage.
From PTL interface definition, we don't need the additional outputs arg.

Otherwise I'm seeing the below error on PTL==1.5.9:

ray.exceptions.RayTaskError(TypeError): ray::RayExecutor.execute() (pid=2128, ip=10.86.3.39, repr=<ray_lightning.ray_ddp.RayExecutor object at 0x7fc83d8a7350>)
  File "/usr/lib/python3.7/site-packages/ray_lightning/ray_ddp.py", line 63, in execute
    return fn(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/ray_lightning/ray_ddp_sharded.py", line 34, in execute_remote
    model=self._model, global_rank=global_rank, queue=queue)
  File "/usr/lib/python3.7/site-packages/ray_lightning/ray_ddp.py", line 472, in execute_remote
    results = self.lightning_module.trainer.run_stage()
  File "/usr/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1289, in run_stage
    return self._run_train()
  File "/usr/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1319, in _run_train
    self.fit_loop.run()
  File "/usr/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 145, in run
    self.advance(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/pytorch_lightning/loops/fit_loop.py", line 234, in advance
    self.epoch_loop.run(data_fetcher)
  File "/usr/lib/python3.7/site-packages/pytorch_lightning/loops/base.py", line 151, in run
    output = self.on_run_end()
  File "/usr/lib/python3.7/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 298, in on_run_end
    self.trainer.call_hook("on_train_epoch_end")
  File "/usr/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1495, in call_hook
    callback_fx(*args, **kwargs)
  File "/usr/lib/python3.7/site-packages/pytorch_lightning/trainer/callback_hook.py", line 93, in on_train_epoch_end
    callback.on_train_epoch_end(self, self.lightning_module)
TypeError: on_train_epoch_end() missing 1 required positional argument: 'outputs'

Fix on_train_epoch_end() hook usage.
From PTL interface definition, we don't need the additional `outputs` arg.
Copy link
Copy Markdown
Collaborator

@amogkam amogkam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @chongxiaoc!

@amogkam
Copy link
Copy Markdown
Collaborator

amogkam commented May 18, 2022

Will merge after tests finish!

@amogkam amogkam changed the title Example: ray_ddp_sharded_example.py Fix ray_ddp_sharded_example May 18, 2022
@amogkam amogkam merged commit a927dfa into ray-project:main May 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants