Skip to content

[tune](deps): Bump pytorch-lightning from 1.0.3 to 1.3.1 in /python/requirements#13

Closed
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/pip/python/requirements/pytorch-lightning-1.3.1
Closed

[tune](deps): Bump pytorch-lightning from 1.0.3 to 1.3.1 in /python/requirements#13
dependabot[bot] wants to merge 1 commit intomasterfrom
dependabot/pip/python/requirements/pytorch-lightning-1.3.1

Conversation

@dependabot
Copy link
Copy Markdown

@dependabot dependabot bot commented on behalf of github May 11, 2021

Bumps pytorch-lightning from 1.0.3 to 1.3.1.

Release notes

Sourced from pytorch-lightning's releases.

Standard weekly patch release

[1.3.1] - 2021-05-11

Fixed

  • Fixed DeepSpeed with IterableDatasets (#7362)
  • Fixed Trainer.current_epoch not getting restored after tuning (#7434)
  • Fixed local rank displayed in console log (#7395)

Contributors

@​akihironitta @​awaelchli @​leezu

If we forgot someone due to not matching commit email with GitHub account, let us know :]

Lightning CLI, PyTorch Profiler, Improved Early Stopping

[1.3.0] - 2021-05-06

Added

  • Added support for the EarlyStopping callback to run at the end of the training epoch (#6944)
  • Added synchronization points before and after setup hooks are run (#7202)
  • Added a teardown hook to ClusterEnvironment (#6942)
  • Added utils for metrics to scalar conversions (#7180)
  • Added utils for NaN/Inf detection for gradients and parameters (#6834)
  • Added more explicit exception message when trying to execute trainer.test() or trainer.validate() with fast_dev_run=True (#6667)
  • Added LightningCLI class to provide simple reproducibility with minimum boilerplate training CLI (#4492, #6862, #7156, #7299)
  • Added gradient_clip_algorithm argument to Trainer for gradient clipping by value (#6123).
  • Added a way to print to terminal without breaking up the progress bar (#5470)
  • Added support to checkpoint after training steps in ModelCheckpoint callback (#6146)
  • Added TrainerStatus.{INITIALIZING,RUNNING,FINISHED,INTERRUPTED} (#7173)
  • Added Trainer.validate() method to perform one evaluation epoch over the validation set (#4948)
  • Added LightningEnvironment for Lightning-specific DDP (#5915)
  • Added teardown() hook to LightningDataModule (#4673)
  • Added auto_insert_metric_name parameter to ModelCheckpoint (#6277)
  • Added arg to self.log that enables users to give custom names when dealing with multiple dataloaders (#6274)
  • Added teardown method to BaseProfiler to enable subclasses defining post-profiling steps outside of __del__ (#6370)
  • Added setup method to BaseProfiler to enable subclasses defining pre-profiling steps for every process (#6633)
  • Added no return warning to predict (#6139)
  • Added Trainer.predict config validation (#6543)
  • Added AbstractProfiler interface (#6621)
  • Added support for including module names for forward in the autograd trace of PyTorchProfiler (#6349)
  • Added support for the PyTorch 1.8.1 autograd profiler (#6618)
  • Added outputs parameter to callback's on_validation_epoch_end & on_test_epoch_end hooks (#6120)
  • Added configure_sharded_model hook (#6679)
  • Added support for precision=64, enabling training with double precision (#6595)
  • Added support for DDP communication hooks (#6736)
  • Added artifact_location argument to MLFlowLogger which will be passed to the MlflowClient.create_experiment call (#6677)
  • Added model parameter to precision plugins' clip_gradients signature (#6764, #7231)
  • Added is_last_batch attribute to Trainer (#6825)

... (truncated)

Changelog

Sourced from pytorch-lightning's changelog.

[1.3.1] - 2021-05-11

Fixed

  • Fixed DeepSpeed with IterableDatasets (#7362)
  • Fixed Trainer.current_epoch not getting restored after tuning (#7434)
  • Fixed local rank displayed in console log (#7395)

[1.3.0] - 2021-05-06

Added

  • Added support for the EarlyStopping callback to run at the end of the training epoch (#6944)
  • Added synchronization points before and after setup hooks are run (#7202)
  • Added a teardown hook to ClusterEnvironment (#6942)
  • Added utils for metrics to scalar conversions (#7180)
  • Added utils for NaN/Inf detection for gradients and parameters (#6834)
  • Added more explicit exception message when trying to execute trainer.test() or trainer.validate() with fast_dev_run=True (#6667)
  • Added LightningCLI class to provide simple reproducibility with minimum boilerplate training CLI ( #4492, #6862, #7156, #7299)
  • Added gradient_clip_algorithm argument to Trainer for gradient clipping by value (#6123).
  • Added a way to print to terminal without breaking up the progress bar (#5470)
  • Added support to checkpoint after training steps in ModelCheckpoint callback (#6146)
  • Added TrainerStatus.{INITIALIZING,RUNNING,FINISHED,INTERRUPTED} (#7173)
  • Added Trainer.validate() method to perform one evaluation epoch over the validation set (#4948)
  • Added LightningEnvironment for Lightning-specific DDP (#5915)
  • Added teardown() hook to LightningDataModule (#4673)
  • Added auto_insert_metric_name parameter to ModelCheckpoint (#6277)
  • Added arg to self.log that enables users to give custom names when dealing with multiple dataloaders (#6274)
  • Added teardown method to BaseProfiler to enable subclasses defining post-profiling steps outside of __del__ (#6370)
  • Added setup method to BaseProfiler to enable subclasses defining pre-profiling steps for every process (#6633)
  • Added no return warning to predict (#6139)
  • Added Trainer.predict config validation (#6543)
  • Added AbstractProfiler interface (#6621)
  • Added support for including module names for forward in the autograd trace of PyTorchProfiler (#6349)
  • Added support for the PyTorch 1.8.1 autograd profiler (#6618)
  • Added outputs parameter to callback's on_validation_epoch_end & on_test_epoch_end hooks (#6120)
  • Added configure_sharded_model hook (#6679)
  • Added support for precision=64, enabling training with double precision (#6595)
  • Added support for DDP communication hooks (#6736)
  • Added artifact_location argument to MLFlowLogger which will be passed to the MlflowClient.create_experiment call (#6677)
  • Added model parameter to precision plugins' clip_gradients signature ( #6764, #7231)
  • Added is_last_batch attribute to Trainer (#6825)
  • Added LightningModule.lr_schedulers() for manual optimization (#6567)

... (truncated)

Commits

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot dependabot bot added the dependencies Pull requests that update a dependency file label May 11, 2021
@dependabot @github
Copy link
Copy Markdown
Author

dependabot bot commented on behalf of github May 22, 2021

Superseded by #17.

@dependabot dependabot bot closed this May 22, 2021
@dependabot dependabot bot deleted the dependabot/pip/python/requirements/pytorch-lightning-1.3.1 branch May 22, 2021 07:02
architkulkarni pushed a commit that referenced this pull request Jul 27, 2022
We encountered SIGSEGV when running Python test `python/ray/tests/test_failure_2.py::test_list_named_actors_timeout`. The stack is:

```
#0  0x00007fffed30f393 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&) ()
   from /lib64/libstdc++.so.6
#1  0x00007fffee707649 in ray::RayLog::GetLoggerName() () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#2  0x00007fffee70aa90 in ray::SpdLogMessage::Flush() () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#3  0x00007fffee70af28 in ray::RayLog::~RayLog() () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#4  0x00007fffee2b570d in ray::asio::testing::(anonymous namespace)::DelayManager::Init() [clone .constprop.0] ()
   from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#5  0x00007fffedd0d95a in _GLOBAL__sub_I_asio_chaos.cc () from /home/admin/dev/Arc/merge/ray/python/ray/_raylet.so
#6  0x00007ffff7fe282a in call_init.part () from /lib64/ld-linux-x86-64.so.2
#7  0x00007ffff7fe2931 in _dl_init () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff7fe674c in dl_open_worker () from /lib64/ld-linux-x86-64.so.2
#9  0x00007ffff7b82e79 in _dl_catch_exception () from /lib64/libc.so.6
#10 0x00007ffff7fe5ffe in _dl_open () from /lib64/ld-linux-x86-64.so.2
#11 0x00007ffff7d5f39c in dlopen_doit () from /lib64/libdl.so.2
#12 0x00007ffff7b82e79 in _dl_catch_exception () from /lib64/libc.so.6
#13 0x00007ffff7b82f13 in _dl_catch_error () from /lib64/libc.so.6
#14 0x00007ffff7d5fb09 in _dlerror_run () from /lib64/libdl.so.2
#15 0x00007ffff7d5f42a in dlopen@@GLIBC_2.2.5 () from /lib64/libdl.so.2
#16 0x00007fffef04d330 in py_dl_open (self=<optimized out>, args=<optimized out>)
    at /tmp/python-build.20220507135524.257789/Python-3.7.11/Modules/_ctypes/callproc.c:1369
```

The root cause is that when loading `_raylet.so`, `static DelayManager _delay_manager` is initialized and `RAY_LOG(ERROR) << "RAY_testing_asio_delay_us is set to " << delay_env;` is executed. However, the static variables declared in `logging.cc` are not initialized yet (in this case, `std::string RayLog::logger_name_ = "ray_log_sink"`).

It's better not to rely on the initialization order of static variables in different compilation units because it's not guaranteed. I propose to change all `RAY_LOG`s to `std::cerr` in `DelayManager::Init()`.

The crash happens in Ant's internal codebase. Not sure why this test case passes in the community version though.

BTW, I've tried different approaches:

1. Using a static local variable in `get_delay_us` and remove the global variable. This doesn't work because `init()` needs to access the variable as well.
2. Defining the global variable as type `std::unique_ptr<DelayManager>` and initialize it in `get_delay_us`. This works but it requires a lock to be thread-safe.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants