Skip to content

[Fix] Avoid infinite GPU waiting in dist training#6501

Merged
ZwwWayne merged 5 commits intoopen-mmlab:dev-v2.19.0from
fingertap:master
Nov 24, 2021
Merged

[Fix] Avoid infinite GPU waiting in dist training#6501
ZwwWayne merged 5 commits intoopen-mmlab:dev-v2.19.0from
fingertap:master

Conversation

@fingertap
Copy link
Copy Markdown
Contributor

See #6495 for details. Here I just add an assertion before reducing the log_vars to ensure every GPU has the same length of log_vars to prevent infinite waiting.

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Distributed training will hang if log_vars has different length among GPUs

6 participants