Skip to content

Disable tests that use DataLoader with multiple workers for Windows#5322

Merged
soumith merged 1 commit intopytorch:masterfrom
yf225:num_workers
Feb 21, 2018
Merged

Disable tests that use DataLoader with multiple workers for Windows#5322
soumith merged 1 commit intopytorch:masterfrom
yf225:num_workers

Conversation

@yf225
Copy link
Copy Markdown
Contributor

@yf225 yf225 commented Feb 21, 2018

It seems that DataLoader with multiple workers have been causing CUDA out-of-memory errors in the Windows CI test (such as test_batch_sampler, test_multi_keep and test_multi_drop). @ssnl is looking into this issue.

Added to #4092.

Copy link
Copy Markdown
Collaborator

@ssnl ssnl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@soumith soumith merged commit 0340e46 into pytorch:master Feb 21, 2018
@yf225 yf225 deleted the num_workers branch February 22, 2018 22:27
AlexanderRadionov added a commit to AlexanderRadionov/pytorch that referenced this pull request Mar 6, 2018
ezyang pushed a commit that referenced this pull request Mar 23, 2018
dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate.

DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087

To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results.

To reproduce issue you may change ind_worker_queue to False and run the script several times.
Code to reproduce issue is in the corresponding PR.

* TestIndividualWorkerQueue added to DataLoader tests

* Review fixes

* "Simplify" code by removing itertools

* Rebase conflicts fix

* Review fixes

* Fixed shutdown behavior

* Removed ind_worker_queue flag.

* Rebase on master

* Disable tests that use DataLoader with multiple workers (#5322)
yf225 pushed a commit to yf225/pytorch that referenced this pull request Mar 29, 2018
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
dded ind_worker_queue parameter to data.DataLoader. It makes preprocessing determinate.

DataLoader in multiprocessing mode may cause non-deterministic issue. Even if radom_seed has frozen, each subprocess may get tasks in unstable order. This is caused by different I/O time while data loads. If you use augmentation while data loading, it makes results unreproduceble. Look at the https://discuss.pytorch.org/t/deterministic-non-deterministic-results-with-pytorch/9087

To fix this issue I have added the individual queue for each worker. In this case each worker get tasks in the stable order. In summary, subprocess produces the stable results.

To reproduce issue you may change ind_worker_queue to False and run the script several times.
Code to reproduce issue is in the corresponding PR.

* TestIndividualWorkerQueue added to DataLoader tests

* Review fixes

* "Simplify" code by removing itertools

* Rebase conflicts fix

* Review fixes

* Fixed shutdown behavior

* Removed ind_worker_queue flag.

* Rebase on master

* Disable tests that use DataLoader with multiple workers (pytorch#5322)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants