-
Notifications
You must be signed in to change notification settings - Fork 7.2k
Open
Labels
Description
🚀 Feature
Use multiple processes when extracting ImageNet training archive.
Motivation
I recently extracting the ImageNet training archive with the code of torchvision and was suprised how long it took. I realised that after extracting the main archive, we only extract the subarchives one after another:
vision/torchvision/datasets/imagenet.py
Lines 183 to 184 in 3c254fb
| for archive in archives: | |
| extract_archive(archive, os.path.splitext(archive)[0], remove_finished=True) |
Pitch
I think we can speed that up significantly by using multiple processes to do this simultaneously. IMO doing this would have no drawbacks.
Additional context
If we want this feature, I could take it up, albeit with a low priority.
cc @pmeier
Reactions are currently unavailable