Skip to content

Use multiple processes when extracting ImageNet training archive #2023

@pmeier

Description

@pmeier

🚀 Feature

Use multiple processes when extracting ImageNet training archive.

Motivation

I recently extracting the ImageNet training archive with the code of torchvision and was suprised how long it took. I realised that after extracting the main archive, we only extract the subarchives one after another:

for archive in archives:
extract_archive(archive, os.path.splitext(archive)[0], remove_finished=True)

Pitch

I think we can speed that up significantly by using multiple processes to do this simultaneously. IMO doing this would have no drawbacks.

Additional context

If we want this feature, I could take it up, albeit with a low priority.

cc @pmeier

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions