Skip to content

Potential race conditions between multiple workers trying to download and cache the same file in torch.hub.load_state_dict_from_url and torch.hub.download_url_to_file <- duplicate dataset/model downloads across DDP workers #68320

@vadimkantorov

Description

@vadimkantorov

if not os.path.exists(cached_file):

There should be a recommendation how to cache / download files and checkpoints correctly betwen multiple workers or at least warn of these problems

Same might happen if all of workers are using torchvision pretrained models? (which lead to this caching presumably)

cc @ezyang @gchanan @zou3519 @nairbv @NicolasHug @vmoens @jdsgomes

Metadata

Metadata

Assignees

No one assigned

    Labels

    actionablehigh prioritymodule: hubtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions