Skip to content

Fix tests for S3 #984

@ejguan

Description

@ejguan

🐛 Describe the bug

S3 test is broken due to the the datasets have been updated in the public bucket, which we don't have any control.
Test case:
https://github.com/pytorch/data/blob/807db8f8c7282b2f48b48b1e07439c119a2ba12f/test/test_remote_io.py#L256-L291

And, we previously just fix the test by updating the number of files per bucket whenever the dataset update happened. It's not a long-term solution to maintain CI. To fix it, we might choose from the following solutions:

  • Only validate some files existing in the output, not the total file count from each bucket
  • Use mock to simulate the result
  • Add our own stable bucket for testing

I prefer the first solution for two reasons:

  • We want to test the functionality provided by _torchdata.so. Even though mocking the result of this extension would guarantee test green, it doesn't really cover the test over the extension.
  • The third option might work but it also means our own bucket will be exposed on Github, which is not ideal IMHO.

Versions

main

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions