Skip to content

Fix test_remote_io.py due to mutating public s3 bucket#997

Closed
ShiboXing wants to merge 9 commits intometa-pytorch:mainfrom
ShiboXing:remote-test-fix
Closed

Fix test_remote_io.py due to mutating public s3 bucket#997
ShiboXing wants to merge 9 commits intometa-pytorch:mainfrom
ShiboXing:remote-test-fix

Conversation

@ShiboXing
Copy link

@ShiboXing ShiboXing commented Feb 8, 2023

Please read through our contribution guide prior to
creating your pull request.

  • Note that there is a section on requirements related to adding a new DataPipe.

Fixes #984

Changes

  • Add a private function to TestDataPipeRemoteIO in test_remote_io.py to get s3 objects count label through aws cli
  • add awscli in requirements

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Feb 8, 2023
@ShiboXing ShiboXing mentioned this pull request Feb 8, 2023
Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind removing this test?
https://github.com/pytorch/data/blob/e14927f32a6cf99658f72b765425bf032116f724/test/test_remote_io.py#L337-L343

It takes too much time as it would download the whole archives. We have used the test to validated S3FileLoader for an image.

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, you don't need to rely on CI to execute those tests. You can run the test on your local env by
pytest -v test/test_remote_io.py -k test_fsspec_io_iterdatapipe
pytest -v test/test_remote_io.py -k test_s3_io_iterdatapipe

@ShiboXing
Copy link
Author

Hi @ejguan, can you start the checks again? Added the recursive option.
Tests passed locally.

@ShiboXing
Copy link
Author

Hi @NivekT, @ejguan, have you seen a version conflict similar to the one we are getting in 3.11 conda? I wonder if you have any quick fix

@ejguan
Copy link
Contributor

ejguan commented Feb 8, 2023

Hi @NivekT, @ejguan, have you seen a version conflict similar to the one we are getting in 3.11 conda? I wonder if you have any quick fix

You can ignore it. Not related to your changes

facebook-github-bot pushed a commit that referenced this pull request Feb 9, 2023
…reset request in the dispatching process (#994)

Summary:
### Changes

- ~Fix S3 Tests~ in #997
- Ignore AttributeError for ReadingService when `gc` gets involved
- Add a reset_iterator_counter to dispatching process to guarantee that the datapipe is reset when all loops have received the request. Otherwise, datapipe can be reset in the middle of iteration of the other loops.
- Remove reference of the `thread` from `Prefetcher`. This would prevent racing condition when both `finally` in generator and `reset` function are accessing the same `thread`.

Pull Request resolved: #994

Reviewed By: NivekT

Differential Revision: D43094081

Pulled By: ejguan

fbshipit-source-id: 3065bc6a994cc0ff5047d9b71970abc4c8d3e369
@ejguan
Copy link
Contributor

ejguan commented Feb 9, 2023

@ShiboXing Can you please do a rebase onto main branch and force push? After that, I think this PR is ready.

@ShiboXing
Copy link
Author

@ejguan, rebased and force-pushed!

@facebook-github-bot
Copy link
Contributor

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort to make TochData' CI green again

@facebook-github-bot
Copy link
Contributor

@NivekT merged this pull request in c109c3a.

ShiboXing pushed a commit to ShiboXing/data that referenced this pull request Feb 9, 2023
)

Summary:
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to
creating your pull request.

- Note that there is a section on requirements related to adding a new DataPipe.

Fixes meta-pytorch#984
- Add a private function to TestDataPipeRemoteIO in test_remote_io.py to get s3 objects count label through aws cli
- add awscli in requirements

Pull Request resolved: meta-pytorch#997

Reviewed By: ejguan

Differential Revision: D43157757

Pulled By: NivekT

fbshipit-source-id: 7e9ee8299a28a087f88024c3b3e77be3bfe5adf0
ShiboXing pushed a commit to ShiboXing/data that referenced this pull request Feb 9, 2023
)

Summary:
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to
creating your pull request.

- Note that there is a section on requirements related to adding a new DataPipe.

Fixes meta-pytorch#984
- Add a private function to TestDataPipeRemoteIO in test_remote_io.py to get s3 objects count label through aws cli
- add awscli in requirements

Pull Request resolved: meta-pytorch#997

Reviewed By: ejguan

Differential Revision: D43157757

Pulled By: NivekT

fbshipit-source-id: 7e9ee8299a28a087f88024c3b3e77be3bfe5adf0
ShiboXing pushed a commit to ShiboXing/data that referenced this pull request Feb 9, 2023
)

Summary:
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to
creating your pull request.

- Note that there is a section on requirements related to adding a new DataPipe.

Fixes meta-pytorch#984
- Add a private function to TestDataPipeRemoteIO in test_remote_io.py to get s3 objects count label through aws cli
- add awscli in requirements

Pull Request resolved: meta-pytorch#997

Reviewed By: ejguan

Differential Revision: D43157757

Pulled By: NivekT

fbshipit-source-id: 7e9ee8299a28a087f88024c3b3e77be3bfe5adf0
kace added a commit to kace/data that referenced this pull request Feb 9, 2023
Fix test_remote_io.py due to mutating public s3 bucket (meta-pytorch#997)
ShiboXing pushed a commit to ShiboXing/data that referenced this pull request Feb 10, 2023
)

Summary:
Please read through our [contribution guide](https://github.com/pytorch/data/blob/main/CONTRIBUTING.md) prior to
creating your pull request.

- Note that there is a section on requirements related to adding a new DataPipe.

Fixes meta-pytorch#984
- Add a private function to TestDataPipeRemoteIO in test_remote_io.py to get s3 objects count label through aws cli
- add awscli in requirements

Pull Request resolved: meta-pytorch#997

Reviewed By: ejguan

Differential Revision: D43157757

Pulled By: NivekT

fbshipit-source-id: 7e9ee8299a28a087f88024c3b3e77be3bfe5adf0
@ShiboXing ShiboXing deleted the remote-test-fix branch February 21, 2023 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix tests for S3

4 participants