Skip to content

Add a masks option to filter files in s3 datapipe#880

Closed
sebathomas wants to merge 1 commit intometa-pytorch:mainfrom
sebathomas:s3io_file_masks
Closed

Add a masks option to filter files in s3 datapipe#880
sebathomas wants to merge 1 commit intometa-pytorch:mainfrom
sebathomas:s3io_file_masks

Conversation

@sebathomas
Copy link
Contributor

Add a new option to the constructor of S3FileListerIterDataPipe that allows to filter the list of files with a pattern, using the existing filter function match_masks.

I added a unit test for the s3 datapipe and I tested it on my machine with a real S3 bucket.

Fixes #737.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 4, 2022
@sebathomas sebathomas marked this pull request as ready for review November 4, 2022 10:37
Copy link
Contributor

@NivekT NivekT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation looks fine. Do we want to add an extra argument to this DataPipe?

@ejguan WDYT?

Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you

@facebook-github-bot
Copy link
Contributor

@NivekT has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@NivekT
Copy link
Contributor

NivekT commented Nov 4, 2022

We will need to update some things to make the test compatible internally. We will keep you posted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support file mask on list_files_by_s3 like list_files

4 participants