Skip to content

[DataPipe] Adding LengthSetterIterDataPipe#747

Closed
NivekT wants to merge 3 commits intogh/NivekT/90/basefrom
gh/NivekT/90/head
Closed

[DataPipe] Adding LengthSetterIterDataPipe#747
NivekT wants to merge 3 commits intogh/NivekT/90/basefrom
gh/NivekT/90/head

Conversation

@NivekT
Copy link
Contributor

@NivekT NivekT commented Aug 18, 2022

Stack from ghstack:

This DataPipe allows users to manually set length of an IterDataPipe with no other side effect. This is useful for DataPipes whose final length cannot be known in advance (e.g. filter). If you know the final length with certainty, you can manually set it for usages by DataLoader or other DataPipes.

Previously, users theoretically could use .header(length) to manually set length. However, the UX is suboptimal in that 1) the name is not obvious, and 2) warnings are raised unless the HeaderDataPipe has been fully traversed once to confirm the length.

Differential Revision: D38989359

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Aug 18, 2022
@NivekT NivekT changed the title [DataPipe] Adding LengthSetter [DataPipe] Adding LengthSetterIterDataPipe Aug 18, 2022
@NivekT NivekT added the topic: new feature topic category label Aug 18, 2022
@NivekT
Copy link
Contributor Author

NivekT commented Aug 19, 2022

Per @ejguan 's request, I am adding this to core as an alternative to this PR: pytorch/pytorch#83750

I will close this one if we decide the other PR is better.

This DataPipe allows users to manually set length of an `IterDataPipe` with no other side effect. This is useful for DataPipes whose final length cannot be known in advance (e.g. ``filter``). If you know the final length with certainty, you can manually set it for usages by DataLoader or other DataPipes.

Previously, users theoretically could use `.header(length)` to manually set length. However, the UX is suboptimal in that 1) the name is not obvious, and 2) warnings are raised unless the HeaderDataPipe has been fully traversed once to confirm the length. 

[ghstack-poisoned]
NivekT added a commit that referenced this pull request Aug 24, 2022
ghstack-source-id: 5116443
Pull Request resolved: #747
@NivekT
Copy link
Contributor Author

NivekT commented Aug 24, 2022

We will be landing this PR instead of pytorch/pytorch#83750.

@NivekT
Copy link
Contributor Author

NivekT commented Aug 24, 2022

@NivekT has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@NivekT NivekT requested a review from ejguan August 24, 2022 21:11
Copy link
Contributor

@ejguan ejguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One nit comment. Otherwise, LGTM. Thank you


def __init__(self, source_datapipe: IterDataPipe[T_co], length: int) -> None:
self.source_datapipe: IterDataPipe[T_co] = source_datapipe
self.length: int = length
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add one validation on the length >= 0.

This DataPipe allows users to manually set length of an `IterDataPipe` with no other side effect. This is useful for DataPipes whose final length cannot be known in advance (e.g. ``filter``). If you know the final length with certainty, you can manually set it for usages by DataLoader or other DataPipes.

Previously, users theoretically could use `.header(length)` to manually set length. However, the UX is suboptimal in that 1) the name is not obvious, and 2) warnings are raised unless the HeaderDataPipe has been fully traversed once to confirm the length.

Differential Revision: [D38989359](https://our.internmc.facebook.com/intern/diff/D38989359)

[ghstack-poisoned]
NivekT added a commit that referenced this pull request Aug 24, 2022
ghstack-source-id: a0bc04f
Pull Request resolved: #747
@NivekT
Copy link
Contributor Author

NivekT commented Aug 24, 2022

@NivekT has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot facebook-github-bot deleted the gh/NivekT/90/head branch August 29, 2022 14:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: new feature topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants