Conversation
ejguan
left a comment
There was a problem hiding this comment.
Overall, LGTM. Thank you for adding this DataPipe. Could you please fix the lint Error?
For formatting, you can follow the instruction here: https://github.com/pytorch/data/blob/main/CONTRIBUTING.md#code-style. And, run pre-commit run --all-files to execute auto formatting.
For mypy, you can execute mypy --config-file mypy.ini to see the Error.
| if isinstance(old_item, tuple): | ||
| new_item = tuple(x for i, x in enumerate(old_item) if i not in self.indices) | ||
| elif isinstance(old_item, list): | ||
| new_item = [x for i, x in enumerate(old_item) if i not in self.indices] | ||
| elif isinstance(old_item, dict): | ||
| new_item = {k: v for (k, v) in old_item.items() if k not in self.indices} |
There was a problem hiding this comment.
Let's say we have input datapipe that yields tuple of size 3. But, we provide 3 to indices. Should we raise Error in that case?
There was a problem hiding this comment.
We can raise a warning by doing something like this and running a quick check over the indices:
#filter indices all present check
try:
for i in self.indices:
old_item[i]
except (IndexError, KeyError):
warnings.warn(
"At least one index in the filter is not present in the item being returned,"
" please be aware that expected columns/keys may be missing."
)
NivekT
left a comment
There was a problem hiding this comment.
LGTM! Just one small step to add this DataPipe to the documentation.
Add Dropper into docs/source/torchdata.datapipes.iter.rst. I think the category "Selecting" is the most fitting, you can put it right before "Filter".
|
@dbish has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
|
@dbish has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Please read through our contribution guide prior to
creating your pull request.
Fixes #656
Changes
part 1 of the feature requests to manipulate datapipe columns here: #656
this adds a drop functionality to iter datapipes