Update padding of ragged features to enable dataloader change #647
Update padding of ragged features to enable dataloader change #647karlhigley merged 13 commits intoNVIDIA-Merlin:mainfrom
Conversation
Documentation previewhttps://nvidia-merlin.github.io/Transformers4Rec/review/pr-647 |
| conts=None, | ||
| labels=None, | ||
| ): | ||
| schema = schema.select_by_name(conts + cats + labels) |
There was a problem hiding this comment.
Maybe we should remove the =None for cats, conts & labels?
There was a problem hiding this comment.
that's clearer without the default now. it wouldn't have worked as None before this change either.
There was a problem hiding this comment.
On second thought, I've put the None back and added support for the None to the method by setting to an empty list if not provided.
It doesn't appear to be captured by any tests, but this would enable training a model with only continuous or only categorical features. Currently you need to have at least one of each for the current version of this MerlinDataloader to work.
|
|
||
| batch_padded = {} | ||
| for k, values in batch.items(): | ||
| if k.endswith("__values"): |
There was a problem hiding this comment.
Wouldn't it be better to put __values and __offsets as constants somewhere in merlin-core? Or some util-method, like is_value or something.
There was a problem hiding this comment.
Perhaps TensorTable would be useful here since that seems to handle both the values/offsets dictionary and tuple variants for us.
| return pad_fn | ||
|
|
||
| @staticmethod | ||
| def _augment_schema( |
There was a problem hiding this comment.
Moved this from schema_utiils to here so that it's closer to the only place it's called in the codebase.
Goals ⚽
Implementation Details 🚧
Testing Details 🔍