Cleanup shapes in model.input_schema and output_schema#628
Conversation
Documentation previewhttps://nvidia-merlin.github.io/Transformers4Rec/review/pr-628 |
| is_list = column.value_count.max > 0 | ||
| dims = None | ||
| if column.value_count.max > 0: | ||
| dims = (None, column.value_count.max) |
There was a problem hiding this comment.
Do the value_counts always have the same size here? If so, it should still be safe to record the second dimension as (column.value_count.min, column.value_count.max) just in case they differ at some point (like after we add ragged input support to T4R.)
There was a problem hiding this comment.
makes sense. so if we dont use padding in NVT workflow, value_counts.min() and value_counts.max() are different for the transformed dataset coming out of NVT workflow. but the padding is applied in the datalader (if I am not mistaken model takes dense inputs..)
There was a problem hiding this comment.
Yeah, as I understand it, the current state of the Merlin-verse is that the T4R input layers expect dense fixed-size inputs and the dataloader bridges the gap by applying padding. I think that might change in the semi-near-future though, since it seems like we've settled on supporting both fixed size and ragged inputs everywhere in Merlin.
…nsformers4Rec into cleanup_shapes
This work addresses the Capturing shapes everywhere work. There are a bunch of places in Merlin that should fill information into the shape. for now we modified only model input_schema and output_schema code, but once we move to Schema from core, we can do more updates, accordingly.