Skip to content

Cleanup shapes in model.input_schema and output_schema#628

Merged
rnyak merged 11 commits intomainfrom
cleanup_shapes
Mar 10, 2023
Merged

Cleanup shapes in model.input_schema and output_schema#628
rnyak merged 11 commits intomainfrom
cleanup_shapes

Conversation

@rnyak
Copy link
Copy Markdown
Contributor

@rnyak rnyak commented Mar 1, 2023

This work addresses the Capturing shapes everywhere work. There are a bunch of places in Merlin that should fill information into the shape. for now we modified only model input_schema and output_schema code, but once we move to Schema from core, we can do more updates, accordingly.

@rnyak rnyak requested a review from karlhigley March 1, 2023 19:25
@rnyak rnyak added chore Maintenance for the repository enhancement New feature or request labels Mar 1, 2023
@rnyak rnyak changed the title cleanup shapes in model.input_schema and output_schema [DRAFT] cleanup shapes in model.input_schema and output_schema Mar 1, 2023
@rnyak rnyak added this to the Merlin 23.03 milestone Mar 1, 2023
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 1, 2023

Comment thread transformers4rec/torch/model/base.py Outdated
is_list = column.value_count.max > 0
dims = None
if column.value_count.max > 0:
dims = (None, column.value_count.max)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the value_counts always have the same size here? If so, it should still be safe to record the second dimension as (column.value_count.min, column.value_count.max) just in case they differ at some point (like after we add ragged input support to T4R.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. so if we dont use padding in NVT workflow, value_counts.min() and value_counts.max() are different for the transformed dataset coming out of NVT workflow. but the padding is applied in the datalader (if I am not mistaken model takes dense inputs..)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, as I understand it, the current state of the Merlin-verse is that the T4R input layers expect dense fixed-size inputs and the dataloader bridges the gap by applying padding. I think that might change in the semi-near-future though, since it seems like we've settled on supporting both fixed size and ragged inputs everywhere in Merlin.

@rnyak rnyak changed the title [DRAFT] cleanup shapes in model.input_schema and output_schema Cleanup shapes in model.input_schema and output_schema Mar 8, 2023
@rnyak rnyak merged commit 5047d17 into main Mar 10, 2023
@rnyak rnyak deleted the cleanup_shapes branch March 10, 2023 19:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore Maintenance for the repository enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants