Remove sparse tensor output type for list features#103
Merged
karlhigley merged 4 commits intoNVIDIA-Merlin:mainfrom Mar 16, 2023
Merged
Remove sparse tensor output type for list features#103karlhigley merged 4 commits intoNVIDIA-Merlin:mainfrom
karlhigley merged 4 commits intoNVIDIA-Merlin:mainfrom
Conversation
karlhigley
approved these changes
Mar 16, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Remove sparse tensor output type for list features
Motivation
The value count attributes of columns in
dataset.schemacurrently controls the output type of list columns.With the addition of shape in the schema NVIDIA-Merlin/Merlin#813 we're going to start seeing value counts specified more. This will result in unexpected output type's if previously a value count was not specified.
We also currently have a possibility of output type sparse tensor which doesn't have a clear use-case and appears to be an implementation detail of padding ragged columns to dense.
Current
value_count.maxspecified andis_ragged=Falsevalue_count.maxspecified andis_ragged=Truevalue_count.maxnot specified andis_ragged=TrueAfter - With this PR
Always returns this for all list columns. value count does not influence output type.
Planning a follow-up / independent change in #97 to return a dense tensor if the schema specifies that a column is of fixed size (
is_ragged=False).