NVTabular
NVTabular copied to clipboard
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
**What is your question?** I've installed NVTabular for 3 times today, and each time I encountered a different bug. It's quite torturing. The latest error was the closest to success,...
## Goal Reduce the resulting `int_domain.max` property by one on a ColumnSchema after transforming with `Categorify`. To match the data correctly. ## Motivation / Context This PR was motivated by...
graph of categorical features and the combination of categorical features i.e.('userId', 'movieId') and numerical feature i.e. (rating) were visualized and the difference can be seen in the uploaded script.
I tried changing the loss function to BCELoss. I got a message saying that Many models use a sigmoid layer right before the binary cross entropy layer? How do I...
We'd like to re-use some of the mechanics of graph execution (both local and distributed) in other parts of Merlin, so this is a step in the direction of disentangling...
**Describe the bug** I would like to jointly encode single and multi-hot categorical columns but I am getting the following error: ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input...
In this case, the piece of code that should be connecting the column name with the aggregation name incorrectly tries to join every character of the column name with the...
**Describe the bug** **Steps/Code to reproduce bug** The following code ``` import cudf import nvtabular as nvt import numpy as np gdf = cudf.DataFrame(data=[['apple', np.nan], ['apple', 'red'], ['apple', 'green'], ['orange',...
This is just a very early WIP, only adding it here to share current status with @bschifferer, still a lot of work to be done
**Describe the bug** This functionality is important because often we might want to group by an identifier column (such as `customer_id` for instance), perform some calculations on the groupings and...