Fix failing ci error related to sparse_names containing features that are not part of the model's schema#541
Conversation
Click to view CI ResultsGitHub pull request #541 of commit 2bcaec66c4b1943fcfebd1e1cb7e8dd0fcbd361c, no merge conflicts.
Running as SYSTEM
Setting status of 2bcaec66c4b1943fcfebd1e1cb7e8dd0fcbd361c to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/301/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on master in workspace /var/jenkins_home/workspace/transformers4rec_tests
using credential nvidia-merlin-bot
Cloning the remote Git repository
Cloning repository https://github.com/NVIDIA-Merlin/Transformers4Rec.git
> git init /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
> git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/* # timeout=10
> git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/541/*:refs/remotes/origin/pr/541/* # timeout=10
> git rev-parse 2bcaec66c4b1943fcfebd1e1cb7e8dd0fcbd361c^{commit} # timeout=10
Checking out Revision 2bcaec66c4b1943fcfebd1e1cb7e8dd0fcbd361c (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 2bcaec66c4b1943fcfebd1e1cb7e8dd0fcbd361c # timeout=10
Commit message: "fix failing ci error"
> git rev-list --no-walk 7c84cfb1253540af9abe9474b5d41585dc4e9041 # timeout=10
[transformers4rec_tests] $ /bin/bash /tmp/jenkins1662834209160290281.sh
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting git+https://github.com/NVIDIA-Merlin/NVTabular.git
Cloning https://github.com/NVIDIA-Merlin/NVTabular.git to /tmp/pip-req-build-olqd1lc4
Running command git clone --filter=blob:none --quiet https://github.com/NVIDIA-Merlin/NVTabular.git /tmp/pip-req-build-olqd1lc4
Resolved https://github.com/NVIDIA-Merlin/NVTabular.git to commit 21117cfc4c113b30036afcb97b6daa5f377996db
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: scipy in /usr/local/lib/python3.8/dist-packages (from nvtabular==1.6.0+5.g21117cfc) (1.8.1)
Requirement already satisfied: merlin-core>=0.2.0 in /usr/local/lib/python3.8/dist-packages (from nvtabular==1.6.0+5.g21117cfc) (0.6.0+1.g5926fcf)
Requirement already satisfied: merlin-dataloader>=0.0.2 in /usr/local/lib/python3.8/dist-packages (from nvtabular==1.6.0+5.g21117cfc) (0.0.2)
Requirement already satisfied: pandas<1.4.0dev0,>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.3.5)
Requirement already satisfied: betterproto<2.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.2.5)
Requirement already satisfied: packaging in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (21.3)
Requirement already satisfied: tqdm>=4.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (4.64.1)
Requirement already satisfied: dask>=2022.3.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (2022.5.1)
Requirement already satisfied: fsspec==2022.5.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (2022.5.0)
Requirement already satisfied: numba>=0.54 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (0.56.2)
Requirement already satisfied: tensorflow-metadata>=1.2.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.10.0)
Requirement already satisfied: distributed>=2022.3.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (2022.5.1)
Requirement already satisfied: protobuf>=3.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (3.19.5)
Requirement already satisfied: pyarrow>=5.0.0 in /usr/local/lib/python3.8/dist-packages (from merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (7.0.0)
Requirement already satisfied: numpy<1.25.0,>=1.17.3 in /usr/local/lib/python3.8/dist-packages (from scipy->nvtabular==1.6.0+5.g21117cfc) (1.22.4)
Requirement already satisfied: grpclib in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (0.4.3)
Requirement already satisfied: stringcase in /usr/local/lib/python3.8/dist-packages (from betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.2.0)
Requirement already satisfied: partd>=0.3.10 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.3.0)
Requirement already satisfied: cloudpickle>=1.1.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (2.2.0)
Requirement already satisfied: toolz>=0.8.2 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (0.12.0)
Requirement already satisfied: pyyaml>=5.3.1 in /usr/local/lib/python3.8/dist-packages (from dask>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (5.4.1)
Requirement already satisfied: tornado>=6.0.3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (6.2)
Requirement already satisfied: click>=6.6 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (8.1.3)
Requirement already satisfied: locket>=1.0.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.0.0)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (3.1.2)
Requirement already satisfied: msgpack>=0.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.0.4)
Requirement already satisfied: urllib3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.26.12)
Requirement already satisfied: psutil>=5.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (5.9.2)
Requirement already satisfied: sortedcontainers!=2.0.0,!=2.0.1 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (2.4.0)
Requirement already satisfied: zict>=0.1.3 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (2.2.0)
Requirement already satisfied: tblib>=1.6.0 in /usr/local/lib/python3.8/dist-packages (from distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.7.0)
Requirement already satisfied: importlib-metadata in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (4.12.0)
Requirement already satisfied: setuptools<60 in /usr/lib/python3/dist-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (45.2.0)
Requirement already satisfied: llvmlite<0.40,>=0.39.0dev0 in /usr/local/lib/python3.8/dist-packages (from numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (0.39.1)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /usr/local/lib/python3.8/dist-packages (from packaging->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (3.0.9)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (2.8.2)
Requirement already satisfied: pytz>=2017.3 in /usr/local/lib/python3.8/dist-packages (from pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (2022.2.1)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.52.0)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in /usr/local/lib/python3.8/dist-packages (from tensorflow-metadata>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.2.0)
Requirement already satisfied: six>=1.5 in /var/jenkins_home/.local/lib/python3.8/site-packages (from python-dateutil>=2.7.3->pandas<1.4.0dev0,>=1.2.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.15.0)
Requirement already satisfied: heapdict in /usr/local/lib/python3.8/dist-packages (from zict>=0.1.3->distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (1.0.1)
Requirement already satisfied: multidict in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (6.0.2)
Requirement already satisfied: h2<5,>=3.1.0 in /usr/local/lib/python3.8/dist-packages (from grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (4.1.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.8/dist-packages (from importlib-metadata->numba>=0.54->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (3.8.1)
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.8/dist-packages (from jinja2->distributed>=2022.3.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (2.1.1)
Requirement already satisfied: hyperframe<7,>=6.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (6.0.1)
Requirement already satisfied: hpack<5,>=4.0 in /usr/local/lib/python3.8/dist-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->merlin-core>=0.2.0->nvtabular==1.6.0+5.g21117cfc) (4.0.0)
============================= test session starts ==============================
platform linux -- Python 3.8.10, pytest-7.1.3, pluggy-1.0.0
rootdir: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
plugins: anyio-3.6.1, xdist-3.0.2, cov-4.0.0
collected 1 item
|
Documentation previewhttps://nvidia-merlin.github.io/Transformers4Rec/review/pr-541 |
|
I suspect some version of this error might be why saved TF models from Merlin Models are reporting all features returned by the dataloader as inputs. Seems like the dataloaders broke a lot of stuff this release cycle. 😅 |
dataset.schema. T4Rec was built on a previous convention where the data loader was loading only features specified in the schema passed to theTrainer(that matches the schema used to construct the model).dataset.schemabefore calling the dataloader. However,sparse_namesargument needs also to be filtered out to keep only the features of theschemaused to build the model.Note: I was able to catch another issue related to the types of features in the dataset stored in the drive (and used by ci test). In fact, the data contained a mix of float64 and float32 input features which breaks the input block of T4Rec as it expects all continuous features to have same type (to aggregate them into one vector). So I updated the data by casting all columns to float32 and load it back to the drive.