Change data_loader_engine to 'merlin' in examples#580
Change data_loader_engine to 'merlin' in examples#580edknv merged 1 commit intoNVIDIA-Merlin:mainfrom
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Click to view CI ResultsGitHub pull request #580 of commit 880aae1385b5604226e984aedcbdc659dce0993d, no merge conflicts.
Running as SYSTEM
Setting status of 880aae1385b5604226e984aedcbdc659dce0993d to PENDING with url http://merlin-infra1.nvidia.com:8080/job/transformers4rec_tests/409/ and message: 'Build started for merge commit.'
Using context: Jenkins Unit Test Run
Building on the built-in node in workspace /var/jenkins_home/jobs/transformers4rec_tests/workspace
using credential nvidia-merlin-bot
> git rev-parse --is-inside-work-tree # timeout=10
Fetching changes from the remote Git repository
> git config remote.origin.url https://github.com/NVIDIA-Merlin/Transformers4Rec.git # timeout=10
Fetching upstream changes from https://github.com/NVIDIA-Merlin/Transformers4Rec.git
> git --version # timeout=10
using GIT_ASKPASS to set credentials This is the bot credentials for our CI/CD
> git fetch --tags --force --progress -- https://github.com/NVIDIA-Merlin/Transformers4Rec.git +refs/pull/580/*:refs/remotes/origin/pr/580/* # timeout=10
> git rev-parse 880aae1385b5604226e984aedcbdc659dce0993d^{commit} # timeout=10
Checking out Revision 880aae1385b5604226e984aedcbdc659dce0993d (detached)
> git config core.sparsecheckout # timeout=10
> git checkout -f 880aae1385b5604226e984aedcbdc659dce0993d # timeout=10
Commit message: "Change data_loader_engine to in examples"
> git rev-list --no-walk 6e64490a3835814f6c465bbcdd1560386451a35f # timeout=10
[workspace] $ /bin/bash /tmp/jenkins7004963983918626290.sh
GLOB sdist-make: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/setup.py
py38-gpu recreate: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/.tox/py38-gpu
py38-gpu inst: /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec/.tox/.tmp/package/1/transformers4rec-0.1.14+34.g880aae13.zip
WARNING: Discarding $PYTHONPATH from environment, to override specify PYTHONPATH in 'passenv' in your configuration.
py38-gpu installed: absl-py==1.2.0,aiohttp==3.8.1,aiosignal==1.2.0,alabaster==0.7.12,alembic==1.8.1,anyio==3.6.1,argon2-cffi==21.3.0,argon2-cffi-bindings==21.2.0,astroid==2.5.6,asttokens==2.0.8,astunparse==1.6.3,asv==0.5.1,asvdb==0.4.2,async-timeout==4.0.2,attrs==22.1.0,autopage==0.5.1,awscli==1.27.30,Babel==2.10.3,backcall==0.2.0,beautifulsoup4==4.11.1,betterproto==1.2.5,bleach==5.0.1,boto3==1.24.75,botocore==1.29.30,Brotli==1.0.9,cachetools==5.2.0,certifi==2022.12.7,cffi==1.15.1,charset-normalizer==2.1.1,click==8.1.3,cliff==4.1.0,cloudpickle==2.2.0,cmaes==0.9.0,cmake==3.24.1.1,cmd2==2.4.2,colorama==0.4.4,colorlog==6.7.0,contourpy==1.0.5,coverage==6.5.0,cuda-python==11.7.1,cupy-cuda117==10.6.0,cycler==0.11.0,Cython==0.29.32,dask==2022.1.1,debugpy==1.6.3,decorator==5.1.1,defusedxml==0.7.1,dill==0.3.5.1,distlib==0.3.6,distributed==2022.5.1,distro==1.7.0,docker-pycreds==0.4.0,docutils==0.16,entrypoints==0.4,execnet==1.9.0,executing==1.0.0,faiss==1.7.2,faiss-gpu==1.7.2,fastai==2.7.9,fastapi==0.85.0,fastavro==1.6.1,fastcore==1.5.27,fastdownload==0.0.7,fastjsonschema==2.16.1,fastprogress==1.0.3,fastrlock==0.8,feast==0.19.4,fiddle==0.2.2,filelock==3.8.0,flatbuffers==1.12,fonttools==4.37.3,frozenlist==1.3.1,fsspec==2022.5.0,gast==0.4.0,gevent==21.12.0,geventhttpclient==2.0.2,gitdb==4.0.9,GitPython==3.1.27,google-api-core==2.10.1,google-auth==2.11.1,google-auth-oauthlib==0.4.6,google-pasta==0.2.0,googleapis-common-protos==1.52.0,graphviz==0.20.1,greenlet==1.1.3,grpcio==1.41.0,grpcio-channelz==1.49.0,grpcio-reflection==1.48.1,grpclib==0.4.3,h11==0.13.0,h2==4.1.0,h5py==3.7.0,HeapDict==1.0.1,horovod==0.26.1,hpack==4.0.0,httptools==0.5.0,hugectr2onnx==0.0.0,huggingface-hub==0.9.1,hyperframe==6.0.1,idna==3.4,imagesize==1.4.1,implicit==0.6.1,importlib-metadata==4.12.0,importlib-resources==5.9.0,iniconfig==1.1.1,ipykernel==6.15.3,ipython==8.5.0,ipython-genutils==0.2.0,jedi==0.18.1,Jinja2==3.1.2,jmespath==1.0.1,joblib==1.2.0,json5==0.9.10,jsonschema==4.16.0,jupyter-core==4.11.1,jupyter-server==1.18.1,jupyter_client==7.3.5,jupyterlab==3.4.7,jupyterlab-pygments==0.2.2,jupyterlab_server==2.15.1,keras==2.9.0,Keras-Preprocessing==1.1.2,kiwisolver==1.4.4,lazy-object-proxy==1.8.0,libclang==14.0.6,libcst==0.4.7,lightfm==1.16,lightgbm==3.3.2,llvmlite==0.39.1,locket==1.0.0,lxml==4.9.1,Mako==1.2.4,Markdown==3.4.1,MarkupSafe==2.1.1,matplotlib==3.6.0,matplotlib-inline==0.1.6,merlin-core==0.6.0+1.g5926fcf,merlin-models==0.7.0+11.g280956aa4,merlin-systems==0.5.0+4.g15074ad,mistune==2.0.4,mmh3==3.0.0,mpi4py==3.1.3,msgpack==1.0.4,multidict==6.0.2,mypy-extensions==0.4.3,nbclassic==0.4.3,nbclient==0.6.8,nbconvert==7.0.0,nbformat==5.5.0,nest-asyncio==1.5.5,ninja==1.10.2.3,notebook==6.4.12,notebook-shim==0.1.0,numba==0.56.2,numpy==1.22.4,nvidia-pyindex==1.0.9,nvtabular==1.4.0+8.g95e12d347,nvtx==0.2.5,oauthlib==3.2.1,oldest-supported-numpy==2022.8.16,onnx==1.12.0,onnxruntime==1.11.1,opt-einsum==3.3.0,optuna==3.0.4,packaging==21.3,pandas==1.3.5,pandavro==1.5.2,pandocfilters==1.5.0,parso==0.8.3,partd==1.3.0,pathtools==0.1.2,pbr==5.11.0,pexpect==4.8.0,pickleshare==0.7.5,Pillow==9.2.0,pkgutil_resolve_name==1.3.10,platformdirs==2.5.2,plotly==5.11.0,pluggy==1.0.0,prettytable==3.5.0,prometheus-client==0.14.1,promise==2.3,prompt-toolkit==3.0.31,proto-plus==1.19.6,protobuf==3.19.5,psutil==5.9.2,ptyprocess==0.7.0,pure-eval==0.2.2,py==1.11.0,pyarrow==7.0.0,pyasn1==0.4.8,pyasn1-modules==0.2.8,pybind11==2.10.0,pycparser==2.21,pydantic==1.10.2,pydot==1.4.2,Pygments==2.13.0,pynvml==11.4.1,pyparsing==3.0.9,pyperclip==1.8.2,pyrsistent==0.18.1,pytest==7.1.3,pytest-cov==4.0.0,pytest-xdist==3.1.0,python-dateutil==2.8.2,python-dotenv==0.21.0,python-rapidjson==1.8,pytz==2022.2.1,PyYAML==5.4.1,pyzmq==24.0.0,regex==2022.9.13,requests==2.28.1,requests-oauthlib==1.3.1,rsa==4.7.2,s3transfer==0.6.0,sacremoses==0.0.53,scikit-build==0.15.0,scikit-learn==1.1.2,scipy==1.8.1,Send2Trash==1.8.0,sentry-sdk==1.9.8,setproctitle==1.3.2,setuptools-scm==7.0.5,shortuuid==1.0.9,six==1.16.0,sklearn==0.0,smmap==5.0.0,sniffio==1.3.0,snowballstemmer==2.2.0,sortedcontainers==2.4.0,soupsieve==2.3.2.post1,Sphinx==5.3.0,sphinx-multiversion==0.2.4,sphinxcontrib-applehelp==1.0.2,sphinxcontrib-devhelp==1.0.2,sphinxcontrib-htmlhelp==2.0.0,sphinxcontrib-jsmath==1.0.1,sphinxcontrib-qthelp==1.0.3,sphinxcontrib-serializinghtml==1.1.5,SQLAlchemy==1.4.45,stack-data==0.5.0,starlette==0.20.4,stevedore==4.1.1,stringcase==1.2.0,tabulate==0.8.10,tblib==1.7.0,tenacity==8.0.1,tensorboard==2.9.1,tensorboard-data-server==0.6.1,tensorboard-plugin-wit==1.8.1,tensorflow-estimator==2.9.0,tensorflow-gpu==2.9.2,tensorflow-io-gcs-filesystem==0.27.0,tensorflow-metadata==1.10.0,termcolor==2.0.1,terminado==0.15.0,testbook==0.4.2,threadpoolctl==3.1.0,tinycss2==1.1.1,tokenizers==0.10.3,toml==0.10.2,tomli==2.0.1,toolz==0.12.0,torch==1.12.1+cu113,torchmetrics==0.3.2,tornado==6.2,tox==3.26.0,tqdm==4.64.1,traitlets==5.4.0,transformers==4.12.0,transformers4rec==0.1.14+34.g880aae13,treelite==2.3.0,treelite-runtime==2.3.0,tritonclient==2.25.0,typing-inspect==0.8.0,typing_extensions==4.3.0,urllib3==1.26.12,uvicorn==0.18.3,uvloop==0.17.0,virtualenv==20.16.5,wandb==0.13.3,watchfiles==0.17.0,wcwidth==0.2.5,webencodings==0.5.1,websocket-client==1.4.1,websockets==10.3,Werkzeug==2.2.2,wrapt==1.12.1,xgboost==1.6.2,yarl==1.8.1,zict==2.2.0,zipp==3.8.1,zope.event==4.5.0,zope.interface==5.4.0
py38-gpu run-test-pre: PYTHONHASHSEED='4291785767'
py38-gpu run-test: commands[0] | pip install --upgrade pip
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Requirement already satisfied: pip in ./.tox/py38-gpu/lib/python3.8/site-packages (22.3.1)
py38-gpu run-test: commands[1] | pip install .
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing /var/jenkins_home/workspace/transformers4rec_tests/transformers4rec
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Requirement already satisfied: tqdm>=4.27 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (4.64.1)
Requirement already satisfied: tensorflow-metadata in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (1.12.0)
Requirement already satisfied: transformers<4.19 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (4.18.0)
Requirement already satisfied: betterproto<2.0.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (1.2.5)
Requirement already satisfied: pyarrow>=1.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (10.0.1)
Requirement already satisfied: numpy>=1.17.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers4rec==0.1.14+34.g880aae13) (1.23.5)
Requirement already satisfied: stringcase in ./.tox/py38-gpu/lib/python3.8/site-packages (from betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (1.2.0)
Requirement already satisfied: grpclib in ./.tox/py38-gpu/lib/python3.8/site-packages (from betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (0.4.3)
Requirement already satisfied: packaging>=20.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (22.0)
Requirement already satisfied: regex!=2019.12.17 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2022.10.31)
Requirement already satisfied: huggingface-hub<1.0,>=0.1.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (0.11.1)
Requirement already satisfied: sacremoses in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (0.0.53)
Requirement already satisfied: tokenizers!=0.11.3,<0.13,>=0.11.1 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (0.12.1)
Requirement already satisfied: requests in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2.28.1)
Requirement already satisfied: filelock in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (3.8.2)
Requirement already satisfied: pyyaml>=5.1 in ./.tox/py38-gpu/lib/python3.8/site-packages (from transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (6.0)
Requirement already satisfied: absl-py<2.0.0,>=0.9 in ./.tox/py38-gpu/lib/python3.8/site-packages (from tensorflow-metadata->transformers4rec==0.1.14+34.g880aae13) (1.3.0)
Requirement already satisfied: protobuf<4,>=3.13 in ./.tox/py38-gpu/lib/python3.8/site-packages (from tensorflow-metadata->transformers4rec==0.1.14+34.g880aae13) (3.20.3)
Requirement already satisfied: googleapis-common-protos<2,>=1.52.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from tensorflow-metadata->transformers4rec==0.1.14+34.g880aae13) (1.57.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in ./.tox/py38-gpu/lib/python3.8/site-packages (from huggingface-hub<1.0,>=0.1.0->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (4.4.0)
Requirement already satisfied: h2<5,>=3.1.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (4.1.0)
Requirement already satisfied: multidict in ./.tox/py38-gpu/lib/python3.8/site-packages (from grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (6.0.3)
Requirement already satisfied: idna<4,>=2.5 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2022.12.7)
Requirement already satisfied: charset-normalizer<3,>=2 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (2.1.1)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./.tox/py38-gpu/lib/python3.8/site-packages (from requests->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (1.26.13)
Requirement already satisfied: click in ./.tox/py38-gpu/lib/python3.8/site-packages (from sacremoses->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (8.1.3)
Requirement already satisfied: joblib in ./.tox/py38-gpu/lib/python3.8/site-packages (from sacremoses->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (1.2.0)
Requirement already satisfied: six in ./.tox/py38-gpu/lib/python3.8/site-packages (from sacremoses->transformers<4.19->transformers4rec==0.1.14+34.g880aae13) (1.16.0)
Requirement already satisfied: hyperframe<7,>=6.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (6.0.1)
Requirement already satisfied: hpack<5,>=4.0 in ./.tox/py38-gpu/lib/python3.8/site-packages (from h2<5,>=3.1.0->grpclib->betterproto<2.0.0->transformers4rec==0.1.14+34.g880aae13) (4.0.0)
Building wheels for collected packages: transformers4rec
Building wheel for transformers4rec (pyproject.toml): started
Building wheel for transformers4rec (pyproject.toml): finished with status 'done'
Created wheel for transformers4rec: filename=transformers4rec-0.1.14+34.g880aae13-py3-none-any.whl size=481720 sha256=18f9978328d7d05c5abc992765c9085a9c4c14518012f7a169eb9f3718deff1a
Stored in directory: /tmp/pip-ephem-wheel-cache-rb8eubdj/wheels/cb/5d/b4/e081835ae498194a418e957657f998bdff0fa2bd103855a861
Successfully built transformers4rec
Installing collected packages: transformers4rec
Attempting uninstall: transformers4rec
Found existing installation: transformers4rec 0.1.14+34.g880aae13
Uninstalling transformers4rec-0.1.14+34.g880aae13:
Successfully uninstalled transformers4rec-0.1.14+34.g880aae13
Successfully installed transformers4rec-0.1.14+34.g880aae13
___________________________________ summary ____________________________________
py38-gpu: commands succeeded
congratulations :)
Performing Post build task...
Match found for : : True
Logical operation result is TRUE
Running script : #!/bin/bash
cd /var/jenkins_home/
CUDA_VISIBLE_DEVICES=2 python test_res_push.py "https://api.GitHub.com/repos/NVIDIA-Merlin/Transformers4Rec/issues/$ghprbPullId/comments" "/var/jenkins_home/jobs/$JOB_NAME/builds/$BUILD_NUMBER/log"
[workspace] $ /bin/bash /tmp/jenkins5679220209899127283.sh
|
| ```bash | ||
| DATA_PATH=~/transformers4rec_paper_preproc_datasets/ecom_rees46/ | ||
| FEATURE_SCHEMA_PATH=datasets_configs/ecom_rees46/rees46_schema.pbtxt | ||
| CUDA_VISIBLE_DEVICES=0 python3 -m t4r_paper_repro.transf_exp_main --output_dir ./tmp/ --overwrite_output_dir --do_train --do_eval --validate_every 10 --logging_steps 20 --save_steps 0 --data_path $DATA_PATH --features_schema_path $FEATURE_SCHEMA_PATH --fp16 --data_loader_engine nvtabular --start_time_window_index 1 --final_time_window_index 30 --time_window_folder_pad_digits 4 --model_type xlnet --loss_type cross_entropy --per_device_eval_batch_size 512 --similarity_type concat_mlp --tf_out_activation tanh --inp_merge mlp --learning_rate_warmup_steps 0 --learning_rate_schedule linear_with_warmup --hidden_act gelu --num_train_epochs 10 --dataloader_drop_last --compute_metrics_each_n_steps 1 --session_seq_length_max 20 --eval_on_last_item_seq_only --mf_constrained_embeddings --layer_norm_featurewise --attn_type bi --mlm --input_features_aggregation concat --per_device_train_batch_size 256 --learning_rate 0.00020171456712823088 --dropout 0.0 --input_dropout 0.0 --weight_decay 2.747484129693843e-05 --d_model 448 --item_embedding_dim 448 --n_layer 2 --n_head 8 --label_smoothing 0.5 --stochastic_shared_embeddings_replacement_prob 0.0 --item_id_embeddings_init_std 0.09 --other_embeddings_init_std 0.015 --mlm_probability 0.1 --embedding_dim_from_cardinality_multiplier 3.0 --eval_on_test_set --seed 100 --use_side_information_features |
There was a problem hiding this comment.
Hard to see the diff, but the only change is --data_loader_engine nvtabular -> --data_loader_engine merlin .
There was a problem hiding this comment.
@sararb do we still maintain tf4rec_paper_experiments ? Meaning if we change the dataloader_engine to merlin is that gonna break anything?
There was a problem hiding this comment.
We are still maintaining tf4rec_paper_experiments because the main script is used in the integration tests (here). Changing the dataloader_engine won't break anything because the merlin and nvtabular aliases are both referring to the sameMerlinDataLoader class.
Documentation previewhttps://nvidia-merlin.github.io/Transformers4Rec/review/pr-580 |
A follow-up to #547
Goals ⚽
data_loader_enginetomerlin(instead ofnvtabular).With the changes in PR #547,
nvtabularis now simply an alias tomerlin, i.e.,data_loader_engine=nvtabularis equivalent todata_loader_engine=merlinand both will use Merlin dataloader (nvtabularwas not removed for backward compatibility).This PR changes the customer-facing examples to use
data_loader_engine=merlinin order to promote Merlin Dataloader as the correct engine to use going forward.Implementation Details 🚧
The CI scripts are also changed to use
--data_loader_engine merlin.Testing Details 🔍
For the CI script changes, manually ran
./ci/test_integration.shin themerlin-pytorch:22.11container (after upgrading/installingcoreanddataloader).