Skip to content

Remove padding from NVTabular getting started example#677

Merged
oliverholworthy merged 8 commits intoNVIDIA-Merlin:mainfrom
oliverholworthy:serve-session-based-with-ragged-inputs-outputs
May 4, 2023
Merged

Remove padding from NVTabular getting started example#677
oliverholworthy merged 8 commits intoNVIDIA-Merlin:mainfrom
oliverholworthy:serve-session-based-with-ragged-inputs-outputs

Conversation

@oliverholworthy
Copy link
Copy Markdown
Contributor

Demonstrates how we can serve Transformers4Rec and NVTabular together with ragged outputs from the workflow and ragged inputs into the model

@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@github-actions
Copy link
Copy Markdown

@rnyak
Copy link
Copy Markdown
Contributor

rnyak commented Apr 21, 2023

@oliverholworthy this looks like ready to me, but it reads Draft.

@oliverholworthy oliverholworthy marked this pull request as ready for review April 26, 2023 10:57
@@ -63,18 +63,7 @@
"execution_count": 2,
Copy link
Copy Markdown
Contributor

@rnyak rnyak Apr 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oliver this number 486 does not match with the cardinality in the workflow.output_schema. are you using the different saved workflow by any chance? each time you rerun ETL you will get a different num of unique items since we gen data.


Reply via ReviewNB

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let me re-run both notebooks again. I only commited the part of the first notebook that had changed

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run all the notebooks now and should match up now with the schema

@rnyak
Copy link
Copy Markdown
Contributor

rnyak commented May 1, 2023

@oliverholworthy I pulled the latest main branches and then your PR and I am getting error from unit test from first ETL notebook.

E           File /usr/local/lib/python3.8/dist-packages/merlin/dag/executors.py:287, in DaskExecutor.transform(self, dataset, graph, output_dtypes, additional_columns, capture_dtypes, strict)
E               283 nodes = self._executor._output_nodes(graph)
E               285 self._clear_worker_cache()
E           --> 287 ddf = dataset.to_ddf()
E               289 # Check if we are only selecting columns (no transforms).
E               290 # If so, we should perform column selection at the ddf level.
E               291 # Otherwise, Dask will not push the column selection into the
E               292 # IO function.
E               293 if not nodes:
E           
E           File /usr/local/lib/python3.8/dist-packages/dask/dataframe/core.py:4686, in DataFrame.__getattr__(self, key)
E              4684     object.__getattribute__(self, key)
E              4685 else:
E           -> 4686     raise AttributeError("'DataFrame' object has no attribute %r" % key)
E           
E           AttributeError: 'DataFrame' object has no attribute 'to_ddf'

@oliverholworthy oliverholworthy merged commit e655580 into NVIDIA-Merlin:main May 4, 2023
@oliverholworthy oliverholworthy deleted the serve-session-based-with-ragged-inputs-outputs branch May 4, 2023 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/examples chore Maintenance for the repository

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants