[RMP] Provide PyTorch serving support for T4R models in Torchscript

## Problem:

Users should be able to serve pytorch models that were produced with Transformers4Rec or any other process using a Systems ensemble. This will work towards supporting Session-based models as well as expanding System's support for a new modeling framework.

## Goal:

Systems should be able to serve all pytorch models that are currently supported by Triton.

## Definition of Done
Have an example that serves PyT session based model in conjunction with a NVT workflow where the session based models scores the whole catalog

## Open questions

## Constraints:

Not all pytorch models can be served via Triton's `pytorch` backend, so we will need to be able to use multiple backends in order to serve all Triton-compatible pytorch models.

## Starting Point:


### Transformers4Rec
- [x] Update T4R to use Merlin std schema
  - [x] https://github.com/NVIDIA-Merlin/Transformers4Rec/pull/507
- [x] https://github.com/NVIDIA-Merlin/Transformers4Rec/issues/499
- [x] NVIDIA-Merlin/systems#322
- [x] https://github.com/NVIDIA-Merlin/Transformers4Rec/issues/668

### Systems

- [x] Use Triton's `pytorch` backend for "torchscriptable" models for optimized performance
  - [x] NVIDIA-Merlin/systems#153
  - [x] https://github.com/NVIDIA-Merlin/systems/pull/176
- [x] Load the schema files written out by T4R 

### Integration Issues

- [ ] Address [the issues](https://docs.google.com/document/d/1Wp_G_Bifrok0KvQUgQ2FP40oOSI3lvGqDH48YaK4IwI/edit?usp=sharing) @rnyak surfaced while trying to put together an example
  - [x] Fix missing value_count property and/or error from value_count op if sequence min and max length is same: This  PR https://github.com/NVIDIA-Merlin/core/pull/173 should fix that issue.
  - [x] Fix wrong response dimension from TIS after sending request. This PR https://github.com/NVIDIA-Merlin/Transformers4Rec/pull/543 should be fixing that issue.
  - [x] https://github.com/NVIDIA-Merlin/systems/issues/234
  - [x] https://github.com/NVIDIA-Merlin/core/issues/175
  - [x] https://github.com/NVIDIA-Merlin/systems/issues/240

Nice to have: (P1)
- [x] https://github.com/NVIDIA-Merlin/systems/issues/135

### Documentation
- [x] https://github.com/NVIDIA-Merlin/Transformers4Rec/issues/491
- [x] https://github.com/NVIDIA-Merlin/systems/issues/257
- [x] NVIDIA-Merlin/Merlin#904
- [x] NVIDIA-Merlin/Transformers4Rec#692
- [x] https://github.com/NVIDIA-Merlin/Transformers4Rec/issues/664

### Examples
- [x] NVIDIA-Merlin/Transformers4Rec#660
- [x] https://github.com/NVIDIA-Merlin/Transformers4Rec/issues/676

### Blockers:
- [ ] [[INF] Unresolved architectural decisions](https://github.com/NVIDIA-Merlin/Merlin/issues/776)
- [x] Support for ragged tensors in T4R
- [x] Start with fix padding and have it all padded to the same length and then investigate whether it's worthwhile to add the like padding to the Max Sequence length
- [x] Padding support in dataloader that supports systems along with cross framework support 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RMP] Provide PyTorch serving support for T4R models in Torchscript #255

Problem:

Goal:

Definition of Done

Open questions

Constraints:

Starting Point:

Transformers4Rec

Systems

Integration Issues

Documentation

Examples

Blockers:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RMP] Provide PyTorch serving support for T4R models in Torchscript #255

Description

Problem:

Goal:

Definition of Done

Open questions

Constraints:

Starting Point:

Transformers4Rec

Systems

Integration Issues

Documentation

Examples

Blockers:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions