[RMP] Support pre-trained vector embeddings as input features into a model via the dataloader

```[tasklist]
### Tasks
- [ ] Add a draft title or issue reference here
```
## Problem:
Customers need a way to load embeddings that have been pretrained or trained from separate models into the model.
See https://github.com/NVIDIA-Merlin/Merlin/issues/471

## Goal:
Enable dataloading of separate embedding tables without having to add these embeddings into the interaction data during training.  For serving those embeddings need to be provided in the request to the model. The feature must be ueseable in production setting

## Constraints:
- [ ] External embedding tables may not fit on GPU.
- [ ] Non-trainable embeddings
- [ ] Fits in CPU memory, Larger than CPU memory is left for potential future work
- [ ] Not generating the embedding on the fly (future work)

Supporting pre-trained vector embeddings as features would provide baseline support for multi-modal use cases that rely on outside models to generate image/text embeddings.

## NVTabular
- [x] https://github.com/NVIDIA-Merlin/NVTabular/pull/1692
- [x] Update the c++ versions of categorify serving to match the new functionality
- [x] https://github.com/NVIDIA-Merlin/NVTabular/issues/1748
- [x] #972
- [x] #971
- [ ] [Feed pre-trained embeddings to NVTabular](https://github.com/NVIDIA-Merlin/dataloader/issues/124)
Is this part of this RMP ticket?


## Core
- [x] NVIDIA-Merlin/core#238

## Dataloader
 - [x] NVIDIA-Merlin/dataloader#31
 - [x] NVIDIA-Merlin/dataloader#32
 - [x] NVIDIA-Merlin/dataloader#34
 - [ ] Modify the padding operator to only allow padding values of 0 (in conjunction with the changes to categorify)

## Transformers4Rec
These features under T4R will not be in scope for this RMP ticket. The development will happen in Models. 
PR implementing pre-trained support in T4Rec: https://github.com/NVIDIA-Merlin/Transformers4Rec/pull/690
- [x] NVIDIA-Merlin/Transformers4Rec#682
- [x] NVIDIA-Merlin/Transformers4Rec#683
- [x] NVIDIA-Merlin/Transformers4Rec#684
- [x] NVIDIA-Merlin/Transformers4Rec#685
- [x] NVIDIA-Merlin/Transformers4Rec#500
- [x] https://github.com/NVIDIA-Merlin/Transformers4Rec/issues/485

Related PR: https://github.com/NVIDIA-Merlin/Transformers4Rec/pull/690

## Models (TF API)
[PR #1083 implementing pre-trained support in MM](https://github.com/NVIDIA-Merlin/models/pull/1083)
- [x] NVIDIA-Merlin/models#1071
- [x] NVIDIA-Merlin/models#1068
- [x] NVIDIA-Merlin/models#1073
- [x] NVIDIA-Merlin/models#1070
- [x] NVIDIA-Merlin/models#1072


## Merlin Systems
- [x] NVIDIA-Merlin/systems#210

## Examples
- [x] NVIDIA-Merlin/Transformers4Rec#501
- [x] NVIDIA-Merlin/models#788
- [x] NVIDIA-Merlin/Merlin#886
- [x] https://github.com/NVIDIA-Merlin/NVTabular/pull/1827 ( This ticket closes all NVT work ) 

## Documentation
- [x] #1000
- [ ] #1001

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RMP] Support pre-trained vector embeddings as input features into a model via the dataloader #211

Problem:

Goal:

Constraints:

NVTabular

Core

Dataloader

Transformers4Rec

Models (TF API)

Merlin Systems

Examples

Documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RMP] Support pre-trained vector embeddings as input features into a model via the dataloader #211

Description

Problem:

Goal:

Constraints:

NVTabular

Core

Dataloader

Transformers4Rec

Models (TF API)

Merlin Systems

Examples

Documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions