Skip to content

[RMP] Support pre-trained vector embeddings as input features into a model via the dataloader #211

@karlhigley

Description

@karlhigley
### Tasks
- [ ] Add a draft title or issue reference here

Problem:

Customers need a way to load embeddings that have been pretrained or trained from separate models into the model.
See #471

Goal:

Enable dataloading of separate embedding tables without having to add these embeddings into the interaction data during training. For serving those embeddings need to be provided in the request to the model. The feature must be ueseable in production setting

Constraints:

  • External embedding tables may not fit on GPU.
  • Non-trainable embeddings
  • Fits in CPU memory, Larger than CPU memory is left for potential future work
  • Not generating the embedding on the fly (future work)

Supporting pre-trained vector embeddings as features would provide baseline support for multi-modal use cases that rely on outside models to generate image/text embeddings.

NVTabular

Core

Dataloader

Transformers4Rec

These features under T4R will not be in scope for this RMP ticket. The development will happen in Models.
PR implementing pre-trained support in T4Rec: NVIDIA-Merlin/Transformers4Rec#690

Related PR: NVIDIA-Merlin/Transformers4Rec#690

Models (TF API)

PR #1083 implementing pre-trained support in MM

Merlin Systems

Examples

Documentation

Metadata

Metadata

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions