[BUG] Model's performance discrepancy between evaluation and inference modes

### Bug description

This BUG is a summary of the thread discussion happening in #506.  The main issue is the discrepancy of scores when applying the model using the masking to replace the last interaction with [MASK] embeddings and applying the same model without using the masking replacement. Instead of masking, the last interaction of the input sequence is explicitly set to `0` (to mimic the real-life inference scenario)

Given the input sequence, a subset of positions is replaced by special embeddings [MASK], and the model is trained to recover the original item-id from the corrupted input sequence.   During inference, we don't rely on data corruption (using [MASK]). Instead, we apply the model to the original test input sequence. Hence, we observe a discrepancy in scores because the model learned latent information from [MASKING] embeddings which is not used in the original test data. 


### Steps/Code to reproduce bug

1. I summarized the code shared in thread #506 in this [gist](https://gist.github.com/sararb/7e61b286aa612a2d707a59832bd948ff#file-replicate_t4rec_eval_metrics-py).
2. The code can be added to the notebook `02-End-to-end-session-based-with-Yoochoose-PyT` after the cell 10 to get the different test scores.

### Expected behavior
- The performance scores should match between the training/evaluation mode where data is corrupted with the [MASKING] embeddings and the test/inference mode where the masking is not applied to the original data. 

### Additional context
Some possible solutions to fix the discrepancy are: 
- Extend the test data with a dummy last interaction and use the model with `ignore_masking=False` to replace the last position with the [MASK] embeddings. 
- Replace the trainable masked embeddings tensor by the embeddings of the padding-idx `0`



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Model's performance discrepancy between evaluation and inference modes #525

Bug description

Steps/Code to reproduce bug

Expected behavior

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Model's performance discrepancy between evaluation and inference modes #525

Description

Bug description

Steps/Code to reproduce bug

Expected behavior

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions