Skip to content

[BUG] Model's performance discrepancy between evaluation and inference modes #525

@sararb

Description

@sararb

Bug description

This BUG is a summary of the thread discussion happening in #506. The main issue is the discrepancy of scores when applying the model using the masking to replace the last interaction with [MASK] embeddings and applying the same model without using the masking replacement. Instead of masking, the last interaction of the input sequence is explicitly set to 0 (to mimic the real-life inference scenario)

Given the input sequence, a subset of positions is replaced by special embeddings [MASK], and the model is trained to recover the original item-id from the corrupted input sequence. During inference, we don't rely on data corruption (using [MASK]). Instead, we apply the model to the original test input sequence. Hence, we observe a discrepancy in scores because the model learned latent information from [MASKING] embeddings which is not used in the original test data.

Steps/Code to reproduce bug

  1. I summarized the code shared in thread [QST] Unable to replicate evaluation metrics when using ignore_masking. #506 in this gist.
  2. The code can be added to the notebook 02-End-to-end-session-based-with-Yoochoose-PyT after the cell 10 to get the different test scores.

Expected behavior

  • The performance scores should match between the training/evaluation mode where data is corrupted with the [MASKING] embeddings and the test/inference mode where the masking is not applied to the original data.

Additional context

Some possible solutions to fix the discrepancy are:

  • Extend the test data with a dummy last interaction and use the model with ignore_masking=False to replace the last position with the [MASK] embeddings.
  • Replace the trainable masked embeddings tensor by the embeddings of the padding-idx 0

Metadata

Metadata

Assignees

Labels

P0bugSomething isn't working

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions