You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This BUG is a summary of the thread discussion happening in #506. The main issue is the discrepancy of scores when applying the model using the masking to replace the last interaction with [MASK] embeddings and applying the same model without using the masking replacement. Instead of masking, the last interaction of the input sequence is explicitly set to 0 (to mimic the real-life inference scenario)
Given the input sequence, a subset of positions is replaced by special embeddings [MASK], and the model is trained to recover the original item-id from the corrupted input sequence. During inference, we don't rely on data corruption (using [MASK]). Instead, we apply the model to the original test input sequence. Hence, we observe a discrepancy in scores because the model learned latent information from [MASKING] embeddings which is not used in the original test data.
The code can be added to the notebook 02-End-to-end-session-based-with-Yoochoose-PyT after the cell 10 to get the different test scores.
Expected behavior
The performance scores should match between the training/evaluation mode where data is corrupted with the [MASKING] embeddings and the test/inference mode where the masking is not applied to the original data.
Additional context
Some possible solutions to fix the discrepancy are:
Extend the test data with a dummy last interaction and use the model with ignore_masking=False to replace the last position with the [MASK] embeddings.
Replace the trainable masked embeddings tensor by the embeddings of the padding-idx 0
Bug description
This BUG is a summary of the thread discussion happening in #506. The main issue is the discrepancy of scores when applying the model using the masking to replace the last interaction with [MASK] embeddings and applying the same model without using the masking replacement. Instead of masking, the last interaction of the input sequence is explicitly set to
0(to mimic the real-life inference scenario)Given the input sequence, a subset of positions is replaced by special embeddings [MASK], and the model is trained to recover the original item-id from the corrupted input sequence. During inference, we don't rely on data corruption (using [MASK]). Instead, we apply the model to the original test input sequence. Hence, we observe a discrepancy in scores because the model learned latent information from [MASKING] embeddings which is not used in the original test data.
Steps/Code to reproduce bug
02-End-to-end-session-based-with-Yoochoose-PyTafter the cell 10 to get the different test scores.Expected behavior
Additional context
Some possible solutions to fix the discrepancy are:
ignore_masking=Falseto replace the last position with the [MASK] embeddings.0