Change keepdim argument to False in loss function#9
Conversation
|
Thanks for your attention and the great contribution! It's indeed a bug and seems influencing the pre-training procedure. The bug may bring additional noises to the pre-training objectives. After merging this PR, I suppose the pre-training cound be more efficient and converges faster. The results could be slightly different comparing to what we've reported. I'll continue test the performance after fixing this bug and will update later. Thanks again for the PR! |
|
Thank you for your response. However, this modification will not affect the results, as the final use of the mean function will lead to the same results as before, only reducing computation and memory consumption. |
Thanks! I hope so but while testing I found that if we use [[a1], [a2], [a3]] / [b1, b2, b3], it seems that the result will be: and only those elements of matrix diagonal are what we need. I think the other elements could be noises. |
In this commit, the keepdim=True argument in the loss function has been removed as it is unnecessary. In the original code, the shape of neg_logits is (batch_size,), and the shape of pos_logits is (batch_size, 1). The shape of the result of dividing the two is (batch_size, batch_size)