Fix TFTransfoXLLMHeadModel outputs#16590
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
| pred_hid = last_hidden[:, -tgt_len:] | ||
|
|
||
| softmax_output = self.crit(pred_hid, labels, training=training) | ||
| prediction_scores = softmax_output if labels is None else () |
There was a problem hiding this comment.
An empty tuple? Is that really what's expected? Seems weird.
There was a problem hiding this comment.
This is done in PyTorch version. I am not sure about the reason why not to output prediction_scores (empty tuple).
softmax_output is returned by ProjectedAdaptiveLogSoftmax, which is quite complicated. In particular, when labels is passed to ProjectedAdaptiveLogSoftmax, the logic is very different from not passing labels.
There was a problem hiding this comment.
Oh, there is one reason doing so in PyTorch version:
softmax_output.view(bsz, tgt_len, -1)
This shape only works when labels is not used.
(There is truncation/shift when labels is passed)
See
sgugger
left a comment
There was a problem hiding this comment.
Ok to align the two APIs!
gante
left a comment
There was a problem hiding this comment.
LGTM, as it makes the two versions closer to each other
What does this PR do?
Fix the outputs of
TFTransfoXLLMHeadModel(in the case withoutlabels) - current TF returnssoftmax_outputwhile PT returnsprediction_scores:Current PT
transformers/src/transformers/models/transfo_xl/modeling_transfo_xl.py
Line 1119 in 6f9d8dc
and
transformers/src/transformers/models/transfo_xl/modeling_transfo_xl.py
Lines 1138 to 1140 in 6f9d8dc
Current TF
transformers/src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py
Lines 1005 to 1006 in 6f9d8dc
Remarks:
labelsis much more complicated - to be addressed in the future.