Skip to content

Fix TFTransfoXLLMHeadModel outputs#16590

Merged
ydshieh merged 1 commit intohuggingface:mainfrom
ydshieh:fix_tf_transfoXL_lm_output
Apr 6, 2022
Merged

Fix TFTransfoXLLMHeadModel outputs#16590
ydshieh merged 1 commit intohuggingface:mainfrom
ydshieh:fix_tf_transfoXL_lm_output

Conversation

@ydshieh
Copy link
Collaborator

@ydshieh ydshieh commented Apr 4, 2022

What does this PR do?

Fix the outputs of TFTransfoXLLMHeadModel (in the case without labels) - current TF returns softmax_output while PT returns prediction_scores:

Remarks:

  • The case with labels is much more complicated - to be addressed in the future.
  • The current PT/TF equivalence test has a bit flaw and doesn't detect this issue. A WIP PR Improve PT/TF equivalence test #16557 is on its way (with other enhancements)!

@ydshieh ydshieh changed the title fix Fix TFTransfoXLLMHeadModel outputs Apr 4, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Apr 4, 2022

The documentation is not available anymore as the PR was closed or merged.

@ydshieh ydshieh requested review from LysandreJik, gante, patrickvonplaten and sgugger and removed request for LysandreJik April 4, 2022 16:40
pred_hid = last_hidden[:, -tgt_len:]

softmax_output = self.crit(pred_hid, labels, training=training)
prediction_scores = softmax_output if labels is None else ()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty tuple? Is that really what's expected? Seems weird.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in PyTorch version. I am not sure about the reason why not to output prediction_scores (empty tuple).

softmax_output is returned by ProjectedAdaptiveLogSoftmax, which is quite complicated. In particular, when labels is passed to ProjectedAdaptiveLogSoftmax, the logic is very different from not passing labels.

Copy link
Collaborator Author

@ydshieh ydshieh Apr 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, there is one reason doing so in PyTorch version:

 softmax_output.view(bsz, tgt_len, -1)

This shape only works when labels is not used.
(There is truncation/shift when labels is passed)

See

if labels is not None:
# Shift so that tokens < n predict n
hidden = hidden[..., :-1, :].contiguous()

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok to align the two APIs!

Copy link
Contributor

@gante gante left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, as it makes the two versions closer to each other

@ydshieh ydshieh merged commit 2aef4cf into huggingface:main Apr 6, 2022
@ydshieh ydshieh deleted the fix_tf_transfoXL_lm_output branch April 6, 2022 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants