Fix TFTransfoXLLMHeadModel outputs by ydshieh · Pull Request #16590 · huggingface/transformers

ydshieh · 2022-04-04T16:21:39Z

What does this PR do?

Fix the outputs of TFTransfoXLLMHeadModel (in the case without labels) - current TF returns softmax_output while PT returns prediction_scores:

Current PT

transformers/src/transformers/models/transfo_xl/modeling_transfo_xl.py

Line 1119 in 6f9d8dc

    
           prediction_scores = softmax_output.view(bsz, tgt_len, -1) if labels is None else ()

and

transformers/src/transformers/models/transfo_xl/modeling_transfo_xl.py

Lines 1138 to 1140 in 6f9d8dc

    
           return TransfoXLLMHeadModelOutput( 
        
               loss=loss, 
        
               prediction_scores=prediction_scores,

Current TF

transformers/src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py

Lines 1005 to 1006 in 6f9d8dc

return TFTransfoXLLMHeadModelOutput(

prediction_scores=softmax_output,

Remarks:

The case with labels is much more complicated - to be addressed in the future.
The current PT/TF equivalence test has a bit flaw and doesn't detect this issue. A WIP PR Improve PT/TF equivalence test #16557 is on its way (with other enhancements)!

HuggingFaceDocBuilderDev · 2022-04-04T16:38:22Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2022-04-04T17:25:46Z

src/transformers/models/transfo_xl/modeling_tf_transfo_xl.py

        pred_hid = last_hidden[:, -tgt_len:]

        softmax_output = self.crit(pred_hid, labels, training=training)
+        prediction_scores = softmax_output if labels is None else ()


An empty tuple? Is that really what's expected? Seems weird.

This is done in PyTorch version. I am not sure about the reason why not to output prediction_scores (empty tuple).

softmax_output is returned by ProjectedAdaptiveLogSoftmax, which is quite complicated. In particular, when labels is passed to ProjectedAdaptiveLogSoftmax, the logic is very different from not passing labels.

Oh, there is one reason doing so in PyTorch version:

softmax_output.view(bsz, tgt_len, -1)

This shape only works when labels is not used.
(There is truncation/shift when labels is passed)

See

transformers/src/transformers/models/transfo_xl/modeling_transfo_xl_utilities.py

Lines 98 to 100 in 6f9d8dc

if labels is not None:

# Shift so that tokens < n predict n

hidden = hidden[..., :-1, :].contiguous()

sgugger

Ok to align the two APIs!

gante

LGTM, as it makes the two versions closer to each other

fix

34bb86d

ydshieh changed the title ~~fix~~ Fix TFTransfoXLLMHeadModel outputs Apr 4, 2022

ydshieh requested review from LysandreJik, gante, patrickvonplaten and sgugger and removed request for LysandreJik April 4, 2022 16:40

sgugger reviewed Apr 4, 2022

View reviewed changes

sgugger approved these changes Apr 4, 2022

View reviewed changes

gante approved these changes Apr 4, 2022

View reviewed changes

patrickvonplaten approved these changes Apr 6, 2022

View reviewed changes

ydshieh merged commit 2aef4cf into huggingface:main Apr 6, 2022

ydshieh deleted the fix_tf_transfoXL_lm_output branch April 6, 2022 13:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix TFTransfoXLLMHeadModel outputs#16590

Fix TFTransfoXLLMHeadModel outputs#16590
ydshieh merged 1 commit intohuggingface:mainfrom
ydshieh:fix_tf_transfoXL_lm_output

ydshieh commented Apr 4, 2022

Uh oh!

HuggingFaceDocBuilderDev commented Apr 4, 2022 •

edited

Loading

Uh oh!

sgugger Apr 4, 2022

Uh oh!

ydshieh Apr 4, 2022

Uh oh!

ydshieh Apr 4, 2022 •

edited

Loading

Uh oh!

sgugger left a comment

Uh oh!

gante left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	return TransfoXLLMHeadModelOutput(
	loss=loss,
	prediction_scores=prediction_scores,

	return TFTransfoXLLMHeadModelOutput(
	prediction_scores=softmax_output,

	if labels is not None:
	# Shift so that tokens < n predict n
	hidden = hidden[..., :-1, :].contiguous()

Conversation

ydshieh commented Apr 4, 2022

What does this PR do?

Remarks:

Uh oh!

HuggingFaceDocBuilderDev commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

ydshieh Apr 4, 2022

Choose a reason for hiding this comment

Uh oh!

ydshieh Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

gante left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

HuggingFaceDocBuilderDev commented Apr 4, 2022 •

edited

Loading

ydshieh Apr 4, 2022 •

edited

Loading