use scale=1.0 in floats_tensor called in speech model testers by ydshieh · Pull Request #17007 · huggingface/transformers

ydshieh · 2022-04-29T10:28:06Z

What does this PR do?

Fix the failure of Speech2TextModelTest.test_pt_tf_model_equivalence. This is caused by

transformers/tests/speech_to_text/test_modeling_speech_to_text.py

Lines 134 to 136 in e6f00a1

    
           input_features = floats_tensor( 
        
               [self.batch_size, self.seq_length, self.input_feat_per_channel], self.vocab_size 
        
           )

where the input_features get a large magnitude of 1e2 (from self.vocab_size=99).

(probably this happens because we just copied the input_ids = ids_tensor([self.batch_size, self.seq_length], self.vocab_size) from NLP models?)

I changed it to scale=1.0, but need @patrickvonplaten's expertise to make sure there was no particular reason to use self.vocab_size.

Details

Current speech model testers have

def prepare_config_and_inputs(self):
    input_values = floats_tensor([self.batch_size, self.seq_length], self.vocab_size)

The self.vocab_size argument is the scale, so the generated dummy input_values has the magnitude of self.vocab_size.
For Speech2TextModelTester, we have vocab_size=99.

Furthermore, Speech2TextEncoder has

transformers/src/transformers/models/speech_to_text/modeling_speech_to_text.py

Line 705 in e6f00a1

self.embed_scale = math.sqrt(embed_dim) if config.scale_embedding else 1.0

and from the tester's hidden_size=16, we get embed_scale=4.

The input_features goes through the conv layer(s) and being scaled:

transformers/src/transformers/models/speech_to_text/modeling_speech_to_text.py

Lines 767 to 768 in e6f00a1

    
           inputs_embeds = self.conv(input_features) 
        
           inputs_embeds = self.embed_scale * inputs_embeds

On CPU however, the conv layers of PT/TF gives diff. with a magnitude of 1e-7 for input values with 1s. So with the above 2 scalings, this error becomes 4e-5, and the PT/TF equiv. test fails.

HuggingFaceDocBuilderDev · 2022-04-29T10:44:41Z

The documentation is not available anymore as the PR was closed or merged.

patrickvonplaten · 2022-04-29T11:15:17Z

tests/data2vec/test_modeling_data2vec_audio.py


    def prepare_config_and_inputs(self):
-        input_values = floats_tensor([self.batch_size, self.seq_length], self.vocab_size)
+        input_values = floats_tensor([self.batch_size, self.seq_length], scale=1.0)


wow good catch!

patrickvonplaten

You're 100% right - this was indeed a bad copy paste!

patrickvonplaten · 2022-04-29T11:16:23Z

Thanks for fixing all the tests!

sgugger

Nice fix! Thanks a lot!

…gface#17007) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

use scale=1.0 in floats_tensor called in speech model testers

9a22702

ydshieh requested review from patrickvonplaten and sgugger April 29, 2022 10:28

patrickvonplaten reviewed Apr 29, 2022

View reviewed changes

patrickvonplaten approved these changes Apr 29, 2022

View reviewed changes

sgugger approved these changes Apr 29, 2022

View reviewed changes

ydshieh merged commit e952e04 into huggingface:main Apr 29, 2022

ydshieh deleted the fix_speech_to_text_ci_failure branch April 29, 2022 12:41

stevhliu pushed a commit to stevhliu/transformers that referenced this pull request May 3, 2022

use scale=1.0 in floats_tensor called in speech model testers (huggin…

db85031

…gface#17007) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

elusenji pushed a commit to elusenji/transformers that referenced this pull request Jun 12, 2022

use scale=1.0 in floats_tensor called in speech model testers (huggin…

30034dc

…gface#17007) Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use scale=1.0 in floats_tensor called in speech model testers#17007

use scale=1.0 in floats_tensor called in speech model testers#17007
ydshieh merged 1 commit intohuggingface:mainfrom
ydshieh:fix_speech_to_text_ci_failure

ydshieh commented Apr 29, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Apr 29, 2022 •

edited

Loading

Uh oh!

patrickvonplaten Apr 29, 2022 •

edited

Loading

Uh oh!

patrickvonplaten left a comment

Uh oh!

patrickvonplaten commented Apr 29, 2022

Uh oh!

sgugger left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	input_features = floats_tensor(
	[self.batch_size, self.seq_length, self.input_feat_per_channel], self.vocab_size
	)

	inputs_embeds = self.conv(input_features)
	inputs_embeds = self.embed_scale * inputs_embeds

Conversation

ydshieh commented Apr 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Details

Uh oh!

HuggingFaceDocBuilderDev commented Apr 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

patrickvonplaten Apr 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten left a comment

Choose a reason for hiding this comment

Uh oh!

patrickvonplaten commented Apr 29, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ydshieh commented Apr 29, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 29, 2022 •

edited

Loading

patrickvonplaten Apr 29, 2022 •

edited

Loading