Generate tests: modality-agnostic input preparation by gante · Pull Request #33685 · huggingface/transformers

gante · 2024-09-24T18:13:35Z

What does this PR do?

Requirement for #33212
Follow-up to #33663

This PR rewrites the LLM-centric _get_input_ids_and_config(), the function that creates random model inputs for tests, into a modality-agnostic prepare_config_and_inputs_for_generate()

The rest of the diff consists in propagating the change. Highlights:

most _get_input_ids_and_config() overwrites were deleted as a result of the changes 🔪
most test generate calls receive a dictionary of input, as opposed to input_ids
_check_outputs now receives the model's main input, as opposed to input_ids
because of the changes above, a few test overwrites that needed to be updated could be deleted instead 🙏

In a follow-up PR: hunt tests that no longer need to be skipped/overwritten as a result of these changes 🎯

HuggingFaceDocBuilderDev · 2024-09-24T18:40:16Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2024-09-25T14:30:16Z

tests/generation/test_utils.py

+        # in some models we subsample the sequence length in inner layers
+        if hasattr(self.model_tester, "get_subsampled_output_lengths"):
+            seq_length = self.model_tester.get_subsampled_output_lengths(seq_length)


some models were overwritting _check_outputs to apply subsampling on the sequence length. Since it was a single pattern in common with the overwrites, and overloading couldn't be applied here (it changes the internals), I've decided to move the pattern here

gante · 2024-09-25T14:31:22Z

tests/models/chameleon/test_modeling_chameleon.py

        input_mask = None
        if self.use_input_mask:
-            input_mask = torch.tril(torch.ones(self.batch_size, self.seq_length)).to(torch_device)
+            input_mask = torch.tril(torch.ones_like(input_ids).to(torch_device))


common pattern: input_mask, which was then passed around as attention_mask, was a torch.float32 instead of a torch.long 👀

gante · 2024-09-25T14:32:25Z

tests/models/musicgen/test_modeling_musicgen.py

        pad_token_id=99,
        bos_token_id=99,
        num_codebooks=4,
+        audio_channels=1,


audio_channels=1 is the default in the config. In other words, this doesn't change the tests, but allows us to quickly override the config when needed (see overloaded test below)

gante · 2024-09-25T14:33:26Z

tests/models/musicgen/test_modeling_musicgen.py

            lm_heads = model.get_output_embeddings()
            self.assertTrue(lm_heads is None or isinstance(lm_heads[0], torch.nn.Linear))

-    def _get_input_ids_and_config(self, batch_size=2):


(all this deleted code corresponds to overwritten functions that no longer need to be overwritten)

That's very nice

zucchini-nlp

Awesome! So much code cleaned up, thanks! 💓

Overall looks good to me, just a few question for my general understanding. I see some VLMs are failing the CI. I remember skipping one of the beam search tests for VLMs earlier so it's prob that. But lmk if you want me to look at it :)

tests/generation/test_utils.py

zucchini-nlp · 2024-09-26T10:21:23Z

tests/generation/test_utils.py

        for model_class in self.all_generative_model_classes:
-            config, input_ids, attention_mask, inputs_dict = self._get_input_ids_and_config()
+            config, inputs_dict = self.prepare_config_and_inputs_for_generate()
+            main_input = inputs_dict[self.input_name]


wondering what happens if we use model_cls.main_input_name? AFAIR from few months ago, there were some inconsistencies in how model main input is defined, and we could make another round of cleaning up on that because main_input_name is also used in generate(). Maybe we can have a more generalized interface and testing suite?

I agree we should use the model's main_input_name. I will make the change and see what breaks 🤞

If all tests pass, I'll update it. Otherwise I'll add a TODO for us :)

Not only it worked, but also it allowed us to remove the input_name attribute from all testers 💛

tests/generation/test_utils.py

zucchini-nlp · 2024-09-26T10:43:10Z

tests/models/musicgen/test_modeling_musicgen.py

            lm_heads = model.get_output_embeddings()
            self.assertTrue(lm_heads is None or isinstance(lm_heads[0], torch.nn.Linear))

-    def _get_input_ids_and_config(self, batch_size=2):


zucchini-nlp · 2024-09-26T10:44:49Z

tests/models/reformer/test_modeling_reformer.py

-        config.forced_eos_token_id = None
-        return config, input_ids, attention_mask, inputs_dict
+        original_sequence_length = self.model_tester.seq_length
+        self.model_tester.seq_length = 16


nice to see a cleaner way

LysandreJik

Very welcome and clean changes! Nice to see less tests being overwritten 👌

ArthurZucker

Very welcome.

IMO we should have:

a dict of common forward input names, with default values, that range from 0 to the small model vocab size.
a variant for vLMs
a variant for image only
a variant for audio
And then you just take them using inspect to inspect the forward pass.

Now for testing the padding and else, of course you need somethibng else, but this way people don't have to do this ever again:

transformers/tests/models/sam/test_modeling_sam.py

Lines 448 to 458 in 1dba608

    
           def prepare_image(): 
        
               img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png" 
        
               raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB") 
        
               return raw_image 
        
           def prepare_dog_img(): 
        
               img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/dog-sam.png" 
        
               raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB") 
        
               return raw_image

gante · 2024-10-03T11:36:23Z

@ArthurZucker that is a cool idea, model-agnostic inputs, potentially dependent on modality (or perhaps simply looking at the signature of the fwd pass?)

I took note of it to explore after the current round of reactors, to avoid adding more parallel threads :D

gante added 2 commits September 24, 2024 18:11

tmp commit

49b3b38

tmp commit

047e6e0

gante added 9 commits September 24, 2024 18:59

tmp commit

db119a0

fix greedy tests

7b5e2aa

up to beam search

9c613f4

let's see what breaks

d274cda

delete a few more overwrites

d5ecc5d

fix contrastic search test

a63ef8c

fix left-padding test

d8f8bd5

fix a few more

d67e936

attn masks must be torch.long

2a5c0ce

gante commented Sep 25, 2024

View reviewed changes

comments

985c8af

gante marked this pull request as ready for review September 25, 2024 14:34

gante requested review from LysandreJik and zucchini-nlp September 25, 2024 14:34

zucchini-nlp approved these changes Sep 26, 2024

View reviewed changes

Merge branch 'main' into get_input_ids_and_config

3d6393d

LysandreJik approved these changes Sep 26, 2024

View reviewed changes

ArthurZucker approved these changes Sep 30, 2024

View reviewed changes

gante added 3 commits October 3, 2024 12:23

PR comments

79d96f9

nit

43fef27

rm input_name; nits

0e6a6ed

gante merged commit d29738f into huggingface:main Oct 3, 2024

gante deleted the get_input_ids_and_config branch October 3, 2024 13:01

zucchini-nlp mentioned this pull request Oct 4, 2024

Track progress for VLMs refactoring #33374

Closed

16 tasks

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

Generate tests: modality-agnostic input preparation (huggingface#33685)

8d35e47

	def prepare_image():
	img_url = "https://huggingface.co/ybelkada/segment-anything/resolve/main/assets/car.png"
	raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
	return raw_image


	def prepare_dog_img():
	img_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/model_doc/dog-sam.png"
	raw_image = Image.open(requests.get(img_url, stream=True).raw).convert("RGB")
	return raw_image

Conversation

gante commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 24, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gante Sep 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Oct 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

gante commented Sep 24, 2024 •

edited

Loading

gante Sep 25, 2024 •

edited

Loading

gante Sep 25, 2024 •

edited

Loading