BLIP: enable generation tests by zucchini-nlp · Pull Request #34174 · huggingface/transformers

zucchini-nlp · 2024-10-15T11:51:02Z

What does this PR do?

Enables generation tests for BLIP models, except BLIP-1 (turned out to be a bit harder). I changed the generation tests to use modelTest.input_name as BLIP is the only model that uses pixel values as main input and thus checking generated text length's will always fail.

I tried to get rid of custom generate for these models, but that opened a Pandora box so I think better not waste time on an old model and maintain it for a while, until the model gets deprecated. But still I did some changes so we don't need to add extra bos at the beginning and now the decoder-based BLIP models return full text at output. Encoder-decoder based models return only generated text, which is consistent with what an LLM should do

HuggingFaceDocBuilderDev · 2024-10-15T12:42:30Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

gante · 2024-10-16T15:38:19Z

changed the generation tests to use modelTest.input_name as BLIP is the only model that uses pixel values as main input and thus checking generated text length's will always fail.

I'd like very much to avoid this change -- extra logic for all tests to handle a niche corner case. Let's brainstorm alternatives! main_input in the generative tests is used to check the shapes. Perhaps we always want to look for input_ids or inputs_embeds in the input dictionary? 🤔

zucchini-nlp · 2024-10-16T16:02:01Z

@gante I tried to force input_ids always and I found another corner case with ~~Whisper which expects input_features to be the main input for shape checking~~. I would very much love to make BLIP standard and maybe I'll make so in v5, because it will break a whole lot of things.

OMG, I found an option while writing this reply, whisper and the other audio model are encoder-decoder so we can make it work by getting main input in decoder-only models. Just before the check happens, in the same indent block :)
If you agree, I'll make the change hehe

ArthurZucker

AH actually we might need / want to force return_dict to TRUE, to avoid all the if elses

gante · 2024-10-17T13:44:00Z

OMG, I found an option while writing this reply, whisper and the other audio model are encoder-decoder so we can make it work by getting main input in decoder-only models. Just before the check happens, in the same indent block :)
If you agree, I'll make the change hehe

if it works, sounds good! (make sure to leave a comment)

zucchini-nlp · 2024-10-21T11:03:32Z

@gante requesting re-review, since the input-name was merged as a separate PR. I rebased main and ran tests again

ArthurZucker

Thanks for cleaning up

ArthurZucker · 2024-10-29T10:25:21Z

src/transformers/models/blip_2/modeling_blip_2.py

-            logits = outputs.logits if return_dict else outputs[1]
+            loss = outputs.loss
+            logits = outputs.logits
+            outputs = outputs.to_tuple() if not return_dict else outputs


outputs will have loss and logits twice there

(this was probably already a bug)

yep, in general i don't like that we return it as this and would better return unwrapped lm outputs. But we can't prob just delete it for BC reasons

ArthurZucker · 2024-10-29T10:26:48Z

tests/models/instructblip/test_modeling_instructblip.py

+        models_without_standard_cache = (
+            "ctrl",
+            "fsmt",
+            "gptbigcode",
+            "mega",
+            "reformer",
+            "jamba",
+            "mamba",
+            "xlnet",
+            "zamba",
+        )
+        has_standard_cache = not any(
+            model_name in config.__class__.__name__.lower() for model_name in models_without_standard_cache
+        )
+        if has_standard_cache:


when we overwrite we don't need that no?

yeah, will remove unnecessary part

* blip2 tests * instructblips * copies * fix slow tests * fix * uncomment this * clean up after rebase * should be model main input * fix overwritten tests * oops len should be multiple of frame number * style * fix some tests

zucchini-nlp added 4 commits October 15, 2024 12:52

blip2 tests

64873ab

instructblips

23d2e15

copies

8cf7507

Merge remote-tracking branch 'upstream/main' into blip-tests

b0ab34a

zucchini-nlp mentioned this pull request Oct 15, 2024

Track progress for VLMs refactoring #33374

Closed

16 tasks

fix slow tests

8dcb4fb

fix

8cfabfe

zucchini-nlp requested review from ArthurZucker and gante and removed request for gante October 15, 2024 12:50

uncomment this

f0aff4f

ArthurZucker reviewed Oct 16, 2024

View reviewed changes

zucchini-nlp added 5 commits October 21, 2024 12:27

merge main

87bfdb6

clean up after rebase

c39c5ed

should be model main input

37d25b1

fix overwritten tests

ce467a0

oops len should be multiple of frame number

95f6b76

zucchini-nlp added 2 commits October 25, 2024 10:35

Merge branch 'main' into blip-tests

adf7f40

style

63756d6

ArthurZucker approved these changes Oct 29, 2024

View reviewed changes

zucchini-nlp added 2 commits November 1, 2024 07:57

Merge branch 'main' into blip-tests

bfa947d

fix some tests

3fec510

zucchini-nlp merged commit 4cc0813 into huggingface:main Nov 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BLIP: enable generation tests#34174

BLIP: enable generation tests#34174
zucchini-nlp merged 16 commits intohuggingface:mainfrom
zucchini-nlp:blip-tests

zucchini-nlp commented Oct 15, 2024 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2024

Uh oh!

gante commented Oct 16, 2024 •

edited

Loading

Uh oh!

zucchini-nlp commented Oct 16, 2024 •

edited

Loading

Uh oh!

ArthurZucker left a comment

Uh oh!

gante commented Oct 17, 2024

Uh oh!

zucchini-nlp commented Oct 21, 2024

Uh oh!

ArthurZucker left a comment

Uh oh!

ArthurZucker Oct 29, 2024

Uh oh!

ArthurZucker Oct 29, 2024

Uh oh!

zucchini-nlp Oct 29, 2024

Uh oh!

ArthurZucker Oct 29, 2024

Uh oh!

zucchini-nlp Oct 29, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zucchini-nlp commented Oct 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

HuggingFaceDocBuilderDev commented Oct 15, 2024

Uh oh!

gante commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zucchini-nlp commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

gante commented Oct 17, 2024

Uh oh!

zucchini-nlp commented Oct 21, 2024

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

ArthurZucker Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

zucchini-nlp Oct 29, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zucchini-nlp commented Oct 15, 2024 •

edited

Loading

gante commented Oct 16, 2024 •

edited

Loading

zucchini-nlp commented Oct 16, 2024 •

edited

Loading