Fix non FA2 tests after FA2 installed in CI docker image#40430
Fix non FA2 tests after FA2 installed in CI docker image#40430
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
| model = Glm4vForConditionalGeneration.from_pretrained( | ||
| "THUDM/GLM-4.1V-9B-Thinking", dtype=torch.float16, device_map="auto" | ||
| ) | ||
| questions = ["Describe this video."] * 2 |
There was a problem hiding this comment.
don't use batch 2, otherwise OOM. It's fine to simply to test batch 1 here, the goal is to check if video works
There was a problem hiding this comment.
Hmm, we have the same for images at
transformers/tests/models/glm4v/test_modeling_glm4v.py
Lines 314 to 370 in 8828b2e
Would it be possible to have something similar for videos? Fine with this tho as well, just not sure if batching x videos could have some different issues than batching x image
There was a problem hiding this comment.
I will just leave as it is. Don't have enough bandwidth. It never pass anyway.
| model_id = "mistralai/Mistral-7B-v0.1" | ||
| EXPECTED_COMPLETIONS = [ | ||
| "This is a nice place. This is a nice place. This is a nice place. This is", | ||
| "scenery, scenery, scenery, scenery, scenery,", |
There was a problem hiding this comment.
due to the 800 --> 682 below
| if attn_implementation in ["flex_attention", "eager"]: | ||
| input_text = input_text[:1] |
There was a problem hiding this comment.
eager still OOM with 682, just make it batch size 1
| @pytest.mark.flash_attn_test | ||
| def test_model_600m_long_prompt(self): | ||
| EXPECTED_OUTPUT_TOKEN_IDS = [306, 338] | ||
| EXPECTED_OUTPUT_TOKEN_IDS = [198, 198] |
There was a problem hiding this comment.
never run before. The sdpa version of this test already use this 198
There was a problem hiding this comment.
Maybe move to same test and parametrize instead? Looks like another parity check between sdpa and flash that was hidden
There was a problem hiding this comment.
look at the 2 tests, one is doing more work than another and loading is also different (4-bit or not). I will simply keep as they are
There was a problem hiding this comment.
Ic, makes sense didnt look in detail myself ^^
vasqu
left a comment
There was a problem hiding this comment.
LGTM overall! Just some smaller things and good to see that we now run these hidden tests instead
| model = Glm4vForConditionalGeneration.from_pretrained( | ||
| "THUDM/GLM-4.1V-9B-Thinking", dtype=torch.float16, device_map="auto" | ||
| ) | ||
| questions = ["Describe this video."] * 2 |
There was a problem hiding this comment.
Hmm, we have the same for images at
transformers/tests/models/glm4v/test_modeling_glm4v.py
Lines 314 to 370 in 8828b2e
Would it be possible to have something similar for videos? Fine with this tho as well, just not sure if batching x videos could have some different issues than batching x image
| model_id = "mistralai/Mistral-7B-v0.1" | ||
| EXPECTED_COMPLETIONS = [ | ||
| "This is a nice place. This is a nice place. This is a nice place. This is", | ||
| "scenery, scenery, scenery, scenery, scenery,", |
| if attn_implementation in ["flex_attention", "eager"]: | ||
| input_text = input_text[:1] |
| @pytest.mark.flash_attn_test | ||
| def test_model_600m_long_prompt(self): | ||
| EXPECTED_OUTPUT_TOKEN_IDS = [306, 338] | ||
| EXPECTED_OUTPUT_TOKEN_IDS = [198, 198] |
There was a problem hiding this comment.
Maybe move to same test and parametrize instead? Looks like another parity check between sdpa and flash that was hidden
|
[For maintainers] Suggested jobs to run (before merge) run-slow: glm4v, mistral, qwen2, qwen3 |
What does this PR do?