Skip to content

[Tests] Fix inputs placement#42963

Merged
zucchini-nlp merged 4 commits into
huggingface:mainfrom
vasqu:fix-params-test
Dec 22, 2025
Merged

[Tests] Fix inputs placement#42963
zucchini-nlp merged 4 commits into
huggingface:mainfrom
vasqu:fix-params-test

Conversation

@vasqu

@vasqu vasqu commented Dec 19, 2025

Copy link
Copy Markdown
Contributor

As per title, we init our model on CPU but the inputs are not guaranteed to be on CPU as well. Forcing CPU here

Related failure: https://github.com/huggingface/transformers/actions/runs/20375489555/job/58553450832

@vasqu vasqu requested a review from Cyrilvallez December 19, 2025 16:44
@HuggingFaceDocBuilderDev

Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@vasqu vasqu mentioned this pull request Dec 19, 2025
2 tasks
vasqu added a commit that referenced this pull request Dec 19, 2025
* more attention cleanup

* llama like text attention

* generates different text but cos and sin tensors are always close - 1e-8

* another round of rope fixups

* yea, gonna check tomorrow cant cheat w freqs for whatever reason

* NOTE: last time where comp with old rope

* rope cleanup

* more rope

* somewhat clean 3d rope with attn - sin / cos has very small diffs to original formula (torch.allclose always True) leading to slightly different generations

* new rope type

* style

* attempt at moe, gonna need a deeper look

* cleanup gate

* more cleaning

* NOTE remove attempt at moe for now

* another round of cleanups

* whoops

* we back boys, reattempting moe start

* moe should be done with this

* cleanup

* more cleanup

* nits

* add conversion and adjust code accordingly

* fix

* make moe copyable as far as we can

* cleanup conversion a bit, next config

* cleanup config part1

* small removal of unused things

* config conversion, rope type doesnt get loaded tho...

* fix rope

* last hardcoded values

* remove unnecessary class

* starting to make copies available for vision, vision rope refactor tomorrow

* vl rope changes

* simplify variable resolution resampler

* nit

* conversion update

* more conversions, standardization, and big dtype fix!

* remove some docs (tmp), focus on code for me

* oops

* nit

* fixup embeddings, add todos

* more cleanup

* more cleanup, next caching changes

* revert fp16, internally discussed weights are supposed to be bf16

* fix rope (a bit), prepare cache logic changes

* more prep for cache

* cache class is used, fixup some flags

* modular refactor

* partially docstrings, docs, etc

* cleaner order

* nit

* fix config

* remove old artefacts/todos

* sync with remote and add some todos for orientation

* remove img process dep on modeling code

* image processor with a few diffs highlighted to copy from maybe

* fast img processor version

* modular image processors

* convert tokenizer to have dedicated video placeholder token

* before i forget

* a modular bug :/

* more processor things, some modular adjustments

* remove dependency on token type ids

* position ids ala qwen vl and modular is bugging

* fixup some inheritances + nits

* token type ids

* moe loss, docs, simplify pos ids

* align some feature getters

* docs

* rename conv -> merge aka our naming convention

* style

* fixup tokenizer class in auto

* no more nn sequential

* fix chat template, fix tokenizer conversion, modular bug

* remove this

* remove old deps (from the remote processor)

* whoops

* argh

* todo, restarting progress tomorrow

* fast image processor changes output, keeping slow for now

* NOTE rm debugging code on processor conversion

* first complete conversion script version, todo on whether to use fast processor

* config docs

* image processor tests, only kept to images as videos need different resolutions

* processor tests

* first ish version for video processor, very much WIP tho

* sync with main and all the changes that happened, fix ernie moe bug in dtype casting

* mini style fix

* vid processor is properly separated now

* make vid processor its own thing

* style

* video processing and cleanups, img processing done, processing needs one TODO, vid processing needs tests

* readd vid patch fn

* make 4D RoPE possible if manually passed

* simplify the msg on packing, allow external prep but not internal one

* nit

* revert general changes video utils, make it specific to ernie, fixup tests

* vid to auto

* left to check: pos ids (rope) + token type ids

* move token type ids to processor, fix processor to ernie logic

TODOs: tests, tests, tests

* processor fixes, conversion todo for fast img processor

TODOs: tests for vid processor and modeling

* fix

* video processor tests, torch compile does not work due to PIL drawing being needed

* fix config consistency

* style

* wip tests

* fix most tests, 2 failing ones remain

* fix last tests

* check

* docs consistency

* fix conversion script, more docs

* optional drawing on frames, style

* add error on compile x draw on frames

* fix

* fix

* change font loading to hub dep with default font

* fix config try 2

* fix diff resolution, tests (not fast processor, a100)

* fix test

* style

* torch 2.9 (fa2 untested, video from 2.6)

* raushan's review (part 1)

* Update docs/source/en/model_doc/ernie4_5_vl.md

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Pablo's review

* style

* fix device/dtype stuff that is no longer needed

* revert vision property rm, necessary for composite sdpa test

* fixup few smaller things + refactor how we load the font entirely (based on font name with expected associated file at same repo)

* remove bc min max pixels --> less modular on processor parts but way cleaner code

* fix fps and add fixme to the inefficient conversion stuff

* rope

* style

* copies and last rope stuff i fogot

* revert glm4v copies

* fix

* simplify temporal slicing and add more descriptions

* that ":" 😢

* fixup init

* conversion for moe split and merge + general renamings etc -- encountering OOM (automap maybe?)

* wrong order whoops

* style

* copies

* fix init

* fix

* fix

* allow the resolved path to be passed to explicit video processor classes and refactor how we load them for ernie

* simplify

* shoot, I need it there as well

* better err handling

* style

* initial fixes after merge

* working loading version

* cleanup

* change moe order and fix vl version

* reverse op is mapping incorrectly TODO

* reverse loading somewhat works, name conversion has issues it seems 👀

* fix renaming issue, slow tests pass (except the integration ones ~ expected due to fused weights)

* conversion mapping with native features + remove conversion mapping restriction

* add test for new conversion

* style

* update conversion

* fix integration tests, remove fa tests

* fix

* update docs a bit

* style

* fix ernie moe and routing ernie series

* style

* fix rope warning

* i fucked up again pain

* update expectations

* remove EP, broken atm be it sole or in combination with TP

* update docs a bit

* first part of addressing review comments

* fixup

* fix vid processor

* fix font saving

* readd decorators oops

* add mm token type id shortcut

* always compose mm token type ids if needed

* move config to modular

* fix loading by enforcing correct order

* fix

* address first bunch of comments

* smaller comments

* let's make moe layer types, ill fix modular in a second

* modular

* style

* renamed version along a few fixes in conversion and processor tests

* fix

* style + decorator

* fix tokenizer handling of additional special tokens

* style

* fix doc refs

* test fix

* fix

* was this too breaking?

* fix conversion via workaround for now

* post merge fix

* revert a few tok things (additional_special_tokens), updated conversion

* fix video processing loading logic

add exception for auto class (reload config as we have a circular dep on finding which class we have, i.e. we need to load to find the class then load with specific logic)

remove some original ideas

* style

* processor path change

* add small dummy integration tests

* style

* fix rope modeling to follow qwen2 vl instead + change auto loading to specifically load via pretrained (overridable from pretrained for auto classes)

* seems to be skipped in other similar vlms

* small conversion updates and adjust max vram usage during the big integration test

* update test paths

* style

* style attmpt 2

* docs

* trigger ci

* review

* post merge fixes

* fix

* safety

* fix test

* style

* oops

* fix

* ...

* simplify the config init for moe pattern

* gonna be fixed by #42963

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

@zucchini-nlp zucchini-nlp left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Facing the same issue when running tests locally, thanks for the fix

@zucchini-nlp zucchini-nlp merged commit 007274d into huggingface:main Dec 22, 2025
25 checks passed
@vasqu vasqu deleted the fix-params-test branch December 22, 2025 09:27
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* more attention cleanup

* llama like text attention

* generates different text but cos and sin tensors are always close - 1e-8

* another round of rope fixups

* yea, gonna check tomorrow cant cheat w freqs for whatever reason

* NOTE: last time where comp with old rope

* rope cleanup

* more rope

* somewhat clean 3d rope with attn - sin / cos has very small diffs to original formula (torch.allclose always True) leading to slightly different generations

* new rope type

* style

* attempt at moe, gonna need a deeper look

* cleanup gate

* more cleaning

* NOTE remove attempt at moe for now

* another round of cleanups

* whoops

* we back boys, reattempting moe start

* moe should be done with this

* cleanup

* more cleanup

* nits

* add conversion and adjust code accordingly

* fix

* make moe copyable as far as we can

* cleanup conversion a bit, next config

* cleanup config part1

* small removal of unused things

* config conversion, rope type doesnt get loaded tho...

* fix rope

* last hardcoded values

* remove unnecessary class

* starting to make copies available for vision, vision rope refactor tomorrow

* vl rope changes

* simplify variable resolution resampler

* nit

* conversion update

* more conversions, standardization, and big dtype fix!

* remove some docs (tmp), focus on code for me

* oops

* nit

* fixup embeddings, add todos

* more cleanup

* more cleanup, next caching changes

* revert fp16, internally discussed weights are supposed to be bf16

* fix rope (a bit), prepare cache logic changes

* more prep for cache

* cache class is used, fixup some flags

* modular refactor

* partially docstrings, docs, etc

* cleaner order

* nit

* fix config

* remove old artefacts/todos

* sync with remote and add some todos for orientation

* remove img process dep on modeling code

* image processor with a few diffs highlighted to copy from maybe

* fast img processor version

* modular image processors

* convert tokenizer to have dedicated video placeholder token

* before i forget

* a modular bug :/

* more processor things, some modular adjustments

* remove dependency on token type ids

* position ids ala qwen vl and modular is bugging

* fixup some inheritances + nits

* token type ids

* moe loss, docs, simplify pos ids

* align some feature getters

* docs

* rename conv -> merge aka our naming convention

* style

* fixup tokenizer class in auto

* no more nn sequential

* fix chat template, fix tokenizer conversion, modular bug

* remove this

* remove old deps (from the remote processor)

* whoops

* argh

* todo, restarting progress tomorrow

* fast image processor changes output, keeping slow for now

* NOTE rm debugging code on processor conversion

* first complete conversion script version, todo on whether to use fast processor

* config docs

* image processor tests, only kept to images as videos need different resolutions

* processor tests

* first ish version for video processor, very much WIP tho

* sync with main and all the changes that happened, fix ernie moe bug in dtype casting

* mini style fix

* vid processor is properly separated now

* make vid processor its own thing

* style

* video processing and cleanups, img processing done, processing needs one TODO, vid processing needs tests

* readd vid patch fn

* make 4D RoPE possible if manually passed

* simplify the msg on packing, allow external prep but not internal one

* nit

* revert general changes video utils, make it specific to ernie, fixup tests

* vid to auto

* left to check: pos ids (rope) + token type ids

* move token type ids to processor, fix processor to ernie logic

TODOs: tests, tests, tests

* processor fixes, conversion todo for fast img processor

TODOs: tests for vid processor and modeling

* fix

* video processor tests, torch compile does not work due to PIL drawing being needed

* fix config consistency

* style

* wip tests

* fix most tests, 2 failing ones remain

* fix last tests

* check

* docs consistency

* fix conversion script, more docs

* optional drawing on frames, style

* add error on compile x draw on frames

* fix

* fix

* change font loading to hub dep with default font

* fix config try 2

* fix diff resolution, tests (not fast processor, a100)

* fix test

* style

* torch 2.9 (fa2 untested, video from 2.6)

* raushan's review (part 1)

* Update docs/source/en/model_doc/ernie4_5_vl.md

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Pablo's review

* style

* fix device/dtype stuff that is no longer needed

* revert vision property rm, necessary for composite sdpa test

* fixup few smaller things + refactor how we load the font entirely (based on font name with expected associated file at same repo)

* remove bc min max pixels --> less modular on processor parts but way cleaner code

* fix fps and add fixme to the inefficient conversion stuff

* rope

* style

* copies and last rope stuff i fogot

* revert glm4v copies

* fix

* simplify temporal slicing and add more descriptions

* that ":" 😢

* fixup init

* conversion for moe split and merge + general renamings etc -- encountering OOM (automap maybe?)

* wrong order whoops

* style

* copies

* fix init

* fix

* fix

* allow the resolved path to be passed to explicit video processor classes and refactor how we load them for ernie

* simplify

* shoot, I need it there as well

* better err handling

* style

* initial fixes after merge

* working loading version

* cleanup

* change moe order and fix vl version

* reverse op is mapping incorrectly TODO

* reverse loading somewhat works, name conversion has issues it seems 👀

* fix renaming issue, slow tests pass (except the integration ones ~ expected due to fused weights)

* conversion mapping with native features + remove conversion mapping restriction

* add test for new conversion

* style

* update conversion

* fix integration tests, remove fa tests

* fix

* update docs a bit

* style

* fix ernie moe and routing ernie series

* style

* fix rope warning

* i fucked up again pain

* update expectations

* remove EP, broken atm be it sole or in combination with TP

* update docs a bit

* first part of addressing review comments

* fixup

* fix vid processor

* fix font saving

* readd decorators oops

* add mm token type id shortcut

* always compose mm token type ids if needed

* move config to modular

* fix loading by enforcing correct order

* fix

* address first bunch of comments

* smaller comments

* let's make moe layer types, ill fix modular in a second

* modular

* style

* renamed version along a few fixes in conversion and processor tests

* fix

* style + decorator

* fix tokenizer handling of additional special tokens

* style

* fix doc refs

* test fix

* fix

* was this too breaking?

* fix conversion via workaround for now

* post merge fix

* revert a few tok things (additional_special_tokens), updated conversion

* fix video processing loading logic

add exception for auto class (reload config as we have a circular dep on finding which class we have, i.e. we need to load to find the class then load with specific logic)

remove some original ideas

* style

* processor path change

* add small dummy integration tests

* style

* fix rope modeling to follow qwen2 vl instead + change auto loading to specifically load via pretrained (overridable from pretrained for auto classes)

* seems to be skipped in other similar vlms

* small conversion updates and adjust max vram usage during the big integration test

* update test paths

* style

* style attmpt 2

* docs

* trigger ci

* review

* post merge fixes

* fix

* safety

* fix test

* style

* oops

* fix

* ...

* simplify the config init for moe pattern

* gonna be fixed by huggingface#42963

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
SangbumChoi pushed a commit to SangbumChoi/transformers that referenced this pull request Jan 23, 2026
* fix

* more careful about the items

* oops

* ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants