add fuyu fast image processors #41817

DeXtAr47-oss · 2025-10-23T14:43:41Z

What does this PR do?

This PR introduces FuyuImageProcessorFast, providing a faster alternative to the original FuyuImageProcessor by leveraging torchvision for image transformations.

Key changes include:

Implementation of FuyuImageProcessorFast inheriting from BaseImageProcessorFast.
Updates to tests/models/fuyu/test_image_processing_fuyu.py to include the fast processor, override save/load tests and fixed the image height and width in test_preprocess_with_tokenizer_info have been updated to values divisible by 30 (180x300), ensuring compatibility with FuyuImageProcessorFast and avoiding ValueError: image_height must be divisible by 30. All Fuyu image processing tests now pass.
Addition of documentation for FuyuImageProcessorFast

Fixes #36978

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. Was this discussed/approved via a Github issue or the forum? [Contributions Welcome] Add Fast Image Processors [Contributions Welcome] Add Fast Image Processors #36978]([Contributions Welcome] Add Fast Image Processors #36978)
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@yonigozlan

…r47-oss/transformers into fuyu-fast-image-processors

…uggingface#36978)

molbap

Thanks for taking this on! Left an initial review

molbap · 2025-10-30T17:07:02Z

tests/models/fuyu/test_image_processing_fuyu.py

+        # Convert to torch tensor for fast processor
+        sample_tensor = torch.from_numpy(self.sample_image).permute(2, 0, 1).float()
+        # (h:450, w:210) fitting (160, 320) -> (160, 210*160/450) = (160, 74.67) -> (160, 74)
+        from transformers.image_utils import SizeDict


No imports in the functions

molbap · 2025-10-30T17:07:32Z

src/transformers/models/fuyu/image_processing_fuyu_fast.py

@@ -0,0 +1,453 @@
+# coding=utf-8
+# Copyright 2024 The HuggingFace Inc. team. All rights reserved.


Suggested change

# Copyright 2024 The HuggingFace Inc. team. All rights reserved.

# Copyright 2025 The HuggingFace Inc. team. All rights reserved.

molbap · 2025-10-30T17:08:26Z

src/transformers/models/fuyu/image_processing_fuyu_fast.py

+    def _further_process_kwargs(
+        self,
+        patch_size: Optional[dict[str, int]] = None,
+        **kwargs,
+    ) -> dict:
+        """
+        Process Fuyu-specific kwargs before validation.
+        """
+        kwargs = super()._further_process_kwargs(**kwargs)
+        if patch_size is not None:
+            patch_size = SizeDict(**get_size_dict(patch_size, param_name="patch_size"))
+        kwargs["patch_size"] = patch_size
+        return kwargs


can we refactor this and avoid the call to super?

umm yes we could do that, i would work on it.

molbap · 2025-10-30T17:09:12Z

src/transformers/models/fuyu/image_processing_fuyu_fast.py

+class FuyuImagesKwargs(ImagesKwargs, total=False):
+    """Keyword arguments for Fuyu image processing."""
+
+    patch_size: Optional[SizeDict]


isn't that in the base kwargs? not sure

no this was not present in the BaseImageProcessorFast, patch_size is more specific towards fuyu patching mechanism so i feel like adding it

molbap · 2025-10-30T17:09:34Z

tests/models/fuyu/test_image_processing_fuyu.py

+
+    def test_do_not_resize_if_smaller(self):
+        """Test that images smaller than target size are not resized."""
+        from transformers.image_utils import SizeDict


to import on top

Suggested change

from transformers.image_utils import SizeDict

molbap · 2025-10-30T17:11:18Z

tests/models/fuyu/test_image_processing_fuyu.py

 @require_vision
 @require_torchvision
-class TestFuyuImageProcessor(unittest.TestCase):
+class TestFuyuImageProcessorFast(unittest.TestCase):


If we update this test suite, it would be nice to inherit finally from ImageProcessingTestMixin here!

…r47-oss/transformers into fuyu-fast-image-processors

…cessors

yonigozlan

Hey @DeXtAr47-oss ! Thanks a lot for working on this, and great to see that you added many tests!
I cleaned up a bit the fast processor file to use the optimized group processing methods from BaseImageProcessorFast, and other small things to get the CI to pass.
LGTM once the CI is all green!

HuggingFaceDocBuilderDev · 2025-11-03T20:17:40Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

github-actions · 2025-11-04T15:34:20Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, fuyu

* added fast processor for fuyu (huggingface#36978) * updated docs for fuyu model (huggingface#36978) * updated test_image_processing and image_processing_fuyu_fast * updated fuyu.md and image_processing_fuyu_fast (huggingface#36978) * updated test_image_processing_fuyu (huggingface#36978) * formatted image_processing_fuyu_fast and test_image_processing_fuyu (huggingface#36978) * updated tests and fuyu fast image processing (huggingface#36978) * Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtAr47-oss/transformers into fuyu-fast-image-processors * fixed format (huggingface#36978) * formatted files (huggingface#36978) * formatted files * revert unnecessary changes * clean up and process by group --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* remove attributes and add all missing sub processors to their auto classes * remove all mentions of .attributes * cleanup * fix processor tests * fix modular * remove last attributes * fixup * fixes after merge * fix wrong tokenizer in auto florence2 * fix missing audio_processor + nits * Override __init__ in NewProcessor and change hf-internal-testing-repo (temporarily) * fix auto tokenizer test * add init to markup_lm * update CustomProcessor in custom_processing * remove print * nit * fix test modeling owlv2 * fix test_processing_layoutxlm * Fix owlv2, wav2vec2, markuplm, voxtral issues * add support for loading and saving multiple tokenizer natively * remove exclude_attributes from save_pretrained * Run slow v2 (#41914) * Super * Super * Super * Super --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix `detectron2` installation in docker files (#41975) * detectron2 - part 1 * detectron2 - part 2 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix `autoawq[kernels]` installation in quantization docker file (#41978) fix autoawq[kernels] Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * add support for saving encoder only so any parakeet model can be loaded for inference (#41969) * add support for saving encoder only so any decoder model can be loaded Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * use convolution_bias * convert modular * convolution_bias in convertion script --------- Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> * Use indices as position_ids in modernebert (#41789) * Use indices as position_ids in modernebert * Move position_ids init to the branch * test tensor parallel: make tests for dense model more robust (#41968) * make test forward and backward more robust * refactor compile part of test tensor parallel * linting * pass rank around instead of calling it over and over * Run slow v2 (#41914) * Super * Super * Super * Super --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix `detectron2` installation in docker files (#41975) * detectron2 - part 1 * detectron2 - part 2 --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Fix `autoawq[kernels]` installation in quantization docker file (#41978) fix autoawq[kernels] Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * add support for saving encoder only so any parakeet model can be loaded for inference (#41969) * add support for saving encoder only so any decoder model can be loaded Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> * use convolution_bias * convert modular * convolution_bias in convertion script --------- Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> --------- Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> * fix: dict[RopeParameters] to dict[str, RopeParameters] (#41963) * docs: add continuous batching page (#41847) * docs: add continuous batching page * docs(cb): add `generate_batch` example * docs(cb): add `opentelemtry` and `serving` section * feat: add `TODO` note about opentelemetry dependency * docs(cb): add supported features * docs(cb): add unsupported features * docs(cb): add `ContinuousBatchingManager` example * docs(cb): x reference CB in optimizing inference * Fix `torchcodec` version in quantization docker file (#41988) check Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * [kernels] Add Tests & CI for kernels (#41765) * first commit * add tests * add kernel config * add more tests * add ci * small fix * change branch name * update tests * nit * change test name * revert jobs * addressing review * reenable all jobs * address second review * Move the Mi355 to regular docker (#41989) * Move the Mi355 to regular docker * Disable gfx950 compilation for FA on AMD * More data in benchmarking (#41848) * Reduce scope of cross-generate * Rm generate_sall configs * Workflow benchmarks more * Prevent crash when FA is not installed * fix (CI): Refactor SSH runners (#41991) * Change ssh runner type * Add wait step to SSH runner workflow * Rename wait step to wait2 in ssh-runner.yml * Remove wait step from ssh-runner.yml Removed the wait step from the SSH runner workflow. * Update runner type for single GPU A10 instance * Update SSH runner version to 1.90.3 * Add sha256sum to ssh-runner workflow * Update runner type and remove unused steps * fix 3 failed test cases for video_llama_3 model on Intel XPU (#41931) * fix 3 failed test cases for video_llama_3 model on Intel XPU Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * adjust format Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * update code Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> --------- Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> * Integrate colqwen2.5 using colqwen2 modelling code (#40600) * adding option for 2.5 * minor - arg in conversion script * getting started on modelling.py * minor - shouldve been using modular * adressing comments + fixing datatype/device _get method * minor * commiting suggestion Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * docs + first test * ruff fix * minor fix * ruff fix * model fix * model fix * fine-grained check, with a hardcoded score from the original Hf implementation. * minor ruff * update tests values with CI hardware * adding 2.5 to conversion script * Apply style fixes --------- Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Fixed wrong padding value in OWLv2 (#41938) * Update image_processing_owlv2_fast.py fixed padding value * fixed padding value * Change padding constant value from 0.5 to 0.0 * Fixed missed padding value in modular_owlv2.py --------- Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * Fix `run slow v2`: empty report when there is only one model (#42002) fix Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * [kernels] change import time in KernelConfig (#42004) * change import time * style * DOC Fix typo in argument name: pseudoquant (#41994) The correct argument name is pseudoquantization. Since there is no error on passing wrong arguments name (which is arguably an anti-pattern), this is difficult for users to debug. * Fix `torch+deepspeed` docker file (#41985) * fix * delete --------- Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> * Correct syntax error in trainer.md (#42001) A comma is missing between two parameters in the signature of compute_loss function. * Reduce the number of benchmark in the CI (#42008) Changed how benchmark cfgs are chosen * Fix continuous batching tests (#42012) * Fix continuous batching tests * make fixup * add back `logging_dir` (#42013) * add back * Apply style fixes --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> * Fix issue with from pretrained and kwargs in image processors (#41997) * accept kwargs in image proc from_pretrained * only use kwargs that are in cls.valid_kwargs * remove specific logic for _from_auto * add image_seq_length to Images_kwargs for backward compatibility * fix missing image kwargs in pix2struct * Fix default image_rows and image_cols initialization in Idefics3 and SmolVLM processors (#41871) * Fix default image_rows and image_cols initialization in Idefics3 and SmolVLM processors * Fix default initialization of image_rows and image_cols in Idefics3 and SmolVLM processors * Add GLPNImageProcessorFast (#41725) * Add GLPNImageProcessorFast for torch backend * Address review feedback - Simplified to_dict() method - Keep tensors as torch instead of converting to numpy for heterogeneous shapes - Removed unnecessary shape guards in post_process_depth_estimation - Improved variable names (tgt -> target_size, d -> resized) - Removed unnecessary GLPNImageProcessorKwargs class * Address review feedback - Simplified to_dict() method - Keep tensors as torch instead of converting to numpy for heterogeneous shapes - Removed unnecessary shape guards in post_process_depth_estimation - Improved variable names (tgt -> target_size, d -> resized) - Removed unnecessary GLPNImageProcessorKwargs class * commits after 2nd review * Address all review feedback and add explicit batched test - Simplified to_dict() with descriptive variable names (d->output_dict) - Fixed resize operation: changed from crop to proper resize with interpolation - Added padding for heterogeneous batch shapes in both slow and fast processors - Fused rescale and normalize operations for efficiency - Improved all variable names (tgt->target_size, d->depth_4d->resized) - Added GLPNImageProcessorKwargs class in slow processor and imported in fast - Renamed test_equivalence_slow_fast to test_slow_fast_equivalence - Added explicit test_slow_fast_equivalence_batched test - All 20 tests passing * using padding from utils * simplify glpn image processor fast * fix docstring --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com> * add fuyu fast image processors (#41817) * added fast processor for fuyu (#36978) * updated docs for fuyu model (#36978) * updated test_image_processing and image_processing_fuyu_fast * updated fuyu.md and image_processing_fuyu_fast (#36978) * updated test_image_processing_fuyu (#36978) * formatted image_processing_fuyu_fast and test_image_processing_fuyu (#36978) * updated tests and fuyu fast image processing (#36978) * Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtAr47-oss/transformers into fuyu-fast-image-processors * fixed format (#36978) * formatted files (#36978) * formatted files * revert unnecessary changes * clean up and process by group --------- Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co> * [kernels] Fix XPU layernorm kernel (#41583) * fix * add comment * better fix * style * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> --------- Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * [v5] Deprecate Text2Text and related pipelines (#41996) * Deprecate Text2Text and related pipelines * Try a restructure * make fixup * logging -> logger * [FPQuant] MXFP8 and MXFP4 backwards support (#41897) * FP-Quant backwards * fp-quant v0.3.0 docker * availability version bump * fp_quant==0.3.1 * fp_quant v0.3.2 * add working auto_docstring for processors * add auto_docstring to processors first part * add auto_docstring to processors part 2 * modifs after review * fully working auto_docstring and check_docstring with placeholder docstrings * Working check_docstrings for Typed dicts * Add recurring processor args to auto_docstring and add support for removing redundant docstring and placeholders * replace placeholders with real docstrings * fix copies * fixup * remove unwanted changes * fix unprotected imports * Fix unprotected imports * fix unprotected imports * Add __call__ to all docs of processors * nits docs --------- Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com> Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com> Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com> Co-authored-by: ydshieh <ydshieh@users.noreply.github.com> Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com> Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com> Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com> Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com> Co-authored-by: Ferdinand Mom <47445085+3outeille@users.noreply.github.com> Co-authored-by: Ryan Mullins <ryanmullins@google.com> Co-authored-by: Luc Georges <McPatate@users.noreply.github.com> Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com> Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com> Co-authored-by: kaixuanliu <kaixuan.liu@intel.com> Co-authored-by: Sahil Kabir <66221472+sahil-kabir@users.noreply.github.com> Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: James <67161633+gjamesgoenawan@users.noreply.github.com> Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com> Co-authored-by: Yacklin Wong <139425274+Yacklin@users.noreply.github.com> Co-authored-by: Matt <Rocketknight1@users.noreply.github.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: MilkClouds <claude@maum.ai> Co-authored-by: ARAVINDHAN T <arvindhant01@gmail.com> Co-authored-by: Pritam Das <79273068+DeXtAr47-oss@users.noreply.github.com> Co-authored-by: Andrei Panferov <andrei@panferov.org>

DeXtAr47-oss added 10 commits October 23, 2025 18:20

added fast processor for fuyu (huggingface#36978)

50d486f

updated docs for fuyu model (huggingface#36978)

85c2898

Merge branch 'main' into fuyu-fast-image-processors

6d871e7

Merge branch 'main' into fuyu-fast-image-processors

d63240b

updated test_image_processing and image_processing_fuyu_fast

88bd49e

Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtA…

0f66904

…r47-oss/transformers into fuyu-fast-image-processors

updated fuyu.md and image_processing_fuyu_fast (huggingface#36978)

2b1c894

updated test_image_processing_fuyu (huggingface#36978)

483b2ea

formatted image_processing_fuyu_fast and test_image_processing_fuyu (h…

11169cf

…uggingface#36978)

Merge branch 'main' into fuyu-fast-image-processors

6f80564

molbap self-requested a review October 30, 2025 14:37

molbap reviewed Oct 30, 2025

View reviewed changes

DeXtAr47-oss and others added 13 commits November 1, 2025 18:48

updated tests and fuyu fast image processing (huggingface#36978)

0efa5dc

Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtA…

c8aac5a

…r47-oss/transformers into fuyu-fast-image-processors

Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtA…

f1658b4

…r47-oss/transformers into fuyu-fast-image-processors

Merge branch 'main' into fuyu-fast-image-processors

37315bd

fixed format (huggingface#36978)

da7477a

Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtA…

5b5a8fa

…r47-oss/transformers into fuyu-fast-image-processors

formatted files (huggingface#36978)

1ab87a1

Merge branch 'main' into fuyu-fast-image-processors

f94f7ae

formatted files

74c6235

Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtA…

c6a9837

…r47-oss/transformers into fuyu-fast-image-processors

Merge remote-tracking branch 'upstream/main' into fuyu-fast-image-pro…

0702f9f

…cessors

revert unnecessary changes

224ef97

clean up and process by group

4e90a3a

yonigozlan approved these changes Nov 3, 2025

View reviewed changes

yonigozlan enabled auto-merge (squash) November 3, 2025 21:24

yonigozlan added 2 commits November 3, 2025 18:31

Merge branch 'main' into fuyu-fast-image-processors

f2b7bf8

Merge branch 'main' into fuyu-fast-image-processors

de29121

yonigozlan merged commit 325810e into huggingface:main Nov 4, 2025
23 checks passed

yonigozlan mentioned this pull request Nov 4, 2025

[Contributions Welcome] Add Fast Image Processors #36978

Closed

81 tasks

		@@ -0,0 +1,453 @@
		# coding=utf-8
		# Copyright 2024 The HuggingFace Inc. team. All rights reserved.

	# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
	# Copyright 2025 The HuggingFace Inc. team. All rights reserved.

add fuyu fast image processors #41817

add fuyu fast image processors #41817

Uh oh!

Conversation

DeXtAr47-oss commented Oct 23, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

molbap left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DeXtAr47-oss Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yonigozlan left a comment

Choose a reason for hiding this comment

Uh oh!

HuggingFaceDocBuilderDev commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

DeXtAr47-oss Oct 30, 2025 •

edited

Loading