Skip to content

Conversation

@DeXtAr47-oss
Copy link
Contributor

What does this PR do?

This PR introduces FuyuImageProcessorFast, providing a faster alternative to the original FuyuImageProcessor by leveraging torchvision for image transformations.

Key changes include:

  • Implementation of FuyuImageProcessorFast inheriting from BaseImageProcessorFast.
  • Updates to tests/models/fuyu/test_image_processing_fuyu.py to include the fast processor, override save/load tests and fixed the image height and width in test_preprocess_with_tokenizer_info have been updated to values divisible by 30 (180x300), ensuring compatibility with FuyuImageProcessorFast and avoiding ValueError: image_height must be divisible by 30. All Fuyu image processing tests now pass.
  • Addition of documentation for FuyuImageProcessorFast

Fixes #36978

Before submitting

Who can review?

@yonigozlan

@molbap molbap self-requested a review October 30, 2025 14:37
Copy link
Contributor

@molbap molbap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! Left an initial review

# Convert to torch tensor for fast processor
sample_tensor = torch.from_numpy(self.sample_image).permute(2, 0, 1).float()
# (h:450, w:210) fitting (160, 320) -> (160, 210*160/450) = (160, 74.67) -> (160, 74)
from transformers.image_utils import SizeDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No imports in the functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@@ -0,0 +1,453 @@
# coding=utf-8
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright 2024 The HuggingFace Inc. team. All rights reserved.
# Copyright 2025 The HuggingFace Inc. team. All rights reserved.

Comment on lines +438 to +450
def _further_process_kwargs(
self,
patch_size: Optional[dict[str, int]] = None,
**kwargs,
) -> dict:
"""
Process Fuyu-specific kwargs before validation.
"""
kwargs = super()._further_process_kwargs(**kwargs)
if patch_size is not None:
patch_size = SizeDict(**get_size_dict(patch_size, param_name="patch_size"))
kwargs["patch_size"] = patch_size
return kwargs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we refactor this and avoid the call to super?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

umm yes we could do that, i would work on it.

class FuyuImagesKwargs(ImagesKwargs, total=False):
"""Keyword arguments for Fuyu image processing."""

patch_size: Optional[SizeDict]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that in the base kwargs? not sure

Copy link
Contributor Author

@DeXtAr47-oss DeXtAr47-oss Oct 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no this was not present in the BaseImageProcessorFast, patch_size is more specific towards fuyu patching mechanism so i feel like adding it


def test_do_not_resize_if_smaller(self):
"""Test that images smaller than target size are not resized."""
from transformers.image_utils import SizeDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to import on top

Suggested change
from transformers.image_utils import SizeDict

@require_vision
@require_torchvision
class TestFuyuImageProcessor(unittest.TestCase):
class TestFuyuImageProcessorFast(unittest.TestCase):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we update this test suite, it would be nice to inherit finally from ImageProcessingTestMixin here!

Copy link
Member

@yonigozlan yonigozlan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @DeXtAr47-oss ! Thanks a lot for working on this, and great to see that you added many tests!
I cleaned up a bit the fast processor file to use the optimized group processing methods from BaseImageProcessorFast, and other small things to get the CI to pass.
LGTM once the CI is all green!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@yonigozlan yonigozlan enabled auto-merge (squash) November 3, 2025 21:24
@github-actions
Copy link
Contributor

github-actions bot commented Nov 4, 2025

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, fuyu

@yonigozlan yonigozlan merged commit 325810e into huggingface:main Nov 4, 2025
23 checks passed
yonigozlan added a commit to yonigozlan/transformers that referenced this pull request Nov 7, 2025
* added fast processor for fuyu (huggingface#36978)

* updated docs for fuyu model (huggingface#36978)

* updated test_image_processing  and image_processing_fuyu_fast

* updated fuyu.md and image_processing_fuyu_fast (huggingface#36978)

* updated test_image_processing_fuyu (huggingface#36978)

* formatted image_processing_fuyu_fast and test_image_processing_fuyu (huggingface#36978)

* updated tests and fuyu fast image processing (huggingface#36978)

* Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtAr47-oss/transformers into fuyu-fast-image-processors

* fixed format (huggingface#36978)

* formatted files (huggingface#36978)

* formatted files

* revert unnecessary changes

* clean up and process by group

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Abdennacer-Badaoui pushed a commit to Abdennacer-Badaoui/transformers that referenced this pull request Nov 10, 2025
* added fast processor for fuyu (huggingface#36978)

* updated docs for fuyu model (huggingface#36978)

* updated test_image_processing  and image_processing_fuyu_fast

* updated fuyu.md and image_processing_fuyu_fast (huggingface#36978)

* updated test_image_processing_fuyu (huggingface#36978)

* formatted image_processing_fuyu_fast and test_image_processing_fuyu (huggingface#36978)

* updated tests and fuyu fast image processing (huggingface#36978)

* Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtAr47-oss/transformers into fuyu-fast-image-processors

* fixed format (huggingface#36978)

* formatted files (huggingface#36978)

* formatted files

* revert unnecessary changes

* clean up and process by group

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
yonigozlan added a commit that referenced this pull request Jan 8, 2026
* remove attributes and add all missing sub processors to their auto classes

* remove all mentions of .attributes

* cleanup

* fix processor tests

* fix modular

* remove last attributes

* fixup

* fixes after merge

* fix wrong tokenizer in auto florence2

* fix missing audio_processor + nits

* Override __init__ in NewProcessor and change hf-internal-testing-repo (temporarily)

* fix auto tokenizer test

* add init to markup_lm

* update CustomProcessor in custom_processing

* remove print

* nit

* fix test modeling owlv2

* fix test_processing_layoutxlm

* Fix owlv2, wav2vec2, markuplm, voxtral issues

* add support for loading and saving multiple tokenizer natively

* remove exclude_attributes from save_pretrained

* Run slow v2 (#41914)

* Super

* Super

* Super

* Super

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix `detectron2` installation in docker files (#41975)

* detectron2 - part 1

* detectron2 - part 2

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix `autoawq[kernels]` installation in quantization docker file (#41978)

fix autoawq[kernels]

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* add support for saving encoder only so any parakeet model can be loaded for inference (#41969)

* add support for saving encoder only so any decoder model can be loaded

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* use convolution_bias

* convert modular

* convolution_bias in convertion script

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

* Use indices as position_ids in modernebert (#41789)

* Use indices as position_ids in modernebert

* Move position_ids init to the branch

* test tensor parallel: make tests for dense model more robust (#41968)

* make test forward and backward more robust

* refactor compile part of test tensor parallel

* linting

* pass rank around instead of calling it over and over

* Run slow v2 (#41914)

* Super

* Super

* Super

* Super

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix `detectron2` installation in docker files (#41975)

* detectron2 - part 1

* detectron2 - part 2

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Fix `autoawq[kernels]` installation in quantization docker file (#41978)

fix autoawq[kernels]

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* add support for saving encoder only so any parakeet model can be loaded for inference (#41969)

* add support for saving encoder only so any decoder model can be loaded

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>

* use convolution_bias

* convert modular

* convolution_bias in convertion script

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>

* fix: dict[RopeParameters] to dict[str, RopeParameters] (#41963)

* docs: add continuous batching page (#41847)

* docs: add continuous batching page

* docs(cb): add `generate_batch` example

* docs(cb): add `opentelemtry` and `serving` section

* feat: add `TODO` note about opentelemetry dependency

* docs(cb): add supported features

* docs(cb): add unsupported features

* docs(cb): add `ContinuousBatchingManager` example

* docs(cb): x reference CB in optimizing inference

* Fix `torchcodec` version in quantization docker file (#41988)

check

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [kernels] Add Tests & CI for kernels (#41765)

* first commit

* add tests

* add kernel config

* add more tests

* add ci

* small fix

* change branch name

* update tests

* nit

* change test name

* revert jobs

* addressing review

* reenable all jobs

* address second review

* Move the Mi355 to regular docker (#41989)

* Move the Mi355 to regular docker

* Disable gfx950 compilation for FA on AMD

* More data in benchmarking (#41848)

* Reduce scope of cross-generate

* Rm generate_sall configs

* Workflow benchmarks more

* Prevent crash when FA is not installed

* fix (CI): Refactor SSH runners (#41991)

* Change ssh runner type

* Add wait step to SSH runner workflow

* Rename wait step to wait2 in ssh-runner.yml

* Remove wait step from ssh-runner.yml

Removed the wait step from the SSH runner workflow.

* Update runner type for single GPU A10 instance

* Update SSH runner version to 1.90.3

* Add sha256sum to ssh-runner workflow

* Update runner type and remove unused steps

* fix 3 failed test cases for video_llama_3 model on Intel XPU (#41931)

* fix 3 failed test cases for video_llama_3 model on Intel XPU

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* adjust format

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update code

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* Integrate colqwen2.5 using colqwen2 modelling code (#40600)

* adding option for 2.5

* minor - arg in conversion script

* getting started on modelling.py

* minor - shouldve been using modular

* adressing comments + fixing datatype/device _get method

* minor

* commiting suggestion

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* docs + first test

* ruff fix

* minor fix

* ruff fix

* model fix

* model fix

* fine-grained check, with a hardcoded score from the original Hf implementation.

* minor ruff

* update tests values with CI hardware

* adding 2.5 to conversion script

* Apply style fixes

---------

Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>
Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fixed wrong padding value in OWLv2 (#41938)

* Update image_processing_owlv2_fast.py

fixed padding value

* fixed padding value

* Change padding constant value from 0.5 to 0.0

* Fixed missed padding value in modular_owlv2.py

---------

Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* Fix `run slow v2`: empty report when there is only one model (#42002)

fix

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* [kernels] change import time in KernelConfig (#42004)

* change import time

* style

* DOC Fix typo in argument name: pseudoquant (#41994)

The correct argument name is pseudoquantization. Since there is no error
on passing wrong arguments name (which is arguably an anti-pattern),
this is difficult for users to debug.

* Fix `torch+deepspeed` docker file (#41985)

* fix

* delete

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>

* Correct syntax error in trainer.md (#42001)

A comma is missing between two parameters in the signature of compute_loss function.

* Reduce the number of benchmark in the CI (#42008)

Changed how benchmark cfgs are chosen

* Fix continuous batching tests (#42012)

* Fix continuous batching tests

* make fixup

* add back `logging_dir` (#42013)

* add back

* Apply style fixes

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* Fix issue with from pretrained and kwargs in image processors (#41997)

* accept kwargs in image proc from_pretrained

* only use kwargs that are in cls.valid_kwargs

* remove specific logic for _from_auto

* add image_seq_length to Images_kwargs for backward compatibility

* fix missing image kwargs in pix2struct

* Fix default image_rows and image_cols initialization in Idefics3 and SmolVLM processors (#41871)

* Fix default image_rows and image_cols initialization in Idefics3 and SmolVLM processors

* Fix default initialization of image_rows and image_cols in Idefics3 and SmolVLM processors

* Add GLPNImageProcessorFast  (#41725)

* Add GLPNImageProcessorFast for torch backend

* Address review feedback

- Simplified to_dict() method
- Keep tensors as torch instead of converting to numpy for heterogeneous shapes
- Removed unnecessary shape guards in post_process_depth_estimation
- Improved variable names (tgt -> target_size, d -> resized)
- Removed unnecessary GLPNImageProcessorKwargs class

* Address review feedback

- Simplified to_dict() method
- Keep tensors as torch instead of converting to numpy for heterogeneous shapes
- Removed unnecessary shape guards in post_process_depth_estimation
- Improved variable names (tgt -> target_size, d -> resized)
- Removed unnecessary GLPNImageProcessorKwargs class

* commits after 2nd review

* Address all review feedback and add explicit batched test

- Simplified to_dict() with descriptive variable names (d->output_dict)
- Fixed resize operation: changed from crop to proper resize with interpolation
- Added padding for heterogeneous batch shapes in both slow and fast processors
- Fused rescale and normalize operations for efficiency
- Improved all variable names (tgt->target_size, d->depth_4d->resized)
- Added GLPNImageProcessorKwargs class in slow processor and imported in fast
- Renamed test_equivalence_slow_fast to test_slow_fast_equivalence
- Added explicit test_slow_fast_equivalence_batched test
- All 20 tests passing

* using padding from utils

* simplify glpn image processor fast

* fix docstring

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>
Co-authored-by: Yoni Gozlan <74535834+yonigozlan@users.noreply.github.com>

* add fuyu fast image processors (#41817)

* added fast processor for fuyu (#36978)

* updated docs for fuyu model (#36978)

* updated test_image_processing  and image_processing_fuyu_fast

* updated fuyu.md and image_processing_fuyu_fast (#36978)

* updated test_image_processing_fuyu (#36978)

* formatted image_processing_fuyu_fast and test_image_processing_fuyu (#36978)

* updated tests and fuyu fast image processing (#36978)

* Merge branch 'fuyu-fast-image-processors' of https://github.com/DeXtAr47-oss/transformers into fuyu-fast-image-processors

* fixed format (#36978)

* formatted files (#36978)

* formatted files

* revert unnecessary changes

* clean up and process by group

---------

Co-authored-by: yonigozlan <yoni.gozlan@huggingface.co>

* [kernels] Fix XPU layernorm kernel (#41583)

* fix

* add comment

* better fix

* style

* Update src/transformers/modeling_utils.py

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

* [v5] Deprecate Text2Text and related pipelines (#41996)

* Deprecate Text2Text and related pipelines

* Try a restructure

* make fixup

* logging -> logger

* [FPQuant] MXFP8 and MXFP4 backwards support (#41897)

* FP-Quant backwards

* fp-quant v0.3.0 docker

* availability version bump

* fp_quant==0.3.1

* fp_quant v0.3.2

* add working auto_docstring for processors

* add auto_docstring to processors first part

* add auto_docstring to processors part 2

* modifs after review

* fully working auto_docstring and check_docstring with placeholder docstrings

* Working check_docstrings for Typed dicts

* Add recurring processor args to auto_docstring and add support for removing redundant docstring and placeholders

* replace placeholders with real docstrings

* fix copies

* fixup

* remove unwanted changes

* fix unprotected imports

* Fix unprotected imports

* fix unprotected imports

* Add __call__ to all docs of processors

* nits docs

---------

Signed-off-by: nithinraok <nithinrao.koluguri@gmail.com>
Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Nithin Rao <nithinrao.koluguri@gmail.com>
Co-authored-by: Eustache Le Bihan <eulebihan@gmail.com>
Co-authored-by: eustlb <94853470+eustlb@users.noreply.github.com>
Co-authored-by: Rémi Ouazan <83456801+remi-or@users.noreply.github.com>
Co-authored-by: Ferdinand Mom <47445085+3outeille@users.noreply.github.com>
Co-authored-by: Ryan Mullins <ryanmullins@google.com>
Co-authored-by: Luc Georges <McPatate@users.noreply.github.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
Co-authored-by: Guillaume LEGENDRE <glegendre01@gmail.com>
Co-authored-by: kaixuanliu <kaixuan.liu@intel.com>
Co-authored-by: Sahil Kabir <66221472+sahil-kabir@users.noreply.github.com>
Co-authored-by: Sahil Kabir <sahilkabir@Sahils-MacBook-Pro.local>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: James <67161633+gjamesgoenawan@users.noreply.github.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Yacklin Wong <139425274+Yacklin@users.noreply.github.com>
Co-authored-by: Matt <Rocketknight1@users.noreply.github.com>
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Co-authored-by: MilkClouds <claude@maum.ai>
Co-authored-by: ARAVINDHAN T <arvindhant01@gmail.com>
Co-authored-by: Pritam Das <79273068+DeXtAr47-oss@users.noreply.github.com>
Co-authored-by: Andrei Panferov <andrei@panferov.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Contributions Welcome] Add Fast Image Processors

4 participants