include ChannelDimension.NONE + function replicate_channels by rafaelpadilla · Pull Request #25767 · huggingface/transformers

rafaelpadilla · 2023-08-25T20:41:47Z

What does this PR do?

The ViT model currently doesn't support grayscale images with a (height, width) format, leading to preprocessing errors.

This PR addresses the issue with a new replicate_channels function. This function converts images in (height, width) format to a 3-channel RGB format (3, height, width), replicating the grayscale channel across all three RGB channels.

While it's possible to integrate format checks and modifications within each processing function (like resize, rescale, normalize, to_channel_dimension_format, etc.), doing so might affect other modules using these functions. To avoid potential complications, I've opted for a direct solution.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@amyeroberts

HuggingFaceDocBuilderDev · 2023-08-25T21:06:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

…ransformers into support_no_channel_image

amyeroberts · 2023-08-29T19:07:25Z

src/transformers/models/vit/image_processing_vit.py

        # All transformations expect numpy arrays.
        images = [to_numpy_array(image) for image in images]

+        # All transformations expect 3-channel images


All of the transforms also accept images with 1 channel, we don't necessarily need to replicate 3x. It just expects 3 in the case of normalization when there's 3 values for mean and std provided

Just noticed that resize also requires 3-channel images as it ends up calling to_channel_dimension_format, which needs 3-channel images for transposing here.

I could still be calling replicate_channels to make 3-channel images and just change that comment. Or I could make the modifications inside to_channel_dimension_format to make it support 1-channel images. But could affect other models that make usage of it.

What do you think?

Hmmm...., the difficultly with to_channel_dimension_format is the desired behaviour if the input image is grayscale and we do

image = to_channel_dimension_format(image, ChannelDimensionFirst)

isn't obvious: should we raise an error? Add a channel? Do nothing?

However, the reason for this channel setting in resize is to convert to a PIL image, which should be possible with grayscale images! I.e. it's the logic in resize and to_pil_image which should be updated to be compatible with grayscale images.

You have a good point. :)

However, to_pil_image is not limited to ViT, and modifying it may affect other models. I am not sure how other models should behave with grayscale images - or even if they should support them.

Now, if the idea is to make only ViT transformations handle single-channel images on their own, we could change the logic in resize, rescale and normalize. What do you think?

amyeroberts

Just some small comments.

Using replicate channels I think is OK for now - we can always adapt the transformations to accept grayscale images in the future.

One thing to consider is the inverse function. Even if this enables us to pass the image through the image processor, we don't necessarily want to have 3 channel image outputs, so will need to reduce and have logic in the image processor to replicate / reduce depending on the output.

amyeroberts · 2023-09-13T20:19:08Z

src/transformers/image_utils.py

    return to_numpy(img)


+def replicate_channels(img: np.ndarray, num_channels: int = 3) -> np.ndarray:


We should be able to specify the output data format with this function

You mean the type e.g. np.uint8, np.float?

Oh, sorry, I meant have the input argument data_format which will be one of the enum values from ChannelDimension

amyeroberts · 2023-09-13T20:20:47Z

src/transformers/models/vit/image_processing_vit.py

        # All transformations expect numpy arrays.
        images = [to_numpy_array(image) for image in images]

+        # All transformations expect 3-channel images


Hmmm...., the difficultly with to_channel_dimension_format is the desired behaviour if the input image is grayscale and we do

image = to_channel_dimension_format(image, ChannelDimensionFirst)

isn't obvious: should we raise an error? Add a channel? Do nothing?

However, the reason for this channel setting in resize is to convert to a PIL image, which should be possible with grayscale images! I.e. it's the logic in resize and to_pil_image which should be updated to be compatible with grayscale images.

include ChannelDimension.NONE + function replicate_channels

63a734d

rafaelpadilla requested a review from amyeroberts August 25, 2023 20:42

rafaelpadilla added 2 commits August 25, 2023 18:02

include ChannelDimension.NONE + function replicate_channels

7427658

Merge branch 'support_no_channel_image' of github.com:rafaelpadilla/t…

79f5392

…ransformers into support_no_channel_image

amyeroberts reviewed Aug 29, 2023

View reviewed changes

amyeroberts reviewed Sep 13, 2023

View reviewed changes

rafaelpadilla mentioned this pull request Sep 21, 2023

Implementation of SuperPoint and AutoModelForInterestPointDescription #25786

Closed

5 tasks

huggingface deleted a comment from github-actions bot Oct 31, 2023

huggingface deleted a comment from github-actions bot Nov 27, 2023

huggingface deleted a comment from github-actions bot Dec 22, 2023

huggingface deleted a comment from github-actions bot Jan 16, 2024

huggingface deleted a comment from github-actions bot Feb 12, 2024

huggingface deleted a comment from github-actions bot Mar 8, 2024

amyeroberts added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Apr 2, 2024

huggingface deleted a comment from github-actions bot Apr 2, 2024

sbucaille mentioned this pull request Mar 12, 2025

Add EfficientLoFTR model #36355

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

include ChannelDimension.NONE + function replicate_channels#25767

include ChannelDimension.NONE + function replicate_channels#25767
rafaelpadilla wants to merge 3 commits intohuggingface:mainfrom
rafaelpadilla:support_no_channel_image

rafaelpadilla commented Aug 25, 2023 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Aug 25, 2023

Uh oh!

amyeroberts Aug 29, 2023

Uh oh!

rafaelpadilla Aug 30, 2023 •

edited

Loading

Uh oh!

amyeroberts Sep 13, 2023

Uh oh!

rafaelpadilla Sep 13, 2023

Uh oh!

amyeroberts left a comment

Uh oh!

amyeroberts Sep 13, 2023

Uh oh!

rafaelpadilla Sep 13, 2023

Uh oh!

amyeroberts Sep 14, 2023

Uh oh!

amyeroberts Sep 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		return to_numpy(img)


		def replicate_channels(img: np.ndarray, num_channels: int = 3) -> np.ndarray:

Conversation

rafaelpadilla commented Aug 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Aug 25, 2023

Uh oh!

amyeroberts Aug 29, 2023

Choose a reason for hiding this comment

Uh oh!

rafaelpadilla Aug 30, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amyeroberts Sep 13, 2023

Choose a reason for hiding this comment

Uh oh!

rafaelpadilla Sep 13, 2023

Choose a reason for hiding this comment

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts Sep 13, 2023

Choose a reason for hiding this comment

Uh oh!

rafaelpadilla Sep 13, 2023

Choose a reason for hiding this comment

Uh oh!

amyeroberts Sep 14, 2023

Choose a reason for hiding this comment

Uh oh!

amyeroberts Sep 13, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rafaelpadilla commented Aug 25, 2023 •

edited

Loading

rafaelpadilla Aug 30, 2023 •

edited

Loading