Skip to content

include ChannelDimension.NONE + function replicate_channels#25767

Open
rafaelpadilla wants to merge 3 commits intohuggingface:mainfrom
rafaelpadilla:support_no_channel_image
Open

include ChannelDimension.NONE + function replicate_channels#25767
rafaelpadilla wants to merge 3 commits intohuggingface:mainfrom
rafaelpadilla:support_no_channel_image

Conversation

@rafaelpadilla
Copy link
Contributor

@rafaelpadilla rafaelpadilla commented Aug 25, 2023

What does this PR do?

Fixes #25694

The ViT model currently doesn't support grayscale images with a (height, width) format, leading to preprocessing errors.

This PR addresses the issue with a new replicate_channels function. This function converts images in (height, width) format to a 3-channel RGB format (3, height, width), replicating the grayscale channel across all three RGB channels.

While it's possible to integrate format checks and modifications within each processing function (like resize, rescale, normalize, to_channel_dimension_format, etc.), doing so might affect other modules using these functions. To avoid potential complications, I've opted for a direct solution.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@amyeroberts

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

# All transformations expect numpy arrays.
images = [to_numpy_array(image) for image in images]

# All transformations expect 3-channel images
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All of the transforms also accept images with 1 channel, we don't necessarily need to replicate 3x. It just expects 3 in the case of normalization when there's 3 values for mean and std provided

Copy link
Contributor Author

@rafaelpadilla rafaelpadilla Aug 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noticed that resize also requires 3-channel images as it ends up calling to_channel_dimension_format, which needs 3-channel images for transposing here.

I could still be calling replicate_channels to make 3-channel images and just change that comment. Or I could make the modifications inside to_channel_dimension_format to make it support 1-channel images. But could affect other models that make usage of it.

What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...., the difficultly with to_channel_dimension_format is the desired behaviour if the input image is grayscale and we do

image = to_channel_dimension_format(image, ChannelDimensionFirst)

isn't obvious: should we raise an error? Add a channel? Do nothing?

However, the reason for this channel setting in resize is to convert to a PIL image, which should be possible with grayscale images! I.e. it's the logic in resize and to_pil_image which should be updated to be compatible with grayscale images.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a good point. :)

However, to_pil_image is not limited to ViT, and modifying it may affect other models. I am not sure how other models should behave with grayscale images - or even if they should support them.

Now, if the idea is to make only ViT transformations handle single-channel images on their own, we could change the logic in resize, rescale and normalize. What do you think?

Copy link
Contributor

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments.

Using replicate channels I think is OK for now - we can always adapt the transformations to accept grayscale images in the future.

One thing to consider is the inverse function. Even if this enables us to pass the image through the image processor, we don't necessarily want to have 3 channel image outputs, so will need to reduce and have logic in the image processor to replicate / reduce depending on the output.

return to_numpy(img)


def replicate_channels(img: np.ndarray, num_channels: int = 3) -> np.ndarray:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to specify the output data format with this function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean the type e.g. np.uint8, np.float?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry, I meant have the input argument data_format which will be one of the enum values from ChannelDimension

# All transformations expect numpy arrays.
images = [to_numpy_array(image) for image in images]

# All transformations expect 3-channel images
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...., the difficultly with to_channel_dimension_format is the desired behaviour if the input image is grayscale and we do

image = to_channel_dimension_format(image, ChannelDimensionFirst)

isn't obvious: should we raise an error? Add a channel? Do nothing?

However, the reason for this channel setting in resize is to convert to a PIL image, which should be possible with grayscale images! I.e. it's the logic in resize and to_pil_image which should be updated to be compatible with grayscale images.

@huggingface huggingface deleted a comment from github-actions bot Oct 31, 2023
@huggingface huggingface deleted a comment from github-actions bot Nov 27, 2023
@huggingface huggingface deleted a comment from github-actions bot Dec 22, 2023
@huggingface huggingface deleted a comment from github-actions bot Jan 16, 2024
@huggingface huggingface deleted a comment from github-actions bot Feb 12, 2024
@huggingface huggingface deleted a comment from github-actions bot Mar 8, 2024
@amyeroberts amyeroberts added the WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress label Apr 2, 2024
@huggingface huggingface deleted a comment from github-actions bot Apr 2, 2024
@sbucaille sbucaille mentioned this pull request Mar 12, 2025
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

WIP Label your PR/Issue with WIP for some long outstanding Issues/PRs that are work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError: Unsupported number of image dimensions: 2 - An error during embedding Image data

3 participants