include ChannelDimension.NONE + function replicate_channels#25767
include ChannelDimension.NONE + function replicate_channels#25767rafaelpadilla wants to merge 3 commits intohuggingface:mainfrom
Conversation
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
…ransformers into support_no_channel_image
| # All transformations expect numpy arrays. | ||
| images = [to_numpy_array(image) for image in images] | ||
|
|
||
| # All transformations expect 3-channel images |
There was a problem hiding this comment.
All of the transforms also accept images with 1 channel, we don't necessarily need to replicate 3x. It just expects 3 in the case of normalization when there's 3 values for mean and std provided
There was a problem hiding this comment.
Just noticed that resize also requires 3-channel images as it ends up calling to_channel_dimension_format, which needs 3-channel images for transposing here.
I could still be calling replicate_channels to make 3-channel images and just change that comment. Or I could make the modifications inside to_channel_dimension_format to make it support 1-channel images. But could affect other models that make usage of it.
What do you think?
There was a problem hiding this comment.
Hmmm...., the difficultly with to_channel_dimension_format is the desired behaviour if the input image is grayscale and we do
image = to_channel_dimension_format(image, ChannelDimensionFirst)
isn't obvious: should we raise an error? Add a channel? Do nothing?
However, the reason for this channel setting in resize is to convert to a PIL image, which should be possible with grayscale images! I.e. it's the logic in resize and to_pil_image which should be updated to be compatible with grayscale images.
There was a problem hiding this comment.
You have a good point. :)
However, to_pil_image is not limited to ViT, and modifying it may affect other models. I am not sure how other models should behave with grayscale images - or even if they should support them.
Now, if the idea is to make only ViT transformations handle single-channel images on their own, we could change the logic in resize, rescale and normalize. What do you think?
amyeroberts
left a comment
There was a problem hiding this comment.
Just some small comments.
Using replicate channels I think is OK for now - we can always adapt the transformations to accept grayscale images in the future.
One thing to consider is the inverse function. Even if this enables us to pass the image through the image processor, we don't necessarily want to have 3 channel image outputs, so will need to reduce and have logic in the image processor to replicate / reduce depending on the output.
| return to_numpy(img) | ||
|
|
||
|
|
||
| def replicate_channels(img: np.ndarray, num_channels: int = 3) -> np.ndarray: |
There was a problem hiding this comment.
We should be able to specify the output data format with this function
There was a problem hiding this comment.
You mean the type e.g. np.uint8, np.float?
There was a problem hiding this comment.
Oh, sorry, I meant have the input argument data_format which will be one of the enum values from ChannelDimension
| # All transformations expect numpy arrays. | ||
| images = [to_numpy_array(image) for image in images] | ||
|
|
||
| # All transformations expect 3-channel images |
There was a problem hiding this comment.
Hmmm...., the difficultly with to_channel_dimension_format is the desired behaviour if the input image is grayscale and we do
image = to_channel_dimension_format(image, ChannelDimensionFirst)
isn't obvious: should we raise an error? Add a channel? Do nothing?
However, the reason for this channel setting in resize is to convert to a PIL image, which should be possible with grayscale images! I.e. it's the logic in resize and to_pil_image which should be updated to be compatible with grayscale images.
What does this PR do?
Fixes #25694
The ViT model currently doesn't support grayscale images with a (height, width) format, leading to preprocessing errors.
This PR addresses the issue with a new replicate_channels function. This function converts images in (height, width) format to a 3-channel RGB format (3, height, width), replicating the grayscale channel across all three RGB channels.
While it's possible to integrate format checks and modifications within each processing function (like resize, rescale, normalize, to_channel_dimension_format, etc.), doing so might affect other modules using these functions. To avoid potential complications, I've opted for a direct solution.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@amyeroberts