Redundant to_channel_dimension_format() call makes preprocessing fail in case the image has height of 1 pixel by dhansmair · Pull Request #20728 · huggingface/transformers

dhansmair · 2022-12-12T11:29:12Z

In the resize() function in image_transforms.py, the line 267: I think image = to_channel_dimension_format(image, ChannelDimension.LAST) is redundant as this conversion is also applied in the following to_pil_image().

This redundant call actually makes the clip preprocessing fail in special cases. The problem can be reproduced with the following code snippet:

import torch
from transformers.models.clip import CLIPFeatureExtractor
vision_processor = CLIPFeatureExtractor.from_pretrained('openai/clip-vit-large-patch14')
images = [
    torch.rand(size=(3, 2, 10), dtype=torch.float),
    torch.rand(size=(3, 10, 1), dtype=torch.float),
    torch.rand(size=(3, 1, 10), dtype=torch.float)
]
for image in images:
    processed_image = vision_processor(images=image, return_tensors="pt")['pixel_values']
    print(processed_image.shape)
    assert processed_image.shape == torch.Size([1, 3, 224, 224])

The last image has a height of 1 pixel.
The second call to to_channel_dimesion_format() will transpose the image, and the height dimension is wrongly treated as the channels dimension afterwards. Because of this, the following normalize() step will result in an exception.

An image of height 1 pixel honestly doesn't make much sense, but it happened in my training on visual genome region descriptions and took me a while to track down the problem.

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

`image = to_channel_dimension_format(image, ChannelDimension.LAST)` is redundant as this same conversion is also applied in to_pil_image(). This redundant call actually makes the training fail in rare cases. The problem can be reproduced with the following code snippet: ``` from transformers.models.clip import CLIPFeatureExtractor vision_processor = CLIPFeatureExtractor.from_pretrained('openai/clip-vit-large-patch14') images = [ torch.rand(size=(3, 2, 10), dtype=torch.float), torch.rand(size=(3, 10, 1), dtype=torch.float), torch.rand(size=(3, 1, 10), dtype=torch.float) ] for image in images: processed_image = vision_processor(images=image, return_tensors="pt")['pixel_values'] print(processed_image.shape) assert processed_image.shape == torch.Size([1, 3, 224, 224]) ``` The last image has a height of 1 pixel. The second call to to_channel_dimesion_format() will transpose the image, and the height dimension is wrongly treated as the channels dimension afterwards. Because of this, the following normalize() step will result in an exception.

HuggingFaceDocBuilderDev · 2022-12-12T11:43:49Z

The documentation is not available anymore as the PR was closed or merged.

sgugger · 2022-12-12T14:50:20Z

cc @amyeroberts

amyeroberts

Thanks for finding the issue and fix!

dhansmair · 2022-12-13T11:00:25Z

sure thing!

…ngface#20728) `image = to_channel_dimension_format(image, ChannelDimension.LAST)` is redundant as this same conversion is also applied in to_pil_image(). This redundant call actually makes the training fail in rare cases. The problem can be reproduced with the following code snippet: ``` from transformers.models.clip import CLIPFeatureExtractor vision_processor = CLIPFeatureExtractor.from_pretrained('openai/clip-vit-large-patch14') images = [ torch.rand(size=(3, 2, 10), dtype=torch.float), torch.rand(size=(3, 10, 1), dtype=torch.float), torch.rand(size=(3, 1, 10), dtype=torch.float) ] for image in images: processed_image = vision_processor(images=image, return_tensors="pt")['pixel_values'] print(processed_image.shape) assert processed_image.shape == torch.Size([1, 3, 224, 224]) ``` The last image has a height of 1 pixel. The second call to to_channel_dimesion_format() will transpose the image, and the height dimension is wrongly treated as the channels dimension afterwards. Because of this, the following normalize() step will result in an exception.

amyeroberts approved these changes Dec 12, 2022

View reviewed changes

sgugger merged commit 30d8919 into huggingface:main Dec 13, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redundant to_channel_dimension_format() call makes preprocessing fail in case the image has height of 1 pixel#20728

Redundant to_channel_dimension_format() call makes preprocessing fail in case the image has height of 1 pixel#20728
sgugger merged 1 commit intohuggingface:mainfrom
dhansmair:clip-preprocess-resize-fix

dhansmair commented Dec 12, 2022 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Dec 12, 2022 •

edited

Loading

Uh oh!

sgugger commented Dec 12, 2022

Uh oh!

amyeroberts left a comment

Uh oh!

dhansmair commented Dec 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

dhansmair commented Dec 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sgugger commented Dec 12, 2022

Uh oh!

amyeroberts left a comment

Choose a reason for hiding this comment

Uh oh!

dhansmair commented Dec 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dhansmair commented Dec 12, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Dec 12, 2022 •

edited

Loading