-
Notifications
You must be signed in to change notification settings - Fork 32.5k
Closed
Labels
Description
System Info
System Info:
transformersversion: 4.45.0.dev0- Platform: Linux-5.15.0-105-generic-x86_64-with-glibc2.35
- Python version: 3.11.0
- Huggingface_hub version: 0.24.7
- Safetensors version: 0.4.5
- Accelerate version: 0.34.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.3.1+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?:
- Using GPU in script?:
- GPU type: NVIDIA A100-SXM4-80GB
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
In unpad_image we found a numerical inaccuracy, if original_aspect_ratio==current_aspect_ratio. Which occurs in DocVQA on training sample 32673. See for example the snippet below:
original_size = torch.tensor([2136,3212], device = "cuda:0", dtype = torch.bfloat16)
original_height, original_width = original_size
current_height, current_width = 108, 162
original_aspect_ratio = original_width / original_height #tensor(1.5000)
current_aspect_ratio = current_width / current_height #1.5
scale_factor = current_height / original_height
new_width = int(original_width * scale_factor) # 163
Testing showed, if orignal_height and original_width are integers, that this inaccuracy does not occur.
In die docstring the unpad function asks to be original_size to be a tuple (no type annotation tho), however it will always get a torch.tensor.
"""
Args:
image_features (`List[torch.Tensor]` of length num_images, each of shape `(num_patches, image_length, embed_dim)`)
List of image feature tensor, each contains all the visual feature of all patches.
image_sizes (`torch.Tensor` of shape `(num_images, 2)`)
Actual image size of each images (H, W)."""
.
.
.
image_feature = unpad_image(image_feature, image_sizes[image_idx])Expected behavior
The new_width value shoud be 162. You can see that, if you write down the formula for the aspect ratios, equal them, and multiply by current_height, then you have original_width*scaling_factor=current_width(=new_width).
PS My first issue ever, have patience please.
Reactions are currently unavailable