Fuyu processing update#27133
Fuyu processing update#27133amyeroberts merged 46 commits intohuggingface:fuyu_follow_up_image_processingfrom
Conversation
|
@pcuenca Here's the draft PR for updating the image processor. In relation to your PR with the box coordinate transformations, you'll notice that I've removed the |
|
cc @molbap |
|
The documentation is not available anymore as the PR was closed or merged. |
|
@amyeroberts Nice! I'll update accordingly. |
| patches = patches.reshape(batch_size, -1, channels * patch_height * patch_width) | ||
| return patches | ||
|
|
||
| def preprocess_with_tokenizer_info( |
There was a problem hiding this comment.
This was renamed to preprocess_with_tokenizer_info to reflect the current naming patterns with other image processors: preprocess for creating the model inputs, post_process_xxx for processing the model outputs for a specific downstream task
There was a problem hiding this comment.
Ok, good to know! thanks for the explanation
| # Copied from transformers.models.detr.image_processing_detr.max_across_indices | ||
| def max_across_indices(values: Iterable[Any]) -> List[Any]: | ||
| """ | ||
| Return the maximum value across all indices of an iterable of values. | ||
| """ | ||
| return [max(values_i) for values_i in zip(*values)] | ||
|
|
||
|
|
||
| # Copied from transformers.models.detr.image_processing_detr.get_max_height_width | ||
| def get_max_height_width( | ||
| images: List[np.ndarray], input_data_format: Optional[Union[str, ChannelDimension]] = None | ||
| ) -> List[int]: | ||
| """ | ||
| Get the maximum height and width across all images in a batch. | ||
| """ | ||
| if input_data_format is None: | ||
| input_data_format = infer_channel_dimension_format(images[0]) | ||
|
|
||
| if input_data_format == ChannelDimension.FIRST: | ||
| _, max_height, max_width = max_across_indices([img.shape for img in images]) | ||
| elif input_data_format == ChannelDimension.LAST: | ||
| max_height, max_width, _ = max_across_indices([img.shape for img in images]) | ||
| else: | ||
| raise ValueError(f"Invalid channel dimension format: {input_data_format}") | ||
| return (max_height, max_width) | ||
|
|
||
|
|
||
| # Copied from transformers.models.detr.image_processing_detr.make_pixel_mask | ||
| def make_pixel_mask( | ||
| image: np.ndarray, output_size: Tuple[int, int], input_data_format: Optional[Union[str, ChannelDimension]] = None | ||
| ) -> np.ndarray: | ||
| """ | ||
| Make a pixel mask for the image, where 1 indicates a valid pixel and 0 indicates padding. | ||
|
|
||
| Args: | ||
| image (`np.ndarray`): | ||
| Image to make the pixel mask for. | ||
| output_size (`Tuple[int, int]`): | ||
| Output size of the mask. | ||
| """ | ||
| input_height, input_width = get_image_size(image, channel_dim=input_data_format) | ||
| mask = np.zeros(output_size, dtype=np.int64) | ||
| mask[:input_height, :input_width] = 1 | ||
| return mask |
There was a problem hiding this comment.
These were removed as they didn't appear to be used anywhere in the processing logic
| return mask | ||
|
|
||
|
|
||
| class FuyuBatchEncoding(BatchEncoding): |
There was a problem hiding this comment.
This was replaced wtih BatchFeature as the processor contains image_patches which are of float type
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
…rocessing-update-coordinates
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
…rocessing-update-coordinates
Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com>
| target_width: int = 1920, | ||
| do_resize: bool = True, | ||
| size: Optional[Dict[str, int]] = None, | ||
| resample: PILImageResampling = PILImageResampling.BILINEAR, # FIXME check default value |
There was a problem hiding this comment.
| resample: PILImageResampling = PILImageResampling.BILINEAR, # FIXME check default value | |
| resample: PILImageResampling = PILImageResampling.BILINEAR, |
This is how it was done in the original code: https://huggingface.co/adept-hf-collab/adept-mm/blob/736c6b570b2a9c0367a3266746fd1f53cfff0a2b/mm-inference-for-hf/multimodal/data/image_utils.py#L208
BILINEAR seems correct, as our resizing is always done on PIL images and antialias is True in that case.
| if is_vision_available(): | ||
| from .image_processing_fuyu import FuyuImageProcessor |
There was a problem hiding this comment.
| if is_vision_available(): | |
| from .image_processing_fuyu import FuyuImageProcessor | |
| from .image_processing_fuyu import FuyuImageProcessor |
Otherwise I think import FuyuProcessor would fail if torchvision is not installed.
There was a problem hiding this comment.
This helped uncover a bug! The image processor was being reset, overwriting the user's input here. If we get rid of that, then we don't need this import at all
| # Batch of two images - different sizes | ||
| images = [self.bus_image_pil, self.bus_image_pil.resize((64, 300))] | ||
| processor_outputs = self.processor(text=[self.text_prompt, self.text_prompt], images=images) | ||
| # FIXME - test outputs |
There was a problem hiding this comment.
To be completed, this succeeds now.
There was a problem hiding this comment.
I've added a test which checks the processing of an individual resized images, and then checks the padding for two differently sized images in a batch
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
|
LGTM, I'll add some tests related to model in my PR! Ok to merge to #27007 when amyeroberts#113 is merged, and I'll add a model tester there |
Fuyu processing: handle coordinates
* Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * initial rework commit * add batching capabilities, refactor image processing * add functional batching for a list of images and texts * make args explicit * Fuyu processing update (#27133) * Add file headers * Add file headers * First pass - preprocess method with standard args * First pass image processor rework * Small tweaks * More args and docstrings * Tidying iterating over batch * Tidying up * Modify to have quick tests (for now) * Fix up * BatchFeature * Passing tests * Add tests for processor * Sense check when patchifying * Add some tests * FuyuBatchFeature * Post-process box coordinates * Update to `size` in processor * Remove unused and duplicate constants * Store unpadded dims after resize * Fix up * Return FuyuBatchFeature * Get unpadded sizes after resize * Update exception * Fix return * Convert input `<box>` coordinates to model format. * Post-process point coords, support multiple boxes/points in a single sequence * Replace constants * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Preprocess List[List[image]] * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update to Amy's latest state. * post-processing returns a list of tensors * Fix error when target_sizes is None Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Review comments * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix up * Fix up --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Fix conflicts in fuyu_follow_up_image_processing (#27228) fixing conflicts and updating on main * Revert "Fix conflicts in fuyu_follow_up_image_processing" (#27232) Revert "Fix conflicts in fuyu_follow_up_image_processing (#27228)" This reverts commit acce10b. --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal>
* Fix Fuyu image scaling bug It could produce negative padding and hence inference errors for certain image sizes. * initial rework commit * add batching capabilities, refactor image processing * add functional batching for a list of images and texts * make args explicit * Fuyu processing update (huggingface#27133) * Add file headers * Add file headers * First pass - preprocess method with standard args * First pass image processor rework * Small tweaks * More args and docstrings * Tidying iterating over batch * Tidying up * Modify to have quick tests (for now) * Fix up * BatchFeature * Passing tests * Add tests for processor * Sense check when patchifying * Add some tests * FuyuBatchFeature * Post-process box coordinates * Update to `size` in processor * Remove unused and duplicate constants * Store unpadded dims after resize * Fix up * Return FuyuBatchFeature * Get unpadded sizes after resize * Update exception * Fix return * Convert input `<box>` coordinates to model format. * Post-process point coords, support multiple boxes/points in a single sequence * Replace constants * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Preprocess List[List[image]] * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update to Amy's latest state. * post-processing returns a list of tensors * Fix error when target_sizes is None Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Review comments * Update src/transformers/models/fuyu/image_processing_fuyu.py Co-authored-by: Pedro Cuenca <pedro@huggingface.co> * Fix up * Fix up --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal> Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: Pablo Montalvo <pablo.montalvo.leroux@gmail.com> * Fix conflicts in fuyu_follow_up_image_processing (huggingface#27228) fixing conflicts and updating on main * Revert "Fix conflicts in fuyu_follow_up_image_processing" (huggingface#27232) Revert "Fix conflicts in fuyu_follow_up_image_processing (huggingface#27228)" This reverts commit acce10b. --------- Co-authored-by: Pedro Cuenca <pedro@huggingface.co> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@ip-172-31-72-126.ec2.internal>
What does this PR do?
This PR builds upon #27007 - ticking off some elements in the TODO list and bringing the processor and image processor more in-line with expected patterns in the library.