Add segmentation + object detection image processors#20160
Add segmentation + object detection image processors#20160amyeroberts merged 32 commits intohuggingface:mainfrom
Conversation
|
The documentation is not available anymore as the PR was closed or merged. |
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
390817d to
da942bd
Compare
fcd9d98 to
c07bcd3
Compare
There was a problem hiding this comment.
Conditional DETR only supports instance and panoptic, cc @alaradirik
There was a problem hiding this comment.
The semantic segmentation map can still be inferred though, should we keep post_process_semantic_segmentation @NielsRogge @amyeroberts ?
There was a problem hiding this comment.
I'd be in favour of keeping if we can get the segmentation maps out, even if it isn't officially a capability of the model.
I can definitely remove both the methods and references in the documentation. I believe they were added in this PR - and so have been part of official release since 4.23. I think we'd therefore need to add a deprecation message etc.
cc @sgugger
There was a problem hiding this comment.
Let's not remove them if they were already documented indeed (also since this PR is big, let's keep it focused on the switch FE->ImageProcessor, we can revisit this change in a followup PR).
alaradirik
left a comment
There was a problem hiding this comment.
LGTM! Just left a few comments regarding post-processing methods.
There was a problem hiding this comment.
The semantic segmentation map can still be inferred though, should we keep post_process_semantic_segmentation @NielsRogge @amyeroberts ?
docs/source/en/model_doc/detr.mdx
Outdated
There was a problem hiding this comment.
| - post_process_semantic_segmentation |
DETR only supports instance + panoptic segmentation.
src/transformers/models/conditional_detr/image_processing_conditional_detr.py
Outdated
Show resolved
Hide resolved
sgugger
left a comment
There was a problem hiding this comment.
Thanks for working on this. It looks good to me except for the multiple places doccstrings begin with "Args:" followed y the description, then the actual arguments.
Tried to flag as many of them as possible. If make style changes them back, make sure you pull the latest from doc-builder as a bug was fixed recently.
src/transformers/models/deformable_detr/image_processing_deformable_detr.py
Outdated
Show resolved
Hide resolved
src/transformers/models/deformable_detr/image_processing_deformable_detr.py
Outdated
Show resolved
Hide resolved
src/transformers/models/deformable_detr/image_processing_deformable_detr.py
Outdated
Show resolved
Hide resolved
992d7e9 to
430e2f2
Compare
430e2f2 to
4d84ca6
Compare
|
@NielsRogge @sgugger @alaradirik Sorry for the previous issues with the docstrings. They should all be resolved now. |
| self.assertTrue(torch.allclose(encoding["labels"][0]["class_labels"], expected_class_labels)) | ||
| # verify masks | ||
| expected_masks_sum = 822338 | ||
| expected_masks_sum = 822873 |
There was a problem hiding this comment.
The values for DETR, Conditional DETR, Deformable DETR and YOLOS all changed for the same test here. There are 535 pixels different across the 6, 800 * 1066 pixel masks, representing a 0.01% change.
This is due to the resizing of the annotation masks now being performed by functionality in the image transforms library (using Pillow), whereas it was previously done by torch. This was done to make the preprocessing framework agnostic.
Note: the "nearest" mode for torch interpolation is the same function as Open CVs. The equivalent for scipy/Pillow is "nearest-exact" c.f. torch.nn.functional.interpolation documentation. When this is used, the number of pixels different is 2.
| # Convert from relative [0, 1] to absolute [0, height] coordinates | ||
| img_h, img_w = target_sizes.unbind(1) | ||
| scale_fct = torch.stack([img_w, img_h, img_w, img_h], dim=1) | ||
| target_boxes = target_boxes * scale_fct[:, None, :] |
There was a problem hiding this comment.
@alaradirik @NielsRogge there is a bug in this line preventing image_guided_detection in cuda.
Here is the tested fix:
scale_fct = torch.stack([img_w, img_h, img_w, img_h], dim=1).to(target_boxes.device)* Add transforms for object detection * DETR models + Yolos * Scrappy additions * Maskformer image processor * Fix up; MaskFormer tests * Update owlvit processor * Add to docs * OwlViT tests * Update pad logic * Remove changes to transforms * Import fn directly * Update to include pad transformation * Remove uninstended changes * Add new owlvit post processing function * Tidy up * Fix copies * Fix some copies * Include device fix * Fix scipy imports * Update _pad_image * Update padding functionality * Fix bug * Properly handle ignore index * Fix up * Remove defaults to None in docstrings * Fix docstrings & docs * Fix sizes bug * Resolve conflicts in init * Cast to float after resizing * Tidy & add size if missing * Allow kwards when processing for owlvit * Update test values
What does this PR do?
Adds image processors for DETR, Deformable DETR, Conditional DETR, YOLOS and Maskformer, as many of the image processors methods are copied from DETR.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.