Skip to content

Add segmentation + object detection image processors#20160

Merged
amyeroberts merged 32 commits intohuggingface:mainfrom
amyeroberts:add-image-processor-detr
Nov 30, 2022
Merged

Add segmentation + object detection image processors#20160
amyeroberts merged 32 commits intohuggingface:mainfrom
amyeroberts:add-image-processor-detr

Conversation

@amyeroberts
Copy link
Contributor

@amyeroberts amyeroberts commented Nov 10, 2022

What does this PR do?

Adds image processors for DETR, Deformable DETR, Conditional DETR, YOLOS and Maskformer, as many of the image processors methods are copied from DETR.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 10, 2022

The documentation is not available anymore as the PR was closed or merged.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@amyeroberts amyeroberts force-pushed the add-image-processor-detr branch 5 times, most recently from 390817d to da942bd Compare November 21, 2022 17:17
@amyeroberts amyeroberts force-pushed the add-image-processor-detr branch from fcd9d98 to c07bcd3 Compare November 23, 2022 20:41
@amyeroberts amyeroberts marked this pull request as ready for review November 23, 2022 21:12
@amyeroberts amyeroberts requested review from NielsRogge, alaradirik and sgugger and removed request for NielsRogge and alaradirik November 23, 2022 21:14
Comment on lines 41 to 42
Copy link
Contributor

@NielsRogge NielsRogge Nov 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Conditional DETR only supports instance and panoptic, cc @alaradirik

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic segmentation map can still be inferred though, should we keep post_process_semantic_segmentation @NielsRogge @amyeroberts ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be in favour of keeping if we can get the segmentation maps out, even if it isn't officially a capability of the model.

I can definitely remove both the methods and references in the documentation. I believe they were added in this PR - and so have been part of official release since 4.23. I think we'd therefore need to add a deprecation message etc.

cc @sgugger

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not remove them if they were already documented indeed (also since this PR is big, let's keep it focused on the switch FE->ImageProcessor, we can revisit this change in a followup PR).

Copy link
Contributor

@alaradirik alaradirik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just left a few comments regarding post-processing methods.

Comment on lines 41 to 42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semantic segmentation map can still be inferred though, should we keep post_process_semantic_segmentation @NielsRogge @amyeroberts ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- post_process_semantic_segmentation

DETR only supports instance + panoptic segmentation.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this. It looks good to me except for the multiple places doccstrings begin with "Args:" followed y the description, then the actual arguments.

Tried to flag as many of them as possible. If make style changes them back, make sure you pull the latest from doc-builder as a bug was fixed recently.

@amyeroberts amyeroberts force-pushed the add-image-processor-detr branch 2 times, most recently from 992d7e9 to 430e2f2 Compare November 29, 2022 15:29
@amyeroberts amyeroberts force-pushed the add-image-processor-detr branch from 430e2f2 to 4d84ca6 Compare November 29, 2022 16:43
@amyeroberts
Copy link
Contributor Author

@NielsRogge @sgugger @alaradirik Sorry for the previous issues with the docstrings. They should all be resolved now.

Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

self.assertTrue(torch.allclose(encoding["labels"][0]["class_labels"], expected_class_labels))
# verify masks
expected_masks_sum = 822338
expected_masks_sum = 822873
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The values for DETR, Conditional DETR, Deformable DETR and YOLOS all changed for the same test here. There are 535 pixels different across the 6, 800 * 1066 pixel masks, representing a 0.01% change.

This is due to the resizing of the annotation masks now being performed by functionality in the image transforms library (using Pillow), whereas it was previously done by torch. This was done to make the preprocessing framework agnostic.

Note: the "nearest" mode for torch interpolation is the same function as Open CVs. The equivalent for scipy/Pillow is "nearest-exact" c.f. torch.nn.functional.interpolation documentation. When this is used, the number of pixels different is 2.

@amyeroberts amyeroberts merged commit de6d19e into huggingface:main Nov 30, 2022
@amyeroberts amyeroberts deleted the add-image-processor-detr branch November 30, 2022 10:24
# Convert from relative [0, 1] to absolute [0, height] coordinates
img_h, img_w = target_sizes.unbind(1)
scale_fct = torch.stack([img_w, img_h, img_w, img_h], dim=1)
target_boxes = target_boxes * scale_fct[:, None, :]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alaradirik @NielsRogge there is a bug in this line preventing image_guided_detection in cuda.

Here is the tested fix:

scale_fct = torch.stack([img_w, img_h, img_w, img_h], dim=1).to(target_boxes.device)

mpierrau pushed a commit to mpierrau/transformers that referenced this pull request Dec 15, 2022
* Add transforms for object detection

* DETR models + Yolos

* Scrappy additions

* Maskformer image processor

* Fix up; MaskFormer tests

* Update owlvit processor

* Add to docs

* OwlViT tests

* Update pad logic

* Remove changes to transforms

* Import fn directly

* Update to include pad transformation

* Remove uninstended changes

* Add new owlvit post processing function

* Tidy up

* Fix copies

* Fix some copies

* Include device fix

* Fix scipy imports

* Update _pad_image

* Update padding functionality

* Fix bug

* Properly handle ignore index

* Fix up

* Remove defaults to None in docstrings

* Fix docstrings & docs

* Fix sizes bug

* Resolve conflicts in init

* Cast to float after resizing

* Tidy & add size if missing

* Allow kwards when processing for owlvit

* Update test values
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants