Add segmentation + object detection image processors by amyeroberts · Pull Request #20160 · huggingface/transformers

amyeroberts · 2022-11-10T11:13:54Z

What does this PR do?

Adds image processors for DETR, Deformable DETR, Conditional DETR, YOLOS and Maskformer, as many of the image processors methods are copied from DETR.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

HuggingFaceDocBuilderDev · 2022-11-10T11:26:05Z

The documentation is not available anymore as the PR was closed or merged.

HuggingFaceDocBuilderDev · 2022-11-10T18:14:13Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

NielsRogge · 2022-11-25T10:29:40Z

docs/source/en/model_doc/conditional_detr.mdx

Conditional DETR only supports instance and panoptic, cc @alaradirik

The semantic segmentation map can still be inferred though, should we keep post_process_semantic_segmentation @NielsRogge @amyeroberts ?

I'd be in favour of keeping if we can get the segmentation maps out, even if it isn't officially a capability of the model.

I can definitely remove both the methods and references in the documentation. I believe they were added in this PR - and so have been part of official release since 4.23. I think we'd therefore need to add a deprecation message etc.

cc @sgugger

Let's not remove them if they were already documented indeed (also since this PR is big, let's keep it focused on the switch FE->ImageProcessor, we can revisit this change in a followup PR).

alaradirik

LGTM! Just left a few comments regarding post-processing methods.

docs/source/en/model_doc/maskformer.mdx

alaradirik · 2022-11-28T10:00:59Z

docs/source/en/model_doc/conditional_detr.mdx

The semantic segmentation map can still be inferred though, should we keep post_process_semantic_segmentation @NielsRogge @amyeroberts ?

NielsRogge · 2022-11-28T10:07:52Z

docs/source/en/model_doc/detr.mdx

Suggested change

- post_process_semantic_segmentation

DETR only supports instance + panoptic segmentation.

src/transformers/models/conditional_detr/image_processing_conditional_detr.py

sgugger

Thanks for working on this. It looks good to me except for the multiple places doccstrings begin with "Args:" followed y the description, then the actual arguments.

Tried to flag as many of them as possible. If make style changes them back, make sure you pull the latest from doc-builder as a bug was fixed recently.

src/transformers/models/deformable_detr/image_processing_deformable_detr.py

src/transformers/models/detr/image_processing_detr.py

src/transformers/models/yolos/image_processing_yolos.py

amyeroberts · 2022-11-29T19:53:00Z

@NielsRogge @sgugger @alaradirik Sorry for the previous issues with the docstrings. They should all be resolved now.

sgugger

Thanks!

amyeroberts · 2022-11-29T23:33:03Z

tests/models/detr/test_feature_extraction_detr.py

        self.assertTrue(torch.allclose(encoding["labels"][0]["class_labels"], expected_class_labels))
        # verify masks
-        expected_masks_sum = 822338
+        expected_masks_sum = 822873


The values for DETR, Conditional DETR, Deformable DETR and YOLOS all changed for the same test here. There are 535 pixels different across the 6, 800 * 1066 pixel masks, representing a 0.01% change.

This is due to the resizing of the annotation masks now being performed by functionality in the image transforms library (using Pillow), whereas it was previously done by torch. This was done to make the preprocessing framework agnostic.

Note: the "nearest" mode for torch interpolation is the same function as Open CVs. The equivalent for scipy/Pillow is "nearest-exact" c.f. torch.nn.functional.interpolation documentation. When this is used, the number of pixels different is 2.

fcakyon · 2022-11-30T17:53:48Z

src/transformers/models/owlvit/image_processing_owlvit.py

+        # Convert from relative [0, 1] to absolute [0, height] coordinates
+        img_h, img_w = target_sizes.unbind(1)
+        scale_fct = torch.stack([img_w, img_h, img_w, img_h], dim=1)
+        target_boxes = target_boxes * scale_fct[:, None, :]


@alaradirik @NielsRogge there is a bug in this line preventing image_guided_detection in cuda.

Here is the tested fix:

scale_fct = torch.stack([img_w, img_h, img_w, img_h], dim=1).to(target_boxes.device)

* Add transforms for object detection * DETR models + Yolos * Scrappy additions * Maskformer image processor * Fix up; MaskFormer tests * Update owlvit processor * Add to docs * OwlViT tests * Update pad logic * Remove changes to transforms * Import fn directly * Update to include pad transformation * Remove uninstended changes * Add new owlvit post processing function * Tidy up * Fix copies * Fix some copies * Include device fix * Fix scipy imports * Update _pad_image * Update padding functionality * Fix bug * Properly handle ignore index * Fix up * Remove defaults to None in docstrings * Fix docstrings & docs * Fix sizes bug * Resolve conflicts in init * Cast to float after resizing * Tidy & add size if missing * Allow kwards when processing for owlvit * Update test values

amyeroberts mentioned this pull request Nov 10, 2022

Update OnnxConfig.generate_dummy_inputs to check ImageProcessingMixin #20157

Merged

amyeroberts force-pushed the add-image-processor-detr branch 5 times, most recently from 390817d to da942bd Compare November 21, 2022 17:17

amyeroberts force-pushed the add-image-processor-detr branch from fcd9d98 to c07bcd3 Compare November 23, 2022 20:41

amyeroberts marked this pull request as ready for review November 23, 2022 21:12

amyeroberts requested review from NielsRogge, alaradirik and sgugger and removed request for NielsRogge and alaradirik November 23, 2022 21:14

NielsRogge reviewed Nov 25, 2022

View reviewed changes

alaradirik approved these changes Nov 28, 2022

View reviewed changes

NielsRogge reviewed Nov 28, 2022

View reviewed changes

src/transformers/models/conditional_detr/image_processing_conditional_detr.py Outdated Show resolved Hide resolved

sgugger reviewed Nov 28, 2022

View reviewed changes

amyeroberts force-pushed the add-image-processor-detr branch 2 times, most recently from 992d7e9 to 430e2f2 Compare November 29, 2022 15:29

amyeroberts added 6 commits November 29, 2022 15:29

Add transforms for object detection

f1b5173

DETR models + Yolos

e25039b

Scrappy additions

d3eb2c1

Maskformer image processor

97d3dfd

Fix up; MaskFormer tests

46c8620

Update owlvit processor

4e6dc4f

amyeroberts added 12 commits November 29, 2022 16:38

Fix some copies

caf78cc

Include device fix

c8e1089

Fix scipy imports

9fde985

Update _pad_image

31aca4c

Update padding functionality

5909cc3

Fix bug

e673519

Properly handle ignore index

f3064de

Fix up

1217f85

Remove defaults to None in docstrings

59b3a72

Fix docstrings & docs

9bc7d91

Fix sizes bug

06bd412

Resolve conflicts in init

4d84ca6

amyeroberts force-pushed the add-image-processor-detr branch from 430e2f2 to 4d84ca6 Compare November 29, 2022 16:43

amyeroberts added 3 commits November 29, 2022 16:53

Cast to float after resizing

271e82b

Tidy & add size if missing

8b40a89

Allow kwards when processing for owlvit

66cff40

sgugger approved these changes Nov 29, 2022

View reviewed changes

Update test values

4979808

amyeroberts commented Nov 29, 2022

View reviewed changes

amyeroberts merged commit de6d19e into huggingface:main Nov 30, 2022

amyeroberts deleted the add-image-processor-detr branch November 30, 2022 10:24

fcakyon reviewed Nov 30, 2022

View reviewed changes

fcakyon mentioned this pull request Nov 30, 2022

owlvit image guided detection does not work in gpu (cuda) #20513

Closed

4 tasks

This was referenced Dec 1, 2022

Update ZeroShotObjectDetectionPipeline doc example #20528

Merged

Fix ConditionalDetrForSegmentation doc example #20531

Merged

Fix torch device issue #20584

Merged

Conversation

amyeroberts commented Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Uh oh!

HuggingFaceDocBuilderDev commented Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Nov 10, 2022

Uh oh!

NielsRogge Nov 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alaradirik Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

amyeroberts Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

sgugger Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

alaradirik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alaradirik Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

NielsRogge Nov 28, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amyeroberts commented Nov 29, 2022

Uh oh!

sgugger left a comment

Choose a reason for hiding this comment

Uh oh!

amyeroberts Nov 29, 2022

Choose a reason for hiding this comment

Uh oh!

fcakyon Nov 30, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

amyeroberts commented Nov 10, 2022 •

edited

Loading

HuggingFaceDocBuilderDev commented Nov 10, 2022 •

edited

Loading

NielsRogge Nov 25, 2022 •

edited

Loading