Skip to content

Adding support for LayoutLMvX variants for object-detection.#20143

Merged
Narsil merged 4 commits intohuggingface:mainfrom
Narsil:layout_object_detection
Nov 10, 2022
Merged

Adding support for LayoutLMvX variants for object-detection.#20143
Narsil merged 4 commits intohuggingface:mainfrom
Narsil:layout_object_detection

Conversation

@Narsil
Copy link
Contributor

@Narsil Narsil commented Nov 9, 2022

What does this PR do?

Adding support for layoutlm to object-detection.

LayoutLMv{2,3} can be used for object-detection.

However the classes are ForTokenClassification which not all classes can support
a vision + OCR type of inference. (the model needs bbox object even if we splitted
out the OCR).

The current implementation changes object-detection to multimodal since
now the models require tokenizer for the layoutlm variants. (This does not
affect existing working pipelines).

Then it uses reflection at runtime to see if the model is using a tokenizer.
This is not a great way to go about it but was the "simpler" change I could think
of.
As long as we don't have support for other model architecture, I'm hesitant to
make "cleaner" modifications, since I don't know if other architectures will
support the same invariants or not.

Fixes # (issue)

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 9, 2022

The documentation is not available anymore as the PR was closed or merged.

@Narsil Narsil requested review from alaradirik and sgugger November 9, 2022 15:08
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this! Just have a couple of nits.

dict(zip(keys, vals))
for vals in zip(raw_annotation["scores"], raw_annotation["labels"], raw_annotation["boxes"])
]
if self.tokenizer:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if self.tokenizer:
if self.tokenizer is not None:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

for vals in zip(raw_annotation["scores"], raw_annotation["labels"], raw_annotation["boxes"])
]
if self.tokenizer is not None:
w, h = target_size[0].tolist()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full names here please :-)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not sure why it didn't go through in my review)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

dict(zip(keys, vals))
for vals in zip(raw_annotation["scores"], raw_annotation["labels"], raw_annotation["boxes"])
]
if self.tokenizer is not None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a comment above the if else statement to clarify which model this is for (for future use purposes)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

@alaradirik alaradirik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for adding this, looks good to me!

@Narsil Narsil merged commit d066c37 into huggingface:main Nov 10, 2022
@Narsil Narsil deleted the layout_object_detection branch November 10, 2022 10:33
mpierrau pushed a commit to mpierrau/transformers that referenced this pull request Dec 15, 2022
…ngface#20143)

* Adding support for LayoutLMvX variants for `object-detection`.

* Revert bogs `layoutlm` feature extractor which does not exist (it was a
V2 model) .

* Updated condition.

* Handling the comments.
@Narsil Narsil mentioned this pull request Dec 16, 2022
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants