Skip to content

[GSoC 2022] Multipage Image Decoder API#22128

Merged
asmorkalov merged 1 commit intoopencv:4.xfrom
ocpalo:multipage_img_decoder
Sep 21, 2022
Merged

[GSoC 2022] Multipage Image Decoder API#22128
asmorkalov merged 1 commit intoopencv:4.xfrom
ocpalo:multipage_img_decoder

Conversation

@ocpalo
Copy link
Copy Markdown
Collaborator

@ocpalo ocpalo commented Jun 19, 2022

I am trying to finalize #17753 pr. To understand the requirements and see what we can do I also take a look at #6574.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@alalek
Copy link
Copy Markdown
Member

alalek commented Jun 19, 2022

OpenCV API is primary C++ based. However we need to think about Python, Java bindings - and support them too.

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Jun 19, 2022

@alalek I am planning to do Python and Java bindings after we finalize this API interface. Thanks for the reminder!

@vpisarev
Copy link
Copy Markdown
Contributor

@ocpalo, thank you! Please, rename PR so that the title contains [GSoC 2022] in the beginning.

@ocpalo ocpalo changed the title [WIP] Multipage Image Decoder API [GSoC] Multipage Image Decoder API Jun 22, 2022
@ocpalo ocpalo changed the title [GSoC] Multipage Image Decoder API [GSoC 2022] Multipage Image Decoder API Jun 22, 2022
@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Jun 25, 2022

I think I can remove the cv::Exception part from the API and provide error codes. This way, API would be more efficient.

@ocpalo ocpalo force-pushed the multipage_img_decoder branch from d85931b to 7a5d89e Compare July 9, 2022 08:54
@vpisarev
Copy link
Copy Markdown
Contributor

@ocpalo, overall, it looks good now. I put a couple of small comments.

@vpisarev
Copy link
Copy Markdown
Contributor

ok, I realized that init() reassigned m_decoder, so it should be closed automatically.

@vpisarev vpisarev self-requested a review August 26, 2022 09:52
@ocpalo ocpalo force-pushed the multipage_img_decoder branch 4 times, most recently from fbed6c0 to 79856ab Compare September 10, 2022 12:53
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The solution looks very good for me! It's the next step for HEIF support and it's analogues.

}

Mat& ImageCollection::Impl::operator[](int index) {
if(m_pages[index].empty()) {
Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out-of-range access check is required here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

range is already checked at the external method ImageCollection::at() method

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 methods in the ImageCollection to read images. .at() method and operator[] operator. As in the STD containers, .at method does out-of-range access check but operator[] does not. There is no need to check here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at() calls operator[], but not vise versa. In case if index is -1 you'll get out of bound access for std::vector. For index, let say 1e+6 the next loop will be iterated 1e+6 times: for(int i = 0; i != index && advance(); ++i) {}. I propose to move the check from at() to operator[] or have it in both methods. In std::vector<> is reasonable to get rid of range check for performance reason. Range check is heavier than actual work. But it's not true for image codecs. Also it could be a security issue.

@ocpalo ocpalo force-pushed the multipage_img_decoder branch from 3c22310 to d7a3bef Compare September 14, 2022 17:16
@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 14, 2022

I made the fixes. Please take a look again @vpisarev @asmorkalov.

ALso, the Windows10-x64 fail is not related with my pull request. Please rerun that workflow only if you can. I need to push commit to re-run all workflows.

Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take a look on CI issue. Please ignore it for now.

}

Mat& ImageCollection::Impl::operator[](int index) {
if(m_pages[index].empty()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at() calls operator[], but not vise versa. In case if index is -1 you'll get out of bound access for std::vector. For index, let say 1e+6 the next loop will be iterated 1e+6 times: for(int i = 0; i != index && advance(); ++i) {}. I propose to move the check from at() to operator[] or have it in both methods. In std::vector<> is reasonable to get rid of range check for performance reason. Range check is heavier than actual work. But it's not true for image codecs. Also it could be a security issue.

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 15, 2022

at() calls operator[], but not vise versa. In case if index is -1 you'll get out of bound access for std::vector. For index, let say 1e+6 the next loop will be iterated 1e+6 times: for(int i = 0; i != index && advance(); ++i) {}. I propose to move the check from at() to operator[] or have it in both methods. In std::vector<> is reasonable to get rid of range check for performance reason. Range check is heavier than actual work. But it's not true for image codecs. Also it could be a security issue.

@asmorkalov Regarding this comment, I have changed the std::vector operator[] with the std::vector .at() method. In this implementation, I pre-allocate vectors size. Now, In the Impl class, it does out of range access check in the both cases. I think that is good enough for range checks.

Somehow i can not reply to your comment. That is why i wrote here as seperate comment. Thanks for the dealing with the CI issue!

Mat& ImageCollection::Impl::operator[](int index) {
    if(m_pages.at(index).empty()) {

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 16, 2022

The current iterator api does not implements operator->. Should I?

struct CV_EXPORTS iterator {
     Mat* operator->();
     ...
}

Mat* ImageCollection::iterator::operator->() {
    CV_Assert(m_pCollection);
    return &m_pCollection->getImpl()->operator[](m_curr);
}

So one can use this to do operations on the cache Mat objects. And I realized that my at() and operator[] methods returns copy of the Mat object. I think I should return Mat& to get rid of expensive copy costs in case of return value is reference to Mat.

@asmorkalov , Could you take a look at my last commit and tell me what do you think?

I have checked the std::vector::at() implementation. This also returns reference type.

I also pushed the test that Vadim sent me to see if operator++ is works as expected. Since both of you suspected that it is a bug.

Edit: CI fails does not related with my commits again.

@ocpalo ocpalo force-pushed the multipage_img_decoder branch from 88f0436 to cba8d3f Compare September 16, 2022 17:43
@ocpalo ocpalo requested a review from asmorkalov September 16, 2022 18:35
@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 17, 2022

Further explanation about operator++() advance call. There are 2 variables named m_curr and m_current. Maybe I should rename them.

  • m_curr -> Iterator API
  • m_current -> ImageCollection API

When operator++ is called, either way, we need to increment m_curr to understand that iterator points to which page.
On the other hande, m_current can be modified only if m_current is equal to m_curr. Suppose we created a new ImageCollection and Iterator. So both m_curr and m_current is initialized to 0. If we advance with operator++(), both will increase its internal variables because they are equal. Now both variable are equal to 1. Lets say we create another iterator, iterB, . This iterator m_current is initialized to 0. If we call operator++ on iterator iterB, it will not advance the page (m_current) because it is not equal to iterB's m_curr. It should not because iterB is behind of the first iterator.

So what happens if iterB is tries to decode page 1(index 0) without calling operator++(). In this case, ImageCollection's m_decoder will be reinitialized to points to first page. Then it will advance the pages until iterB's m_current is equal to m_curr. Then it decodes the page. In this way, I keep the m_current determined. Even thought, one iterator goes beyond of the scope, if other iterator tries to decode, the ImageCollection will be reinitialized and will advance until that index.

@asmorkalov
Copy link
Copy Markdown
Contributor

Further explanation about operator++() advance call. There are 2 variables named m_curr and m_current. Maybe I should rename them.

m_curr -> Iterator API
m_current -> ImageCollection API

Rename - it's good idea.

@asmorkalov
Copy link
Copy Markdown
Contributor

asmorkalov commented Sep 19, 2022

it makes sense to return const cv::Mat& in at and operator[] to not poison channels, size, etc on user side.

Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Well done!

@asmorkalov
Copy link
Copy Markdown
Contributor

@ocpalo Please make operator[] and at() return values constant reference and squash commits. I'll merge the patch then.

@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 19, 2022

Ok. Give me an hour or two.

Thank you for the reviews btw.

@ocpalo ocpalo force-pushed the multipage_img_decoder branch from cba8d3f to 062cee2 Compare September 19, 2022 17:27
@ocpalo
Copy link
Copy Markdown
Collaborator Author

ocpalo commented Sep 19, 2022

@asmorkalov I have made the changes and pushed. Tbh, I do not understand why we force users to use const& if they want to reference it.

If we keep operator[] and at() as before, the user can use it like this

cv::Mat& firstPage = collection[0]; // 1
const cv::Mat& cFirstPage = collection[0]; //2

After the changes, the first one(1) would not compile. I understand the reason that it is protection for the user but we should not do that because since he is taking it by reference, it is his reponsibility to deal with this kind of stuff.

@asmorkalov
Copy link
Copy Markdown
Contributor

asmorkalov commented Sep 20, 2022

It's weak protection, but it covers at least:

  • type, shape, channels change.
  • unexpected cv::Mat realloc, if user tries to put that mat as output argument.
    Mat content is still in danger, but out-of-bound access will not happen, if some piece of code relies on saved width and height.

Copy link
Copy Markdown
Contributor

@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@vpisarev vpisarev self-requested a review September 20, 2022 07:00
@vpisarev
Copy link
Copy Markdown
Contributor

@ocpalo, @asmorkalov, thank you for polishing it! I think, it can be merged now

@asmorkalov asmorkalov merged commit 04ebedb into opencv:4.x Sep 21, 2022
This was referenced Oct 3, 2022
@alalek alalek mentioned this pull request Jan 8, 2023
@asmorkalov asmorkalov added this to the 4.7.0 milestone Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants