Skip to content

trying to fix handling file storages with extremely long lines#16488

Merged
alalek merged 4 commits intoopencv:masterfrom
vpisarev:filestorage_longlines
Feb 11, 2020
Merged

trying to fix handling file storages with extremely long lines#16488
alalek merged 4 commits intoopencv:masterfrom
vpisarev:filestorage_longlines

Conversation

@vpisarev
Copy link
Copy Markdown
Contributor

@vpisarev vpisarev commented Feb 3, 2020

resolves #11061

@vpisarev vpisarev changed the title trying to fix handling of file storages with extremely long lines trying to fix handling file storages with extremely long lines Feb 3, 2020
@vpisarev vpisarev requested a review from alalek February 3, 2020 15:14
Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea looks good, but it is too hard to validate other parts of persistence code.

}
return ptr;
int count = (int)std::min(buffer.size() - ofs - 16, maxCount);
char* ptr = getsFromFile( &buffer[ofs], count+1 );
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

- 16
count + 1

What is this? Why are these offsets different?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fgets() requires "count+1", because it adds '\0' in the end. but sometimes we need to add '\n' to the end of line. so we might need at least count + 2. "+16" has been added just in case, it's definitely sufficient

}
#endif
CV_Error(CV_StsError, "The storage is not opened");
return ofs > 0 ? &buffer[0] : 0;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On .resize() all previously returned pointers from .gets() become invalid.

Are you sure that all previously returned pointers are not used anymore? (after next .gets() call)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. I'm sure. when ".gets()" is called by the XML/YAML/JSON parser, it returns the pointer to the read line. The parsers process the line and then call ".gets()" another time to get the next line and so on. Of course, they do not store pointers to the middle of the previously read line, because they consume all the content. ".resize()" can be safely called inside ".gets()".

@vpisarev
Copy link
Copy Markdown
Contributor Author

vpisarev commented Feb 5, 2020

@alalek, the idea is obvious, and the patch is not that big. persistence functions are tested extensively by the test system itself. I suggest to integrate the patch now. 99% sure that it works correctly. If we find any problem with it, we can always revert the patch, because it's very local, touches just one file (+ adds the test that can be disabled).

* it's now crash-test time! temporarily set the initial parser buffer size to just 40 bytes. let's run all the test and check if the buffer is always correctly resized and handled
}

if( !ptr || !*ptr )
CV_PARSE_ERROR_CPP( "Unexpected End-Of-File" );
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe ptr should be reassigned above instead:

ptr = parseMap(ptr, ...);
...
ptr = parseSeq(ptr, ...);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe, those 2 lines just left out by mistake after refactoring. If ptr is reassigned, all the tests fail. Simply remove them.

Copy link
Copy Markdown
Member

@alalek alalek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

const size_t buf_size0 = 1 << 20;
size_t buf_size = buf_size0;
const size_t buf_size0 = 40;
buffer.resize(buf_size0);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we left this "as is" for merge?

@vpisarev
Copy link
Copy Markdown
Contributor Author

vpisarev commented Feb 7, 2020

@alalek, please, do not merge it! I think, I found some problems with this implementation. Wait a bit.

@alalek alalek merged commit 3efa783 into opencv:master Feb 11, 2020
@vpisarev vpisarev deleted the filestorage_longlines branch May 28, 2020 18:21
a-sajjad72 pushed a commit to a-sajjad72/opencv that referenced this pull request Mar 30, 2023
trying to fix handling file storages with extremely long lines

* trying to fix handling of file storages with extremely long lines: opencv#11061

* * fixed errorneous pointer access in JSON parser.
* it's now crash-test time! temporarily set the initial parser buffer size to just 40 bytes. let's run all the test and check if the buffer is always correctly resized and handled

* fixed pointer use in JSON parser; added the proper test to catch this case

* fixed the test to make it more challenging. generate test json with
*
**
***
etc. shape
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Large JSON can't be loaded while smaller ones can

2 participants