-
Notifications
You must be signed in to change notification settings - Fork 70
Add corrupted files in bad_data #48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bad_data/README.md
Outdated
|
|
||
| * PARQUET-1481.parquet: tests a case where a schema Thrift value has been | ||
| corrupted | ||
| * arrow_issue_41321.parquet: test case of https://github.com/apache/arrow/issues/41321 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe try to unify the file naming in this directory? We already have PARQUET-1481.parquet (a JIRA reference) so perhaps something like ARROW-GH-41321.parquet? (related: #57)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me edit it 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
I would merge this in 1 day if no negative comment. 7.1k is a bit large here but it's not too large since generate a file like this is also hard. |
|
Merged! Thanks all! |
Those 2 files triggered libparquet c++ issues apache/arrow#41317 and apache/arrow#41321 . They have been generated through a local run of oss-fuzz on synthetic test data of the GDAL regression test suite, and can be licensed under Apache-2