ARROW-5889: [C++][Parquet] Add property to indicate origin from converted type to TimestampLogicalType#4856
ARROW-5889: [C++][Parquet] Add property to indicate origin from converted type to TimestampLogicalType#4856tpboudreau wants to merge 1 commit intoapache:masterfrom tpboudreau:ARROW-5889
Conversation
jorisvandenbossche
left a comment
There was a problem hiding this comment.
This looks good to me. Also tested it locally and can confirm this now properly reads older files in pyarrow.
|
Thanks @jorisvandenbossche for testing it out. When you say it works on older files, did you use the files attached to the Jira issue by the reporter? I've been considering adding a test using those as fixtures. Thanks again. |
|
Yes, I checked with those files (as well as something I wrote myself with an older version). I certainly think we should add such tests reading older files (I just opened https://issues.apache.org/jira/browse/ARROW-5915 for that). |
|
Great. Thanks for testing the reporter's files. Thanks also for opening that issue for compatibility testing -- I generally agree with you -- I'll comment further on that Jira issue when I get the chance. |
|
@tpboudreau this needs to be rebased now, let me know if you need help |
|
Rebased and RFR (I'll monitor the TravisCI build after it kicks off). |
Codecov Report
@@ Coverage Diff @@
## master #4856 +/- ##
===========================================
- Coverage 87.08% 75.68% -11.4%
===========================================
Files 883 58 -825
Lines 116941 3735 -113206
Branches 1418 0 -1418
===========================================
- Hits 101837 2827 -99010
+ Misses 14742 908 -13834
+ Partials 362 0 -362Continue to review full report at Codecov.
|
cpp/src/parquet/types.h
Outdated
There was a problem hiding this comment.
This isn't true anymore -- I'll remove this comment and the is_serialized method
There was a problem hiding this comment.
Sorry, GitHub was hiding files from me so this is indeed what is going on in this patch. Will take a closer look
wesm
left a comment
There was a problem hiding this comment.
+1. I see that the is_serialized override just allows us to round-trip the metadata faithfully. If a file is read to Arrow, then written back, then the logical type will be written (I think)
…rted type to TimestampLogicalType This patch: - adds a new boolean member isFromConvertedType() to the TimestampLogicalType class that is set to "true" if the LogicalType was created from a converted type of TIMESTAMP_MILLIS or TIMESTAMP_MICROS - prevents writing the TimestampLogicalType in the Parquet schema if this new property is true - changes the Arrow reader to ignore the isAdjustedToUTC() property of the TimestampLogicalType if the type annotation came from a converted type Author: TP Boudreau <tpboudreau@gmail.com> Closes #4856 from tpboudreau/ARROW-5889 and squashes the following commits: 458f063 <TP Boudreau> Add property showing converted type origin to TimestampLogicalType
This patch: