GH-43427: [C++][Parquet] Deprecate ColumnChunk::file_offset field and no longer write Metadata at end of Chunk#43428
Conversation
|
|
|
Also cc @emkornfield @etseidl |
|
We should not write arrow/cpp/src/parquet/column_writer.cc Line 670 in 62ee676 |
cpp/src/parquet/column_writer.cc
Outdated
| // Write metadata at end of column chunk | ||
| metadata_->WriteTo(sink_.get()); | ||
|
|
||
| // Not write metadata at end of column chunk since we will |
There was a problem hiding this comment.
I'd rather removing these lines to not confuse readers.
cpp/src/parquet/column_writer.cc
Outdated
|
|
||
| // Write metadata at end of column chunk | ||
| metadata_->WriteTo(in_memory_sink_.get()); | ||
| // Not write metadata at end of column chunk since we will |
cpp/src/parquet/metadata.cc
Outdated
| // https://github.com/apache/parquet-format/pull/440 | ||
| // The `file_offset` field is deprecated and should be set to 0 for writer | ||
| // if the column chunk has not been written outsidethe footer. |
There was a problem hiding this comment.
| // https://github.com/apache/parquet-format/pull/440 | |
| // The `file_offset` field is deprecated and should be set to 0 for writer | |
| // if the column chunk has not been written outsidethe footer. | |
| // The `file_offset` field is deprecated and should be set to 0. | |
| // See https://github.com/apache/parquet-format/pull/440 for detail. |
cpp/src/parquet/metadata.h
Outdated
| // column chunk | ||
| // Byte offset of `ColumnMetaData` in `file_path()`. | ||
| // | ||
| // Note that the meaning of this field has been inconsistent between implementations |
There was a problem hiding this comment.
| // Note that the meaning of this field has been inconsistent between implementations | |
| // Note that the meaning of this field has been inconsistent among implementations |
cpp/src/parquet/metadata.h
Outdated
| // Byte offset of `ColumnMetaData` in `file_path()`. | ||
| // | ||
| // Note that the meaning of this field has been inconsistent between implementations | ||
| // so its use has since been deprecated in the Parquet specification. Moder |
There was a problem hiding this comment.
| // so its use has since been deprecated in the Parquet specification. Moder | |
| // so its use has since been deprecated in the Parquet specification. Modern |
cpp/src/parquet/metadata.cc
Outdated
| } | ||
| // https://github.com/apache/parquet-format/pull/440 | ||
| // The `file_offset` field is deprecated and should be set to 0 for writer | ||
| // if the column chunk has not been written outsidethe footer. |
There was a problem hiding this comment.
| // if the column chunk has not been written outsidethe footer. | |
| // if the column chunk has not been written outside the footer. |
|
Would fix this later: |
|
Failed CI is unrelated. Will merge if no more negative comment in August 6th |
|
Merge because collect 1 approve |
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 9b58454. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 31 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
See github issue
What changes are included in this PR?
Force
ColumnChunk::file_offsettobe 0Are these changes tested?
No
Are there any user-facing changes?
Might affect user using legacy
ColumnChunk::file_offset