GH-35718: [Go][Parquet] Fix for null-only encoding panic#39497
GH-35718: [Go][Parquet] Fix for null-only encoding panic#39497zeroshade merged 2 commits intoapache:mainfrom
Conversation
|
|
There was a problem hiding this comment.
Strictly speaking this is not necessary,I just copy the previous test and want to add some simple tests on nested formats.
There was a problem hiding this comment.
It's good to ensure we're testing nulls for multiple types
There was a problem hiding this comment.
Would you mind testing all these encoders?
There was a problem hiding this comment.
( Maybe go/parquet/internal/encoding/encoding_test.go is also a good place for testing?)
There was a problem hiding this comment.
I don't get it, I have already used these encoders testing nulls encoding.
There was a problem hiding this comment.
There is no panic or error happens in the encoding process, panic happens when writing data into a file (call written function on a nil bitWriter not inited).
There was a problem hiding this comment.
In this patch we check DeltaBinaryPack and DeltaByteArray. DeltaLengthByteArray uses DeltaBinaryPack, and DeltaByteArray uses DeltaLengthByteArray. So I think it's ok to testing them
Besides, there're some other encodings: RleBoolean, Plain ..etc. Should we also testing them?
There was a problem hiding this comment.
Ok, I added more encording tests.
|
(I'm afraid https://github.com/apache/arrow/pull/39347/files does the similiar things but less than your contributions in this patch, but it doesn't get merged, I will wait @zeroshade for a while...) |
|
I've rebased this and fixed the conflict after merging my change. Shrinks this down to a fairly small change plus all the new tests. This all seems good to me assuming that all the CI passes. Thanks much @MagicBoost ! |
// EstimatedDataEncodedSize returns the current number of bytes that have
// been buffered so far
func (enc *PlainBooleanEncoder) EstimatedDataEncodedSize() int64 {
return int64(enc.sink.Len() + int(bitutil.BytesForBits(int64(enc.wr.Pos()))))
}@zeroshade This might suffer from problem when |
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit f0879ed. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. |
…e#39497) ### Rationale for this change closes: apache#35718 ### What changes are included in this PR? Fix painc writing with DeltaBinaryPacked or DeltaByteArray when column only has nulls ### Are these changes tested? Yes - add a test writing nulls to columns with DeltaBinaryPacked / DeltaByteArray / DeltaLengthByteArray encodings ### Are there any user-facing changes? No * Closes: apache#35718 Lead-authored-by: yufanmo <yufan.mo@transwarp.io> Co-authored-by: Matt Topol <zotthewizard@gmail.com> Signed-off-by: Matt Topol <zotthewizard@gmail.com>
Rationale for this change
closes: #35718
What changes are included in this PR?
Fix painc writing with DeltaBinaryPacked or DeltaByteArray when column only has nulls
Are these changes tested?
Yes
Are there any user-facing changes?
No