GH-40994: [C++][Parquet] RleBooleanDecoder supports DecodeArrow with nulls#40995
GH-40994: [C++][Parquet] RleBooleanDecoder supports DecodeArrow with nulls#40995pitrou merged 9 commits intoapache:mainfrom
Conversation
|
|
34d3c72 to
8a33bea
Compare
|
benchmark number wouldn't be great, because RleDecoder need to decode to |
|
Well, it seems the benchmarks fail: |
|
Testing on My PC ( Amd 3800X, WSL2, gcc11.4, Release(-O2) ) RLE PLAIN: For all-nulls seems that Rle is faster because it hacks the logic without any checking |
5e31baa to
cb1a842
Compare
|
|
||
| TYPED_TEST(EncodingAdHocTyped, RleArrowDirectPut) { | ||
| // TODO: test with nulls once RleBooleanDecoder::DecodeArrow supports them | ||
| this->null_probability_ = 0; |
There was a problem hiding this comment.
Hmm what does this mean? The case is covered here
|
@github-actions crossbow submit -g cpp |
|
Revision: cb1a842 Submitted crossbow builds: ursacomputing/crossbow @ actions-35133d7bec |
…row into rle-boolean-impl-decode-arrow
cpp/src/parquet/encoding.cc
Outdated
| current_index_in_batch = 0; | ||
| }; | ||
| if (null_count == 0) { | ||
| int sum_decode_count = 0; |
There was a problem hiding this comment.
Why introduce a separate variable? We know that we're going to return the original value of num_values.
There was a problem hiding this comment.
Removed it, it's copy-pasted from origin impl
pitrou
left a comment
There was a problem hiding this comment.
Please simplify the code a bit. There's a bunch of local variables which are mutually redundant.
|
This is a nice improvement, thank you @mapleFU ! |
…oding (#41037) ### Rationale for this change Previous patches ( #40876 , #40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: #41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
… with nulls (apache#40995) ### Rationale for this change Supports DecodeArrow with nulls in RleBooleanDecoder ### What changes are included in this PR? Supports DecodeArrow with nulls in RleBooleanDecoder ### Are these changes tested? Yes ### Are there any user-facing changes? currently not * GitHub Issue: apache#40994 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
|
After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit b99b00d. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them. |
Rationale for this change
Supports DecodeArrow with nulls in RleBooleanDecoder
What changes are included in this PR?
Supports DecodeArrow with nulls in RleBooleanDecoder
Are these changes tested?
Yes
Are there any user-facing changes?
currently not