GH-40872: [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder#40876
Conversation
|
|
4099c1b to
962dcfa
Compare
962dcfa to
b03728b
Compare
cpp/src/parquet/encoding.cc
Outdated
| } else { | ||
| int64_t previous_offset = 0; | ||
| int64_t previous_value_offset = 0; | ||
| PARQUET_THROW_NOT_OK(::arrow::internal::VisitSetBitRuns( |
There was a problem hiding this comment.
If the data is interleaving "null, non-null, null, non-null, ...". Don't know that would performance be
There was a problem hiding this comment.
I guess the best performance might getting a BitRun scanner here
There was a problem hiding this comment.
We can start with this. Are there benchmarks for this case?
There was a problem hiding this comment.
I've pasted this in the pr, PTAL
There was a problem hiding this comment.
The benchmarks in this PR always have null_probability = 0, right?
There was a problem hiding this comment.
Yes. Aha previously all null_probability is 0...
|
Before: After: |
| @@ -1214,18 +1234,15 @@ inline int PlainBooleanDecoder::DecodeArrow( | |||
|
|
|||
| int PlainBooleanDecoder::Decode(uint8_t* buffer, int max_values) { | |||
f8eaf1f to
47d6a1b
Compare
|
After: Before: |
| } | ||
| }; | ||
|
|
||
| class BM_DecodeArrowBooleanRle : public BenchmarkDecodeArrowBoolean { |
There was a problem hiding this comment.
You don't need this, you just need some member variable with Encoding::PLAIN or Encoding::RLE, right?
|
After merging your PR, Conbench analyzed the 7 benchmarking runs that have been run so far on merge-commit a4acb64. There was 1 benchmark result with an error:
There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 7 possible false positives for unstable benchmarks that are known to sometimes produce them. |
…oding (#41037) ### Rationale for this change Previous patches ( #40876 , #40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: #41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
…bitmap) for PlainBooleanDecoder (apache#40876) ### Rationale for this change This is for enhance boolean decoding. I optimized the `DecodeArrow` for PlainBoolean ### What changes are included in this PR? Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder, and add benchmarks ### Are these changes tested? Yes ### Are there any user-facing changes? Minor optimization. And `Decode` boolean will change the syntax * GitHub Issue: apache#40872 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: Antoine Pitrou <antoine@python.org> Signed-off-by: mwish <maplewish117@gmail.com>
…ow decoding (apache#41037) ### Rationale for this change Previous patches ( apache#40876 , apache#40995 ) introduce some bugs in boolean decoding. This patch add tests for DecodeArrow and fix the bug ### What changes are included in this PR? 1. Add tests for DecodeArrow with nulls and without nulls 2. Fix 2 bugs ### Are these changes tested? Yes ### Are there any user-facing changes? no * GitHub Issue: apache#41032 Lead-authored-by: mwish <maplewish117@gmail.com> Co-authored-by: mwish <anmmscs_maple@qq.com> Signed-off-by: Antoine Pitrou <antoine@python.org>
Rationale for this change
This is for enhance boolean decoding. I optimized the
DecodeArrowfor PlainBooleanWhat changes are included in this PR?
Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder, and add benchmarks
Are these changes tested?
Yes
Are there any user-facing changes?
Minor optimization. And
Decodeboolean will change the syntax