-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[C++][Parquet] Decoding: allow Boolean RecordReader get raw LSB bitmap #39227
Copy link
Copy link
Open
Labels
Component: C++Component: ParquetStatus: stale-warningIssues and PRs flagged as stale which are due to be closed if no indication otherwiseIssues and PRs flagged as stale which are due to be closed if no indication otherwiseType: enhancementgood-first-issue
Description
Describe the enhancement requested
Plain Boolean Decoding is far more slower than Rle Boolean Decoding. This is because:
PlainBooleanDecoderusesBitReader::GetBatchto decoding BoolBitReader::GetBatchis optimized for 32bits and 64bits input with unpack32/unpack64- However, when input is
bool, the code will fallback to the logic:
if (sizeof(T) == 4) {
int num_unpacked =
internal::unpack32(reinterpret_cast<const uint32_t*>(buffer + byte_offset),
reinterpret_cast<uint32_t*>(v + i), batch_size - i, num_bits);
i += num_unpacked;
byte_offset += num_unpacked * num_bits / 8;
} else if (sizeof(T) == 8 && num_bits > 32) {
// Use unpack64 only if num_bits is larger than 32
// TODO (ARROW-13677): improve the performance of internal::unpack64
// and remove the restriction of num_bits
int num_unpacked =
internal::unpack64(buffer + byte_offset, reinterpret_cast<uint64_t*>(v + i),
batch_size - i, num_bits);
i += num_unpacked;
byte_offset += num_unpacked * num_bits / 8;
} else {
// TODO: revisit this limit if necessary
DCHECK_LE(num_bits, 32);
const int buffer_size = 1024;
uint32_t unpack_buffer[buffer_size];
while (i < batch_size) {
int unpack_size = std::min(buffer_size, batch_size - i);
int num_unpacked =
internal::unpack32(reinterpret_cast<const uint32_t*>(buffer + byte_offset),
unpack_buffer, unpack_size, num_bits);
if (num_unpacked == 0) {
break;
}
for (int k = 0; k < num_unpacked; ++k) {
#ifdef _MSC_VER
#pragma warning(push)
#pragma warning(disable : 4800)
#endif
v[i + k] = static_cast<T>(unpack_buffer[k]);
#ifdef _MSC_VER
#pragma warning(pop)
#endif
}
i += num_unpacked;
byte_offset += num_unpacked * num_bits / 8;
}
}Maybe we can specialize the case with sizeof(T) == 1 to optimize this?
Component(s)
C++, Parquet
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Component: C++Component: ParquetStatus: stale-warningIssues and PRs flagged as stale which are due to be closed if no indication otherwiseIssues and PRs flagged as stale which are due to be closed if no indication otherwiseType: enhancementgood-first-issue