Skip to content

[C++][Parquet] Decoding: allow Boolean RecordReader get raw LSB bitmap #39227

@mapleFU

Description

@mapleFU

Describe the enhancement requested

Plain Boolean Decoding is far more slower than Rle Boolean Decoding. This is because:

  1. PlainBooleanDecoder uses BitReader::GetBatch to decoding Bool
  2. BitReader::GetBatch is optimized for 32bits and 64bits input with unpack32/unpack64
  3. However, when input is bool, the code will fallback to the logic:
  if (sizeof(T) == 4) {
    int num_unpacked =
        internal::unpack32(reinterpret_cast<const uint32_t*>(buffer + byte_offset),
                           reinterpret_cast<uint32_t*>(v + i), batch_size - i, num_bits);
    i += num_unpacked;
    byte_offset += num_unpacked * num_bits / 8;
  } else if (sizeof(T) == 8 && num_bits > 32) {
    // Use unpack64 only if num_bits is larger than 32
    // TODO (ARROW-13677): improve the performance of internal::unpack64
    // and remove the restriction of num_bits
    int num_unpacked =
        internal::unpack64(buffer + byte_offset, reinterpret_cast<uint64_t*>(v + i),
                           batch_size - i, num_bits);
    i += num_unpacked;
    byte_offset += num_unpacked * num_bits / 8;
  } else {
    // TODO: revisit this limit if necessary
    DCHECK_LE(num_bits, 32);
    const int buffer_size = 1024;
    uint32_t unpack_buffer[buffer_size];
    while (i < batch_size) {
      int unpack_size = std::min(buffer_size, batch_size - i);
      int num_unpacked =
          internal::unpack32(reinterpret_cast<const uint32_t*>(buffer + byte_offset),
                             unpack_buffer, unpack_size, num_bits);
      if (num_unpacked == 0) {
        break;
      }
      for (int k = 0; k < num_unpacked; ++k) {
#ifdef _MSC_VER
#pragma warning(push)
#pragma warning(disable : 4800)
#endif
        v[i + k] = static_cast<T>(unpack_buffer[k]);
#ifdef _MSC_VER
#pragma warning(pop)
#endif
      }
      i += num_unpacked;
      byte_offset += num_unpacked * num_bits / 8;
    }
  }

Maybe we can specialize the case with sizeof(T) == 1 to optimize this?

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions