Skip to content

[C++] [Parquet] Use std::count in parquet ColumnReader #39398

@Hattonuri

Description

@Hattonuri

Describe the enhancement requested

I've found that for-loop here

void ReadLevels(int64_t batch_size, int16_t* def_levels, int16_t* rep_levels,
int64_t* num_def_levels, int64_t* values_to_read) {
batch_size =
std::min(batch_size, this->num_buffered_values_ - this->num_decoded_values_);
// If the field is required and non-repeated, there are no definition levels
if (this->max_def_level_ > 0 && def_levels != nullptr) {
*num_def_levels = this->ReadDefinitionLevels(batch_size, def_levels);
// TODO(wesm): this tallying of values-to-decode can be performed with better
// cache-efficiency if fused with the level decoding.
for (int64_t i = 0; i < *num_def_levels; ++i) {
if (def_levels[i] == this->max_def_level_) {
++(*values_to_read);
}
}
} else {
// Required field, read all values
*values_to_read = batch_size;
}

transforms into

0xc0c2f0 <ReadLevels()+96> inc %rdx
0xc0c2f3 <ReadLevels()+99> cmp %rax,%rdx
0xc0c2f6 <ReadLevels()+102> jge 0xc0c30c <ReadLevels()+124>
0xc0c2f8 <ReadLevels()+104> cmp %cx,(%r14,%rdx,2)
0xc0c2fd <ReadLevels()+109> jne 0xc0c2f0 <ReadLevels()+96>
0xc0c2ff <ReadLevels()+111> incq 0x0(%rbp)
0xc0c303 <ReadLevels()+115> mov (%rbx),%rax
0xc0c306 <ReadLevels()+118> jmp 0xc0c2f0 <ReadLevels()+96>

That means that it uses iteration element by element and changes reference with incq
I think that the reason is that values_to_read and num_def_levels are not set as restrict. So the compiler can not optimize this to a more efficient way(for example using simd)

On my flamegraph this part showed ~10% of time spent

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions