-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[C++] [Parquet] Use std::count in parquet ColumnReader #39398
Description
Describe the enhancement requested
I've found that for-loop here
arrow/cpp/src/parquet/column_reader.cc
Lines 1055 to 1073 in 7c3480e
| void ReadLevels(int64_t batch_size, int16_t* def_levels, int16_t* rep_levels, | |
| int64_t* num_def_levels, int64_t* values_to_read) { | |
| batch_size = | |
| std::min(batch_size, this->num_buffered_values_ - this->num_decoded_values_); | |
| // If the field is required and non-repeated, there are no definition levels | |
| if (this->max_def_level_ > 0 && def_levels != nullptr) { | |
| *num_def_levels = this->ReadDefinitionLevels(batch_size, def_levels); | |
| // TODO(wesm): this tallying of values-to-decode can be performed with better | |
| // cache-efficiency if fused with the level decoding. | |
| for (int64_t i = 0; i < *num_def_levels; ++i) { | |
| if (def_levels[i] == this->max_def_level_) { | |
| ++(*values_to_read); | |
| } | |
| } | |
| } else { | |
| // Required field, read all values | |
| *values_to_read = batch_size; | |
| } |
transforms into
0xc0c2f0 <ReadLevels()+96> inc %rdx
0xc0c2f3 <ReadLevels()+99> cmp %rax,%rdx
0xc0c2f6 <ReadLevels()+102> jge 0xc0c30c <ReadLevels()+124>
0xc0c2f8 <ReadLevels()+104> cmp %cx,(%r14,%rdx,2)
0xc0c2fd <ReadLevels()+109> jne 0xc0c2f0 <ReadLevels()+96>
0xc0c2ff <ReadLevels()+111> incq 0x0(%rbp)
0xc0c303 <ReadLevels()+115> mov (%rbx),%rax
0xc0c306 <ReadLevels()+118> jmp 0xc0c2f0 <ReadLevels()+96>
That means that it uses iteration element by element and changes reference with incq
I think that the reason is that values_to_read and num_def_levels are not set as restrict. So the compiler can not optimize this to a more efficient way(for example using simd)
On my flamegraph this part showed ~10% of time spent
Component(s)
C++, Parquet