-
Notifications
You must be signed in to change notification settings - Fork 4.1k
ColumnReader ReadBatch ignores definition levels when they weren't requested to output #39381
Copy link
Copy link
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
I write optional ints and read them using parquet::Int64Reader using
reader->ReadBatch(kBatchSizeOne, nullptr, nullptr, &tmp, &values_read)
(like it is in parquet::StreamReader)
And this method always returns 1(and shifts it's internal value index) as the result of values_read. But it should return 0 for nulls and 1 for non-empty values.
When i change nullptr to any pointer-to-value(and ignore result) everything works fine
I assume that the issue is here
arrow/cpp/src/parquet/column_reader.cc
Lines 1061 to 1073 in cf44793
| if (this->max_def_level_ > 0 && def_levels != nullptr) { | |
| *num_def_levels = this->ReadDefinitionLevels(batch_size, def_levels); | |
| // TODO(wesm): this tallying of values-to-decode can be performed with better | |
| // cache-efficiency if fused with the level decoding. | |
| for (int64_t i = 0; i < *num_def_levels; ++i) { | |
| if (def_levels[i] == this->max_def_level_) { | |
| ++(*values_to_read); | |
| } | |
| } | |
| } else { | |
| // Required field, read all values | |
| *values_to_read = batch_size; | |
| } |
Which is used in ReadBatch
Because if def_levels == nullptr that does not mean that the field is required
Component(s)
C++, Parquet
Reactions are currently unavailable