Skip to content

[C++] Compute: Why some ScalarAggregator handles scalar ExecSpan like single value, other like groups value? #43768

@mapleFU

Description

@mapleFU

Describe the enhancement requested

struct CountImpl : public ScalarAggregator {
  Status Consume(KernelContext*, const ExecSpan& batch) override {
    if (options.mode == CountOptions::ALL) {
      this->non_nulls += batch.length;
    } else if (batch[0].is_array()) {
      const ArraySpan& input = batch[0].array;
      const int64_t nulls = input.GetNullCount();
      this->nulls += nulls;
      this->non_nulls += input.length - nulls;
    } else {
      const Scalar& input = *batch[0].scalar;
      this->nulls += !input.is_valid * batch.length;
      this->non_nulls += input.is_valid * batch.length;
    }
    return Status::OK();
  }
};

CountImpl regards scalar as a batch.length groups of value, however, other kernels did different:

struct BooleanAnyImpl : public ScalarAggregator {
  explicit BooleanAnyImpl(ScalarAggregateOptions options) : options(std::move(options)) {}

  Status Consume(KernelContext*, const ExecSpan& batch) override {
    // short-circuit if seen a True already
    if (this->any == true && this->count >= options.min_count) {
      return Status::OK();
    }
    if (batch[0].is_scalar()) {
      const Scalar& scalar = *batch[0].scalar;
      this->has_nulls = !scalar.is_valid;
      this->any = scalar.is_valid && checked_cast<const BooleanScalar&>(scalar).value;
      this->count += scalar.is_valid;
      return Status::OK();
    }
    const ArraySpan& data = batch[0].array;
    this->has_nulls = data.GetNullCount() > 0;
    this->count += data.length - data.GetNullCount();
    arrow::internal::OptionalBinaryBitBlockCounter counter(
        data.buffers[0].data, data.offset, data.buffers[1].data, data.offset,
        data.length);
    int64_t position = 0;
    while (position < data.length) {
      const auto block = counter.NextAndBlock();
      if (block.popcount > 0) {
        this->any = true;
        break;
      }
      position += block.length;
    }
    return Status::OK();
  }

This takes it as a single value.

Is this designed? Did I miss something?

Component(s)

C++

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions