Skip to content

[C++] Avoid slicing array inputs in ExecBatchIterator that would result in one slice #31921

@asfimport

Description

@asfimport

For scalar functions, ExecBatchIterator is used to iterate over batches in smaller units. It is implemented by calling {}Array::slice(){}. For small batches, this is unecessary, since only one slice is created. The slice operation still causes some overhead by copying the shrared_ptrs of the ArrayData object, inclung the type pointer, which can lead to contention (ARROW-16161).

This Patch checks if the batch size is smaller than the slice size first, and uses std::move in this case.

I have attached a comparision of the ExecuteScalarExpressionOverhead benchmark here: avoid-slicing-performance.txt 

(created with --benchmark_min_time=20, the standard low runtime tends to be noisy with this, but also shows a positive tendency)

Reporter: Tobias Zagorni / @zagto
Assignee: Tobias Zagorni / @zagto

Related issues:

Original Issue Attachments:

PRs and other links:

Note: This issue was originally created as ARROW-16562. Please see the migration documentation for further details.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions