[C++] Take kernel can't handle ChunkedArrays that don't fit in an Array

Take() currently concatenates ChunkedArrays first. However, this breaks down when calling Take() from a ChunkedArray or Table where concatenating the arrays would result in an array that's too large. While inconvenient to implement, it would be useful if this case were handled.

This could be done as a higher-level wrapper around Take(), perhaps.

Example in Python:
```python

>>> import pyarrow as pa
>>> pa.__version__
'1.0.0'
>>> rb1 = pa.RecordBatch.from_arrays([["a" * 2**30]], names=["a"])
>>> rb2 = pa.RecordBatch.from_arrays([["b" * 2**30]], names=["a"])
>>> table = pa.Table.from_batches([rb1, rb2], schema=rb1.schema)
>>> table.take([1, 0])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyarrow/table.pxi", line 1145, in pyarrow.lib.Table.take
  File "/home/lidavidm/Code/twosigma/arrow/venv/lib/python3.8/site-packages/pyarrow/compute.py", line 268, in take
    return call_function('take', [data, indices], options)
  File "pyarrow/_compute.pyx", line 298, in pyarrow._compute.call_function
  File "pyarrow/_compute.pyx", line 192, in pyarrow._compute.Function.call
  File "pyarrow/error.pxi", line 122, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 84, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: offset overflow while concatenating arrays
```

In this example, it would be useful if Take() or a higher-level wrapper could generate multiple record batches as output.

**Reporter**: [Will Jones](https://issues.apache.org/jira/browse/ARROW-9773) / @wjones127
**Assignee**: [Will Jones](https://issues.apache.org/jira/browse/ARROW-9773) / @wjones127

#### Related issues:

- https://github.com/apache/arrow/issues/28385 (is a child of)
- https://github.com/apache/arrow/issues/26738 (duplicate)
- https://github.com/apache/arrow/issues/31249 (duplicate)
- https://github.com/apache/arrow/issues/26738 (duplicate)
- https://github.com/apache/arrow/issues/34583 (duplicate)
- https://github.com/apache/arrow/issues/37766 (relates to)
- https://github.com/apache/arrow/issues/23539 (relates to)
- https://github.com/apache/arrow/issues/33049 (relates to)
- https://github.com/apache/arrow/issues/40207 (relates to)

#### PRs and other links:

- https://github.com/apache/arrow/pull/13857

<sub>**Note**: *This issue was originally created as [ARROW-9773](https://issues.apache.org/jira/browse/ARROW-9773). Please see the [migration documentation](https://github.com/apache/arrow/issues/14542) for further details.*</sub>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[C++] Take kernel can't handle ChunkedArrays that don't fit in an Array #25822

Related issues:

PRs and other links:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[C++] Take kernel can't handle ChunkedArrays that don't fit in an Array #25822

Description

Related issues:

PRs and other links:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions