-
Notifications
You must be signed in to change notification settings - Fork 4.1k
[Python] Pretty printing very large ChunkedArray objects can use unbounded memory #20692
Copy link
Copy link
Closed as not planned
Closed as not planned
Copy link
Labels
Component: PythonStatus: stale-warningIssues and PRs flagged as stale which are due to be closed if no indication otherwiseIssues and PRs flagged as stale which are due to be closed if no indication otherwiseType: enhancement
Description
In working on ARROW-2970, I have the following dataset:
values = [b'x'] + [
b'x' * (1 << 20)
] * 2 * (1 << 10)
arr = np.array(values)
arrow_arr = pa.array(arr)The object arrow_arr has 129 chunks, each element of which is 1MB of binary. The repr for this object is over 600MB:
In [10]: rep = repr(arrow_arr)
In [11]: len(rep)
Out[11]: 637536258There's probably a number of failsafes we can implement to avoid badness in these pathological cases (which may not happen often, but given the kinds of bug reports we are seeing, people do have datasets that look like this)
Reporter: Wes McKinney / @wesm
Related issues:
- PrettyPrint Improvements (is a child of)
Note: This issue was originally created as ARROW-4099. Please see the migration documentation for further details.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Component: PythonStatus: stale-warningIssues and PRs flagged as stale which are due to be closed if no indication otherwiseIssues and PRs flagged as stale which are due to be closed if no indication otherwiseType: enhancement