Skip to content

[Python] Ensure _exec_plan.execplan preserves order of inputs #31880

@asfimport

Description

@asfimport

At the moment execplan doesn't guarantee any ordered output, the batches are consumed in a random order. This can lead to unordered rows in outputs when use_threads=True

For example providing a column with b=[a, a, a, a, b, b, b, b] will sometimes give back b=[a, b] and sometimes b=[b, a]

See

In [18]: table1 = pa.table({'a': [1, 2, 3, 4], 'b': ['a'] * 4})

In [19]: table2 = pa.table({'a': [1, 2, 3, 4], 'b': ['b'] * 4})

In [20]: table = pa.concat_tables([table1, table2])

In [21]: ep._filter_table(table, pc.field('a') == 1)
Out[21]: 
pyarrow.Table
a: int64
b: string
----
a: [[1],[1]]
b: [["b"],["a"]]

In [22]: ep._filter_table(table, pc.field('a') == 1)
Out[22]: 
pyarrow.Table
a: int64
b: string
----
a: [[1],[1]]
b: [["a"],["b"]] 

Reporter: Alessandro Molina / @amol-
Assignee: Alessandro Molina / @amol-

Note: This issue was originally created as ARROW-16518. Please see the migration documentation for further details.

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions