Skip to content

GH-33976: [Python] Refactor the internal usage of ExecPlan to use new Acero Declaration/ExecNodeOptions bindings#34401

Merged
jorisvandenbossche merged 6 commits intoapache:mainfrom
jorisvandenbossche:gh-33976-refactor
Mar 28, 2023
Merged

GH-33976: [Python] Refactor the internal usage of ExecPlan to use new Acero Declaration/ExecNodeOptions bindings#34401
jorisvandenbossche merged 6 commits intoapache:mainfrom
jorisvandenbossche:gh-33976-refactor

Conversation

@jorisvandenbossche
Copy link
Copy Markdown
Member

@jorisvandenbossche jorisvandenbossche commented Mar 1, 2023

This PR refactors our current custom cython implementation of the Table/Dataset.filter/join/group_by/sort_by methods to use the new bindings for Declaration/ExecNodeOptions (#34102).

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 1, 2023

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Mar 3, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Mar 3, 2023
@jorisvandenbossche jorisvandenbossche changed the title GH-33976: [Python] Refactor the internal usage of ExecPlan to use new Declaration/ExecNodeOptions bindings GH-33976: [Python] Refactor the internal usage of ExecPlan to use new Acero Declaration/ExecNodeOptions bindings Mar 22, 2023
@jorisvandenbossche jorisvandenbossche marked this pull request as ready for review March 22, 2023 14:00
@jorisvandenbossche
Copy link
Copy Markdown
Member Author

jorisvandenbossche commented Mar 22, 2023

With the last commit, I renamed _exec_plan.pyx to _exec_plan.py, because it no longer contains any cython. That makes github see it as a deleted and new file, though. If you want to see the diff within the file itself, select all but the last commit: https://github.com/apache/arrow/pull/34401/files/b2e85efc7318658bafd8b30baf28ab897a177718

with pytest.raises(pa.ArrowTypeError):
ep._filter_table(
t, pc.divide(pc.field("a"), pc.scalar(2)),
output_type=pa.Table
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we only ever used output_type=pa.Table option for this keyword in _filter_table, I removed the keyword and simplified the tests

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Mar 28, 2023
Copy link
Copy Markdown
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see nothing obvious missing. The code is hugely optimised and much clearer now so the maintenance of it will benefit big time 👍 Thanks!

@jorisvandenbossche jorisvandenbossche merged commit 0ead719 into apache:main Mar 28, 2023
@jorisvandenbossche jorisvandenbossche deleted the gh-33976-refactor branch March 28, 2023 11:42
@ursabot
Copy link
Copy Markdown

ursabot commented Mar 28, 2023

Benchmark runs are scheduled for baseline = c74540f and contender = 0ead719. 0ead719 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.12% ⬆️0.06%] test-mac-arm
[Finished ⬇️0.26% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.25% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Finished] 0ead719b ec2-t3-xlarge-us-east-2
[Finished] 0ead719b test-mac-arm
[Finished] 0ead719b ursa-i9-9960x
[Finished] 0ead719b ursa-thinkcentre-m75q
[Finished] c74540f2 ec2-t3-xlarge-us-east-2
[Finished] c74540f2 test-mac-arm
[Finished] c74540f2 ursa-i9-9960x
[Finished] c74540f2 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

ArgusLi pushed a commit to Bit-Quill/arrow that referenced this pull request May 15, 2023
…se new Acero Declaration/ExecNodeOptions bindings (apache#34401)

This PR refactors our current custom cython implementation of the Table/Dataset.filter/join/group_by/sort_by methods to use the new bindings for Declaration/ExecNodeOptions (apache#34102).

* Issue: apache#33976

Authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Signed-off-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants