[SPARK-44486][PYTHON][CONNECT] Implement PyArrow `self_destruct` feature for `toPandas` by xinrong-meng · Pull Request #42079 · apache/spark

xinrong-meng · 2023-07-20T00:10:04Z

What changes were proposed in this pull request?

Implement Arrow self_destruct of toPandas for memory savings.

Now the Spark configuration spark.sql.execution.arrow.pyspark.selfDestruct.enabled can be used to enable PyArrow’s self_destruct feature in Spark Connect, which can save memory when creating a Pandas DataFrame via toPandas by freeing Arrow-allocated memory while building the Pandas DataFrame.

Why are the changes needed?

Reach parity with vanilla PySpark. The PR is a mirror of #29818 for Spark Connect.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

xinrong-meng · 2023-07-21T17:31:13Z

cc @BryanCutler

xinrong-meng · 2023-07-25T00:08:17Z

Failed Run / Run Spark on Kubernetes Integration test, which is irrelevant to the PR.

HyukjinKwon · 2023-07-25T00:43:42Z

Merged to master and branch-3.5.

…ure for `toPandas` ### What changes were proposed in this pull request? Implement Arrow `self_destruct` of `toPandas` for memory savings. Now the Spark configuration `spark.sql.execution.arrow.pyspark.selfDestruct.enabled` can be used to enable PyArrow’s `self_destruct` feature in Spark Connect, which can save memory when creating a Pandas DataFrame via `toPandas` by freeing Arrow-allocated memory while building the Pandas DataFrame. ### Why are the changes needed? Reach parity with vanilla PySpark. The PR is a mirror of #29818 for Spark Connect. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. Closes #42079 from xinrong-meng/self_destruct. Authored-by: Xinrong Meng <xinrong@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 78b3345) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

self_destruct

c29a552

github-actions bot added SQL PYTHON CONNECT labels Jul 20, 2023

xinrong-meng added 3 commits July 20, 2023 10:45

lint

17de200

fix + test

14c28e1

lint

bd2873e

xinrong-meng changed the title ~~[WIP][SPARK-44486][PYTHON][CONNECT] Implement PyArrow self_destruct feature for toPandas~~ [SPARK-44486][PYTHON][CONNECT] Implement PyArrow self_destruct feature for toPandas Jul 20, 2023

del batches

dd1656a

xinrong-meng marked this pull request as ready for review July 21, 2023 00:18

HyukjinKwon approved these changes Jul 21, 2023

View reviewed changes

lint, comment

8524359

xinrong-meng added 2 commits July 21, 2023 15:44

lint

193659e

lint

9dde114

HyukjinKwon closed this in 78b3345 Jul 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-44486][PYTHON][CONNECT] Implement PyArrow `self_destruct` feature for `toPandas`#42079

[SPARK-44486][PYTHON][CONNECT] Implement PyArrow `self_destruct` feature for `toPandas`#42079
xinrong-meng wants to merge 8 commits intoapache:masterfrom
xinrong-meng:self_destruct

xinrong-meng commented Jul 20, 2023 •

edited

Loading

Uh oh!

xinrong-meng commented Jul 21, 2023

Uh oh!

xinrong-meng commented Jul 25, 2023

Uh oh!

HyukjinKwon commented Jul 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xinrong-meng commented Jul 20, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

xinrong-meng commented Jul 21, 2023

Uh oh!

xinrong-meng commented Jul 25, 2023

Uh oh!

HyukjinKwon commented Jul 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xinrong-meng commented Jul 20, 2023 •

edited

Loading