[GLUTEN-6067][VL] [Part 3-1] Refactor: Rename VeloxColumnarWriteFilesExec to ColumnarWriteFilesExec#6403
Merged
baibaichen merged 3 commits intoapache:mainfrom Jul 19, 2024
Merged
Conversation
|
Run Gluten Clickhouse CI |
429bdb9 to
4b24d7f
Compare
|
Run Gluten Clickhouse CI |
4b24d7f to
ea41482
Compare
|
Run Gluten Clickhouse CI |
2 similar comments
|
Run Gluten Clickhouse CI |
|
Run Gluten Clickhouse CI |
37ff1bf to
5ba29ac
Compare
|
Run Gluten Clickhouse CI |
5ba29ac to
f03e47e
Compare
|
Run Gluten Clickhouse CI |
f03e47e to
4837d46
Compare
|
Run Gluten Clickhouse CI |
4837d46 to
04ca20e
Compare
|
Run Gluten Clickhouse CI |
04ca20e to
181d613
Compare
|
Run Gluten Clickhouse CI |
181d613 to
14cf511
Compare
|
Run Gluten Clickhouse CI |
14cf511 to
c475114
Compare
|
Run Gluten Clickhouse CI |
JkSelf
reviewed
Jul 19, 2024
| * plan, and support Spark file commit protocol. | ||
| */ | ||
| class VeloxColumnarWriteFilesRDD( | ||
| class GlutenColumnarWriteFilesRDD( |
Contributor
There was a problem hiding this comment.
After moving VeloxColumnarWriteFilesExec from backend-velox to gluten-core, can we update the class names by renaming GlutenColumnarWriteFilesExec to ColumnarWriteFilesExec and GlutenColumnarWriteFilesRDD to ColumnarWriteFilesRDD?
…nd move it to gluten-core 1. Return GlutenColumnarWriteFilesExec at SparkPlanExecApi 2. Move SparkWriteFilesCommitProtocol to gluten-core 3. SparkWriteFilesCommitProtocol support getFilename from internal commiter 4. Remove supportTransformWriteFiles from BackendSettingsApi 5. injectWriteFilesTempPath with fileName
…tenColumnarWriteFilesRDD to ColumnarWriteFilesRDD
c475114 to
cd7cdd0
Compare
|
Run Gluten Clickhouse CI |
zzcclp
approved these changes
Jul 19, 2024
17 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
(Fixes: #6067)
This PR Refactors Velox side code, rename
VeloxColumnarWriteFilesExectoGlutenColumnarWriteFilesExec, move it to gluten-core, so that Clickhouse backend can use the same SparkPlan in the followup PR.By supporting spark 3.4, Velox supports whole stage native write pipeline which is better than old implementation, clickhouse backend also adopt such implementation.
Major change 1
The only major difference between velox and clichouse is how to parse native metrics. which I introduce a new trait called
BackendWrite, it only has one member now. Once native write pipeline is compeleted, we get it byBackendsApiManager.getSparkPlanExecApiInstance.createBackendWrite, Please seeVeloxBackendWritefor detailsMinor change 2
The other minor diffierence is clickhose backend doesn't generate filename. To compute filename per task, it uses
HadoopMapReduceCommitProtocol::getFilename, and then injects them to backend. This is ok because Velox doesn't supportmaxRecordsPerFile, see #4329 and clickhouse backend also follow this, which means one task only produce one file, no need more injections.Improve
I also pass File Format to backed.
How was this patch tested?
Uisng Existed UTs.