Skip to content

[GLUTEN-6067][CH] [Part 2] Support CH backend with Spark3.5 - Prepare for supporting sink transform#6197

Merged
zzcclp merged 6 commits intoapache:mainfrom
baibaichen:feature/spark35parquet
Jun 24, 2024
Merged

[GLUTEN-6067][CH] [Part 2] Support CH backend with Spark3.5 - Prepare for supporting sink transform#6197
zzcclp merged 6 commits intoapache:mainfrom
baibaichen:feature/spark35parquet

Conversation

@baibaichen
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

We refactor the codes for the following purpose:

  1. Calling SparkShim::genExtendedColumnarPostRules, so we can fallback write to vanilla spark in case of spark 3.5
  2. Add NativeWriteCheck,Make UT failed in spark 3.5
  3. Refactor LocalExecutor, moving pipeline building to SerializedPlanParser, because we can not add sink transform in this class.
  4. Reducing duplcate codes, i.e. CHIteratorApi and SubstraitPlanPrinterUtil

(Fixes: #6067)

How was this patch tested?

Existed UTs

@github-actions
Copy link
Copy Markdown

#6067

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@baibaichen baibaichen changed the title [GLUTEN-6067][CH] [Part 1] Support Native Writer for Spark 3.5 [GLUTEN-6067][CH] [Part 2] Support CH backend with Spark3.5 - Prepare for supporting sink transform Jun 24, 2024
Copy link
Copy Markdown
Contributor

@zzcclp zzcclp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zzcclp zzcclp merged commit 1fbdbc4 into apache:main Jun 24, 2024
@GlutenPerfBot
Copy link
Copy Markdown
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_6197_time.csv log/native_master_06_24_2024_f07e348f4_time.csv difference percentage
q1 34.29 35.39 1.108 103.23%
q2 26.88 23.65 -3.237 87.96%
q3 38.57 40.35 1.776 104.60%
q4 35.48 32.68 -2.803 92.10%
q5 71.43 70.64 -0.786 98.90%
q6 6.31 9.08 2.765 143.83%
q7 85.33 80.58 -4.748 94.44%
q8 89.87 87.90 -1.967 97.81%
q9 119.46 125.50 6.045 105.06%
q10 45.54 48.85 3.312 107.27%
q11 23.56 20.45 -3.109 86.80%
q12 23.44 26.51 3.068 113.09%
q13 38.96 38.63 -0.335 99.14%
q14 19.73 22.31 2.579 113.07%
q15 31.23 31.83 0.608 101.95%
q16 14.67 14.13 -0.546 96.28%
q17 106.57 103.74 -2.832 97.34%
q18 147.02 144.47 -2.550 98.27%
q19 13.81 13.92 0.115 100.83%
q20 27.30 29.16 1.864 106.83%
q21 264.07 264.29 0.221 100.08%
q22 16.54 12.24 -4.306 73.97%
total 1280.06 1276.30 -3.759 99.71%

@baibaichen baibaichen deleted the feature/spark35parquet branch June 25, 2024 01:20
deepashreeraghu pushed a commit to deepashreeraghu/incubator-gluten that referenced this pull request Jun 26, 2024
… for supporting sink transform (apache#6197)

[CH] [Part 2] Support CH backend with Spark3.5 - Prepare for supporting sink transform

* [Refactor] remove duplicate codes

* Add NativeWriteChecker

* [Prepare to commit] getExtendedColumnarPostRules from Spark shim
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CH] Support CH backend with Spark 3.5.x

3 participants