Skip to content

[VL] parquet file metadata columns support in velox#3870

Merged
zhli1142015 merged 13 commits intoapache:mainfrom
gaoyangxiaozhu:gayangya/metadatacolumns
Mar 14, 2024
Merged

[VL] parquet file metadata columns support in velox#3870
zhli1142015 merged 13 commits intoapache:mainfrom
gaoyangxiaozhu:gayangya/metadatacolumns

Conversation

@gaoyangxiaozhu
Copy link
Copy Markdown
Contributor

@gaoyangxiaozhu gaoyangxiaozhu commented Nov 28, 2023

What changes were proposed in this pull request?

Support file metadata column velox native access, it is requirement of Delta native support for Delta Microsoft Team asked.

Fixes: #2618

How was this patch tested?

Via UT and manually tests

image

dependency PR in velox is facebookincubator/velox#8800 and is merged

(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)

@github-actions
Copy link
Copy Markdown

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@FelixYBW FelixYBW changed the title file metadata columns support in velox [VL] file metadata columns support in velox Nov 28, 2023
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

7 similar comments
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 4, 2023

Run Gluten Clickhouse CI

@gaoyangxiaozhu gaoyangxiaozhu force-pushed the gayangya/metadatacolumns branch from d19534a to 43fc8bc Compare December 5, 2023 07:54
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

1 similar comment
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

@gaoyangxiaozhu gaoyangxiaozhu force-pushed the gayangya/metadatacolumns branch from 5753e50 to 79ba0da Compare December 5, 2023 13:01
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 5, 2023

Run Gluten Clickhouse CI

2 similar comments
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

@gaoyangxiaozhu gaoyangxiaozhu force-pushed the gayangya/metadatacolumns branch from 273bee9 to 913d780 Compare December 6, 2023 08:40
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

3 similar comments
@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 6, 2023

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Copy Markdown
Contributor Author

@zhouyuan and @FelixYBW could you help check why centos7-test fail and give some input ? Meanwhile could you help have a draft review to see if current implement for metadata column native support good to you , if the implement ok for you guys , i will sync with meta velox guys for velox part PR review.

@yma11
Copy link
Copy Markdown
Contributor

yma11 commented Dec 19, 2023

@gaoyangxiaozhu Thanks for providing this support. The implementation seems okay. Please do a rebase and go ahead with corresponding support in Velox first.

@github-actions
Copy link
Copy Markdown

This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale stale label Feb 13, 2024
@github-actions
Copy link
Copy Markdown

This PR was auto-closed because it has been stalled for 10 days with no activity. Please feel free to reopen if it is still valid. Thanks.

@github-actions github-actions bot closed this Feb 24, 2024
@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Copy Markdown
Contributor Author

various build issue , can you help re-trigger @yma11 / @zhouyuan / @zhli1142015
image

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Copy Markdown
Contributor Author

gaoyangxiaozhu commented Mar 13, 2024

can we re-trigger again for failed job which all due to below @zhouyuan / @yma11 / @zhli1142015

image

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Copy Markdown
Contributor Author

can we merge the PR ? @zhli1142015 / @yma11 / @zhouyuan

@zhouyuan
Copy link
Copy Markdown
Member

@gaoyangxiaozhu This is in good state to me. Will try with internal delta lake jenkins job
CC @zzcclp as this will also change the API for CK backend

@yma11
Copy link
Copy Markdown
Contributor

yma11 commented Mar 14, 2024

@gaoyangxiaozhu seems there is some conflicts and you need a rebase. Thanks.

@github-actions
Copy link
Copy Markdown

Run Gluten Clickhouse CI

@gaoyangxiaozhu
Copy link
Copy Markdown
Contributor Author

@gaoyangxiaozhu seems there is some conflicts and you need a rebase. Thanks.

done

Copy link
Copy Markdown
Member

@zhouyuan zhouyuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@zhli1142015 zhli1142015 merged commit 1fbd9e6 into apache:main Mar 14, 2024
@GlutenPerfBot
Copy link
Copy Markdown
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3870_time.csv log/native_master_03_13_2024_d7ed0844e_time.csv difference percentage
q1 35.41 38.81 3.399 109.60%
q2 25.71 24.06 -1.644 93.60%
q3 37.01 38.18 1.169 103.16%
q4 40.14 38.49 -1.657 95.87%
q5 68.03 69.71 1.671 102.46%
q6 7.43 7.45 0.026 100.35%
q7 83.15 82.46 -0.695 99.16%
q8 85.00 83.21 -1.785 97.90%
q9 122.38 121.83 -0.549 99.55%
q10 44.07 44.36 0.294 100.67%
q11 19.80 20.90 1.101 105.56%
q12 26.65 28.06 1.413 105.30%
q13 48.44 46.88 -1.561 96.78%
q14 22.24 21.98 -0.259 98.83%
q15 32.04 33.08 1.046 103.26%
q16 14.72 13.84 -0.883 94.00%
q17 100.17 101.80 1.629 101.63%
q18 142.74 141.05 -1.683 98.82%
q19 13.54 15.07 1.529 111.30%
q20 29.49 27.03 -2.464 91.64%
q21 227.01 229.52 2.510 101.11%
q22 15.28 13.86 -1.422 90.69%
total 1240.45 1241.63 1.182 100.10%

taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Mar 25, 2024
[VL]  parquet file metadata columns support in velox.

Co-authored-by: Zhen Li <zhli@microsoft.com>
taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Oct 8, 2024
[VL]  parquet file metadata columns support in velox.

Co-authored-by: Zhen Li <zhli@microsoft.com>
taiyang-li pushed a commit to bigo-sg/gluten that referenced this pull request Oct 9, 2024
[VL]  parquet file metadata columns support in velox.

Co-authored-by: Zhen Li <zhli@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[VL][Spark 3.3+] support return metadataColumns from native scan insteads of fallback

5 participants