Skip to content

[SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0#40555

Closed
wangyum wants to merge 2 commits intoapache:masterfrom
wangyum:SPARK-42926
Closed

[SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0#40555
wangyum wants to merge 2 commits intoapache:masterfrom
wangyum:SPARK-42926

Conversation

@wangyum
Copy link
Member

@wangyum wangyum commented Mar 27, 2023

What changes were proposed in this pull request?

This PR upgrades Apache Parquet to 1.13.0. Apache Parquet 1.13.0 release notes.

Why are the changes needed?

  1. This release includes PARQUET-2160. So we no longer need SPARK-41952.
  2. This release includes Java Vector API support.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing unit test and benchmark test.

TPC-DS benchmark result:

Query Parquet 1.13.0(first time) Parquet 1.12.3(first time) Parquet 1.13.0(second time) Parquet 1.12.3(second time) Parquet 1.13.0(third time) Parquet 1.12.3(third time)
q1.sql 37.819 37.786 36.322 37.59 37.772 36.776
q2.sql 42.132 41.513 43.189 42.274 42.859 42.605
q3.sql 5.933 6.1 6.082 6.071 6.128 6.094
q4.sql 335.051 319.173 322.396 320.977 324.464 326.822
q5.sql 78.41 76.631 76.841 76.37 78.257 76.502
q6.sql 9.006 9.11 8.737 8.577 8.729 9.05
q7.sql 12.881 12.731 12.685 12.662 12.606 12.675
q8.sql 10.122 10.092 10.035 10.853 10.277 10.841
q9.sql 72.562 71.942 73.649 73.04 72.899 72.01
q10.sql 14.127 13.075 14.276 13.913 13.281 13.229
q11.sql 111.334 111.612 110.952 110.776 111.686 112.27
q12.sql 3.138 3.854 3.187 3.613 3.437 3.306
q13.sql 13.131 12.676 12.516 12.417 12.739 12.987
q14a.sql 217.664 213.632 214.655 213.333 217.601 213.341
q14b.sql 191.553 182.775 184.35 187.004 188.313 189.876
q15.sql 10.308 10.46 10.304 9.901 10.175 10.307
q16.sql 81.97 82.059 82.41 81.263 83.179 82.042
q17.sql 28.876 28.905 30.41 29.573 29.555 28.837
q18.sql 14.183 13.929 14.11 14.466 13.969 14.022
q19.sql 6.611 7.593 6.652 6.659 6.446 6.533
q20.sql 3.263 3.701 3.56 3.503 3.53 3.627
q21.sql 2.252 2.188 2.249 2.128 2.161 2.252
q22.sql 14.809 14.715 14.324 14.266 14.567 14.123
q23a.sql 554.385 544.75 546.213 542.194 553.784 547.388
q23b.sql 781.236 768.367 770.584 776.065 776.502 776.006
q24a.sql 196.806 193.989 197.608 194.416 194.71 192.817
q24b.sql 176.56 183.084 177.486 177.936 177.776 177.389
q25.sql 22.323 22.089 22.665 22.049 22.248 22.317
q26.sql 8.574 8.356 8.174 8.753 8.186 8.302
q27.sql 9.056 8.252 8.37 8.319 8.516 8.38
q28.sql 102.185 102.382 102.344 103.058 102.024 102.786
q29.sql 75.655 75.604 75.217 75.532 75.835 76.024
q30.sql 12.476 12.966 13.039 14.108 12.19 13.143
q31.sql 26.343 27.632 26.337 26.791 26.74 26.098
q32.sql 3.251 3.41 3.378 3.333 3.371 3.516
q33.sql 7.143 6.125 6.85 6.718 7.067 6.615
q34.sql 8.53 8.656 8.536 8.866 8.358 8.589
q35.sql 35.212 35.571 35.659 37.631 36.292 35.603
q36.sql 9.264 9.166 9.748 9.488 9.45 9.469
q37.sql 36.368 35.881 37.023 36.578 35.823 36.7
q38.sql 74.58 73.472 72.926 73.823 71.097 73.329
q39a.sql 8.596 7.637 8.036 7.984 7.849 7.88
q39b.sql 7.233 6.641 6.278 7.06 6.595 6.691
q40.sql 17.34 16.558 16.448 16.864 16.432 16.413
q41.sql 1.223 1.105 1.103 1.182 1.232 1.304
q42.sql 2.464 2.441 2.554 2.544 2.314 2.393
q43.sql 7.477 7.396 7.394 7.764 7.381 7.534
q44.sql 30.228 30.516 30.859 31.057 30.372 29.008
q45.sql 9.93 10.089 9.874 10.075 9.802 9.838
q46.sql 9.544 9.949 9.503 9.755 9.395 9.25
q47.sql 27.322 26.952 26.974 26.83 27.087 26.991
q48.sql 14.266 14.39 14.517 14.684 14.471 14.61
q49.sql 21.279 21.733 20.286 20.945 22.388 21.52
q50.sql 191.416 194.256 196.701 194.113 193.354 191.004
q51.sql 37.552 37.767 38.317 37.731 37.369 38.187
q52.sql 2.206 2.406 2.235 2.362 2.337 2.278
q53.sql 5.282 5.131 5.465 5.137 5.142 5.069
q54.sql 13.039 12.655 13.047 12.382 12.992 12.988
q55.sql 2.534 2.39 2.375 2.867 2.623 2.546
q56.sql 7.365 7.087 6.902 7.406 7.586 7.081
q57.sql 18.064 17.945 18.699 17.664 18.362 18.222
q58.sql 6.198 6.702 6.109 6.211 5.9 6.101
q59.sql 28.266 28.195 27.876 28.748 29.027 28.543
q60.sql 6.847 7.143 7.322 7.1 7.207 7.215
q61.sql 7.258 7.62 7.317 7.781 7.616 7.669
q62.sql 10.334 11.523 10.389 10.378 10.072 10.583
q63.sql 4.631 4.944 4.947 5.124 4.61 4.865
q64.sql 249.694 252.117 254.359 254.813 253.236 250.401
q65.sql 78.742 79.184 78.559 78.305 78.985 78.515
q66.sql 14.98 14.854 14.794 14.767 14.781 14.696
q67.sql 1019.744 1048.439 987.894 972.062 927.566 1002.206
q68.sql 8.903 8.915 8.277 8.709 9.349 9.178
q69.sql 13.097 13.01 14.352 12.036 12.302 12.843
q70.sql 21.175 21.085 21.102 20.471 20.129 19.678
q71.sql 15.13 15.526 14.929 15.231 15.406 15.487
q72.sql 76.463 75.851 72.002 72.356 72.676 74.798
q73.sql 5.894 6.09 5.877 6.051 6.365 6.634
q74.sql 99.106 99.356 100.291 99.51 96.766 97.292
q75.sql 126.625 128.094 127.364 128.575 127.418 125.806
q76.sql 35.172 33.601 34.752 34.764 34.228 35.748
q77.sql 8.394 8.01 7.951 8.061 7.839 8.348
q78.sql 289.061 287.508 283.615 288.768 288.448 288.661
q79.sql 10.048 9.251 9.396 9.81 8.607 8.341
q80.sql 59.68 59.458 60.234 60.415 61.325 60.744
q81.sql 17.822 18.815 18.488 18.95 17.911 18.113
q82.sql 64.781 63.957 63.621 64.38 63.637 64.488
q83.sql 4.686 4.922 4.635 4.827 4.678 5.071
q84.sql 10.987 10.629 10.841 11.151 10.646 10.6
q85.sql 12.689 13.304 13.362 13.19 13.779 12.657
q86.sql 6.48 6.491 6.722 6.667 6.833 6.52
q87.sql 77.589 77.377 77.177 77.011 78.339 78.399
q88.sql 83.876 83.676 84.044 83.761 84.201 84.089
q89.sql 6.741 6.564 6.755 6.708 6.704 6.794
q90.sql 7.79 7.812 7.882 7.88 7.875 7.854
q91.sql 4.072 3.728 3.883 3.976 4.151 4.035
q92.sql 3.05 3.155 3.336 3.067 2.942 3.099
q93.sql 356.412 360.731 358.14 356 356.108 358.011
q94.sql 43.202 43.561 44.63 44.486 43.993 42.693
q95.sql 197.185 199.657 193.975 195.843 201.801 196.113
q96.sql 12.765 12.481 12.682 12.799 12.528 12.505
q97.sql 82.895 82.067 81.754 82.799 81.788 81.572
q98.sql 7.338 7.066 7.133 7.005 7.254 7.047
q99.sql 18.431 17.874 17.826 17.861 17.705 17.878
total 7105.675 7091.391 7030.209 7021.7 6992.413 7047.295

@wangyum wangyum marked this pull request as draft March 27, 2023 03:10
@wangyum wangyum changed the title [WIP][SPARK-42926][BUIILD][SQL] Upgrade Parquet to 1.12.4 [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.12.4 Mar 27, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks okay and CI also passed. Seems we just need to wait for it to be published.

@dongjoon-hyun
Copy link
Member

Shall we close this PR because Apache Parquet 1.12.4 vote seems to fail due to the technical issue.

For the record, the release manage creates it from the wrong branch.

- <version>1.13.0-SNAPSHOT</version>
+ <version>1.12.4</version>

@wangyum wangyum closed this Apr 1, 2023
@wangyum wangyum reopened this Apr 2, 2023
@wangyum wangyum changed the title [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.12.4 [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0 Apr 2, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for updating. Since v1.13.0 is different from the maintenance release, v1.12.4, I believe we need to focus on new features and testing to make it sure if there is no sideeffects.

@wangyum wangyum marked this pull request as ready for review April 6, 2023 08:32
@dongjoon-hyun
Copy link
Member

Thank you for updating.

@wangyum
Copy link
Member Author

wangyum commented Apr 10, 2023

TPC-DS benchmark result:

Query Parquet 1.13.0(first time) Parquet 1.12.3(first time) Parquet 1.13.0(second time) Parquet 1.12.3(second time) Parquet 1.13.0(third time) Parquet 1.12.3(third time)
q1.sql 37.819 37.786 36.322 37.59 37.772 36.776
q2.sql 42.132 41.513 43.189 42.274 42.859 42.605
q3.sql 5.933 6.1 6.082 6.071 6.128 6.094
q4.sql 335.051 319.173 322.396 320.977 324.464 326.822
q5.sql 78.41 76.631 76.841 76.37 78.257 76.502
q6.sql 9.006 9.11 8.737 8.577 8.729 9.05
q7.sql 12.881 12.731 12.685 12.662 12.606 12.675
q8.sql 10.122 10.092 10.035 10.853 10.277 10.841
q9.sql 72.562 71.942 73.649 73.04 72.899 72.01
q10.sql 14.127 13.075 14.276 13.913 13.281 13.229
q11.sql 111.334 111.612 110.952 110.776 111.686 112.27
q12.sql 3.138 3.854 3.187 3.613 3.437 3.306
q13.sql 13.131 12.676 12.516 12.417 12.739 12.987
q14a.sql 217.664 213.632 214.655 213.333 217.601 213.341
q14b.sql 191.553 182.775 184.35 187.004 188.313 189.876
q15.sql 10.308 10.46 10.304 9.901 10.175 10.307
q16.sql 81.97 82.059 82.41 81.263 83.179 82.042
q17.sql 28.876 28.905 30.41 29.573 29.555 28.837
q18.sql 14.183 13.929 14.11 14.466 13.969 14.022
q19.sql 6.611 7.593 6.652 6.659 6.446 6.533
q20.sql 3.263 3.701 3.56 3.503 3.53 3.627
q21.sql 2.252 2.188 2.249 2.128 2.161 2.252
q22.sql 14.809 14.715 14.324 14.266 14.567 14.123
q23a.sql 554.385 544.75 546.213 542.194 553.784 547.388
q23b.sql 781.236 768.367 770.584 776.065 776.502 776.006
q24a.sql 196.806 193.989 197.608 194.416 194.71 192.817
q24b.sql 176.56 183.084 177.486 177.936 177.776 177.389
q25.sql 22.323 22.089 22.665 22.049 22.248 22.317
q26.sql 8.574 8.356 8.174 8.753 8.186 8.302
q27.sql 9.056 8.252 8.37 8.319 8.516 8.38
q28.sql 102.185 102.382 102.344 103.058 102.024 102.786
q29.sql 75.655 75.604 75.217 75.532 75.835 76.024
q30.sql 12.476 12.966 13.039 14.108 12.19 13.143
q31.sql 26.343 27.632 26.337 26.791 26.74 26.098
q32.sql 3.251 3.41 3.378 3.333 3.371 3.516
q33.sql 7.143 6.125 6.85 6.718 7.067 6.615
q34.sql 8.53 8.656 8.536 8.866 8.358 8.589
q35.sql 35.212 35.571 35.659 37.631 36.292 35.603
q36.sql 9.264 9.166 9.748 9.488 9.45 9.469
q37.sql 36.368 35.881 37.023 36.578 35.823 36.7
q38.sql 74.58 73.472 72.926 73.823 71.097 73.329
q39a.sql 8.596 7.637 8.036 7.984 7.849 7.88
q39b.sql 7.233 6.641 6.278 7.06 6.595 6.691
q40.sql 17.34 16.558 16.448 16.864 16.432 16.413
q41.sql 1.223 1.105 1.103 1.182 1.232 1.304
q42.sql 2.464 2.441 2.554 2.544 2.314 2.393
q43.sql 7.477 7.396 7.394 7.764 7.381 7.534
q44.sql 30.228 30.516 30.859 31.057 30.372 29.008
q45.sql 9.93 10.089 9.874 10.075 9.802 9.838
q46.sql 9.544 9.949 9.503 9.755 9.395 9.25
q47.sql 27.322 26.952 26.974 26.83 27.087 26.991
q48.sql 14.266 14.39 14.517 14.684 14.471 14.61
q49.sql 21.279 21.733 20.286 20.945 22.388 21.52
q50.sql 191.416 194.256 196.701 194.113 193.354 191.004
q51.sql 37.552 37.767 38.317 37.731 37.369 38.187
q52.sql 2.206 2.406 2.235 2.362 2.337 2.278
q53.sql 5.282 5.131 5.465 5.137 5.142 5.069
q54.sql 13.039 12.655 13.047 12.382 12.992 12.988
q55.sql 2.534 2.39 2.375 2.867 2.623 2.546
q56.sql 7.365 7.087 6.902 7.406 7.586 7.081
q57.sql 18.064 17.945 18.699 17.664 18.362 18.222
q58.sql 6.198 6.702 6.109 6.211 5.9 6.101
q59.sql 28.266 28.195 27.876 28.748 29.027 28.543
q60.sql 6.847 7.143 7.322 7.1 7.207 7.215
q61.sql 7.258 7.62 7.317 7.781 7.616 7.669
q62.sql 10.334 11.523 10.389 10.378 10.072 10.583
q63.sql 4.631 4.944 4.947 5.124 4.61 4.865
q64.sql 249.694 252.117 254.359 254.813 253.236 250.401
q65.sql 78.742 79.184 78.559 78.305 78.985 78.515
q66.sql 14.98 14.854 14.794 14.767 14.781 14.696
q67.sql 1019.744 1048.439 987.894 972.062 927.566 1002.206
q68.sql 8.903 8.915 8.277 8.709 9.349 9.178
q69.sql 13.097 13.01 14.352 12.036 12.302 12.843
q70.sql 21.175 21.085 21.102 20.471 20.129 19.678
q71.sql 15.13 15.526 14.929 15.231 15.406 15.487
q72.sql 76.463 75.851 72.002 72.356 72.676 74.798
q73.sql 5.894 6.09 5.877 6.051 6.365 6.634
q74.sql 99.106 99.356 100.291 99.51 96.766 97.292
q75.sql 126.625 128.094 127.364 128.575 127.418 125.806
q76.sql 35.172 33.601 34.752 34.764 34.228 35.748
q77.sql 8.394 8.01 7.951 8.061 7.839 8.348
q78.sql 289.061 287.508 283.615 288.768 288.448 288.661
q79.sql 10.048 9.251 9.396 9.81 8.607 8.341
q80.sql 59.68 59.458 60.234 60.415 61.325 60.744
q81.sql 17.822 18.815 18.488 18.95 17.911 18.113
q82.sql 64.781 63.957 63.621 64.38 63.637 64.488
q83.sql 4.686 4.922 4.635 4.827 4.678 5.071
q84.sql 10.987 10.629 10.841 11.151 10.646 10.6
q85.sql 12.689 13.304 13.362 13.19 13.779 12.657
q86.sql 6.48 6.491 6.722 6.667 6.833 6.52
q87.sql 77.589 77.377 77.177 77.011 78.339 78.399
q88.sql 83.876 83.676 84.044 83.761 84.201 84.089
q89.sql 6.741 6.564 6.755 6.708 6.704 6.794
q90.sql 7.79 7.812 7.882 7.88 7.875 7.854
q91.sql 4.072 3.728 3.883 3.976 4.151 4.035
q92.sql 3.05 3.155 3.336 3.067 2.942 3.099
q93.sql 356.412 360.731 358.14 356 356.108 358.011
q94.sql 43.202 43.561 44.63 44.486 43.993 42.693
q95.sql 197.185 199.657 193.975 195.843 201.801 196.113
q96.sql 12.765 12.481 12.682 12.799 12.528 12.505
q97.sql 82.895 82.067 81.754 82.799 81.788 81.572
q98.sql 7.338 7.066 7.133 7.005 7.254 7.047
q99.sql 18.431 17.874 17.826 17.861 17.705 17.878
total 7105.675 7091.391 7030.209 7021.7 6992.413 7047.295

@dongjoon-hyun
Copy link
Member

Thank you for sharing!

@dongjoon-hyun
Copy link
Member

dongjoon-hyun commented Apr 10, 2023

As the first glance, there is no noticeable significant perf difference (in both directions: speedup or regression). What is your opinion, @wangyum ?

@wangyum
Copy link
Member Author

wangyum commented Apr 10, 2023

@dongjoon-hyun Yes. It's no noticeable significant perf difference.

@dongjoon-hyun
Copy link
Member

Thank you for the confirmation.

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM for Apache Spark 3.5.0 because we have enough time to test and prepare.

@dongjoon-hyun
Copy link
Member

BTW, if you mind, please revise the PR description.

  1. Removing Maybe it can improve read performance. from the PR description.
  2. Coping [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0 #40555 (comment) to the PR descrition.

@wangyum
Copy link
Member Author

wangyum commented Apr 10, 2023

BTW, if you mind, please revise the PR description.

  1. Removing Maybe it can improve read performance. from the PR description.
  2. Coping [SPARK-42926][BUILD][SQL] Upgrade Parquet to 1.13.0 #40555 (comment) to the PR descrition.

OK.

@wangyum wangyum closed this in 59ba09a Apr 15, 2023
@wangyum
Copy link
Member Author

wangyum commented Apr 15, 2023

Merged to master.

@wangyum wangyum deleted the SPARK-42926 branch April 15, 2023 01:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants