Skip to content

Conversation

@kaka11chen
Copy link
Contributor

@kaka11chen kaka11chen commented Nov 28, 2023

Proposed changes

Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in #27542.

Regression Test: Test for parquet_gzip_all_types in test_hive_basic_type already exists.

Test result:

  • env: 1 node(16 cores, 64G).
  • parquet column: 100 million rows of char(255) column.
  • result: 9.09 s -> 6.04 s.

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

…4 platforms: 2. Opt gzip decompression by libdeflate lib.
@kaka11chen
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.22 seconds
stream load tsv: 563 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 27 seconds loaded 2358488459 Bytes, about 83 MB/s
stream load orc: 72 seconds loaded 1101869774 Bytes, about 14 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17098163749 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit b3e779667ca0e43019a865208133cfc64ac5be5f, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4934	4673	4688	4673
q2	371	181	158	158
q3	2058	1886	1984	1886
q4	1432	1270	1282	1270
q5	3986	3983	4037	3983
q6	264	139	135	135
q7	1401	853	859	853
q8	2844	2859	2814	2814
q9	9859	9917	9985	9917
q10	3465	3528	3558	3528
q11	389	258	234	234
q12	442	290	301	290
q13	4550	3812	3802	3802
q14	330	294	296	294
q15	596	539	523	523
q16	508	451	452	451
q17	1154	1014	947	947
q18	7953	7414	7496	7414
q19	1685	1664	1703	1664
q20	578	308	301	301
q21	4530	4120	4102	4102
q22	473	377	384	377
Total cold run time: 53802 ms
Total hot run time: 49616 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4613	4594	4577	4577
q2	353	219	250	219
q3	4028	4028	4023	4023
q4	2780	2724	2727	2724
q5	9645	9623	9750	9623
q6	252	129	125	125
q7	2977	2464	2459	2459
q8	4474	4481	4456	4456
q9	13234	13222	13171	13171
q10	4058	4126	4168	4126
q11	770	661	634	634
q12	982	817	809	809
q13	4293	3550	3594	3550
q14	377	343	362	343
q15	578	518	512	512
q16	587	571	571	571
q17	3909	3820	3843	3820
q18	9580	9074	9184	9074
q19	1832	1807	1786	1786
q20	2395	2086	2050	2050
q21	8857	8841	8754	8754
q22	897	821	814	814
Total cold run time: 81471 ms
Total hot run time: 78220 ms

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@morningman morningman merged commit f54db85 into apache:master Nov 28, 2023
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 28, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

kaka11chen added a commit to kaka11chen/doris that referenced this pull request Nov 30, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669)

Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in apache#27542.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Nov 30, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669)

Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in apache#27542.
kaka11chen added a commit to kaka11chen/doris that referenced this pull request Nov 30, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669)

Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in apache#27542.
morningman pushed a commit that referenced this pull request Nov 30, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (#27669) (#27801)

Backport from #27669.
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669) (apache#27801)

Backport from apache#27669.
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Dec 3, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669) (apache#27801)

Backport from apache#27669.
eldenmoon added a commit that referenced this pull request Dec 4, 2023
* [chore](case) Use correct insert stmt for cold heat separation case #27546 (#27585)

Co-authored-by: AlexYue <yj976240184@gmail.com>

* [enhance](S3) Print the error detail for every s3 operation (#27572) (#27615)

* [nereids] fix stats error when using dateTime type filter #27571 (#27577)

* [fix](planner)sort node should materialized required slots for itself #27605 (#27620)

* [fix](Nereids) non-deterministic expression should not be constant (#27606) (#27631)

* [enhancement](stats) Add process for aggstate type #27640 (#27642)

* [Fix](statistics)Fix bug and improve auto analyze. (#27626) (#27657)

1. Implement needReAnalyzeTable for ExternalTable. For now, external table will not be reanalyzed in 10 days.
2. For HiveMetastoreCache.loadPartitions, handle the empty iterator case to avoid Index out of boundary exception.
3. Wrap handle show analyze loop with try catch, so that when one table failed (for example, catalog dropped so the table couldn't be found anymore), we can still show the other tables.
4. For now, only OlapTable and Hive HMSExternalTable support sample analyze, throw exception for other types of table.
5. In StatisticsCollector, call constructJob after createTableLevelTaskForExternalTable to avoid NPE.

* [profile](bugfix) should not cache profile content because the profile may not be a full profile (#27635)

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>

* [Enhance](fe) Support setting initial root password when FE firstly launch (#27438) (#27603)

* [opt](plan) only lock olap table when query plan #27639 (#27656)

bp #27639

* select coordinator node from user's tag when exec streaming load (#27106) (#27677)

* [fix](statistics)Need to recalculate health value when table row count become 0  #27673 (#27674)

backport #27673

* [fix](statistics)Fix sample min max npe bug  #27702 (#27707)

backport #27702

* [Bug](join) try fix wrong _has_null_in_build_side setted (#27684) (#27710)

* [Fix](show-load)Show load npe(userinfo is null) (#27698) (#27719)

* [pick](nereids)temporary partition is always pruned #27636 (#27722)

* [enhancement](stats) limit bq cap size for analyze task #27685 (#27687)

* [improvement](statistics) Add config for the threshold of column count for auto analyze #27713 (#27723)

* [doc](fix) k8s operator docs fix to 2.0 (#27476)

* [Improvement](planner)support select tablets with nereids optimize #23164 #23365 (#27740)

#23164
#23365

* [FIX](complextype)fix complex type hash equals (#27743)

* [fix](statistics) Fix show auto analyze missing jobs bug (#27761)

* [bugfix](topn) fix coredump in copy_column_data_to_block when nullable mismatch

return RuntimeError if copy_column_data_to_block nullable mismatch to avoid coredump in input_col_ptr->filter_by_selector(sel_rowid_idx, select_size, raw_res_ptr) .

The problem is reported by a doris user but I can not reproduce it, so there is no testcase added currently.

* [opt](stats) Use escape rather than base64 for min/max value #27746 (#27748)

* [refactor](http) disable snapshot and get_log_file api (#27724) (#27770)

* [branch-2.0](pick 27738) Warning log to trace send fragment #27738 (#27760)

* [branch-2.0](pick #27771) Add more detail msg for waitRPC exception (#27773)

* [Bug](pipeline) prevent PipelineFragmentContext destruct early (#27790)

* [deps](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 1. Add libdeflate lib.  (#27542) (#27711)

Backport from #27542.

* [FIX](case)fix case truncate table first #27792

* [doc](stats) add auto_analyze_table_width_threshold description. (#27818) (#27832)

* [fix](bdbje) Fix bdbje logging level not work (#27597) (#27788)

* `EnvironmentConfig.FILE_LOGGING_LEVEL` only set FileHandlerLevel, we should
   set logger level firstly, otherwise it will not take effect.

* [Opt](compression) Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate lib. (#27669) (#27801)

Backport from #27669.

* [branch-2.0](fix) Fix broken exception message #27836

* [Bug](func) coredump in equal for null in function (#27843)

* [minor](stats) Update olap table row count after analyze (#27858)

pick from master #27814

* [fix](stats)min and max return NaN when table is empty (#27863)

fix analyze empty table and min/max null value bug:
1. Skip empty analyze task for sample analyze task. (Full analyze task already skipped).
2. Check sample rows is not 0 before calculate the scale factor.
3. Remove ' in sql template after remove base64 encoding for min/max value.

backport #27862

* [minor](stats) Throw error when sync analyze failed (#27846)

pick from master #27845

* [fix](stats) Don't save colToPartitions anymore to save mem (#27880)

pick from master #27879

* [fix](nereids) set operation's result type is wrong if decimal overflows (#27872)

pick from master #27870

* [Config] Modify the default value of tablet_schema_cache_recycle_interval (#27877)

* [fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode(#27842) (#27851)

* [fix](fe) Fix show frontends npt in some situations (#27295) (#27789)

```
java.lang.NullPointerException: null
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getMasterSocket(ReplicationGroupAdmin.java:191)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.doMessageExchange(ReplicationGroupAdmin.java:607)
    at com.sleepycat.je.rep.util.ReplicationGroupAdmin.getGroup(ReplicationGroupAdmin.java:406)
    at org.apache.doris.ha.BDBHA.getElectableNodes(BDBHA.java:132)
    at org.apache.doris.common.proc.FrontendsProcNode.getFrontendsInfo(FrontendsProcNode.java:84)
    at org.apache.doris.qe.ShowExecutor.handleShowFrontends(ShowExecutor.java:1923)
    at org.apache.doris.qe.ShowExecutor.execute(ShowExecutor.java:355)
    at org.apache.doris.qe.StmtExecutor.handleShow(StmtExecutor.java:2113)
    ...
```

* [branch-2.0](fix) Fix extremely high CPU usage caused by rf merge #27894 (#27895)

* [fix](stacktrace) ignore stacktrace for error code INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED (#27898)

* ignore stacktrace for error INVALID_ARGUMENT INVERTED_INDEX_NOT_IMPLEMENTED

* AndBlockColumnPredicate::evaluate

* [opt](nereids) Branch-2.0: remove partition & histogram from col stats to reduce memory usage #27885 (#27896)

* [pick](Nereids) temporary partition is selected only if user manually specified: Branch-2.0 #27893 (#27905)

* [fix](multi-catalog)support the max compute partition prune (#27154) (#27902)

backport #27154

* [fix](Nereids) should not push down project to the nullable side of outer join #27912 (#27913)

* fix compile

---------

Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: xzj7019 <131111794+xzj7019@users.noreply.github.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: AKIRA <33112463+Kikyou1997@users.noreply.github.com>
Co-authored-by: Jibing-Li <64681310+Jibing-Li@users.noreply.github.com>
Co-authored-by: yiguolei <676222867@qq.com>
Co-authored-by: yiguolei <yiguolei@gmail.com>
Co-authored-by: DuRipeng <453243496@qq.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: wangbo <wangbo@apache.org>
Co-authored-by: Pxl <pxl290@qq.com>
Co-authored-by: Calvin Kirs <acm_master@163.com>
Co-authored-by: minghong <englefly@gmail.com>
Co-authored-by: catpineapple <42031973+catpineapple@users.noreply.github.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: Kang <kxiao.tiger@gmail.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
Co-authored-by: Lei Zhang <27994433+SWJTU-ZhangLei@users.noreply.github.com>
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
gnehil pushed a commit to gnehil/doris that referenced this pull request Dec 4, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669) (apache#27801)

Backport from apache#27669.
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…4 platforms: 2. Opt gzip decompression by libdeflate lib. (apache#27669)

Opt gzip decompress by libdeflate on X86 and X86_64 platforms: 2. Opt gzip decompression by libdeflate after adding libdeflate lib in apache#27542.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants