Skip to content

Conversation

@xinyiZzz
Copy link
Contributor

Proposed changes

==3671003==ERROR: AddressSanitizer: heap-use-after-free on address 0x60f000d71e68 at pc 0x557a3c21c907 bp 0x7fcbc5ddfe30 sp 0x7fcbc5ddfe28
READ of size 4 at 0x60f000d71e68 thread T39 (memory_gc_threa)
    #0 0x557a3c21c906 in doris::MemTrackerLimiter::make_snapshot() const \ndoris_branch-2.0/doris/be/src/runtime/memory/mem_tracker_limiter.cpp:114:33
    #1 0x557a3c21f0d5 in doris::MemTrackerLimiter::make_top_consumption_snapshots(std::vector<doris::MemTracker::Snapshot, std::allocator<doris::MemTracker::Snapshot>>*, int) \ndoris_branch-2.0/doris/be/src/runtime/memory/mem_tracker_limiter.cpp:202:44
    #2 0x557a3c22258b in doris::MemTrackerLimiter::log_process_usage_str[abi:cxx11]() \ndoris_branch-2.0/doris/be/src/runtime/memory/mem_tracker_limiter.cpp:261:5
    #3 0x557a3c223035 in doris::MemTrackerLimiter::print_log_process_usage() \ndoris_branch-2.0/doris/be/src/runtime/memory/mem_tracker_limiter.cpp:284:25
    #4 0x557a39400209 in doris::Daemon::memory_gc_thread() \ndoris_branch-2.0/doris/be/src/common/daemon.cpp:264:13
    #5 0x557a394081fb in doris::Daemon::start()::$_2::operator()() const \ndoris_branch-2.0/doris/be/src/common/daemon.cpp:459:60
    #6 0x557a394081a6 in void std::__invoke_impl<void, doris::Daemon::start()::$_2&>(std::__invoke_other, doris::Daemon::start()::$_2&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61:14
    #7 0x557a39408128 in std::enable_if<is_invocable_r_v<void, doris::Daemon::start()::$_2&>, void>::type std::__invoke_r<void, doris::Daemon::start()::$_2&>(doris::Daemon::start()::$_2&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:111:2
    #8 0x557a39407f5e in std::_Function_handler<void (), doris::Daemon::start()::$_2>::_M_invoke(std::_Any_data const&) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291:9
    #9 0x557a394f89d6 in std::function<void ()>::operator()() const /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560:9
    #10 0x557a3cbe998f in doris::Thread::supervise_thread(void*) \ndoris_branch-2.0/doris/be/src/util/thread.cpp:498:5
    #11 0x7fcc8739b608 in start_thread /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:477:8
    #12 0x7fcc87648132 in __clone /build/glibc-SzIz7B/glibc-2.31/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

0x60f000d71e68 is located 120 bytes inside of 168-byte region [0x60f000d71df0,0x60f000d71e98)
freed by thread T1213 (TaskWorkerPool.) here:
    #0 0x557a3930fd9d in operator delete(void*) (/mnt/hdd01/STRESS_ENV/be/lib/doris_be+0x191f0d9d) (BuildId: cd0297ed13795481)
    #1 0x557a397fc76e in __gnu_cxx::new_allocator<std::_Sp_counted_ptr_inplace<doris::MemTrackerLimiter, std::allocator<doris::MemTrackerLimiter>, (__gnu_cxx::_Lock_policy)2>>::deallocate(std::_Sp_counted_ptr_inplace<doris::MemTrackerLimiter, std::allocator<doris::MemTrackerLimiter>, (__gnu_cxx::_Lock_policy)2>*, unsigned long) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/ext/new_allocator.h:139:2
    #2 0x557a397fc730 in std::allocator<std::_Sp_counted_ptr_inplace<doris::MemTrackerLimiter, std::allocator<doris::MemTrackerLimiter>, (__gnu_cxx::_Lock_policy)2>>::deallocate(std::_Sp_counted_ptr_inplace<doris::MemTrackerLimiter, std::allocator<doris::MemTrackerLimiter>, (__gnu_cxx::_Lock_policy)2>*, unsigned long) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/allocator.h:187:27
    #3 0x557a397fc730 in std::allocator_traits<std::allocator<std::_Sp_counted_ptr_inplace<doris::MemTrackerLimiter, std::allocator<doris::MemTrackerLimiter>, (__gnu_cxx::_Lock_policy)2>>>::deallocate(std::allocator<std::_Sp_counted_ptr_inplace<doris::MemTrackerLimiter, std::allocator<doris::MemTrackerLimiter>, (__gnu_cxx::_Lock_policy)2>>&, std::_Sp_counted_ptr_inplace<doris::MemTrackerLimiter, std::allocator<doris::MemTrackerLimiter>, (__gnu_cxx::_Lock_policy)2>*, unsigned long) /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:492:13
    #4 0x557a397fbe27 in std::__allocated_ptr<std::allocator<std::_Sp_counted_ptr_inplace<doris::MemTrackerLimiter, std::allocator<doris::MemTrackerLimiter>, (__gnu_cxx::_Lock_policy)2>>>::~__allocated_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/allocated_ptr.h:73:4
    #5 0x557a397fc34e in std::_Sp_counted_ptr_inplace<doris::MemTrackerLimiter, std::allocator<doris::MemTrackerLimiter>, (__gnu_cxx::_Lock_policy)2>::_M_destroy() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:538:7
    #6 0x557a3932d6ac in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:184:10
    #7 0x557a3932d149 in std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:702:11
    #8 0x557a3941727a in std::__shared_ptr<doris::MemTrackerLimiter, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:1149:31
    #9 0x557a39416dd4 in std::shared_ptr<doris::MemTrackerLimiter>::~shared_ptr() /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr.h:122:11
    #10 0x557a3b8808ba in doris::EngineAlterTabletTask::~EngineAlterTabletTask() \ndoris_branch-2.0/doris/be/src/olap/task/engine_alter_tablet_task.h:37:47
    #11 0x557a3b85da83 in doris::AlterTableTaskPool::_alter_tablet(doris::TAgentTaskRequest const&, long, doris::TTaskType::type, doris::TFinishTaskRequest*) \ndoris_branch-2.0/doris/be/src/agent/task_worker_pool.cpp:1768:5

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

@xinyiZzz
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.55% (8449/23115)
Line Coverage: 28.85% (68672/237990)
Region Coverage: 27.82% (35523/127673)
Branch Coverage: 24.56% (18118/73770)
Coverage Report: http://coverage.selectdb-in.cc/coverage/37b4ab4301a72ba20c82c8f41141a0896e8dfb92_37b4ab4301a72ba20c82c8f41141a0896e8dfb92/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 37b4ab4301a72ba20c82c8f41141a0896e8dfb92, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4925	4669	4633	4633
q2	363	152	158	152
q3	2047	1944	1875	1875
q4	1377	1238	1237	1237
q5	4007	3972	4055	3972
q6	251	132	136	132
q7	1409	874	911	874
q8	2773	2778	2770	2770
q9	9623	9681	9679	9679
q10	3448	3530	3505	3505
q11	383	249	245	245
q12	435	287	303	287
q13	4569	3813	3806	3806
q14	308	297	287	287
q15	589	534	531	531
q16	668	582	592	582
q17	1141	951	932	932
q18	7934	7552	7488	7488
q19	1680	1686	1682	1682
q20	582	326	316	316
q21	4434	4010	3999	3999
q22	471	376	378	376
Total cold run time: 53417 ms
Total hot run time: 49360 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4590	4558	4560	4558
q2	324	236	252	236
q3	4031	3995	4002	3995
q4	2708	2706	2734	2706
q5	9772	9720	9831	9720
q6	246	125	126	125
q7	3036	2515	2480	2480
q8	4413	4429	4439	4429
q9	12985	12881	12834	12834
q10	4080	4160	4172	4160
q11	787	669	710	669
q12	978	804	811	804
q13	4280	3589	3561	3561
q14	375	340	356	340
q15	585	522	525	522
q16	743	689	675	675
q17	3831	3908	3846	3846
q18	9589	9188	8944	8944
q19	1822	1786	1804	1786
q20	2396	2085	2043	2043
q21	8942	8696	8572	8572
q22	860	812	780	780
Total cold run time: 81373 ms
Total hot run time: 77785 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.46 seconds
stream load tsv: 566 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17098698998 Bytes

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 23, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@xinyiZzz
Copy link
Contributor Author

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.56% (8449/23107)
Line Coverage: 28.86% (68667/237949)
Region Coverage: 27.83% (35521/127635)
Branch Coverage: 24.56% (18118/73766)
Coverage Report: http://coverage.selectdb-in.cc/coverage/9701e240367e91b1a194a6c4d7df6ae38b8d5cf6_9701e240367e91b1a194a6c4d7df6ae38b8d5cf6/report/index.html

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.59 seconds
stream load tsv: 565 seconds loaded 74807831229 Bytes, about 126 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 64 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.6 seconds inserted 10000000 Rows, about 349K ops/s
storage size: 17098895418 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 9701e240367e91b1a194a6c4d7df6ae38b8d5cf6, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4928	4618	4600	4600
q2	360	143	136	136
q3	2015	1868	1920	1868
q4	1375	1236	1227	1227
q5	3968	3917	4029	3917
q6	252	129	133	129
q7	1422	884	891	884
q8	2788	2803	2787	2787
q9	9859	9392	9521	9392
q10	3458	3520	3534	3520
q11	386	258	256	256
q12	441	288	293	288
q13	4558	3840	3810	3810
q14	316	275	291	275
q15	580	529	525	525
q16	658	586	578	578
q17	1159	965	944	944
q18	7829	7425	7413	7413
q19	1677	1699	1699	1699
q20	566	320	301	301
q21	4469	3988	4024	3988
q22	477	373	373	373
Total cold run time: 53541 ms
Total hot run time: 48910 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4583	4609	4602	4602
q2	349	228	252	228
q3	4034	4014	4009	4009
q4	2704	2700	2716	2700
q5	9637	9549	9602	9549
q6	250	124	125	124
q7	3026	2499	2494	2494
q8	4451	4451	4446	4446
q9	12961	12822	12924	12822
q10	4089	4175	4160	4160
q11	772	638	707	638
q12	972	812	818	812
q13	4280	3562	3604	3562
q14	386	355	347	347
q15	577	518	523	518
q16	753	668	676	668
q17	3817	3858	3841	3841
q18	9677	9203	9068	9068
q19	1832	1784	1805	1784
q20	2394	2067	2055	2055
q21	8987	8527	8450	8450
q22	931	801	784	784
Total cold run time: 81462 ms
Total hot run time: 77661 ms

@yiguolei yiguolei merged commit 4fc638c into apache:master Nov 23, 2023
xinyiZzz added a commit to xinyiZzz/incubator-doris that referenced this pull request Nov 23, 2023
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Nov 27, 2023
eldenmoon added a commit that referenced this pull request Nov 27, 2023
* [fix](stats) Fix update rows for unique table didn't get updated properly #26968 (#27337)

* [FIX](jsonb) fix jsonb in predict column #27325 (#27424)

* [fix](fe) slots in having clause should be set to need materialized(#27412) (#27429)

* [Bug](insert)fix insert wrong data on mv when stmt have multiple values (#27297) (#27382)

fix insert wrong data on mv when stmt have multiple values

* [fix](fe ut) Fix OlapQueryCacheTest failed (#27305) (#27406)

1.
```
java.lang.NullPointerException: null
        at org.apache.doris.catalog.Env.getCurrentSystemInfo(Env.java:793) ~[classes/:?]
        at org.apache.doris.qe.SimpleScheduler$UpdateBlacklistThread.run(SimpleScheduler.java:206) ~[classes/:?]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]

java.lang.NullPointerException
        at org.apache.doris.qe.OlapQueryCacheTest.setUp(OlapQueryCacheTest.java:226)
```

2.
```
[ERROR] testSqlCacheKeyWithNestedViewForNereids  Time elapsed: 1.962 s  <<< FAILURE!
java.lang.AssertionError: SELECT command denied to user 'testCluster:testUser'@'192.168.1.1' for table 'internal: testCluster:testDb: appevent'
	at org.apache.doris.qe.OlapQueryCacheTest.parseSqlByNereids(OlapQueryCacheTest.java:579)
	at org.apache.doris.qe.OlapQueryCacheTest.testSqlCacheKeyWithNestedViewForNereids(OlapQueryCacheTest.java:1338)
```

3.
```
[ERROR] Tests run: 28, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 113.63 s <<< FAILURE! - in org.apache.doris.qe.OlapQueryCacheTest
[ERROR] testCacheModeTable  Time elapsed: 1.657 s  <<< ERROR!
java.lang.IllegalArgumentException: Value of type org.apache.doris.qe.QueryState incompatible with return type org.apache.doris.system.SystemInfoService of org.apache.doris.catalog.Env#getCurrentSystemInfo()
        at org.apache.doris.qe.OlapQueryCacheTest.setUp(OlapQueryCacheTest.java:156)
```

* [regression test](schema change) add some schema change regression cases (#27112) (#27418)

* [fix](Nereids) result type of add precision is 1 more than expected (#27136) (#27426)

* [fix](Nereids): fill miss slot in having subquery (#27177) (#27394)

* [fix](memory) Fix make_top_consumption_snapshots heap-use-after-free #27434 (#27465)

* [fix](function) make TIMESTAMP function DEPEND_ON_ARGUMENT (#27343) (#27458)

* [fix](test) order by clause in test_map(#27390) (#27391)

pick #27390

* [performance](Planner): optimize getStringValue() in DateLiteral (#27363) (#27470)

- reduce cost of `getStringValue()`
- original code don't consider `microsecond` part in `getStringValue()`

(cherry picked from commit 044a295)

* [Chore](pick) do not push down agg on aggregate column (#27356) (#27498)

* [fix](stats) table not exists error msg not print objects name #27074 (#27463)

* [improve](nereids) support agg function of count(const value) pushdown #26677 (#27499)

support sql: select count(1)-count(not null) from table, the agg of count could push down.

* [test](fe-ut) fix unstable MysqlServerTest (#27459)

Need to find a unbind port for MysqlServerTest

* [opt](MergedIO) no need to merge large columns (#27315) (#27497)

1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes  of ranges.
2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time.

* [improvement](drop tablet)  impr gc shutdown tablet lock (#26151) (#27478)

* [doc](stats) SQL manual for stats (#27461)

* [chore](merge-on-write) disable rowid conversion check for mow table by default (#27482) (#27508)

* [fix](regression)Fix hive p2 case (#27466) (#27511)

* [fix](statistics)Fix auto analyze remove finished job bug #27486 (#27510)

* [Bug](bitmap) Fix heap-use-after-free in the bitmap functions #27411 (#27521)

* [Pick](nereids) Pick: partition prune fails in case of NOT expression (#27047) (#27507)

* [fix](clone) Fix engine_clone file exist (#27361) (#27536)

* [chore](case) adjust timeout of broker load case #27540

* Fix auto analyze doesn't filter unsupported type bug. (#27547)

Fix auto analyze doesn't filter unsupported type bug.
Catch throwable in auto analyze thread for each database, otherwise the thread will quit when one database failed to create jobs and all other databases will not get analyzed.
change FE config item full_auto_analyze_simultaneously_running_task_num to auto_analyze_simultaneously_running_task_num
backport #27559

* [chore](fe plugin) Upgrade dependency to doris 2.0-SNAPSHOT #27522 (#27558)

* [Bug](materialized-view) add limitation for duplicate expr on materialized view (#27523) (#27562)

* [fix](planner)join node should output required slot from parent node #27526 (#27551)

* [branch-2.0](hive) enable hive view by default (#27550)

* [pick](nereids) adjust bc join and shuffle join #27113 (#27566)

* [Fix](hive-transactional-table) Fix NPE when query empty hive transactional table. (#27567)

---------

Co-authored-by: AKIRA <33112463+Kikyou1997@users.noreply.github.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: Pxl <pxl290@qq.com>
Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
Co-authored-by: Luwei <814383175@qq.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: 谢健 <jianxie0@gmail.com>
Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
Co-authored-by: yujun <yu.jun.reach@gmail.com>
Co-authored-by: Xin Liao <liaoxinbit@126.com>
Co-authored-by: Jibing-Li <64681310+Jibing-Li@users.noreply.github.com>
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
Co-authored-by: minghong <englefly@gmail.com>
Co-authored-by: Jack Drogon <jack.xsuperman@gmail.com>
Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.0.3-merged p0_c reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants