Skip to content

Conversation

@xy720
Copy link
Member

@xy720 xy720 commented Nov 22, 2023

Proposed changes

How to reproduce and more details in #27409

==87116==ERROR: AddressSanitizer: heap-use-after-free on address 0x60200160a118 at pc 0x5560d96fe0f3 bp 0x7f47783b7f90 sp 0x7f47783b7f80
READ of size 2 at 0x60200160a118 thread T391 (FragmentMgrThre)
    #0 0x5560d96fe0f2 in inline_memcpy /data/doris-1.x/be/src/glibc-compatibility/memcpy/memcpy_x86_64.cpp:132
    #1 0x5560d96feea5 in memcpy /data/doris-1.x/be/src/glibc-compatibility/memcpy/memcpy_x86_64.cpp:219
    #2 0x5560ed490b67 in ra_overwrite (/usr/local/service/doris/lib/be/doris_be+0x25b72b67)
    #3 0x5560d9d077f5 in roaring::Roaring::Roaring(roaring::Roaring const&) /var/local/thirdparty/installed/include/roaring/roaring.hh:68

    ...too much output.

    #21 0x5560d9d08b26 in phmap::btree_map<unsigned int, roaring::Roaring, phmap::Less<unsigned int>, std::allocator<std::pair<unsigned int const, roaring::Roaring> > >::operator=(phmap::btree_map<unsigned int, roaring::Roaring, phmap::Less<unsigned int>, std::allocator<std::pair<unsigned int const, roaring::Roaring> > > const&) /var/local/thirdparty/installed/include/parallel_hashmap/btree.h:3963
    #22 0x5560d9d08b50 in doris::detail::Roaring64Map::operator=(doris::detail::Roaring64Map const&) /data/doris-1.x/be/src/util/bitmap_value.h:140
    #23 0x5560d9d12665 in doris::BitmapValue::_prepare_bitmap_for_write() /data/doris-1.x/be/src/util/bitmap_value.h:1954
    #24 0x5560d9d0f258 in doris::BitmapValue::operator^=(doris::BitmapValue const&) /data/doris-1.x/be/src/util/bitmap_value.h:1466
    #25 0x5560e31dac6c in doris::vectorized::BitmapXor::vector_vector(COW<doris::vectorized::IColumn>::immutable_ptr<doris::vectorized::IColumn>*, unsigned long, unsigned long, std::vector<doris::BitmapValue, std::allocator<doris::BitmapValue> >&, doris::vectorized::IColumn*) /data/doris-1.x/be/src/vec/functions/function_bitmap_variadic.cpp:130
    #26 0x5560e31ed0a7 in doris::vectorized::FunctionBitMapVariadic<doris::vectorized::BitmapXor>::execute_impl_internal(doris_udf::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) /data/doris-1.x/be/src/vec/functions/function_bitmap_variadic.cpp:230
    #27 0x5560e31e8a1c in doris::vectorized::FunctionBitMapVariadic<doris::vectorized::BitmapXor>::execute_impl(doris_udf::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) /data/doris-1.x/be/src/vec/functions/function_bitmap_variadic.cpp:196
    #28 0x5560e25fdaf5 in doris::vectorized::DefaultExecutable::execute_impl(doris_udf::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long) /data/doris-1.x/be/src/vec/functions/function.h:484
    #29 0x5560e3f01c98 in doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns(doris_udf::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool) /data/doris-1.x/be/src/vec/functions/function.cpp:244
    #30 0x5560e3f01325 in doris::vectorized::PreparedFunctionImpl::default_implementation_for_nulls(doris_udf::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool, bool*) /data/doris-1.x/be/src/vec/functions/function.cpp:214
    #31 0x5560e3f01a26 in doris::vectorized::PreparedFunctionImpl::execute_without_low_cardinality_columns(doris_udf::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool) /data/doris-1.x/be/src/vec/functions/function.cpp:235
    #32 0x5560e3f01d98 in doris::vectorized::PreparedFunctionImpl::execute(doris_udf::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool) /data/doris-1.x/be/src/vec/functions/function.cpp:266
    #33 0x5560e25fa730 in doris::vectorized::IFunctionBase::execute(doris_udf::FunctionContext*, doris::vectorized::Block&, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, unsigned long, bool) /data/doris-1.x/be/src/vec/functions/function.h:155
    #34 0x5560e2513fef in doris::vectorized::VectorizedFnCall::execute(doris::vectorized::VExprContext*, doris::vectorized::Block*, int*) /data/doris-1.x/be/src/vec/exprs/vectorized_fn_call.cpp:109
#35 0x5560e25233be in doris::vectorized::VExprContext::execute(doris::vectorized::Block*, int*) /data/doris-1.x/be/src/vec/exprs/vexpr_context.cpp:46
    #36 0x5560df029a77 in doris::vectorized::VUnionNode::materialize_block(doris::vectorized::Block*, doris::vectorized::Block*) /data/doris-1.x/be/src/vec/exec/vunion_node.cpp:285
    #37 0x5560df025478 in doris::vectorized::VUnionNode::get_next_materialized(doris::RuntimeState*, doris::vectorized::Block*) /data/doris-1.x/be/src/vec/exec/vunion_node.cpp:157

...

0x60200160a118 is located 8 bytes inside of 11-byte region [0x60200160a110,0x60200160a11b)
freed by thread T373 (FragmentMgrThre) here:
    #0 0x5560d96b8a6f in free (/usr/local/service/doris/lib/be/doris_be+0x11d9aa6f)
    #1 0x5560ed490883 in ra_shrink_to_fit (/usr/local/service/doris/lib/be/doris_be+0x25b72883)

previously allocated by thread T373 (FragmentMgrThre) here:
    #0 0x5560d96b8dc7 in __interceptor_malloc (/usr/local/service/doris/lib/be/doris_be+0x11d9adc7)
    #1 0x5560ed490822 in ra_shrink_to_fit (/usr/local/service/doris/lib/be/doris_be+0x25b72822)

...

Further comments

If this is a relatively large or complex change, kick off the discussion at dev@doris.apache.org by explaining why you chose the solution you did and what alternatives you considered, etc...

yiguolei
yiguolei previously approved these changes Nov 22, 2023
Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@yiguolei yiguolei added usercase Important user case type label p0_c dev/2.0.3 labels Nov 22, 2023
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 22, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a regression test.

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 22, 2023
@xy720 xy720 changed the title [Bug] (bitmap) Fix heap-use-after-free in the bitmap functions [Bug](bitmap) Fix heap-use-after-free in the bitmap functions Nov 22, 2023
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@xy720
Copy link
Member Author

xy720 commented Nov 23, 2023

run buildall

@xy720
Copy link
Member Author

xy720 commented Nov 23, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.56% (8449/23107)
Line Coverage: 28.86% (68672/237948)
Region Coverage: 27.83% (35526/127632)
Branch Coverage: 24.57% (18124/73764)
Coverage Report: http://coverage.selectdb-in.cc/coverage/ef3ddb8157b1c07dcddbe36bec1fb71be0be0e35_ef3ddb8157b1c07dcddbe36bec1fb71be0be0e35/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit ef3ddb8157b1c07dcddbe36bec1fb71be0be0e35, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4882	4632	4677	4632
q2	359	165	162	162
q3	2030	1953	1848	1848
q4	1390	1248	1268	1248
q5	3968	3954	3994	3954
q6	252	123	129	123
q7	1454	883	875	875
q8	2796	2805	2780	2780
q9	9923	9517	9584	9517
q10	3510	3539	3506	3506
q11	379	255	249	249
q12	436	291	288	288
q13	4576	3789	3791	3789
q14	320	292	300	292
q15	576	538	522	522
q16	665	588	590	588
q17	1149	967	952	952
q18	7884	7345	7347	7345
q19	1681	1690	1689	1689
q20	586	315	287	287
q21	4408	3979	4006	3979
q22	476	393	361	361
Total cold run time: 53700 ms
Total hot run time: 48986 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4622	4575	4579	4575
q2	340	233	253	233
q3	4026	3995	3985	3985
q4	2704	2694	2707	2694
q5	9666	9633	9625	9625
q6	249	122	121	121
q7	3015	2507	2482	2482
q8	4457	4493	4490	4490
q9	12975	12889	12971	12889
q10	4082	4184	4182	4182
q11	812	656	650	650
q12	981	812	807	807
q13	4286	3556	3598	3556
q14	374	335	349	335
q15	583	524	523	523
q16	741	676	688	676
q17	3844	3868	3954	3868
q18	9599	8997	9034	8997
q19	1846	1785	1819	1785
q20	2411	2063	2015	2015
q21	8790	8661	8668	8661
q22	917	811	789	789
Total cold run time: 81320 ms
Total hot run time: 77938 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.18 seconds
stream load tsv: 573 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17098149543 Bytes

@xy720
Copy link
Member Author

xy720 commented Nov 23, 2023

run buildall

@github-actions
Copy link
Contributor

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.58% (8449/23097)
Line Coverage: 28.87% (68672/237903)
Region Coverage: 27.82% (35504/127600)
Branch Coverage: 24.56% (18114/73756)
Coverage Report: http://coverage.selectdb-in.cc/coverage/465981625b2904e8029ce0e370ad33672beec026_465981625b2904e8029ce0e370ad33672beec026/report/index.html

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.31 seconds
stream load tsv: 571 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17098774551 Bytes

@xy720
Copy link
Member Author

xy720 commented Nov 23, 2023

run p0

@lide-reed lide-reed self-requested a review November 23, 2023 10:52
Copy link
Contributor

@lide-reed lide-reed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 465981625b2904e8029ce0e370ad33672beec026, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4949	4706	4696	4696
q2	358	158	158	158
q3	2071	1913	1933	1913
q4	1396	1260	1256	1256
q5	3995	3938	4053	3938
q6	252	136	135	135
q7	1423	874	892	874
q8	2796	2830	2774	2774
q9	9694	9518	9436	9436
q10	3459	3539	3519	3519
q11	377	253	254	253
q12	437	289	291	289
q13	4567	3825	3802	3802
q14	326	290	292	290
q15	578	538	518	518
q16	662	581	587	581
q17	1147	959	924	924
q18	7929	7412	7504	7412
q19	1689	1692	1689	1689
q20	546	322	302	302
q21	4427	4023	4000	4000
q22	475	369	381	369
Total cold run time: 53553 ms
Total hot run time: 49128 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4580	4585	4586	4585
q2	334	240	259	240
q3	4032	4010	3981	3981
q4	2710	2695	2716	2695
q5	9709	9583	9614	9583
q6	246	122	124	122
q7	3013	2496	2494	2494
q8	4489	4498	4464	4464
q9	12931	12858	12840	12840
q10	4077	4181	4181	4181
q11	752	664	637	637
q12	977	809	808	808
q13	4291	3588	3606	3588
q14	387	366	342	342
q15	578	527	527	527
q16	741	675	658	658
q17	3867	3919	3820	3820
q18	9629	9004	9098	9004
q19	1843	1783	1786	1783
q20	2387	2042	2026	2026
q21	8852	8599	8580	8580
q22	909	831	756	756
Total cold run time: 81334 ms
Total hot run time: 77714 ms

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 23, 2023
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@xy720
Copy link
Member Author

xy720 commented Nov 23, 2023

run p0

5 similar comments
@xy720
Copy link
Member Author

xy720 commented Nov 23, 2023

run p0

@xy720
Copy link
Member Author

xy720 commented Nov 23, 2023

run p0

@xy720
Copy link
Member Author

xy720 commented Nov 23, 2023

run p0

@xy720
Copy link
Member Author

xy720 commented Nov 23, 2023

run p0

@xy720
Copy link
Member Author

xy720 commented Nov 23, 2023

run p0

@xiaokang xiaokang merged commit 75c9f00 into apache:master Nov 24, 2023
eldenmoon pushed a commit to eldenmoon/incubator-doris that referenced this pull request Nov 27, 2023
eldenmoon added a commit that referenced this pull request Nov 27, 2023
* [fix](stats) Fix update rows for unique table didn't get updated properly #26968 (#27337)

* [FIX](jsonb) fix jsonb in predict column #27325 (#27424)

* [fix](fe) slots in having clause should be set to need materialized(#27412) (#27429)

* [Bug](insert)fix insert wrong data on mv when stmt have multiple values (#27297) (#27382)

fix insert wrong data on mv when stmt have multiple values

* [fix](fe ut) Fix OlapQueryCacheTest failed (#27305) (#27406)

1.
```
java.lang.NullPointerException: null
        at org.apache.doris.catalog.Env.getCurrentSystemInfo(Env.java:793) ~[classes/:?]
        at org.apache.doris.qe.SimpleScheduler$UpdateBlacklistThread.run(SimpleScheduler.java:206) ~[classes/:?]
        at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_382]

java.lang.NullPointerException
        at org.apache.doris.qe.OlapQueryCacheTest.setUp(OlapQueryCacheTest.java:226)
```

2.
```
[ERROR] testSqlCacheKeyWithNestedViewForNereids  Time elapsed: 1.962 s  <<< FAILURE!
java.lang.AssertionError: SELECT command denied to user 'testCluster:testUser'@'192.168.1.1' for table 'internal: testCluster:testDb: appevent'
	at org.apache.doris.qe.OlapQueryCacheTest.parseSqlByNereids(OlapQueryCacheTest.java:579)
	at org.apache.doris.qe.OlapQueryCacheTest.testSqlCacheKeyWithNestedViewForNereids(OlapQueryCacheTest.java:1338)
```

3.
```
[ERROR] Tests run: 28, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 113.63 s <<< FAILURE! - in org.apache.doris.qe.OlapQueryCacheTest
[ERROR] testCacheModeTable  Time elapsed: 1.657 s  <<< ERROR!
java.lang.IllegalArgumentException: Value of type org.apache.doris.qe.QueryState incompatible with return type org.apache.doris.system.SystemInfoService of org.apache.doris.catalog.Env#getCurrentSystemInfo()
        at org.apache.doris.qe.OlapQueryCacheTest.setUp(OlapQueryCacheTest.java:156)
```

* [regression test](schema change) add some schema change regression cases (#27112) (#27418)

* [fix](Nereids) result type of add precision is 1 more than expected (#27136) (#27426)

* [fix](Nereids): fill miss slot in having subquery (#27177) (#27394)

* [fix](memory) Fix make_top_consumption_snapshots heap-use-after-free #27434 (#27465)

* [fix](function) make TIMESTAMP function DEPEND_ON_ARGUMENT (#27343) (#27458)

* [fix](test) order by clause in test_map(#27390) (#27391)

pick #27390

* [performance](Planner): optimize getStringValue() in DateLiteral (#27363) (#27470)

- reduce cost of `getStringValue()`
- original code don't consider `microsecond` part in `getStringValue()`

(cherry picked from commit 044a295)

* [Chore](pick) do not push down agg on aggregate column (#27356) (#27498)

* [fix](stats) table not exists error msg not print objects name #27074 (#27463)

* [improve](nereids) support agg function of count(const value) pushdown #26677 (#27499)

support sql: select count(1)-count(not null) from table, the agg of count could push down.

* [test](fe-ut) fix unstable MysqlServerTest (#27459)

Need to find a unbind port for MysqlServerTest

* [opt](MergedIO) no need to merge large columns (#27315) (#27497)

1. Fix a profile bug of `MergeRangeFileReader`, and add a profile `ApplyBytes` to show the total bytes  of ranges.
2. There's no need to merge large columns, because `MergeRangeFileReader` will increase the copy time.

* [improvement](drop tablet)  impr gc shutdown tablet lock (#26151) (#27478)

* [doc](stats) SQL manual for stats (#27461)

* [chore](merge-on-write) disable rowid conversion check for mow table by default (#27482) (#27508)

* [fix](regression)Fix hive p2 case (#27466) (#27511)

* [fix](statistics)Fix auto analyze remove finished job bug #27486 (#27510)

* [Bug](bitmap) Fix heap-use-after-free in the bitmap functions #27411 (#27521)

* [Pick](nereids) Pick: partition prune fails in case of NOT expression (#27047) (#27507)

* [fix](clone) Fix engine_clone file exist (#27361) (#27536)

* [chore](case) adjust timeout of broker load case #27540

* Fix auto analyze doesn't filter unsupported type bug. (#27547)

Fix auto analyze doesn't filter unsupported type bug.
Catch throwable in auto analyze thread for each database, otherwise the thread will quit when one database failed to create jobs and all other databases will not get analyzed.
change FE config item full_auto_analyze_simultaneously_running_task_num to auto_analyze_simultaneously_running_task_num
backport #27559

* [chore](fe plugin) Upgrade dependency to doris 2.0-SNAPSHOT #27522 (#27558)

* [Bug](materialized-view) add limitation for duplicate expr on materialized view (#27523) (#27562)

* [fix](planner)join node should output required slot from parent node #27526 (#27551)

* [branch-2.0](hive) enable hive view by default (#27550)

* [pick](nereids) adjust bc join and shuffle join #27113 (#27566)

* [Fix](hive-transactional-table) Fix NPE when query empty hive transactional table. (#27567)

---------

Co-authored-by: AKIRA <33112463+Kikyou1997@users.noreply.github.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
Co-authored-by: Pxl <pxl290@qq.com>
Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
Co-authored-by: Luwei <814383175@qq.com>
Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
Co-authored-by: 谢健 <jianxie0@gmail.com>
Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
Co-authored-by: jakevin <jakevingoo@gmail.com>
Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
Co-authored-by: yujun <yu.jun.reach@gmail.com>
Co-authored-by: Xin Liao <liaoxinbit@126.com>
Co-authored-by: Jibing-Li <64681310+Jibing-Li@users.noreply.github.com>
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
Co-authored-by: minghong <englefly@gmail.com>
Co-authored-by: Jack Drogon <jack.xsuperman@gmail.com>
Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: zhiqiang <seuhezhiqiang@163.com>
Co-authored-by: starocean999 <40539150+starocean999@users.noreply.github.com>
Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 28, 2023
@xiaokang xiaokang mentioned this pull request Dec 4, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/1.2.8-merged dev/2.0.3-merged p0_c reviewed usercase Important user case type label

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants