Skip to content

Conversation

@mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented May 17, 2025

What problem does this PR solve?

Introduced by #50720

The build block maybe be clear_column_mem_not_keep in build phase when the operator is closed.

Status HashJoinBuildSinkLocalState::close(RuntimeState* state, Status exec_status) {
    if (_closed) {
        return Status::OK();
    }
    auto& p = _parent->cast<HashJoinBuildSinkOperatorX>();
    Defer defer {[&]() {
        if (!_should_build_hash_table) {
            return;
        }
        // The build side hash key column maybe no need output, but we need to keep the column in block
        // because it is used to compare with probe side hash key column

        if (p._should_keep_hash_key_column && _build_col_ids.size() == 1) {
            p._should_keep_column_flags[_build_col_ids[0]] = true;
        }

        if (_shared_state->build_block) {
            // release the memory of unused column in probe stage
            _shared_state->build_block->clear_column_mem_not_keep(p._should_keep_column_flags,
                                                                  p._use_shared_hash_table);
        }

        if (p._use_shared_hash_table) {
            std::unique_lock lock(p._mutex);
            p._signaled = true;
            for (auto& dep : _shared_state->sink_deps) {
                dep->set_ready();
            }
            for (auto& dep : p._finish_dependencies) {
                dep->set_ready();
            }
        }
    }};
*** Aborted at 1747343165 (unix time) try "date -d @1747343165" if you are using GNU date ***
*** Current BE git commitID: e7a3e78b97 ***
*** SIGSEGV address not mapped to object (@0x1) received by PID 7474 (TID 9641 OR 0x7f3f8c0e5640) from PID 1; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F4368F76520 in /lib/x86_64-linux-gnu/libc.so.6
 4# doris::Status doris::pipeline::ProcessHashTableProbe<7>::finish_probing > > >(doris::vectorized::MethodKeysFixed > >&, doris::vectorized::MutableBlock&, doris::vectorized::Block*, bool*, bool) at /root/doris/be/src/pipeline/exec/join/process_hash_table_probe_impl.h:738
 5# std::__detail::__variant::__gen_vtable_impl (*)(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&)>, std::integer_sequence >::__visit_invoke(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1013
 6# doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const at /root/doris/be/src/pipeline/exec/hashjoin_probe_operator.cpp:281
 7# doris::pipeline::StatefulOperatorX::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:670
 8# doris::pipeline::OperatorXBase::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:381
 9# doris::pipeline::PipelineTask::execute(bool*) in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
10# doris::pipeline::TaskScheduler::_do_work(int) at /root/doris/be/src/pipeline/task_scheduler.cpp:144
11# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:622
12# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:469
13# start_thread at ./nptl/pthread_create.c:442
14# 0x00007F436905A850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented May 17, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Member Author

mrhhsg commented May 17, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34003 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 64accfc94a58c2931a04941c1ab4266c1e3b9391, data reload: false

------ Round 1 ----------------------------------
q1	26431	5250	5076	5076
q2	2085	303	189	189
q3	10357	1239	700	700
q4	10232	1006	521	521
q5	7591	2423	2364	2364
q6	183	166	135	135
q7	932	768	632	632
q8	9326	1307	1102	1102
q9	6804	5088	5104	5088
q10	6823	2324	1895	1895
q11	498	291	276	276
q12	348	353	210	210
q13	17783	3701	3155	3155
q14	239	238	210	210
q15	523	478	493	478
q16	434	443	376	376
q17	584	858	359	359
q18	7743	7285	7151	7151
q19	1357	945	538	538
q20	347	345	229	229
q21	4114	2658	2338	2338
q22	1065	1032	981	981
Total cold run time: 115799 ms
Total hot run time: 34003 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5179	5086	5116	5086
q2	247	325	233	233
q3	2159	2676	2265	2265
q4	1345	1759	1353	1353
q5	4531	4504	4421	4421
q6	217	172	131	131
q7	2008	1937	1759	1759
q8	2627	2646	2575	2575
q9	7266	7098	7152	7098
q10	3013	3171	2776	2776
q11	583	507	511	507
q12	716	771	601	601
q13	3459	3889	3284	3284
q14	287	303	285	285
q15	525	479	471	471
q16	447	490	448	448
q17	1157	1516	1389	1389
q18	7801	7510	7551	7510
q19	850	854	883	854
q20	1956	2050	1885	1885
q21	4795	4353	4355	4353
q22	1041	1037	1002	1002
Total cold run time: 52209 ms
Total hot run time: 50286 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 186945 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 64accfc94a58c2931a04941c1ab4266c1e3b9391, data reload: false

query1	1027	481	495	481
query2	6566	1828	1819	1819
query3	6761	228	231	228
query4	26671	23400	23391	23391
query5	4371	625	487	487
query6	304	209	191	191
query7	4623	490	301	301
query8	289	244	228	228
query9	8618	2716	2722	2716
query10	468	337	292	292
query11	15452	15097	14850	14850
query12	165	112	111	111
query13	1671	562	446	446
query14	8804	6285	6259	6259
query15	206	194	170	170
query16	7130	638	472	472
query17	961	714	605	605
query18	1985	409	311	311
query19	194	201	168	168
query20	117	116	133	116
query21	215	131	107	107
query22	4093	4266	4013	4013
query23	33920	33024	33176	33024
query24	8470	2453	2362	2362
query25	548	460	405	405
query26	1241	269	154	154
query27	2768	514	351	351
query28	4382	2188	2164	2164
query29	758	615	438	438
query30	282	213	191	191
query31	938	853	771	771
query32	73	67	63	63
query33	569	367	320	320
query34	797	852	532	532
query35	785	820	726	726
query36	956	990	879	879
query37	109	100	79	79
query38	4072	4196	4130	4130
query39	1501	1427	1409	1409
query40	215	123	108	108
query41	56	55	54	54
query42	127	115	111	111
query43	521	508	478	478
query44	1372	867	867	867
query45	181	172	167	167
query46	851	1030	658	658
query47	1781	1798	1703	1703
query48	391	424	321	321
query49	773	502	428	428
query50	639	683	422	422
query51	4144	4152	4105	4105
query52	118	107	108	107
query53	240	256	187	187
query54	610	575	538	538
query55	90	89	91	89
query56	346	328	290	290
query57	1127	1155	1079	1079
query58	279	262	264	262
query59	2574	2666	2525	2525
query60	327	329	327	327
query61	128	126	138	126
query62	776	740	684	684
query63	226	188	190	188
query64	4392	1031	679	679
query65	4298	4271	4272	4271
query66	1144	418	355	355
query67	15952	15509	15613	15509
query68	8089	905	535	535
query69	477	314	282	282
query70	1230	1154	1130	1130
query71	468	345	311	311
query72	5613	4696	5029	4696
query73	740	614	365	365
query74	8954	9199	8994	8994
query75	3863	3212	2682	2682
query76	3715	1199	732	732
query77	796	395	294	294
query78	10113	10209	9273	9273
query79	2336	828	587	587
query80	655	525	449	449
query81	465	257	220	220
query82	426	133	95	95
query83	287	250	233	233
query84	297	111	89	89
query85	793	357	309	309
query86	336	320	277	277
query87	4352	4464	4345	4345
query88	3496	2413	2411	2411
query89	396	318	289	289
query90	1938	217	211	211
query91	146	150	113	113
query92	66	62	56	56
query93	1428	960	584	584
query94	674	422	307	307
query95	377	294	282	282
query96	510	582	295	295
query97	2714	2767	2645	2645
query98	244	204	207	204
query99	1445	1389	1252	1252
Total cold run time: 273977 ms
Total hot run time: 186945 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.7 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 64accfc94a58c2931a04941c1ab4266c1e3b9391, data reload: false

query1	0.04	0.03	0.04
query2	0.12	0.10	0.12
query3	0.25	0.20	0.19
query4	1.59	0.20	0.19
query5	0.47	0.42	0.44
query6	1.19	0.67	0.66
query7	0.02	0.02	0.02
query8	0.05	0.03	0.04
query9	0.60	0.51	0.51
query10	0.57	0.59	0.56
query11	0.16	0.11	0.11
query12	0.15	0.12	0.11
query13	0.61	0.60	0.59
query14	0.78	0.81	0.80
query15	0.88	0.83	0.87
query16	0.39	0.37	0.38
query17	1.04	1.04	1.07
query18	0.22	0.21	0.21
query19	1.92	1.82	1.84
query20	0.01	0.01	0.01
query21	15.41	0.90	0.54
query22	0.74	1.26	0.73
query23	14.80	1.38	0.61
query24	6.74	1.70	1.28
query25	0.47	0.16	0.12
query26	0.56	0.16	0.14
query27	0.06	0.06	0.05
query28	10.14	0.87	0.45
query29	12.53	4.10	3.43
query30	0.25	0.09	0.06
query31	2.84	0.61	0.38
query32	3.23	0.56	0.46
query33	3.04	3.07	3.11
query34	15.75	5.15	4.48
query35	4.49	4.53	4.50
query36	0.69	0.49	0.48
query37	0.08	0.06	0.06
query38	0.06	0.04	0.04
query39	0.03	0.03	0.02
query40	0.18	0.13	0.13
query41	0.08	0.03	0.02
query42	0.05	0.03	0.02
query43	0.03	0.03	0.03
Total cold run time: 103.31 s
Total hot run time: 29.7 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 100.00% (3/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 55.87% (14883/26641)
Line Coverage 44.64% (131872/295388)
Region Coverage 43.75% (66394/151760)
Branch Coverage 38.34% (34015/88714)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (3/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 60.27% (15808/26228)
Line Coverage 49.74% (146942/295399)
Region Coverage 47.03% (83855/178316)
Branch Coverage 40.60% (41144/101346)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (3/3) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.43% (20834/26228)
Line Coverage 72.60% (214460/295399)
Region Coverage 70.76% (126182/178316)
Branch Coverage 64.48% (65351/101346)

@BiteTheDDDDt
Copy link
Contributor

can we add some test case for this coredump stack?

@mrhhsg mrhhsg changed the title [fix](join) The clear_column_data function may unintentionally clear columns shared by others [fix](join) Should not use the build block's size to resize mark_join_flags May 19, 2025
@mrhhsg
Copy link
Member Author

mrhhsg commented May 19, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34106 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a85027f88e65f88ab9527a03883c8764889fbe9b, data reload: false

------ Round 1 ----------------------------------
q1	26507	5104	5024	5024
q2	2086	289	194	194
q3	10551	1268	708	708
q4	10268	1003	558	558
q5	8019	2359	2357	2357
q6	185	165	132	132
q7	927	757	611	611
q8	9314	1295	1124	1124
q9	6771	5069	5152	5069
q10	6812	2329	1906	1906
q11	485	293	278	278
q12	358	350	216	216
q13	17773	3682	3108	3108
q14	232	231	222	222
q15	549	494	477	477
q16	441	429	372	372
q17	640	877	372	372
q18	7557	7377	7251	7251
q19	1218	959	557	557
q20	351	344	229	229
q21	3999	3257	2371	2371
q22	1021	1016	970	970
Total cold run time: 116064 ms
Total hot run time: 34106 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5081	5033	5058	5033
q2	243	337	232	232
q3	2234	2647	2323	2323
q4	1409	1795	1427	1427
q5	4529	4448	4343	4343
q6	209	172	124	124
q7	1981	1879	1697	1697
q8	2544	2477	2536	2477
q9	7142	7274	7147	7147
q10	3046	3169	2759	2759
q11	583	497	486	486
q12	707	787	625	625
q13	3495	3874	3361	3361
q14	279	286	263	263
q15	517	483	462	462
q16	444	483	453	453
q17	1170	1598	1346	1346
q18	7728	7507	7413	7413
q19	825	839	830	830
q20	2025	2022	1832	1832
q21	4729	4259	4299	4259
q22	1129	1038	966	966
Total cold run time: 52049 ms
Total hot run time: 49858 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185641 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a85027f88e65f88ab9527a03883c8764889fbe9b, data reload: false

query1	1015	489	506	489
query2	6565	1778	1847	1778
query3	6760	243	225	225
query4	26080	23611	23067	23067
query5	4312	593	457	457
query6	297	199	196	196
query7	4615	500	286	286
query8	278	239	234	234
query9	8589	2635	2641	2635
query10	471	327	286	286
query11	15794	15065	14795	14795
query12	161	119	107	107
query13	1658	542	433	433
query14	9261	6125	6230	6125
query15	214	198	167	167
query16	7216	655	476	476
query17	1172	700	562	562
query18	1965	397	293	293
query19	194	183	162	162
query20	120	118	118	118
query21	215	121	102	102
query22	4253	4087	4084	4084
query23	33922	33011	33108	33011
query24	8460	2421	2382	2382
query25	556	453	405	405
query26	1230	275	163	163
query27	2735	518	345	345
query28	4341	2141	2116	2116
query29	805	572	462	462
query30	283	222	194	194
query31	949	858	766	766
query32	80	67	69	67
query33	595	359	322	322
query34	834	871	525	525
query35	769	815	753	753
query36	953	984	894	894
query37	119	99	79	79
query38	4133	4193	4188	4188
query39	1492	1402	1419	1402
query40	223	133	113	113
query41	62	58	61	58
query42	131	113	113	113
query43	499	507	519	507
query44	1317	851	839	839
query45	174	176	167	167
query46	838	1024	639	639
query47	1750	1836	1720	1720
query48	398	459	312	312
query49	773	503	428	428
query50	638	675	419	419
query51	4098	4083	4042	4042
query52	115	109	99	99
query53	231	251	186	186
query54	586	564	512	512
query55	87	88	84	84
query56	322	338	314	314
query57	1143	1156	1097	1097
query58	266	254	254	254
query59	2518	2594	2560	2560
query60	321	325	322	322
query61	125	125	122	122
query62	786	733	672	672
query63	220	191	191	191
query64	4409	996	660	660
query65	4286	4225	4266	4225
query66	1125	408	310	310
query67	15843	15533	15324	15324
query68	8378	896	542	542
query69	468	314	263	263
query70	1190	1115	1136	1115
query71	465	336	302	302
query72	5587	4734	4744	4734
query73	723	640	363	363
query74	9050	9085	8651	8651
query75	3874	3192	2677	2677
query76	3676	1193	753	753
query77	786	395	287	287
query78	10142	10299	9343	9343
query79	2102	801	576	576
query80	570	518	472	472
query81	500	251	222	222
query82	460	125	98	98
query83	256	296	231	231
query84	240	106	89	89
query85	790	361	306	306
query86	394	281	310	281
query87	4507	4407	4270	4270
query88	3614	2393	2358	2358
query89	389	315	279	279
query90	1869	214	203	203
query91	144	209	114	114
query92	69	65	59	59
query93	1606	950	577	577
query94	675	415	311	311
query95	380	291	275	275
query96	518	574	297	297
query97	2710	2786	2646	2646
query98	244	209	207	207
query99	1417	1399	1300	1300
Total cold run time: 274538 ms
Total hot run time: 185641 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.43 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a85027f88e65f88ab9527a03883c8764889fbe9b, data reload: false

query1	0.05	0.03	0.03
query2	0.13	0.11	0.10
query3	0.25	0.20	0.19
query4	1.59	0.19	0.11
query5	0.43	0.41	0.43
query6	1.17	0.66	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.04
query9	0.59	0.53	0.52
query10	0.58	0.59	0.56
query11	0.16	0.11	0.10
query12	0.14	0.11	0.12
query13	0.62	0.61	0.60
query14	0.78	0.80	0.81
query15	0.89	0.85	0.86
query16	0.39	0.42	0.39
query17	1.05	1.01	1.08
query18	0.22	0.21	0.21
query19	1.91	1.81	1.83
query20	0.02	0.01	0.02
query21	15.41	0.90	0.55
query22	0.75	1.07	0.64
query23	15.08	1.38	0.63
query24	7.69	1.12	1.24
query25	0.52	0.19	0.19
query26	0.66	0.17	0.13
query27	0.06	0.05	0.04
query28	10.31	0.90	0.46
query29	12.52	3.98	3.36
query30	0.25	0.09	0.08
query31	2.82	0.60	0.39
query32	3.23	0.55	0.47
query33	3.07	3.06	3.05
query34	15.63	5.07	4.50
query35	4.49	4.51	4.49
query36	0.66	0.50	0.48
query37	0.08	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.03	0.02
query40	0.17	0.14	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.03	0.03	0.04
Total cold run time: 104.65 s
Total hot run time: 29.43 s

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (9/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 55.95% (14925/26675)
Line Coverage 44.77% (132401/295752)
Region Coverage 43.87% (66617/151867)
Branch Coverage 38.45% (34128/88768)

@hello-stephen
Copy link
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (9/9) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.33% (20831/26259)
Line Coverage 72.58% (214662/295749)
Region Coverage 70.78% (126289/178418)
Branch Coverage 64.52% (65420/101400)

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label May 20, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@mrhhsg mrhhsg merged commit 81d77fb into apache:master May 20, 2025
27 of 29 checks passed
@mrhhsg mrhhsg deleted the fix_join_core branch May 20, 2025 09:35
mrhhsg added a commit to mrhhsg/doris that referenced this pull request May 20, 2025
…_flags (apache#50993)

Introduced by apache#51050

The build block maybe be `clear_column_mem_not_keep` in build phase when
the operator is closed.

```cpp
Status HashJoinBuildSinkLocalState::close(RuntimeState* state, Status exec_status) {
    if (_closed) {
        return Status::OK();
    }
    auto& p = _parent->cast<HashJoinBuildSinkOperatorX>();
    Defer defer {[&]() {
        if (!_should_build_hash_table) {
            return;
        }
        // The build side hash key column maybe no need output, but we need to keep the column in block
        // because it is used to compare with probe side hash key column

        if (p._should_keep_hash_key_column && _build_col_ids.size() == 1) {
            p._should_keep_column_flags[_build_col_ids[0]] = true;
        }

        if (_shared_state->build_block) {
            // release the memory of unused column in probe stage
            _shared_state->build_block->clear_column_mem_not_keep(p._should_keep_column_flags,
                                                                  p._use_shared_hash_table);
        }

        if (p._use_shared_hash_table) {
            std::unique_lock lock(p._mutex);
            p._signaled = true;
            for (auto& dep : _shared_state->sink_deps) {
                dep->set_ready();
            }
            for (auto& dep : p._finish_dependencies) {
                dep->set_ready();
            }
        }
    }};
```

```
*** Aborted at 1747343165 (unix time) try "date -d @1747343165" if you are using GNU date ***
*** Current BE git commitID: e7a3e78 ***
*** SIGSEGV address not mapped to object (@0x1) received by PID 7474 (TID 9641 OR 0x7f3f8c0e5640) from PID 1; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F4368F76520 in /lib/x86_64-linux-gnu/libc.so.6
 4# doris::Status doris::pipeline::ProcessHashTableProbe<7>::finish_probing > > >(doris::vectorized::MethodKeysFixed > >&, doris::vectorized::MutableBlock&, doris::vectorized::Block*, bool*, bool) at /root/doris/be/src/pipeline/exec/join/process_hash_table_probe_impl.h:738
 5# std::__detail::__variant::__gen_vtable_impl (*)(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&)>, std::integer_sequence >::__visit_invoke(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1013
 6# doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const at /root/doris/be/src/pipeline/exec/hashjoin_probe_operator.cpp:281
 7# doris::pipeline::StatefulOperatorX::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:670
 8# doris::pipeline::OperatorXBase::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:381
 9# doris::pipeline::PipelineTask::execute(bool*) in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
10# doris::pipeline::TaskScheduler::_do_work(int) at /root/doris/be/src/pipeline/task_scheduler.cpp:144
11# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:622
12# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:469
13# start_thread at ./nptl/pthread_create.c:442
14# 0x00007F436905A850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
yiguolei pushed a commit that referenced this pull request May 21, 2025
…_flags (#50993) (#51089)

Pick #50993

Introduced by #51050

The build block maybe be `clear_column_mem_not_keep` in build phase when
the operator is closed.

```cpp
Status HashJoinBuildSinkLocalState::close(RuntimeState* state, Status exec_status) {
    if (_closed) {
        return Status::OK();
    }
    auto& p = _parent->cast<HashJoinBuildSinkOperatorX>();
    Defer defer {[&]() {
        if (!_should_build_hash_table) {
            return;
        }
        // The build side hash key column maybe no need output, but we need to keep the column in block
        // because it is used to compare with probe side hash key column

        if (p._should_keep_hash_key_column && _build_col_ids.size() == 1) {
            p._should_keep_column_flags[_build_col_ids[0]] = true;
        }

        if (_shared_state->build_block) {
            // release the memory of unused column in probe stage
            _shared_state->build_block->clear_column_mem_not_keep(p._should_keep_column_flags,
                                                                  p._use_shared_hash_table);
        }

        if (p._use_shared_hash_table) {
            std::unique_lock lock(p._mutex);
            p._signaled = true;
            for (auto& dep : _shared_state->sink_deps) {
                dep->set_ready();
            }
            for (auto& dep : p._finish_dependencies) {
                dep->set_ready();
            }
        }
    }};
```

```
*** Aborted at 1747343165 (unix time) try "date -d @1747343165" if you are using GNU date ***
*** Current BE git commitID: e7a3e78 ***
*** SIGSEGV address not mapped to object (@0x1) received by PID 7474 (TID 9641 OR 0x7f3f8c0e5640) from PID 1; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F4368F76520 in /lib/x86_64-linux-gnu/libc.so.6
 4# doris::Status doris::pipeline::ProcessHashTableProbe<7>::finish_probing > > >(doris::vectorized::MethodKeysFixed > >&, doris::vectorized::MutableBlock&, doris::vectorized::Block*, bool*, bool) at /root/doris/be/src/pipeline/exec/join/process_hash_table_probe_impl.h:738
 5# std::__detail::__variant::__gen_vtable_impl (*)(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&)>, std::integer_sequence >::__visit_invoke(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1013
 6# doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const at /root/doris/be/src/pipeline/exec/hashjoin_probe_operator.cpp:281
 7# doris::pipeline::StatefulOperatorX::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:670
 8# doris::pipeline::OperatorXBase::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:381
 9# doris::pipeline::PipelineTask::execute(bool*) in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
10# doris::pipeline::TaskScheduler::_do_work(int) at /root/doris/be/src/pipeline/task_scheduler.cpp:144
11# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:622
12# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:469
13# start_thread at ./nptl/pthread_create.c:442
14# 0x00007F436905A850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change. - [ ] No code files have been
changed. - [ ] Other reason <!-- Add your reason? -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
mrhhsg added a commit to mrhhsg/doris that referenced this pull request May 22, 2025
…_flags (apache#50993) (apache#51089)

Pick apache#50993

Introduced by apache#51050

The build block maybe be `clear_column_mem_not_keep` in build phase when
the operator is closed.

```cpp
Status HashJoinBuildSinkLocalState::close(RuntimeState* state, Status exec_status) {
    if (_closed) {
        return Status::OK();
    }
    auto& p = _parent->cast<HashJoinBuildSinkOperatorX>();
    Defer defer {[&]() {
        if (!_should_build_hash_table) {
            return;
        }
        // The build side hash key column maybe no need output, but we need to keep the column in block
        // because it is used to compare with probe side hash key column

        if (p._should_keep_hash_key_column && _build_col_ids.size() == 1) {
            p._should_keep_column_flags[_build_col_ids[0]] = true;
        }

        if (_shared_state->build_block) {
            // release the memory of unused column in probe stage
            _shared_state->build_block->clear_column_mem_not_keep(p._should_keep_column_flags,
                                                                  p._use_shared_hash_table);
        }

        if (p._use_shared_hash_table) {
            std::unique_lock lock(p._mutex);
            p._signaled = true;
            for (auto& dep : _shared_state->sink_deps) {
                dep->set_ready();
            }
            for (auto& dep : p._finish_dependencies) {
                dep->set_ready();
            }
        }
    }};
```

```
*** Aborted at 1747343165 (unix time) try "date -d @1747343165" if you are using GNU date ***
*** Current BE git commitID: e7a3e78 ***
*** SIGSEGV address not mapped to object (@0x1) received by PID 7474 (TID 9641 OR 0x7f3f8c0e5640) from PID 1; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F4368F76520 in /lib/x86_64-linux-gnu/libc.so.6
 4# doris::Status doris::pipeline::ProcessHashTableProbe<7>::finish_probing > > >(doris::vectorized::MethodKeysFixed > >&, doris::vectorized::MutableBlock&, doris::vectorized::Block*, bool*, bool) at /root/doris/be/src/pipeline/exec/join/process_hash_table_probe_impl.h:738
 5# std::__detail::__variant::__gen_vtable_impl (*)(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&)>, std::integer_sequence >::__visit_invoke(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1013
 6# doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const at /root/doris/be/src/pipeline/exec/hashjoin_probe_operator.cpp:281
 7# doris::pipeline::StatefulOperatorX::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:670
 8# doris::pipeline::OperatorXBase::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:381
 9# doris::pipeline::PipelineTask::execute(bool*) in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
10# doris::pipeline::TaskScheduler::_do_work(int) at /root/doris/be/src/pipeline/task_scheduler.cpp:144
11# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:622
12# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:469
13# start_thread at ./nptl/pthread_create.c:442
14# 0x00007F436905A850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change. - [ ] No code files have been
changed. - [ ] Other reason <!-- Add your reason? -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…_flags (apache#50993)

### What problem does this PR solve?

Introduced by apache#51050

The build block maybe be `clear_column_mem_not_keep` in build phase when
the operator is closed.

```cpp
Status HashJoinBuildSinkLocalState::close(RuntimeState* state, Status exec_status) {
    if (_closed) {
        return Status::OK();
    }
    auto& p = _parent->cast<HashJoinBuildSinkOperatorX>();
    Defer defer {[&]() {
        if (!_should_build_hash_table) {
            return;
        }
        // The build side hash key column maybe no need output, but we need to keep the column in block
        // because it is used to compare with probe side hash key column

        if (p._should_keep_hash_key_column && _build_col_ids.size() == 1) {
            p._should_keep_column_flags[_build_col_ids[0]] = true;
        }

        if (_shared_state->build_block) {
            // release the memory of unused column in probe stage
            _shared_state->build_block->clear_column_mem_not_keep(p._should_keep_column_flags,
                                                                  p._use_shared_hash_table);
        }

        if (p._use_shared_hash_table) {
            std::unique_lock lock(p._mutex);
            p._signaled = true;
            for (auto& dep : _shared_state->sink_deps) {
                dep->set_ready();
            }
            for (auto& dep : p._finish_dependencies) {
                dep->set_ready();
            }
        }
    }};
```

```
*** Aborted at 1747343165 (unix time) try "date -d @1747343165" if you are using GNU date ***
*** Current BE git commitID: e7a3e78 ***
*** SIGSEGV address not mapped to object (@0x1) received by PID 7474 (TID 9641 OR 0x7f3f8c0e5640) from PID 1; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F4368F76520 in /lib/x86_64-linux-gnu/libc.so.6
 4# doris::Status doris::pipeline::ProcessHashTableProbe<7>::finish_probing > > >(doris::vectorized::MethodKeysFixed > >&, doris::vectorized::MutableBlock&, doris::vectorized::Block*, bool*, bool) at /root/doris/be/src/pipeline/exec/join/process_hash_table_probe_impl.h:738
 5# std::__detail::__variant::__gen_vtable_impl (*)(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&)>, std::integer_sequence >::__visit_invoke(doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const::$_1&&, std::variant > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodOneNumber, doris::JoinHashTable, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodKeysFixed > >, doris::vectorized::MethodKeysFixed, HashCRC32 > > >, doris::vectorized::MethodStringNoCache > > >&, std::variant, doris::pipeline::ProcessHashTableProbe<2>, doris::pipeline::ProcessHashTableProbe<8>, doris::pipeline::ProcessHashTableProbe<1>, doris::pipeline::ProcessHashTableProbe<4>, doris::pipeline::ProcessHashTableProbe<3>, doris::pipeline::ProcessHashTableProbe<7>, doris::pipeline::ProcessHashTableProbe<9>, doris::pipeline::ProcessHashTableProbe<10>, doris::pipeline::ProcessHashTableProbe<11> >&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/variant:1013
 6# doris::pipeline::HashJoinProbeOperatorX::pull(doris::RuntimeState*, doris::vectorized::Block*, bool*) const at /root/doris/be/src/pipeline/exec/hashjoin_probe_operator.cpp:281
 7# doris::pipeline::StatefulOperatorX::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:670
 8# doris::pipeline::OperatorXBase::get_block_after_projects(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/doris/be/src/pipeline/exec/operator.cpp:381
 9# doris::pipeline::PipelineTask::execute(bool*) in /mnt/hdd01/ci/doris-deploy-master-local/be/lib/doris_be
10# doris::pipeline::TaskScheduler::_do_work(int) at /root/doris/be/src/pipeline/task_scheduler.cpp:144
11# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:622
12# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:469
13# start_thread at ./nptl/pthread_create.c:442
14# 0x00007F436905A850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
dataroaring pushed a commit that referenced this pull request Jun 11, 2025
…51156)

### What problem does this PR solve?

Pick #50720 #50993 #51124

Co-authored-by: zhangdong <zhangdong@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants