Skip to content

Conversation

@seawinde
Copy link
Contributor

What problem does this PR solve?

if query is as following which contains external table, if some partition in external table is empty, when mv partiton is invalid, this would cause compensate many union all empty relation

SELECT * FROM ${catalogName}.`test_paimon_spark`.test_tb_mix_format a 
left join ${tableName} b on a.id=b.user_id;

Issue Number: close #xxx

Related PR: #49514

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@seawinde
Copy link
Contributor Author

run buildall

return context.getRewritePlan();
}, queryPlanWithUnionFilter, queryPlan), true);
}, queryPlanWithUnionFilter, queryPlan);
// need to collect table partition again
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write why need re-collect here in comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have added comment

// this is needed before init hook
collectTableUsedPartitions(showPlanProcess);
// this is needed before init hook, because
MaterializedViewUtils.collectTableUsedPartitions(cascadesContext.getRewritePlan(), cascadesContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not do it in hook?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move it in hook

}, queryPlanWithUnionFilter, queryPlan), true);
}, queryPlanWithUnionFilter, queryPlan);
// need to collect table partition again
MaterializedViewUtils.collectTableUsedPartitions(filterAddedPlan, parentCascadesContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if u need collect partition info multi times, u should sum the time used by it in summary profile

}, queryPlanWithUnionFilter, queryPlan), true);
}, queryPlanWithUnionFilter, queryPlan);
// need to collect table partition again
MaterializedViewUtils.collectTableUsedPartitions(filterAddedPlan, parentCascadesContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only collect for sub plan is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should collect table partition for all rewritten plan by mv, have fixed it

@doris-robot
Copy link

TPC-H: Total hot run time: 35905 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4a2a8c6edb79a886313aa0e6bd8ba0919b95bd81, data reload: false

------ Round 1 ----------------------------------
q1	17596	5154	5069	5069
q2	1994	304	189	189
q3	10314	1312	801	801
q4	10204	1058	583	583
q5	7517	2374	2436	2374
q6	203	169	140	140
q7	955	773	646	646
q8	9346	1378	1202	1202
q9	6933	5125	5195	5125
q10	7098	2383	1982	1982
q11	540	336	328	328
q12	387	399	254	254
q13	17869	3775	3220	3220
q14	247	243	238	238
q15	586	495	503	495
q16	468	470	421	421
q17	719	897	451	451
q18	7805	7114	7407	7114
q19	1442	991	604	604
q20	411	382	262	262
q21	4516	3988	3399	3399
q22	1141	1072	1008	1008
Total cold run time: 108291 ms
Total hot run time: 35905 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5087	5021	5076	5021
q2	263	328	241	241
q3	2215	2754	2332	2332
q4	1490	1884	1447	1447
q5	4394	4457	4427	4427
q6	244	177	131	131
q7	2028	2039	1845	1845
q8	2659	2662	2647	2647
q9	7314	7362	7072	7072
q10	3118	3193	2867	2867
q11	630	537	522	522
q12	722	820	658	658
q13	3718	3991	3422	3422
q14	308	308	310	308
q15	518	507	494	494
q16	494	560	476	476
q17	1244	1567	1415	1415
q18	7826	7650	7637	7637
q19	976	968	1153	968
q20	2072	2100	2012	2012
q21	5081	4835	4715	4715
q22	1099	1104	1059	1059
Total cold run time: 53500 ms
Total hot run time: 51716 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 73.33% (11/15) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

run external 10

@doris-robot
Copy link

TPC-H: Total hot run time: 36238 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4a2a8c6edb79a886313aa0e6bd8ba0919b95bd81, data reload: false

------ Round 1 ----------------------------------
q1	17664	5489	4990	4990
q2	2014	313	189	189
q3	10238	1305	831	831
q4	10220	1048	572	572
q5	7527	2428	2405	2405
q6	190	168	139	139
q7	958	786	657	657
q8	9384	1373	1325	1325
q9	6900	5147	5179	5147
q10	6874	2363	1960	1960
q11	552	335	322	322
q12	391	385	243	243
q13	17793	3804	3289	3289
q14	258	257	227	227
q15	570	497	484	484
q16	464	479	412	412
q17	732	904	446	446
q18	8167	7320	7234	7234
q19	1239	991	632	632
q20	410	392	273	273
q21	4682	4064	3429	3429
q22	1120	1065	1032	1032
Total cold run time: 108347 ms
Total hot run time: 36238 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5103	5039	5050	5039
q2	261	327	237	237
q3	2203	2752	2331	2331
q4	1516	1871	1431	1431
q5	4502	4416	4486	4416
q6	225	172	135	135
q7	2111	1980	1818	1818
q8	2682	2711	2628	2628
q9	7367	7335	7122	7122
q10	3103	3283	2811	2811
q11	634	534	521	521
q12	734	821	661	661
q13	3735	4018	3463	3463
q14	305	325	301	301
q15	528	492	492	492
q16	480	533	487	487
q17	1228	1550	1395	1395
q18	7997	7744	7678	7678
q19	942	1068	1046	1046
q20	2108	2114	1947	1947
q21	5236	4776	4697	4697
q22	1161	1088	1086	1086
Total cold run time: 54161 ms
Total hot run time: 51742 ms

@hello-stephen
Copy link
Contributor

run external 5

@seawinde seawinde force-pushed the fix_empty_relation_union_wrong branch from 4a2a8c6 to b95c035 Compare June 16, 2025 03:54
@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34052 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b95c035a9960a976732c93b7cd96043e85c46e94, data reload: false

------ Round 1 ----------------------------------
q1	17580	5145	5082	5082
q2	1925	268	172	172
q3	10349	1222	778	778
q4	10256	998	537	537
q5	7954	2398	2419	2398
q6	182	164	137	137
q7	930	750	633	633
q8	9295	1304	1057	1057
q9	6707	5074	5096	5074
q10	6862	2314	1922	1922
q11	486	286	277	277
q12	342	357	219	219
q13	17758	3669	3067	3067
q14	235	223	212	212
q15	569	483	484	483
q16	411	449	372	372
q17	580	847	367	367
q18	7499	7060	7162	7060
q19	1366	935	569	569
q20	341	341	232	232
q21	3794	3214	2430	2430
q22	1073	1031	974	974
Total cold run time: 106494 ms
Total hot run time: 34052 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5065	5052	5027	5027
q2	232	315	219	219
q3	2193	2668	2268	2268
q4	1416	1852	1372	1372
q5	4178	4101	4279	4101
q6	213	169	130	130
q7	2207	1915	1773	1773
q8	2621	2742	2618	2618
q9	7155	7105	7246	7105
q10	3044	3127	2715	2715
q11	568	517	492	492
q12	724	774	636	636
q13	3516	3865	3282	3282
q14	276	306	276	276
q15	529	488	471	471
q16	438	496	449	449
q17	1160	1508	1407	1407
q18	7714	7551	7529	7529
q19	801	809	853	809
q20	1953	2105	1870	1870
q21	5016	4377	4484	4377
q22	1108	1066	1032	1032
Total cold run time: 52127 ms
Total hot run time: 49958 ms

@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33763 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b3d79b218c8e81fc3726aa4876bf78cde9edf2c2, data reload: false

------ Round 1 ----------------------------------
q1	17572	5164	4952	4952
q2	1940	271	169	169
q3	10311	1235	771	771
q4	10209	1029	527	527
q5	7530	2388	2293	2293
q6	186	162	130	130
q7	919	742	616	616
q8	9312	1333	1072	1072
q9	6803	5059	5147	5059
q10	6839	2320	1915	1915
q11	480	285	273	273
q12	351	360	224	224
q13	17787	3658	3089	3089
q14	239	232	212	212
q15	570	485	475	475
q16	426	440	378	378
q17	606	867	368	368
q18	7534	7096	7058	7058
q19	2065	979	576	576
q20	334	335	216	216
q21	3922	3374	2433	2433
q22	1010	1001	957	957
Total cold run time: 106945 ms
Total hot run time: 33763 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5133	5003	5019	5003
q2	255	317	219	219
q3	2150	2684	2291	2291
q4	1315	1758	1380	1380
q5	4197	4159	4340	4159
q6	219	168	129	129
q7	1986	1958	1757	1757
q8	2605	2677	2516	2516
q9	7317	7222	7278	7222
q10	3023	3148	2711	2711
q11	584	523	510	510
q12	682	756	640	640
q13	3586	3907	3246	3246
q14	279	310	281	281
q15	527	486	494	486
q16	441	515	453	453
q17	1169	1500	1341	1341
q18	7886	7651	7414	7414
q19	820	812	874	812
q20	1984	2067	1960	1960
q21	5014	4426	4539	4426
q22	1106	1013	1024	1013
Total cold run time: 52278 ms
Total hot run time: 49969 ms

@seawinde
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 77.50% (31/40) 🎉
Increment coverage report
Complete coverage report

@seawinde
Copy link
Contributor Author

run external 10

@doris-robot
Copy link

TPC-H: Total hot run time: 34190 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ab3292f64f79bdf2846ff64409108b2a565c6510, data reload: false

------ Round 1 ----------------------------------
q1	17622	5158	5036	5036
q2	1917	277	193	193
q3	10302	1394	750	750
q4	10223	1022	540	540
q5	7575	2415	2334	2334
q6	177	164	135	135
q7	914	743	613	613
q8	9325	1310	1176	1176
q9	6755	5038	5171	5038
q10	6814	2393	1923	1923
q11	474	295	279	279
q12	345	355	227	227
q13	17774	3637	3150	3150
q14	224	230	219	219
q15	550	480	477	477
q16	417	446	368	368
q17	595	869	361	361
q18	7592	7245	7079	7079
q19	1215	948	541	541
q20	345	346	226	226
q21	4222	3205	2575	2575
q22	1034	956	950	950
Total cold run time: 106411 ms
Total hot run time: 34190 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5072	5110	5044	5044
q2	245	317	219	219
q3	2209	2658	2247	2247
q4	1388	1757	1344	1344
q5	4242	4088	4622	4088
q6	214	171	128	128
q7	2072	1922	1832	1832
q8	2631	2618	2520	2520
q9	7216	7128	7111	7111
q10	2987	3220	2763	2763
q11	593	507	494	494
q12	666	767	634	634
q13	3459	3825	3225	3225
q14	293	310	262	262
q15	511	484	463	463
q16	458	483	437	437
q17	1209	1511	1365	1365
q18	7670	7525	7290	7290
q19	809	791	892	791
q20	2003	2031	1898	1898
q21	4946	4565	4433	4433
q22	1069	1031	973	973
Total cold run time: 51962 ms
Total hot run time: 49561 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184911 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ab3292f64f79bdf2846ff64409108b2a565c6510, data reload: false

query1	990	392	383	383
query2	6561	1807	1757	1757
query3	6739	228	224	224
query4	26386	23691	23357	23357
query5	4363	621	444	444
query6	312	205	191	191
query7	4616	499	275	275
query8	263	226	219	219
query9	8614	2582	2583	2582
query10	484	341	268	268
query11	15626	15315	14759	14759
query12	159	109	101	101
query13	1652	519	416	416
query14	9314	5991	6013	5991
query15	206	191	171	171
query16	7233	625	466	466
query17	1201	732	574	574
query18	1968	400	303	303
query19	197	196	179	179
query20	120	119	119	119
query21	218	134	112	112
query22	4445	4259	4160	4160
query23	34020	33078	33097	33078
query24	8420	2377	2412	2377
query25	530	446	384	384
query26	1239	278	147	147
query27	2755	497	337	337
query28	4369	2115	2098	2098
query29	760	564	424	424
query30	290	219	185	185
query31	923	829	793	793
query32	70	67	61	61
query33	569	357	324	324
query34	790	832	539	539
query35	778	793	721	721
query36	949	998	882	882
query37	110	101	73	73
query38	4084	4086	3985	3985
query39	1478	1396	1405	1396
query40	210	120	107	107
query41	61	61	56	56
query42	124	104	104	104
query43	483	493	443	443
query44	1285	808	815	808
query45	173	172	161	161
query46	831	1009	612	612
query47	1759	1824	1679	1679
query48	384	409	313	313
query49	747	485	385	385
query50	634	672	396	396
query51	4111	4193	4046	4046
query52	113	106	98	98
query53	220	252	181	181
query54	592	568	494	494
query55	83	84	87	84
query56	298	287	285	285
query57	1201	1192	1123	1123
query58	260	257	257	257
query59	2547	2594	2528	2528
query60	346	324	306	306
query61	126	126	138	126
query62	794	712	663	663
query63	224	183	185	183
query64	4383	985	666	666
query65	4242	4110	4172	4110
query66	1146	401	310	310
query67	15744	15525	15394	15394
query68	8021	880	524	524
query69	467	302	265	265
query70	1242	1067	1087	1067
query71	484	321	300	300
query72	5659	4790	4602	4602
query73	689	576	355	355
query74	9004	9215	8842	8842
query75	3873	3167	2676	2676
query76	3653	1189	741	741
query77	784	359	288	288
query78	10066	10228	9400	9400
query79	1969	880	588	588
query80	577	509	501	501
query81	464	253	215	215
query82	419	124	96	96
query83	252	244	226	226
query84	244	108	89	89
query85	782	352	312	312
query86	334	298	281	281
query87	4320	4437	4285	4285
query88	3447	2275	2254	2254
query89	367	317	284	284
query90	1947	209	203	203
query91	149	136	116	116
query92	72	60	63	60
query93	1176	939	580	580
query94	675	387	296	296
query95	370	291	281	281
query96	499	555	288	288
query97	2674	2751	2677	2677
query98	234	207	202	202
query99	1420	1417	1244	1244
Total cold run time: 273255 ms
Total hot run time: 184911 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 28.89 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ab3292f64f79bdf2846ff64409108b2a565c6510, data reload: false

query1	0.04	0.03	0.03
query2	0.06	0.04	0.04
query3	0.23	0.08	0.06
query4	1.60	0.11	0.10
query5	0.43	0.41	0.41
query6	1.17	0.66	0.66
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.59	0.51	0.51
query10	0.58	0.58	0.56
query11	0.15	0.11	0.10
query12	0.14	0.11	0.11
query13	0.61	0.60	0.60
query14	0.78	0.81	0.80
query15	0.86	0.86	0.85
query16	0.37	0.38	0.39
query17	1.06	1.01	1.02
query18	0.22	0.21	0.21
query19	1.94	1.83	1.77
query20	0.02	0.01	0.01
query21	15.41	0.87	0.53
query22	0.75	1.04	0.75
query23	14.99	1.40	0.63
query24	7.24	0.90	0.93
query25	0.50	0.24	0.08
query26	0.58	0.16	0.14
query27	0.05	0.05	0.05
query28	9.41	0.93	0.46
query29	12.59	4.06	3.33
query30	0.26	0.09	0.07
query31	2.82	0.60	0.41
query32	3.23	0.55	0.48
query33	3.02	3.07	3.12
query34	15.92	5.09	4.47
query35	4.55	4.52	4.47
query36	0.67	0.51	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.04	0.03	0.03
query40	0.18	0.13	0.12
query41	0.08	0.02	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 103.42 s
Total hot run time: 28.89 s

@seawinde
Copy link
Contributor Author

run buildall

@seawinde seawinde force-pushed the fix_empty_relation_union_wrong branch from c9108e7 to fe19d56 Compare June 17, 2025 09:37
@seawinde
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33682 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fe19d56f1f9609cb180b891f4d2a0257c87abed2, data reload: false

------ Round 1 ----------------------------------
q1	17624	5116	5005	5005
q2	1942	283	173	173
q3	10313	1221	746	746
q4	10211	983	518	518
q5	7542	2278	2334	2278
q6	180	158	131	131
q7	897	754	611	611
q8	9308	1292	1051	1051
q9	6786	5055	5131	5055
q10	6891	2394	1963	1963
q11	487	286	273	273
q12	338	352	208	208
q13	17764	3695	3087	3087
q14	228	223	219	219
q15	569	479	477	477
q16	428	434	385	385
q17	578	854	347	347
q18	7822	7154	7220	7154
q19	1261	954	540	540
q20	334	327	220	220
q21	3723	2555	2278	2278
q22	1029	1007	963	963
Total cold run time: 106255 ms
Total hot run time: 33682 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5019	4968	5035	4968
q2	247	327	224	224
q3	2145	2640	2313	2313
q4	1327	1768	1352	1352
q5	4197	4116	4266	4116
q6	208	169	128	128
q7	1986	1918	1752	1752
q8	2589	2617	2534	2534
q9	7252	7170	7145	7145
q10	3040	3321	2804	2804
q11	561	508	485	485
q12	643	777	638	638
q13	3522	3891	3273	3273
q14	272	290	270	270
q15	518	481	472	472
q16	464	493	447	447
q17	1145	1571	1375	1375
q18	7681	7460	7449	7449
q19	789	806	950	806
q20	1983	2028	1989	1989
q21	5053	4444	4325	4325
q22	1088	1072	1037	1037
Total cold run time: 51729 ms
Total hot run time: 49902 ms

@morningman morningman merged commit 85b7231 into apache:master Jun 18, 2025
26 of 31 checks passed
seawinde added a commit to seawinde/doris that referenced this pull request Jun 18, 2025
…elation (apache#51700)

if query is as following which contains external table, if some
partition in external table is empty, when mv partiton is invalid, this
would cause compensate many union all empty relation
```sql
SELECT * FROM ${catalogName}.`test_paimon_spark`.test_tb_mix_format a
left join ${tableName} b on a.id=b.user_id;
```
seawinde added a commit to seawinde/doris that referenced this pull request Jun 18, 2025
…elation (apache#51700)

if query is as following which contains external table, if some
partition in external table is empty, when mv partiton is invalid, this
would cause compensate many union all empty relation
```sql
SELECT * FROM ${catalogName}.`test_paimon_spark`.test_tb_mix_format a
left join ${tableName} b on a.id=b.user_id;
```
yiguolei pushed a commit that referenced this pull request Jun 20, 2025
seawinde added a commit to seawinde/doris that referenced this pull request Jun 26, 2025
…elation (apache#51700)

if query is as following which contains external table, if some
partition in external table is empty, when mv partiton is invalid, this
would cause compensate many union all empty relation
```sql
SELECT * FROM ${catalogName}.`test_paimon_spark`.test_tb_mix_format a
left join ${tableName} b on a.id=b.user_id;
```
morrySnow pushed a commit that referenced this pull request Jun 27, 2025
morrySnow pushed a commit that referenced this pull request Jun 27, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jul 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants