Skip to content

Conversation

@sollhui
Copy link
Contributor

@sollhui sollhui commented Feb 23, 2025

What problem does this PR solve?

related #48511

Add more metrics to observe the routine load job:

Metrics Module Description
routine_load_get_msg_latency BE Time to pull a Kafka message
routine_load_get_msg_count BE Number of times pulling Kafka messages
routine_load_consume_bytes BE Total data volume consumed from Kafka
routine_load_consume_rows BE Total number of rows consumed from Kafka
routine_load_task_execute_time FE Task execution time
routine_load_task_execute_count FE Task execution count
routine_load_get_meta_latency FE Delay in obtaining Kafka metadata
routine_load_get_meta_count FE Number of times obtaining Kafka metadata
routine_load_get_meta_fail_count FE Number of failures in obtaining metadata
routine_load_received_bytes FE Total data volume consumed
routine_load_received_rows FE Total number of rows consumed

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@sollhui
Copy link
Contributor Author

sollhui commented Feb 23, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31544 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 94576c5adb59922a84426945b5d2749913636807, data reload: false

------ Round 1 ----------------------------------
q1	17655	5376	5120	5120
q2	2040	309	173	173
q3	10401	1311	711	711
q4	10215	999	539	539
q5	7555	2517	2277	2277
q6	192	168	133	133
q7	904	761	634	634
q8	9289	1387	1177	1177
q9	4945	4568	4654	4568
q10	6827	2319	1896	1896
q11	481	279	262	262
q12	353	355	213	213
q13	17753	3668	3082	3082
q14	222	220	222	220
q15	511	460	457	457
q16	600	594	588	588
q17	578	867	320	320
q18	6797	6240	6243	6240
q19	1212	953	521	521
q20	313	327	195	195
q21	2768	2179	1911	1911
q22	363	322	307	307
Total cold run time: 101974 ms
Total hot run time: 31544 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5126	5140	5158	5140
q2	234	344	232	232
q3	2183	2712	2223	2223
q4	1416	1810	1324	1324
q5	4196	4159	4163	4159
q6	212	163	126	126
q7	1856	1860	1655	1655
q8	2675	2614	2538	2538
q9	7354	7172	7181	7172
q10	3054	3213	2719	2719
q11	571	501	498	498
q12	686	788	640	640
q13	3411	3864	3266	3266
q14	296	285	272	272
q15	512	471	472	471
q16	623	696	646	646
q17	1132	1610	1329	1329
q18	7522	7212	7361	7212
q19	799	770	853	770
q20	1959	2079	1898	1898
q21	5368	4965	4857	4857
q22	642	582	536	536
Total cold run time: 51827 ms
Total hot run time: 49683 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184681 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 94576c5adb59922a84426945b5d2749913636807, data reload: false

query1	982	379	376	376
query2	6523	1887	1905	1887
query3	6796	215	206	206
query4	26216	23774	23504	23504
query5	4338	700	493	493
query6	312	234	196	196
query7	4609	520	309	309
query8	283	233	219	219
query9	8590	2742	2744	2742
query10	435	315	273	273
query11	15453	15129	15016	15016
query12	152	103	104	103
query13	1647	549	426	426
query14	9094	6331	6177	6177
query15	201	189	181	181
query16	7124	639	470	470
query17	1034	732	576	576
query18	1950	399	297	297
query19	191	211	174	174
query20	126	114	121	114
query21	211	120	103	103
query22	4247	4415	4307	4307
query23	34321	33406	32986	32986
query24	7833	2392	2381	2381
query25	559	479	423	423
query26	1247	278	153	153
query27	2480	523	355	355
query28	4176	2502	2485	2485
query29	747	590	455	455
query30	234	186	203	186
query31	938	855	791	791
query32	70	61	64	61
query33	545	371	297	297
query34	776	848	528	528
query35	793	842	744	744
query36	997	1009	880	880
query37	115	99	76	76
query38	4107	4225	4071	4071
query39	1493	1384	1374	1374
query40	206	115	105	105
query41	53	52	48	48
query42	123	112	107	107
query43	489	505	493	493
query44	1343	835	819	819
query45	176	165	160	160
query46	855	1036	667	667
query47	1734	1794	1717	1717
query48	393	450	327	327
query49	778	515	402	402
query50	682	752	418	418
query51	4217	4238	4134	4134
query52	112	108	96	96
query53	229	250	182	182
query54	491	491	407	407
query55	90	79	82	79
query56	285	281	264	264
query57	1133	1111	1052	1052
query58	258	237	239	237
query59	2753	2800	2513	2513
query60	289	285	271	271
query61	115	111	123	111
query62	798	746	641	641
query63	233	191	191	191
query64	4287	997	664	664
query65	3208	3199	3165	3165
query66	1155	410	298	298
query67	15668	15492	15288	15288
query68	7762	759	537	537
query69	482	299	271	271
query70	1171	1128	1111	1111
query71	433	301	276	276
query72	5872	3580	3721	3580
query73	705	750	360	360
query74	9059	9283	9026	9026
query75	3335	3158	2688	2688
query76	3259	1169	737	737
query77	603	367	282	282
query78	10045	10078	9286	9286
query79	1949	903	634	634
query80	699	552	471	471
query81	516	283	234	234
query82	357	131	95	95
query83	169	169	153	153
query84	269	95	77	77
query85	805	352	299	299
query86	371	285	312	285
query87	4542	4548	4391	4391
query88	3240	2374	2371	2371
query89	397	325	289	289
query90	1974	203	191	191
query91	137	157	114	114
query92	70	58	56	56
query93	1715	1008	572	572
query94	648	397	298	298
query95	354	270	256	256
query96	527	549	305	305
query97	2684	2842	2709	2709
query98	222	209	194	194
query99	1344	1386	1282	1282
Total cold run time: 269813 ms
Total hot run time: 184681 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.93 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 94576c5adb59922a84426945b5d2749913636807, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.04
query3	0.24	0.07	0.06
query4	1.61	0.09	0.10
query5	0.41	0.41	0.43
query6	1.19	0.68	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.62	0.50	0.52
query10	0.58	0.57	0.58
query11	0.16	0.11	0.10
query12	0.14	0.11	0.12
query13	0.63	0.60	0.61
query14	2.65	2.69	2.71
query15	0.92	0.84	0.84
query16	0.36	0.38	0.38
query17	1.03	1.04	1.01
query18	0.21	0.19	0.19
query19	1.88	1.78	2.04
query20	0.02	0.01	0.02
query21	15.35	0.88	0.54
query22	0.74	1.23	0.67
query23	14.92	1.38	0.62
query24	6.47	1.39	1.58
query25	0.56	0.30	0.06
query26	0.57	0.17	0.14
query27	0.05	0.05	0.05
query28	10.28	0.88	0.43
query29	12.59	3.90	3.20
query30	0.24	0.10	0.06
query31	2.81	0.59	0.38
query32	3.22	0.56	0.47
query33	2.97	3.02	3.05
query34	15.75	5.15	4.48
query35	4.54	4.53	4.52
query36	0.67	0.49	0.50
query37	0.08	0.06	0.06
query38	0.05	0.03	0.04
query39	0.04	0.03	0.02
query40	0.17	0.14	0.13
query41	0.07	0.02	0.02
query42	0.04	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 105.03 s
Total hot run time: 30.93 s

@sollhui
Copy link
Contributor Author

sollhui commented Feb 24, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31272 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 927f61a5092ef0867dcb6c780e4b7618d8bdb690, data reload: false

------ Round 1 ----------------------------------
q1	17625	5262	5026	5026
q2	2048	315	178	178
q3	10383	1230	754	754
q4	10209	1028	539	539
q5	7506	2438	2234	2234
q6	188	169	143	143
q7	913	755	612	612
q8	9308	1309	1171	1171
q9	4858	4600	4482	4482
q10	6823	2300	1887	1887
q11	473	298	261	261
q12	359	360	224	224
q13	17770	3618	3050	3050
q14	225	220	208	208
q15	502	458	460	458
q16	626	606	579	579
q17	582	849	328	328
q18	6629	6190	6201	6190
q19	1209	952	546	546
q20	317	325	196	196
q21	2744	2138	1904	1904
q22	361	334	302	302
Total cold run time: 101658 ms
Total hot run time: 31272 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5101	5062	5125	5062
q2	233	358	229	229
q3	2127	2630	2282	2282
q4	1395	1797	1325	1325
q5	4193	4084	4105	4084
q6	205	169	125	125
q7	1871	1810	1643	1643
q8	2530	2549	2489	2489
q9	7204	7301	7327	7301
q10	3020	3226	2834	2834
q11	567	532	505	505
q12	714	810	649	649
q13	3648	3772	3278	3278
q14	281	297	277	277
q15	523	474	451	451
q16	631	672	621	621
q17	1102	1587	1329	1329
q18	7605	7425	7344	7344
q19	785	809	859	809
q20	1929	2023	1869	1869
q21	5353	5009	4864	4864
q22	615	576	549	549
Total cold run time: 51632 ms
Total hot run time: 49919 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191523 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 927f61a5092ef0867dcb6c780e4b7618d8bdb690, data reload: false

query1	1326	983	935	935
query2	6233	1885	1904	1885
query3	10941	4524	4414	4414
query4	57282	25031	23754	23754
query5	5148	582	493	493
query6	366	199	192	192
query7	5067	510	300	300
query8	326	261	230	230
query9	5937	2755	2771	2755
query10	425	312	271	271
query11	15322	15130	14851	14851
query12	168	106	109	106
query13	1109	533	435	435
query14	11151	6477	6624	6477
query15	207	201	178	178
query16	7104	635	500	500
query17	1113	748	606	606
query18	1524	428	324	324
query19	220	219	170	170
query20	133	126	125	125
query21	218	122	110	110
query22	4538	4525	4674	4525
query23	34034	33278	33325	33278
query24	5701	2432	2433	2432
query25	472	469	425	425
query26	687	272	165	165
query27	1686	541	358	358
query28	2948	2584	2547	2547
query29	562	584	427	427
query30	217	187	149	149
query31	911	850	837	837
query32	68	66	61	61
query33	460	358	316	316
query34	747	886	537	537
query35	805	844	752	752
query36	948	1001	900	900
query37	117	102	81	81
query38	4247	4295	4196	4196
query39	1479	1414	1419	1414
query40	210	116	105	105
query41	51	48	53	48
query42	124	110	108	108
query43	529	520	505	505
query44	1355	840	857	840
query45	179	170	162	162
query46	911	1068	659	659
query47	1871	1926	1848	1848
query48	427	446	334	334
query49	691	507	446	446
query50	717	741	424	424
query51	4282	4313	4227	4227
query52	122	105	98	98
query53	234	259	198	198
query54	490	494	434	434
query55	91	90	84	84
query56	291	260	268	260
query57	1159	1199	1131	1131
query58	241	258	234	234
query59	2802	2932	2829	2829
query60	294	289	268	268
query61	117	117	112	112
query62	742	773	675	675
query63	256	194	199	194
query64	1461	996	683	683
query65	3268	3137	3097	3097
query66	686	389	290	290
query67	15689	15502	15368	15368
query68	5337	798	542	542
query69	520	312	271	271
query70	1263	1174	1126	1126
query71	442	303	269	269
query72	6311	3538	3721	3538
query73	1072	736	370	370
query74	9167	9165	9023	9023
query75	3174	3169	2657	2657
query76	3762	1161	722	722
query77	520	382	271	271
query78	9995	10086	9281	9281
query79	2605	837	621	621
query80	646	528	442	442
query81	499	270	238	238
query82	405	130	91	91
query83	182	181	153	153
query84	289	95	75	75
query85	756	346	292	292
query86	378	312	311	311
query87	4411	4440	4475	4440
query88	3894	2375	2331	2331
query89	431	324	295	295
query90	1782	191	191	191
query91	133	138	108	108
query92	73	60	57	57
query93	2255	1003	574	574
query94	673	403	295	295
query95	343	273	257	257
query96	532	551	301	301
query97	2809	2855	2709	2709
query98	219	218	202	202
query99	1480	1399	1298	1298
Total cold run time: 297595 ms
Total hot run time: 191523 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.93 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 927f61a5092ef0867dcb6c780e4b7618d8bdb690, data reload: false

query1	0.04	0.04	0.03
query2	0.06	0.04	0.03
query3	0.23	0.07	0.06
query4	1.62	0.10	0.10
query5	0.42	0.41	0.40
query6	1.16	0.66	0.65
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.60	0.52	0.51
query10	0.58	0.56	0.56
query11	0.15	0.10	0.11
query12	0.14	0.12	0.11
query13	0.62	0.59	0.60
query14	2.68	2.70	2.76
query15	0.91	0.86	0.82
query16	0.39	0.38	0.37
query17	1.03	1.06	1.02
query18	0.21	0.20	0.19
query19	1.84	1.73	1.97
query20	0.01	0.01	0.01
query21	15.37	0.89	0.56
query22	0.77	1.23	0.64
query23	14.94	1.39	0.60
query24	6.60	3.04	0.45
query25	0.53	0.17	0.21
query26	0.61	0.16	0.14
query27	0.05	0.04	0.04
query28	9.50	0.84	0.41
query29	12.55	3.91	3.24
query30	0.24	0.08	0.06
query31	2.83	0.57	0.38
query32	3.22	0.54	0.46
query33	2.99	3.03	3.05
query34	15.69	5.29	4.45
query35	4.51	4.49	4.54
query36	0.68	0.50	0.48
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.17	0.13	0.12
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 104.33 s
Total hot run time: 29.93 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 44.51% (11753/26407)
Line Coverage: 34.00% (98244/288974)
Region Coverage: 33.08% (50328/152125)
Branch Coverage: 28.73% (25297/88056)
Coverage Report: http://coverage.selectdb-in.cc/coverage/927f61a5092ef0867dcb6c780e4b7618d8bdb690_927f61a5092ef0867dcb6c780e4b7618d8bdb690/report/index.html

@sollhui
Copy link
Contributor Author

sollhui commented Feb 24, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31319 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit b9113bae31cb3ce05f3867368909f9217a6abf27, data reload: false

------ Round 1 ----------------------------------
q1	17588	5348	5087	5087
q2	2053	290	167	167
q3	10484	1264	696	696
q4	10219	1020	537	537
q5	7538	2419	2271	2271
q6	188	171	139	139
q7	893	748	590	590
q8	9294	1305	1099	1099
q9	4833	4704	4624	4624
q10	6824	2299	1878	1878
q11	493	278	266	266
q12	361	371	218	218
q13	17759	3692	3102	3102
q14	222	237	205	205
q15	506	479	445	445
q16	616	614	583	583
q17	578	849	337	337
q18	7414	6190	6184	6184
q19	1216	932	512	512
q20	311	326	199	199
q21	2748	2141	1884	1884
q22	366	331	296	296
Total cold run time: 102504 ms
Total hot run time: 31319 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5160	5139	5049	5049
q2	239	324	240	240
q3	2146	2689	2301	2301
q4	1414	1821	1379	1379
q5	4235	4093	4116	4093
q6	205	162	124	124
q7	1847	1830	1641	1641
q8	2619	2586	2513	2513
q9	7345	7088	7185	7088
q10	3038	3161	2710	2710
q11	578	509	489	489
q12	670	749	595	595
q13	3526	3843	3288	3288
q14	270	295	265	265
q15	520	457	460	457
q16	654	673	638	638
q17	1114	1555	1380	1380
q18	7604	7315	7304	7304
q19	773	826	938	826
q20	1926	2059	1835	1835
q21	5252	4945	4701	4701
q22	632	577	547	547
Total cold run time: 51767 ms
Total hot run time: 49463 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183975 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit b9113bae31cb3ce05f3867368909f9217a6abf27, data reload: false

query1	972	371	363	363
query2	6527	1814	1750	1750
query3	6782	216	215	215
query4	26079	23704	23333	23333
query5	4307	653	491	491
query6	291	200	200	200
query7	4619	508	294	294
query8	288	226	229	226
query9	8595	2545	2552	2545
query10	455	306	275	275
query11	15340	15085	15082	15082
query12	152	109	103	103
query13	1653	512	366	366
query14	8819	6594	6503	6503
query15	206	196	177	177
query16	7143	621	473	473
query17	936	708	560	560
query18	1954	395	313	313
query19	195	192	155	155
query20	119	116	113	113
query21	208	130	108	108
query22	4366	4251	4522	4251
query23	34542	33292	32980	32980
query24	7788	2388	2345	2345
query25	546	480	419	419
query26	1241	268	156	156
query27	2291	486	330	330
query28	4036	2395	2385	2385
query29	761	553	469	469
query30	234	178	150	150
query31	934	847	795	795
query32	71	65	63	63
query33	584	340	298	298
query34	761	859	499	499
query35	780	804	722	722
query36	977	940	899	899
query37	118	99	83	83
query38	4233	4109	4043	4043
query39	1433	1398	1399	1398
query40	215	112	99	99
query41	52	52	104	52
query42	112	105	100	100
query43	487	495	453	453
query44	1260	774	762	762
query45	174	172	156	156
query46	860	1013	634	634
query47	1736	1796	1708	1708
query48	403	428	312	312
query49	772	495	409	409
query50	677	727	419	419
query51	4193	4142	4126	4126
query52	104	101	93	93
query53	226	250	188	188
query54	485	513	408	408
query55	83	81	80	80
query56	249	246	247	246
query57	1137	1102	1056	1056
query58	259	238	241	238
query59	2406	2580	2461	2461
query60	297	285	261	261
query61	134	119	117	117
query62	771	717	669	669
query63	233	183	185	183
query64	4310	997	654	654
query65	3207	3137	3200	3137
query66	1179	405	301	301
query67	15877	15522	15378	15378
query68	7964	750	494	494
query69	472	298	254	254
query70	1191	1122	1108	1108
query71	395	308	259	259
query72	5713	3642	3636	3636
query73	716	739	343	343
query74	9127	9083	8934	8934
query75	3325	3146	2690	2690
query76	3367	1160	735	735
query77	528	360	291	291
query78	9925	10072	9448	9448
query79	2114	852	599	599
query80	607	547	448	448
query81	513	275	248	248
query82	484	131	99	99
query83	174	169	154	154
query84	283	97	75	75
query85	849	349	298	298
query86	378	319	299	299
query87	4333	4478	4528	4478
query88	4091	2210	2245	2210
query89	393	320	275	275
query90	1935	192	191	191
query91	132	135	111	111
query92	68	60	57	57
query93	1833	975	568	568
query94	658	393	297	297
query95	349	267	253	253
query96	499	557	278	278
query97	2763	2814	2738	2738
query98	234	209	195	195
query99	1311	1432	1282	1282
Total cold run time: 269958 ms
Total hot run time: 183975 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.21 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit b9113bae31cb3ce05f3867368909f9217a6abf27, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.03	0.03
query3	0.24	0.06	0.07
query4	1.61	0.10	0.10
query5	0.40	0.42	0.40
query6	1.16	0.67	0.65
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.59	0.51	0.54
query10	0.57	0.57	0.56
query11	0.15	0.11	0.11
query12	0.15	0.11	0.12
query13	0.63	0.60	0.60
query14	2.68	2.70	2.70
query15	0.94	0.85	0.86
query16	0.38	0.38	0.38
query17	1.00	1.00	1.03
query18	0.21	0.19	0.19
query19	1.87	1.81	1.98
query20	0.01	0.01	0.01
query21	15.36	0.90	0.53
query22	0.76	1.26	0.62
query23	14.93	1.36	0.62
query24	7.94	0.63	0.94
query25	0.49	0.21	0.09
query26	0.60	0.17	0.13
query27	0.05	0.04	0.04
query28	9.43	0.87	0.43
query29	12.57	3.94	3.29
query30	0.25	0.10	0.07
query31	2.81	0.59	0.39
query32	3.22	0.54	0.46
query33	2.97	2.97	3.00
query34	15.60	5.11	4.52
query35	4.53	4.48	4.52
query36	0.67	0.49	0.48
query37	0.09	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.16	0.13	0.13
query41	0.08	0.02	0.02
query42	0.04	0.03	0.02
query43	0.03	0.04	0.03
Total cold run time: 105.42 s
Total hot run time: 30.21 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 44.51% (11752/26406)
Line Coverage: 34.00% (98256/288960)
Region Coverage: 33.10% (50344/152105)
Branch Coverage: 28.74% (25301/88040)
Coverage Report: http://coverage.selectdb-in.cc/coverage/b9113bae31cb3ce05f3867368909f9217a6abf27_b9113bae31cb3ce05f3867368909f9217a6abf27/report/index.html

@sollhui sollhui marked this pull request as ready for review February 24, 2025 09:30
@sollhui
Copy link
Contributor Author

sollhui commented Feb 24, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31400 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f0f61b83a870e82cee9ab7901338ffc0c261ab69, data reload: false

------ Round 1 ----------------------------------
q1	17580	5317	5229	5229
q2	2053	307	163	163
q3	10410	1324	703	703
q4	10217	1033	521	521
q5	7513	2540	2311	2311
q6	195	169	136	136
q7	911	760	621	621
q8	9325	1348	1128	1128
q9	4810	4668	4528	4528
q10	6804	2310	1870	1870
q11	492	286	256	256
q12	353	351	215	215
q13	17769	3685	3048	3048
q14	240	223	211	211
q15	521	470	461	461
q16	617	610	580	580
q17	584	860	339	339
q18	6572	6189	6121	6121
q19	1208	945	551	551
q20	313	311	184	184
q21	2764	2130	1926	1926
q22	356	333	298	298
Total cold run time: 101607 ms
Total hot run time: 31400 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5127	5120	5150	5120
q2	238	330	224	224
q3	2119	2666	2324	2324
q4	1439	1846	1383	1383
q5	4241	4080	4132	4080
q6	215	164	125	125
q7	1861	1798	1652	1652
q8	2556	2529	2505	2505
q9	7441	7160	7147	7147
q10	2962	3190	2867	2867
q11	570	514	485	485
q12	662	746	627	627
q13	3539	3858	3243	3243
q14	268	285	281	281
q15	510	459	456	456
q16	638	683	629	629
q17	1113	1608	1321	1321
q18	7348	7430	7301	7301
q19	790	792	954	792
q20	1979	1998	1847	1847
q21	5241	4838	4866	4838
q22	665	588	534	534
Total cold run time: 51522 ms
Total hot run time: 49781 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183959 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f0f61b83a870e82cee9ab7901338ffc0c261ab69, data reload: false

query1	992	387	368	368
query2	6545	1857	1781	1781
query3	6798	211	209	209
query4	26136	23828	23479	23479
query5	4749	665	493	493
query6	309	213	197	197
query7	4611	516	322	322
query8	301	251	234	234
query9	8640	2765	2740	2740
query10	477	318	262	262
query11	15301	15259	14902	14902
query12	158	119	110	110
query13	1696	532	447	447
query14	9474	6129	6155	6129
query15	206	193	170	170
query16	7671	628	457	457
query17	1160	728	559	559
query18	2004	408	292	292
query19	195	214	163	163
query20	122	126	113	113
query21	215	122	103	103
query22	4226	4315	4228	4228
query23	33858	32850	32895	32850
query24	8289	2366	2399	2366
query25	531	452	394	394
query26	1235	268	153	153
query27	2694	493	364	364
query28	4277	2490	2475	2475
query29	707	564	422	422
query30	230	183	155	155
query31	944	831	755	755
query32	75	62	63	62
query33	574	370	338	338
query34	762	876	541	541
query35	787	819	735	735
query36	974	954	871	871
query37	121	98	76	76
query38	4094	4097	4162	4097
query39	1459	1382	1413	1382
query40	214	116	105	105
query41	53	50	49	49
query42	132	111	104	104
query43	491	506	463	463
query44	1305	828	818	818
query45	172	167	160	160
query46	871	1035	641	641
query47	1744	1798	1716	1716
query48	393	442	332	332
query49	786	509	467	467
query50	686	721	418	418
query51	4159	4110	4160	4110
query52	107	109	98	98
query53	231	251	190	190
query54	494	484	422	422
query55	84	82	83	82
query56	268	276	244	244
query57	1117	1136	1042	1042
query58	259	233	252	233
query59	2676	2750	2523	2523
query60	287	281	269	269
query61	121	116	121	116
query62	794	726	643	643
query63	242	186	188	186
query64	4291	1000	661	661
query65	3162	3159	3140	3140
query66	1059	411	333	333
query67	15728	15706	15310	15310
query68	6803	784	544	544
query69	498	297	314	297
query70	1204	1111	1099	1099
query71	416	296	272	272
query72	5657	3541	3663	3541
query73	714	750	367	367
query74	9262	9089	8899	8899
query75	3148	3161	2723	2723
query76	3295	1170	733	733
query77	469	366	288	288
query78	9914	10230	9283	9283
query79	1892	877	615	615
query80	690	545	453	453
query81	495	271	235	235
query82	195	126	96	96
query83	189	182	152	152
query84	247	99	77	77
query85	789	363	296	296
query86	369	327	296	296
query87	4368	4596	4379	4379
query88	3126	2389	2373	2373
query89	403	329	284	284
query90	1859	195	201	195
query91	137	141	108	108
query92	68	61	58	58
query93	1241	1010	571	571
query94	634	420	305	305
query95	350	264	256	256
query96	522	549	296	296
query97	2774	2853	2734	2734
query98	229	204	207	204
query99	1316	1383	1261	1261
Total cold run time: 268796 ms
Total hot run time: 183959 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.81 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f0f61b83a870e82cee9ab7901338ffc0c261ab69, data reload: false

query1	0.04	0.04	0.04
query2	0.06	0.04	0.03
query3	0.24	0.07	0.06
query4	1.62	0.10	0.10
query5	0.42	0.42	0.39
query6	1.17	0.65	0.65
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.59	0.52	0.53
query10	0.57	0.58	0.56
query11	0.16	0.11	0.11
query12	0.14	0.11	0.12
query13	0.63	0.60	0.61
query14	2.70	2.69	2.70
query15	0.91	0.85	0.85
query16	0.37	0.38	0.37
query17	1.01	1.03	1.02
query18	0.21	0.20	0.19
query19	1.93	1.94	1.80
query20	0.02	0.01	0.01
query21	15.38	0.90	0.54
query22	0.76	1.18	0.62
query23	14.96	1.39	0.63
query24	6.74	1.24	1.48
query25	0.52	0.26	0.09
query26	0.64	0.15	0.14
query27	0.05	0.06	0.05
query28	9.96	0.84	0.41
query29	12.54	3.97	3.29
query30	0.25	0.09	0.06
query31	2.81	0.58	0.38
query32	3.22	0.54	0.46
query33	2.99	3.00	2.98
query34	15.73	5.07	4.51
query35	4.49	4.49	4.52
query36	0.66	0.49	0.48
query37	0.09	0.07	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.17	0.14	0.12
query41	0.09	0.02	0.03
query42	0.04	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 105.06 s
Total hot run time: 30.81 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 44.51% (11753/26406)
Line Coverage: 34.01% (98281/288960)
Region Coverage: 33.09% (50326/152105)
Branch Coverage: 28.73% (25298/88040)
Coverage Report: http://coverage.selectdb-in.cc/coverage/f0f61b83a870e82cee9ab7901338ffc0c261ab69_f0f61b83a870e82cee9ab7901338ffc0c261ab69/report/index.html

@sollhui sollhui force-pushed the enhance_rl_metrics branch from f0f61b8 to 3e118f0 Compare March 1, 2025 03:05
@sollhui
Copy link
Contributor Author

sollhui commented Mar 1, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 31815 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3e118f005f84af00b420095e792eee2fcf98fa42, data reload: false

------ Round 1 ----------------------------------
q1	17582	5247	5068	5068
q2	2051	303	168	168
q3	10463	1264	749	749
q4	10212	1028	530	530
q5	7514	2425	2356	2356
q6	196	170	138	138
q7	919	743	594	594
q8	9322	1334	1129	1129
q9	4904	4821	4846	4821
q10	6831	2306	1903	1903
q11	489	301	249	249
q12	345	358	230	230
q13	17774	3693	3102	3102
q14	233	219	221	219
q15	523	464	463	463
q16	624	614	575	575
q17	605	878	337	337
q18	6696	6288	6134	6134
q19	1648	992	579	579
q20	325	328	200	200
q21	2914	2160	1968	1968
q22	362	336	303	303
Total cold run time: 102532 ms
Total hot run time: 31815 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5188	5110	5144	5110
q2	236	330	235	235
q3	2207	2720	2333	2333
q4	1432	1826	1386	1386
q5	4296	4127	4162	4127
q6	206	160	122	122
q7	1885	1823	1757	1757
q8	2609	2578	2522	2522
q9	7288	7194	7147	7147
q10	3038	3215	2817	2817
q11	591	506	478	478
q12	710	781	657	657
q13	3449	3989	3306	3306
q14	274	307	285	285
q15	522	484	450	450
q16	666	695	653	653
q17	1134	1597	1349	1349
q18	7515	7407	7314	7314
q19	803	856	866	856
q20	2003	2039	1838	1838
q21	5475	5059	4667	4667
q22	641	561	558	558
Total cold run time: 52168 ms
Total hot run time: 49967 ms

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Mar 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Mar 5, 2025

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 33.33% (7/21) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.98% (12296/26742)
Line Coverage 35.42% (103800/293024)
Region Coverage 34.59% (53169/153713)
Branch Coverage 30.26% (26913/88930)

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit b83fb48 into apache:master Mar 6, 2025
24 of 26 checks passed
github-actions bot pushed a commit that referenced this pull request Mar 6, 2025
…job (#48209)

### What problem does this PR solve?

related #48511

Add more metrics to observe the routine load job:

| Metrics | Module | Description |
| ---------------------------------- | ------ |
------------------------------- |
| routine_load_get_msg_latency | BE | Time to pull a Kafka message |
| routine_load_get_msg_count | BE | Number of times pulling Kafka
messages |
| routine_load_consume_bytes | BE | Total data volume consumed from
Kafka |
| routine_load_consume_rows | BE | Total number of rows consumed from
Kafka |
| routine_load_task_execute_time | FE | Task execution time |
| routine_load_task_execute_count | FE | Task execution count |
| routine_load_get_meta_latency | FE | Delay in obtaining Kafka metadata
|
| routine_load_get_meta_count | FE | Number of times obtaining Kafka
metadata |
| routine_load_get_meta_fail_count | FE | Number of failures in
obtaining metadata |
| routine_load_received_bytes | FE | Total data volume consumed |
| routine_load_received_rows | FE | Total number of rows consumed |
github-actions bot pushed a commit that referenced this pull request Mar 6, 2025
…job (#48209)

### What problem does this PR solve?

related #48511

Add more metrics to observe the routine load job:

| Metrics | Module | Description |
| ---------------------------------- | ------ |
------------------------------- |
| routine_load_get_msg_latency | BE | Time to pull a Kafka message |
| routine_load_get_msg_count | BE | Number of times pulling Kafka
messages |
| routine_load_consume_bytes | BE | Total data volume consumed from
Kafka |
| routine_load_consume_rows | BE | Total number of rows consumed from
Kafka |
| routine_load_task_execute_time | FE | Task execution time |
| routine_load_task_execute_count | FE | Task execution count |
| routine_load_get_meta_latency | FE | Delay in obtaining Kafka metadata
|
| routine_load_get_meta_count | FE | Number of times obtaining Kafka
metadata |
| routine_load_get_meta_fail_count | FE | Number of failures in
obtaining metadata |
| routine_load_received_bytes | FE | Total data volume consumed |
| routine_load_received_rows | FE | Total number of rows consumed |
dataroaring pushed a commit that referenced this pull request Mar 10, 2025
…outine load job #48209 (#48764)

Cherry-picked from #48209

Co-authored-by: hui lai <laihui@selectdb.com>
dataroaring pushed a commit that referenced this pull request Mar 17, 2025
…#48963)

### What problem does this PR solve?

Part IV of #48511

doc apache/doris-website#2196

**Introduce routine load job statistic system table:**
```
mysql> show create table information_schema.routine_load_job\G
*************************** 1. row ***************************
       Table: routine_load_job
Create Table: CREATE TABLE `routine_load_job` (
  `JOB_ID` text NULL,
  `JOB_NAME` text NULL,
  `CREATE_TIME` text NULL,
  `PAUSE_TIME` text NULL,
  `END_TIME` text NULL,
  `DB_NAME` text NULL,
  `TABLE_NAME` text NULL,
  `STATE` text NULL,
  `CURRENT_TASK_NUM` text NULL,
  `JOB_PROPERTIES` text NULL,
  `DATA_SOURCE_PROPERTIES` text NULL,
  `CUSTOM_PROPERTIES` text NULL,
  `STATISTIC` text NULL,
  `PROGRESS` text NULL,
  `LAG` text NULL,
  `REASON_OF_STATE_CHANGED` text NULL,
  `ERROR_LOG_URLS` text NULL,
  `USER_NAME` text NULL,
  `CURRENT_ABORT_TASK_NUM` int NULL,
  `IS_ABNORMAL_PAUSE` boolean NULL
) ENGINE=SCHEMA;
1 row in set (0.00 sec)
```

**There are some benefits to empower job with SQL query capability for
statistical information:**

- It can be used in conjunction with metrics add through
#48209 to roughly locate abnormal
jobs when Grafana alarms, and the following SQL can be used:

```
SELECT JOB_NAME
FROM information_schema.routine_load_job_statistics
WHERE CURRENT_ABORT_TASK_NUM > 0
   OR IS_ABNORMAL_PAUSE = TRUE;
```

- User can use the `select * from information_schema.routine_load_job`
instead of the `show routine load`. The advantage is that the `show
routine load` can only be searched by name, but SQL can be very flexible
in locating jobs
sollhui added a commit to sollhui/doris that referenced this pull request Mar 20, 2025
…apache#48963)

Part IV of apache#48511

doc apache/doris-website#2196

**Introduce routine load job statistic system table:**
```
mysql> show create table information_schema.routine_load_job\G
*************************** 1. row ***************************
       Table: routine_load_job
Create Table: CREATE TABLE `routine_load_job` (
  `JOB_ID` text NULL,
  `JOB_NAME` text NULL,
  `CREATE_TIME` text NULL,
  `PAUSE_TIME` text NULL,
  `END_TIME` text NULL,
  `DB_NAME` text NULL,
  `TABLE_NAME` text NULL,
  `STATE` text NULL,
  `CURRENT_TASK_NUM` text NULL,
  `JOB_PROPERTIES` text NULL,
  `DATA_SOURCE_PROPERTIES` text NULL,
  `CUSTOM_PROPERTIES` text NULL,
  `STATISTIC` text NULL,
  `PROGRESS` text NULL,
  `LAG` text NULL,
  `REASON_OF_STATE_CHANGED` text NULL,
  `ERROR_LOG_URLS` text NULL,
  `USER_NAME` text NULL,
  `CURRENT_ABORT_TASK_NUM` int NULL,
  `IS_ABNORMAL_PAUSE` boolean NULL
) ENGINE=SCHEMA;
1 row in set (0.00 sec)
```

**There are some benefits to empower job with SQL query capability for
statistical information:**

- It can be used in conjunction with metrics add through
apache#48209 to roughly locate abnormal
jobs when Grafana alarms, and the following SQL can be used:

```
SELECT JOB_NAME
FROM information_schema.routine_load_job_statistics
WHERE CURRENT_ABORT_TASK_NUM > 0
   OR IS_ABNORMAL_PAUSE = TRUE;
```

- User can use the `select * from information_schema.routine_load_job`
instead of the `show routine load`. The advantage is that the `show
routine load` can only be searched by name, but SQL can be very flexible
in locating jobs
sollhui added a commit to sollhui/doris that referenced this pull request Mar 20, 2025
…apache#48963)

Part IV of apache#48511

doc apache/doris-website#2196

**Introduce routine load job statistic system table:**
```
mysql> show create table information_schema.routine_load_job\G
*************************** 1. row ***************************
       Table: routine_load_job
Create Table: CREATE TABLE `routine_load_job` (
  `JOB_ID` text NULL,
  `JOB_NAME` text NULL,
  `CREATE_TIME` text NULL,
  `PAUSE_TIME` text NULL,
  `END_TIME` text NULL,
  `DB_NAME` text NULL,
  `TABLE_NAME` text NULL,
  `STATE` text NULL,
  `CURRENT_TASK_NUM` text NULL,
  `JOB_PROPERTIES` text NULL,
  `DATA_SOURCE_PROPERTIES` text NULL,
  `CUSTOM_PROPERTIES` text NULL,
  `STATISTIC` text NULL,
  `PROGRESS` text NULL,
  `LAG` text NULL,
  `REASON_OF_STATE_CHANGED` text NULL,
  `ERROR_LOG_URLS` text NULL,
  `USER_NAME` text NULL,
  `CURRENT_ABORT_TASK_NUM` int NULL,
  `IS_ABNORMAL_PAUSE` boolean NULL
) ENGINE=SCHEMA;
1 row in set (0.00 sec)
```

**There are some benefits to empower job with SQL query capability for
statistical information:**

- It can be used in conjunction with metrics add through
apache#48209 to roughly locate abnormal
jobs when Grafana alarms, and the following SQL can be used:

```
SELECT JOB_NAME
FROM information_schema.routine_load_job_statistics
WHERE CURRENT_ABORT_TASK_NUM > 0
   OR IS_ABNORMAL_PAUSE = TRUE;
```

- User can use the `select * from information_schema.routine_load_job`
instead of the `show routine load`. The advantage is that the `show
routine load` can only be searched by name, but SQL can be very flexible
in locating jobs
dataroaring pushed a commit that referenced this pull request Mar 25, 2025
…#48963) (#49284)

pick #48963

Part IV of #48511

doc apache/doris-website#2196

**Introduce routine load job statistic system table:**
```
mysql> show create table information_schema.routine_load_job\G
*************************** 1. row ***************************
       Table: routine_load_job
Create Table: CREATE TABLE `routine_load_job` (
  `JOB_ID` text NULL,
  `JOB_NAME` text NULL,
  `CREATE_TIME` text NULL,
  `PAUSE_TIME` text NULL,
  `END_TIME` text NULL,
  `DB_NAME` text NULL,
  `TABLE_NAME` text NULL,
  `STATE` text NULL,
  `CURRENT_TASK_NUM` text NULL,
  `JOB_PROPERTIES` text NULL,
  `DATA_SOURCE_PROPERTIES` text NULL,
  `CUSTOM_PROPERTIES` text NULL,
  `STATISTIC` text NULL,
  `PROGRESS` text NULL,
  `LAG` text NULL,
  `REASON_OF_STATE_CHANGED` text NULL,
  `ERROR_LOG_URLS` text NULL,
  `USER_NAME` text NULL,
  `CURRENT_ABORT_TASK_NUM` int NULL,
  `IS_ABNORMAL_PAUSE` boolean NULL
) ENGINE=SCHEMA;
1 row in set (0.00 sec)
```

**There are some benefits to empower job with SQL query capability for
statistical information:**

- It can be used in conjunction with metrics add through
#48209 to roughly locate abnormal
jobs when Grafana alarms, and the following SQL can be used:

```
SELECT JOB_NAME
FROM information_schema.routine_load_job_statistics
WHERE CURRENT_ABORT_TASK_NUM > 0
   OR IS_ABNORMAL_PAUSE = TRUE;
```

- User can use the `select * from information_schema.routine_load_job`
instead of the `show routine load`. The advantage is that the `show
routine load` can only be searched by name, but SQL can be very flexible
in locating jobs

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
yiguolei pushed a commit that referenced this pull request Mar 29, 2025
…job (#48209)

### What problem does this PR solve?

related #48511

Add more metrics to observe the routine load job:

| Metrics | Module | Description |
| ---------------------------------- | ------ |
------------------------------- |
| routine_load_get_msg_latency | BE | Time to pull a Kafka message |
| routine_load_get_msg_count | BE | Number of times pulling Kafka
messages |
| routine_load_consume_bytes | BE | Total data volume consumed from
Kafka |
| routine_load_consume_rows | BE | Total number of rows consumed from
Kafka |
| routine_load_task_execute_time | FE | Task execution time |
| routine_load_task_execute_count | FE | Task execution count |
| routine_load_get_meta_latency | FE | Delay in obtaining Kafka metadata
|
| routine_load_get_meta_count | FE | Number of times obtaining Kafka
metadata |
| routine_load_get_meta_fail_count | FE | Number of failures in
obtaining metadata |
| routine_load_received_bytes | FE | Total data volume consumed |
| routine_load_received_rows | FE | Total number of rows consumed |
yiguolei pushed a commit that referenced this pull request May 5, 2025
…outine load job #48209 (#48765)

Cherry-picked from #48209

Co-authored-by: hui lai <laihui@selectdb.com>
yiguolei pushed a commit that referenced this pull request May 21, 2025
…#48963) (#49286)

pick #48963

Part IV of #48511

doc apache/doris-website#2196

**Introduce routine load job statistic system table:**
```
mysql> show create table information_schema.routine_load_job\G
*************************** 1. row ***************************
       Table: routine_load_job
Create Table: CREATE TABLE `routine_load_job` (
  `JOB_ID` text NULL,
  `JOB_NAME` text NULL,
  `CREATE_TIME` text NULL,
  `PAUSE_TIME` text NULL,
  `END_TIME` text NULL,
  `DB_NAME` text NULL,
  `TABLE_NAME` text NULL,
  `STATE` text NULL,
  `CURRENT_TASK_NUM` text NULL,
  `JOB_PROPERTIES` text NULL,
  `DATA_SOURCE_PROPERTIES` text NULL,
  `CUSTOM_PROPERTIES` text NULL,
  `STATISTIC` text NULL,
  `PROGRESS` text NULL,
  `LAG` text NULL,
  `REASON_OF_STATE_CHANGED` text NULL,
  `ERROR_LOG_URLS` text NULL,
  `USER_NAME` text NULL,
  `CURRENT_ABORT_TASK_NUM` int NULL,
  `IS_ABNORMAL_PAUSE` boolean NULL
) ENGINE=SCHEMA;
1 row in set (0.00 sec)
```

**There are some benefits to empower job with SQL query capability for
statistical information:**

- It can be used in conjunction with metrics add through
#48209 to roughly locate abnormal
jobs when Grafana alarms, and the following SQL can be used:

```
SELECT JOB_NAME
FROM information_schema.routine_load_job_statistics
WHERE CURRENT_ABORT_TASK_NUM > 0
   OR IS_ABNORMAL_PAUSE = TRUE;
```

- User can use the `select * from information_schema.routine_load_job`
instead of the `show routine load`. The advantage is that the `show
routine load` can only be searched by name, but SQL can be very flexible
in locating jobs

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…job (apache#48209)

### What problem does this PR solve?

related apache#48511

Add more metrics to observe the routine load job:

| Metrics | Module | Description |
| ---------------------------------- | ------ |
------------------------------- |
| routine_load_get_msg_latency | BE | Time to pull a Kafka message |
| routine_load_get_msg_count | BE | Number of times pulling Kafka
messages |
| routine_load_consume_bytes | BE | Total data volume consumed from
Kafka |
| routine_load_consume_rows | BE | Total number of rows consumed from
Kafka |
| routine_load_task_execute_time | FE | Task execution time |
| routine_load_task_execute_count | FE | Task execution count |
| routine_load_get_meta_latency | FE | Delay in obtaining Kafka metadata
|
| routine_load_get_meta_count | FE | Number of times obtaining Kafka
metadata |
| routine_load_get_meta_fail_count | FE | Number of failures in
obtaining metadata |
| routine_load_received_bytes | FE | Total data volume consumed |
| routine_load_received_rows | FE | Total number of rows consumed |
koarz pushed a commit to koarz/doris that referenced this pull request Jun 4, 2025
…apache#48963)

### What problem does this PR solve?

Part IV of apache#48511

doc apache/doris-website#2196

**Introduce routine load job statistic system table:**
```
mysql> show create table information_schema.routine_load_job\G
*************************** 1. row ***************************
       Table: routine_load_job
Create Table: CREATE TABLE `routine_load_job` (
  `JOB_ID` text NULL,
  `JOB_NAME` text NULL,
  `CREATE_TIME` text NULL,
  `PAUSE_TIME` text NULL,
  `END_TIME` text NULL,
  `DB_NAME` text NULL,
  `TABLE_NAME` text NULL,
  `STATE` text NULL,
  `CURRENT_TASK_NUM` text NULL,
  `JOB_PROPERTIES` text NULL,
  `DATA_SOURCE_PROPERTIES` text NULL,
  `CUSTOM_PROPERTIES` text NULL,
  `STATISTIC` text NULL,
  `PROGRESS` text NULL,
  `LAG` text NULL,
  `REASON_OF_STATE_CHANGED` text NULL,
  `ERROR_LOG_URLS` text NULL,
  `USER_NAME` text NULL,
  `CURRENT_ABORT_TASK_NUM` int NULL,
  `IS_ABNORMAL_PAUSE` boolean NULL
) ENGINE=SCHEMA;
1 row in set (0.00 sec)
```

**There are some benefits to empower job with SQL query capability for
statistical information:**

- It can be used in conjunction with metrics add through
apache#48209 to roughly locate abnormal
jobs when Grafana alarms, and the following SQL can be used:

```
SELECT JOB_NAME
FROM information_schema.routine_load_job_statistics
WHERE CURRENT_ABORT_TASK_NUM > 0
   OR IS_ABNORMAL_PAUSE = TRUE;
```

- User can use the `select * from information_schema.routine_load_job`
instead of the `show routine load`. The advantage is that the `show
routine load` can only be searched by name, but SQL can be very flexible
in locating jobs
cambyzju added a commit that referenced this pull request Jan 16, 2026
### What problem does this PR solve?

come from: #48209

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
github-actions bot pushed a commit that referenced this pull request Jan 16, 2026
### What problem does this PR solve?

come from: #48209

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/2.1.10-merged dev/3.0.5-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants