Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #48171

…etrics (#48171)

### What problem does this PR solve?

related #48511

Introduce some metrics so that abnormal routine load jobs can be
monitored.

**metrics:**

1. On the basis of job state, add two states `USER_PAUSED` and
`ABNORMA_PAUSED`
```
{
        "tags":
        {
                "metric":"doris_fe_job",
                "job":"load",
                "type":"ROUTINE_LOAD",
                "state":"ABNORMAL_PAUSED"
        },
        "unit":"nounit",
        "value":1
},

{
        "tags":
        {
                "metric":"doris_fe_job",
                "job":"load",
                "type":"ROUTINE_LOAD",
                "state":"USER_PAUSED"
        },
        "unit":"nounit",
        "value":1
},
```
2. Sum of all progress of the routine load job
```
doris_fe_routine_load_progress
```
3. Sum of all lags for the routine load job
```
doris_fe_routine_load_lag
```
4. Sum of all abort tasks num for the routine load job
```
doris_fe_routine_load_abort_task_num
```
@github-actions github-actions bot requested a review from dataroaring as a code owner March 13, 2025 12:01
@Thearas
Copy link
Contributor

Thearas commented Mar 13, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Mar 13, 2025
@Thearas
Copy link
Contributor

Thearas commented Mar 13, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40410 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c76cf514404af527d4c3c7b4e882a17d89ce4c42, data reload: false

------ Round 1 ----------------------------------
q1	17562	6772	6658	6658
q2	2044	172	188	172
q3	10645	1088	1168	1088
q4	10557	718	736	718
q5	7749	2825	2828	2825
q6	223	137	136	136
q7	986	624	622	622
q8	9346	1904	2041	1904
q9	6605	6418	6410	6410
q10	7056	2277	2326	2277
q11	464	258	268	258
q12	404	222	219	219
q13	18177	3139	3137	3137
q14	255	227	223	223
q15	503	471	461	461
q16	680	583	588	583
q17	987	610	586	586
q18	7352	6707	6723	6707
q19	1394	1038	1055	1038
q20	480	208	201	201
q21	4078	3307	3208	3208
q22	1062	1010	979	979
Total cold run time: 108609 ms
Total hot run time: 40410 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6667	6634	6766	6634
q2	358	236	233	233
q3	2880	2785	2916	2785
q4	2005	1763	1800	1763
q5	5796	5749	5713	5713
q6	207	135	136	135
q7	2235	1830	1855	1830
q8	3382	3513	3488	3488
q9	8868	8824	8928	8824
q10	3531	3481	3510	3481
q11	592	495	502	495
q12	835	568	576	568
q13	8874	3163	3202	3163
q14	309	271	287	271
q15	503	476	460	460
q16	708	673	647	647
q17	1837	1612	1625	1612
q18	8317	7797	7762	7762
q19	1703	1584	1555	1555
q20	2072	1807	1801	1801
q21	5482	5344	5237	5237
q22	1123	1016	1058	1016
Total cold run time: 68284 ms
Total hot run time: 59473 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197033 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c76cf514404af527d4c3c7b4e882a17d89ce4c42, data reload: false

query1	1306	892	884	884
query2	6273	2068	2031	2031
query3	10834	4429	4454	4429
query4	61604	28679	23387	23387
query5	5242	454	434	434
query6	397	178	172	172
query7	5486	314	309	309
query8	299	212	213	212
query9	8426	2642	2657	2642
query10	458	287	257	257
query11	17669	15130	15634	15130
query12	161	103	103	103
query13	1482	456	439	439
query14	10719	7296	6680	6680
query15	212	185	180	180
query16	7176	452	507	452
query17	1146	562	582	562
query18	1868	308	302	302
query19	225	148	156	148
query20	112	105	107	105
query21	211	102	100	100
query22	4674	4459	4525	4459
query23	34693	33910	33830	33830
query24	6118	2916	2908	2908
query25	522	408	431	408
query26	660	174	172	172
query27	1794	364	376	364
query28	4206	2520	2481	2481
query29	698	486	465	465
query30	243	168	166	166
query31	1003	816	842	816
query32	69	70	57	57
query33	484	287	303	287
query34	914	514	514	514
query35	865	751	730	730
query36	1072	962	953	953
query37	126	73	71	71
query38	4056	4009	4051	4009
query39	1539	1468	1456	1456
query40	207	110	107	107
query41	53	52	50	50
query42	122	105	106	105
query43	540	493	501	493
query44	1244	859	866	859
query45	197	179	169	169
query46	1158	740	733	733
query47	2036	1938	1958	1938
query48	485	385	407	385
query49	738	419	407	407
query50	848	439	431	431
query51	7375	7214	7455	7214
query52	110	91	95	91
query53	288	191	188	188
query54	573	471	477	471
query55	81	78	76	76
query56	268	238	247	238
query57	1315	1157	1146	1146
query58	218	201	207	201
query59	3451	3143	2849	2849
query60	285	255	245	245
query61	109	109	106	106
query62	761	651	663	651
query63	224	189	188	188
query64	1430	692	641	641
query65	3246	3212	3199	3199
query66	720	295	300	295
query67	15758	15561	15583	15561
query68	4307	595	577	577
query69	435	273	264	264
query70	1191	1116	1128	1116
query71	345	275	267	267
query72	6380	4056	4063	4056
query73	752	349	354	349
query74	10335	9241	9279	9241
query75	3366	2669	2657	2657
query76	1920	989	1104	989
query77	529	274	273	273
query78	10514	9594	9595	9594
query79	2025	597	600	597
query80	1428	426	423	423
query81	530	236	236	236
query82	1243	89	87	87
query83	268	140	142	140
query84	289	76	79	76
query85	1029	321	300	300
query86	388	299	290	290
query87	4479	4257	4292	4257
query88	3686	2423	2403	2403
query89	428	300	292	292
query90	1987	185	188	185
query91	187	147	145	145
query92	59	50	52	50
query93	3143	561	560	560
query94	789	292	299	292
query95	368	284	260	260
query96	633	281	284	281
query97	3329	3150	3175	3150
query98	215	216	201	201
query99	1566	1316	1284	1284
Total cold run time: 317025 ms
Total hot run time: 197033 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.35 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit c76cf514404af527d4c3c7b4e882a17d89ce4c42, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.07	0.07
query4	1.62	0.11	0.11
query5	0.52	0.52	0.51
query6	1.14	0.73	0.73
query7	0.02	0.01	0.01
query8	0.04	0.04	0.03
query9	0.57	0.49	0.50
query10	0.55	0.54	0.56
query11	0.15	0.10	0.11
query12	0.15	0.11	0.11
query13	0.61	0.60	0.58
query14	2.74	2.74	2.79
query15	0.88	0.83	0.82
query16	0.38	0.39	0.39
query17	1.06	1.04	1.02
query18	0.23	0.22	0.20
query19	1.97	1.86	1.99
query20	0.01	0.01	0.01
query21	15.36	0.60	0.57
query22	2.35	1.82	1.82
query23	17.34	0.91	0.87
query24	3.23	1.25	1.04
query25	0.20	0.28	0.10
query26	0.38	0.14	0.13
query27	0.04	0.04	0.04
query28	10.11	0.53	0.54
query29	12.56	3.17	3.20
query30	0.24	0.06	0.06
query31	2.87	0.40	0.37
query32	3.26	0.46	0.46
query33	2.94	3.09	3.04
query34	17.07	4.52	4.48
query35	4.48	4.51	4.56
query36	0.67	0.50	0.48
query37	0.10	0.06	0.06
query38	0.05	0.03	0.04
query39	0.04	0.02	0.02
query40	0.17	0.13	0.12
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.59 s
Total hot run time: 32.35 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@dataroaring dataroaring merged commit 6344bda into branch-3.0 Mar 14, 2025
23 of 24 checks passed
@github-actions github-actions bot deleted the auto-pick-48171-branch-3.0 branch March 14, 2025 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants