Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Jul 5, 2025

Cherry-picked from #52818

### What problem does this PR solve?

### Problem

after enable java support, be can not start correctly, it will hang on
stack:
```
(gdb) bt
#0  0x00007f5fb1e97ce6 in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f5fb1e9a798 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f5fb2c98bf3 in os::PlatformEvent::park() () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#3  0x00007f5fb2c693a5 in ObjectMonitor::wait(long, bool, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#4  0x00007f5fb2e8b316 in ObjectSynchronizer::wait(Handle, long, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#5  0x00007f5fb2934a97 in JVM_MonitorWait () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#6  0x00007f5f9de245ba in ?? ()
#7  0x00007f5f49446158 in ?? ()
#8  0x00007f5faeb9fa00 in ?? ()
#9  0x00007ffc37f91178 in ?? ()
#10 0x00007f5f9de304bd in ?? ()
#11 0x00007ffc37f910a0 in ?? ()
#12 0x0000000000000000 in ?? ()
```

jstack of be:
```
"main" #1 prio=5 os_prio=0 cpu=931.38ms elapsed=66.08s tid=0x00007fab8e12c400 nid=0x3e68aa in Object.wait()  [0x00007ffdb3d4c000]  
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(java.base@17.0.15-ga/Native Method)
    - waiting on <0x00000000bd183b68> (a java.lang.ProcessImpl)
    at java.lang.Object.wait(java.base@17.0.15-ga/Object.java:338)
    at java.lang.ProcessImpl.waitFor(java.base@17.0.15-ga/ProcessImpl.java:434)
    - locked <0x00000000bd183b68> (a java.lang.ProcessImpl)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:1061)
    at org.apache.hadoop.util.Shell.run(Shell.java:957)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1282)
    at org.apache.hadoop.util.Shell.isSetsidSupported(Shell.java:853)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:838)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
    at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1713)
    at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:103)
    at org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:92)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:312)
    - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
    - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
    at org.apache.hadoop.fs.viewfs.ViewFileSystem.<init>(ViewFileSystem.java:279)
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@17.0.15-ga/Native Method)
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/NativeConstructorAccessorImpl.java:77)
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstanceWithCaller(java.base@17.0.15-ga/Constructor.java:500)
    at java.lang.reflect.Constructor.newInstance(java.base@17.0.15-ga/Constructor.java:481)
    at java.util.ServiceLoader$ProviderImpl.newInstance(java.base@17.0.15-ga/ServiceLoader.java:789)
    at java.util.ServiceLoader$ProviderImpl.get(java.base@17.0.15-ga/ServiceLoader.java:729)
    at java.util.ServiceLoader$3.next(java.base@17.0.15-ga/ServiceLoader.java:1403)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3534)
    - locked <0x00000000826529f0> (a java.lang.Class for org.apache.hadoop.fs.FileSystem)
```

This problem comes from: #45287

after this fix, we could enable java support:
#52412

### Another Fix Method
Add JAVA_OPTS `-Djdk.lang.processReaperUseDefaultStackSize=true` inside
be.conf, also can fix this problem:

https://bugs.openjdk.org/browse/JDK-8153057

From gemini:

![Clipboard_Screenshot_1751683706](https://github.com/user-attachments/assets/a54ffb3a-cb57-4f8a-bc85-5366081c2d9b)
@github-actions github-actions bot requested a review from dataroaring as a code owner July 5, 2025 08:03
@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Jul 5, 2025
@dataroaring dataroaring reopened this Jul 5, 2025
@hello-stephen
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40007 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 3099cb05386b3214d3cb0bc41cb94b9ae7e64b95, data reload: false

------ Round 1 ----------------------------------
q1	17590	6848	6682	6682
q2	2083	185	154	154
q3	10587	1133	1135	1133
q4	10304	762	785	762
q5	7766	2764	2794	2764
q6	215	134	133	133
q7	961	600	613	600
q8	9356	1957	2013	1957
q9	6611	6371	6389	6371
q10	7030	2271	2272	2271
q11	456	260	267	260
q12	405	212	208	208
q13	17813	2968	2985	2968
q14	242	204	207	204
q15	506	461	479	461
q16	496	401	386	386
q17	975	632	614	614
q18	7235	6692	6742	6692
q19	1398	1052	1056	1052
q20	478	195	205	195
q21	3866	3198	3195	3195
q22	1068	978	945	945
Total cold run time: 107441 ms
Total hot run time: 40007 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6715	6631	6597	6597
q2	324	233	227	227
q3	2920	2926	2990	2926
q4	2017	1770	1819	1770
q5	5725	5717	5783	5717
q6	214	128	130	128
q7	2234	1788	1839	1788
q8	3379	3532	3562	3532
q9	8755	8902	8930	8902
q10	3592	3558	3569	3558
q11	598	512	489	489
q12	824	604	619	604
q13	10325	3157	3139	3139
q14	316	269	284	269
q15	529	474	490	474
q16	495	436	444	436
q17	1825	1637	1585	1585
q18	8406	7816	7751	7751
q19	1694	1604	1598	1598
q20	2051	1828	1809	1809
q21	5160	5019	5115	5019
q22	1109	1047	995	995
Total cold run time: 69207 ms
Total hot run time: 59313 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196039 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 3099cb05386b3214d3cb0bc41cb94b9ae7e64b95, data reload: false

query1	1276	893	932	893
query2	6312	1960	1883	1883
query3	10764	4445	4230	4230
query4	32773	24038	23372	23372
query5	4284	451	444	444
query6	275	175	183	175
query7	4021	330	317	317
query8	285	233	216	216
query9	9394	2557	2549	2549
query10	471	264	249	249
query11	18354	15448	15156	15156
query12	151	103	98	98
query13	1554	428	426	426
query14	9295	7155	7063	7063
query15	250	176	200	176
query16	8052	444	493	444
query17	1603	595	591	591
query18	2223	321	335	321
query19	306	179	180	179
query20	130	111	111	111
query21	205	105	108	105
query22	4897	4453	4460	4453
query23	34666	34451	34213	34213
query24	11394	2866	2943	2866
query25	651	399	386	386
query26	1380	168	168	168
query27	2304	347	347	347
query28	7258	2142	2159	2142
query29	862	443	427	427
query30	251	165	162	162
query31	1050	811	853	811
query32	92	56	57	56
query33	765	300	291	291
query34	944	500	512	500
query35	886	732	732	732
query36	1114	957	922	922
query37	123	72	66	66
query38	4044	3968	3938	3938
query39	1513	1468	1458	1458
query40	197	97	100	97
query41	50	48	52	48
query42	117	100	97	97
query43	516	482	479	479
query44	1245	826	801	801
query45	181	170	164	164
query46	1150	748	717	717
query47	2018	1896	1933	1896
query48	464	379	377	377
query49	989	403	398	398
query50	845	426	414	414
query51	7343	7171	7250	7171
query52	101	94	90	90
query53	275	185	181	181
query54	1126	479	447	447
query55	82	73	78	73
query56	265	244	251	244
query57	1325	1235	1203	1203
query58	225	207	227	207
query59	3141	3029	3064	3029
query60	287	259	254	254
query61	112	107	141	107
query62	854	652	696	652
query63	221	192	189	189
query64	4091	682	638	638
query65	3364	3296	3308	3296
query66	855	307	298	298
query67	16137	15418	15343	15343
query68	4487	568	565	565
query69	429	260	266	260
query70	1129	1087	1069	1069
query71	335	267	245	245
query72	6313	3994	4099	3994
query73	747	349	345	345
query74	10258	8897	8969	8897
query75	3418	2634	2675	2634
query76	2679	1014	1063	1014
query77	390	290	268	268
query78	10457	9502	9530	9502
query79	1677	614	605	605
query80	1160	458	442	442
query81	545	230	222	222
query82	932	90	91	90
query83	242	143	157	143
query84	243	96	82	82
query85	1397	297	288	288
query86	377	309	294	294
query87	4321	4232	4236	4232
query88	3668	2382	2354	2354
query89	407	305	283	283
query90	1915	181	187	181
query91	188	150	144	144
query92	60	47	50	47
query93	1741	553	548	548
query94	860	269	294	269
query95	361	255	256	255
query96	607	289	283	283
query97	3293	3153	3148	3148
query98	215	200	200	200
query99	1498	1292	1290	1290
Total cold run time: 300097 ms
Total hot run time: 196039 ms

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 41.24% (10945/26538)
Line Coverage 32.01% (93656/292607)
Region Coverage 31.12% (48317/155272)
Branch Coverage 27.56% (24742/89764)

@doris-robot
Copy link

ClickBench: Total hot run time: 30.25 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 3099cb05386b3214d3cb0bc41cb94b9ae7e64b95, data reload: false

query1	0.04	0.03	0.02
query2	0.07	0.04	0.03
query3	0.23	0.07	0.07
query4	1.63	0.11	0.11
query5	0.54	0.52	0.54
query6	1.13	0.73	0.72
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.56	0.49	0.51
query10	0.55	0.56	0.56
query11	0.14	0.10	0.10
query12	0.14	0.11	0.11
query13	0.61	0.59	0.60
query14	0.75	0.79	0.78
query15	0.84	0.83	0.82
query16	0.40	0.38	0.36
query17	1.04	1.05	1.04
query18	0.23	0.21	0.21
query19	1.87	1.86	1.80
query20	0.02	0.01	0.01
query21	15.43	0.59	0.58
query22	2.04	1.59	1.74
query23	16.95	0.86	0.76
query24	3.16	1.38	1.40
query25	0.21	0.07	0.21
query26	0.53	0.13	0.14
query27	0.06	0.05	0.04
query28	9.80	0.52	0.45
query29	12.66	3.22	3.21
query30	0.25	0.07	0.06
query31	2.85	0.38	0.37
query32	3.25	0.46	0.46
query33	2.97	3.01	3.02
query34	17.00	4.50	4.46
query35	4.58	4.52	4.52
query36	0.69	0.48	0.47
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.12	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 103.76 s
Total hot run time: 30.25 s

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 7, 2025
@github-actions
Copy link
Contributor Author

github-actions bot commented Jul 7, 2025

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor Author

github-actions bot commented Jul 7, 2025

PR approved by anyone and no changes requested.

@dataroaring dataroaring merged commit 774808f into branch-3.0 Jul 7, 2025
22 of 25 checks passed
@github-actions github-actions bot deleted the auto-pick-52818-branch-3.0 branch July 7, 2025 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants