Skip to content

Conversation

@github-actions
Copy link
Contributor

@github-actions github-actions bot commented Jul 5, 2025

Cherry-picked from #52818

### What problem does this PR solve?

### Problem

after enable java support, be can not start correctly, it will hang on
stack:
```
(gdb) bt
#0  0x00007f5fb1e97ce6 in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f5fb1e9a798 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f5fb2c98bf3 in os::PlatformEvent::park() () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#3  0x00007f5fb2c693a5 in ObjectMonitor::wait(long, bool, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#4  0x00007f5fb2e8b316 in ObjectSynchronizer::wait(Handle, long, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#5  0x00007f5fb2934a97 in JVM_MonitorWait () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#6  0x00007f5f9de245ba in ?? ()
#7  0x00007f5f49446158 in ?? ()
#8  0x00007f5faeb9fa00 in ?? ()
#9  0x00007ffc37f91178 in ?? ()
#10 0x00007f5f9de304bd in ?? ()
#11 0x00007ffc37f910a0 in ?? ()
#12 0x0000000000000000 in ?? ()
```

jstack of be:
```
"main" #1 prio=5 os_prio=0 cpu=931.38ms elapsed=66.08s tid=0x00007fab8e12c400 nid=0x3e68aa in Object.wait()  [0x00007ffdb3d4c000]  
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(java.base@17.0.15-ga/Native Method)
    - waiting on <0x00000000bd183b68> (a java.lang.ProcessImpl)
    at java.lang.Object.wait(java.base@17.0.15-ga/Object.java:338)
    at java.lang.ProcessImpl.waitFor(java.base@17.0.15-ga/ProcessImpl.java:434)
    - locked <0x00000000bd183b68> (a java.lang.ProcessImpl)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:1061)
    at org.apache.hadoop.util.Shell.run(Shell.java:957)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1282)
    at org.apache.hadoop.util.Shell.isSetsidSupported(Shell.java:853)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:838)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
    at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1713)
    at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:103)
    at org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:92)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:312)
    - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
    - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
    at org.apache.hadoop.fs.viewfs.ViewFileSystem.<init>(ViewFileSystem.java:279)
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@17.0.15-ga/Native Method)
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/NativeConstructorAccessorImpl.java:77)
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstanceWithCaller(java.base@17.0.15-ga/Constructor.java:500)
    at java.lang.reflect.Constructor.newInstance(java.base@17.0.15-ga/Constructor.java:481)
    at java.util.ServiceLoader$ProviderImpl.newInstance(java.base@17.0.15-ga/ServiceLoader.java:789)
    at java.util.ServiceLoader$ProviderImpl.get(java.base@17.0.15-ga/ServiceLoader.java:729)
    at java.util.ServiceLoader$3.next(java.base@17.0.15-ga/ServiceLoader.java:1403)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3534)
    - locked <0x00000000826529f0> (a java.lang.Class for org.apache.hadoop.fs.FileSystem)
```

This problem comes from: #45287

after this fix, we could enable java support:
#52412

### Another Fix Method
Add JAVA_OPTS `-Djdk.lang.processReaperUseDefaultStackSize=true` inside
be.conf, also can fix this problem:

https://bugs.openjdk.org/browse/JDK-8153057

From gemini:

![Clipboard_Screenshot_1751683706](https://github.com/user-attachments/assets/a54ffb3a-cb57-4f8a-bc85-5366081c2d9b)
@github-actions github-actions bot requested a review from morrySnow as a code owner July 5, 2025 08:04
@Thearas
Copy link
Contributor

Thearas commented Jul 5, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring closed this Jul 5, 2025
@dataroaring dataroaring reopened this Jul 5, 2025
@Thearas
Copy link
Contributor

Thearas commented Jul 5, 2025

run buildall

@hello-stephen
Copy link
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.13% (12407/27492)
Line Coverage 36.06% (109727/304259)
Region Coverage 35.22% (56986/161818)
Branch Coverage 32.31% (30893/95604)

@morrySnow
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39562 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 95c1942151a9d744894fdb1acbd226e35f6ca830, data reload: false

------ Round 1 ----------------------------------
q1	17581	6755	6574	6574
q2	2048	195	182	182
q3	11594	1166	1253	1166
q4	11064	744	710	710
q5	7751	2843	2825	2825
q6	216	134	135	134
q7	993	624	621	621
q8	9371	1987	2019	1987
q9	6682	6342	6351	6342
q10	6997	2256	2267	2256
q11	449	262	248	248
q12	390	217	216	216
q13	17811	2989	2990	2989
q14	243	218	210	210
q15	502	470	469	469
q16	437	371	378	371
q17	971	602	556	556
q18	7270	6593	6794	6593
q19	1342	1018	902	902
q20	497	199	197	197
q21	4112	3038	3043	3038
q22	1101	985	976	976
Total cold run time: 109422 ms
Total hot run time: 39562 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6573	6593	6575	6575
q2	327	236	228	228
q3	3001	2941	3090	2941
q4	2047	1754	1778	1754
q5	5656	5718	5720	5718
q6	209	127	126	126
q7	2227	1780	1830	1780
q8	3287	3480	3554	3480
q9	8877	8886	8882	8882
q10	3555	3554	3527	3527
q11	596	489	499	489
q12	824	582	606	582
q13	11824	3161	3160	3160
q14	294	266	265	265
q15	513	460	461	460
q16	476	435	458	435
q17	1841	1618	1610	1610
q18	8422	7748	7612	7612
q19	1689	1633	1487	1487
q20	2096	1898	1842	1842
q21	5170	4864	4804	4804
q22	1084	976	975	975
Total cold run time: 70588 ms
Total hot run time: 58732 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189795 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 95c1942151a9d744894fdb1acbd226e35f6ca830, data reload: false

query1	978	383	380	380
query2	6518	1918	1870	1870
query3	6708	212	217	212
query4	33919	23359	23394	23359
query5	4325	458	459	458
query6	280	180	183	180
query7	4629	320	327	320
query8	288	238	232	232
query9	9790	2614	2618	2614
query10	493	270	263	263
query11	18021	15167	15607	15167
query12	158	107	102	102
query13	1632	417	424	417
query14	8819	6674	6679	6674
query15	211	182	178	178
query16	7761	464	445	445
query17	1186	575	580	575
query18	1990	303	314	303
query19	216	162	161	161
query20	119	109	109	109
query21	210	105	108	105
query22	4257	4319	4098	4098
query23	34299	33838	33281	33281
query24	12919	2828	2899	2828
query25	721	430	418	418
query26	1743	179	175	175
query27	2991	348	340	340
query28	7705	2140	2131	2131
query29	1111	450	447	447
query30	324	161	171	161
query31	1026	763	816	763
query32	107	61	62	61
query33	809	326	313	313
query34	945	501	516	501
query35	855	739	715	715
query36	1119	956	929	929
query37	141	72	71	71
query38	3966	3842	3882	3842
query39	1478	1459	1436	1436
query40	291	106	108	106
query41	56	54	55	54
query42	117	105	105	105
query43	516	462	477	462
query44	1304	803	796	796
query45	190	172	175	172
query46	1144	741	715	715
query47	1903	1808	1829	1808
query48	429	341	351	341
query49	1269	427	428	427
query50	830	431	424	424
query51	7423	7050	7124	7050
query52	106	97	104	97
query53	270	192	193	192
query54	1227	485	495	485
query55	89	79	80	79
query56	276	265	255	255
query57	1302	1145	1164	1145
query58	253	232	220	220
query59	3143	2829	2844	2829
query60	300	278	276	276
query61	183	136	144	136
query62	899	689	696	689
query63	229	197	196	196
query64	5316	690	642	642
query65	3276	3205	3230	3205
query66	1442	316	316	316
query67	15764	15480	15576	15480
query68	4488	594	592	592
query69	416	286	271	271
query70	1154	1098	1097	1097
query71	341	276	264	264
query72	6403	4131	4019	4019
query73	769	357	368	357
query74	10199	9267	8953	8953
query75	3403	2618	2655	2618
query76	2735	1045	1216	1045
query77	417	283	277	277
query78	10410	9741	9541	9541
query79	1690	609	602	602
query80	1163	449	441	441
query81	532	222	235	222
query82	965	93	92	92
query83	224	147	146	146
query84	246	82	89	82
query85	1286	312	312	312
query86	368	300	297	297
query87	4422	4236	4233	4233
query88	3463	2403	2378	2378
query89	403	309	294	294
query90	1977	193	193	193
query91	148	114	109	109
query92	63	53	54	53
query93	1188	577	557	557
query94	967	294	300	294
query95	366	261	265	261
query96	610	280	287	280
query97	3332	3109	3144	3109
query98	217	219	193	193
query99	1533	1277	1307	1277
Total cold run time: 300212 ms
Total hot run time: 189795 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 29.97 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 95c1942151a9d744894fdb1acbd226e35f6ca830, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.03	0.04
query3	0.24	0.06	0.06
query4	1.63	0.10	0.10
query5	0.52	0.52	0.54
query6	1.13	0.73	0.72
query7	0.02	0.02	0.02
query8	0.03	0.03	0.03
query9	0.57	0.49	0.49
query10	0.55	0.55	0.56
query11	0.14	0.11	0.11
query12	0.14	0.11	0.11
query13	0.61	0.60	0.59
query14	0.78	0.80	0.80
query15	0.83	0.84	0.82
query16	0.37	0.37	0.37
query17	1.06	1.07	1.05
query18	0.22	0.21	0.21
query19	1.88	1.72	1.74
query20	0.01	0.02	0.01
query21	15.43	0.59	0.56
query22	2.13	2.65	1.71
query23	16.91	0.91	0.96
query24	3.04	0.91	1.26
query25	0.28	0.07	0.11
query26	0.46	0.13	0.14
query27	0.04	0.04	0.05
query28	10.38	0.47	0.45
query29	12.61	3.20	3.18
query30	0.25	0.06	0.06
query31	2.86	0.39	0.39
query32	3.23	0.47	0.45
query33	2.99	3.00	3.03
query34	16.87	4.56	4.51
query35	4.57	4.48	4.48
query36	0.66	0.48	0.47
query37	0.09	0.06	0.06
query38	0.05	0.03	0.04
query39	0.04	0.02	0.03
query40	0.16	0.12	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.02
query43	0.03	0.04	0.04
Total cold run time: 104.05 s
Total hot run time: 29.97 s

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 0.00% (0/1) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 45.23% (12462/27555)
Line Coverage 36.18% (110426/305223)
Region Coverage 35.33% (57295/162151)
Branch Coverage 32.41% (31062/95830)

@morrySnow morrySnow merged commit 300105b into branch-3.1 Jul 8, 2025
20 of 22 checks passed
@github-actions github-actions bot deleted the auto-pick-52818-branch-3.1 branch July 8, 2025 06:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants