-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[fix](doris compose) be disable java support #52412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
deardeng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
PR approved by anyone and no changes requested. |
|
run p0 |
3 similar comments
|
run p0 |
|
run p0 |
|
run p0 |
d06d502 to
58d6c38
Compare
|
run buildall |
|
run p0 |
|
PR approved by at least one committer and no changes requested. |
### What problem does this PR solve? be sometimes hang at libjvm when it start, we don't known why. so disable be java support. only after we fix the be hang at libjvm, we enable java support again. ``` Thread 1 (LWP 867 "doris_be"): #0 0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40 #1 0x00007fffab25a2f0 in ?? () #2 0x00000000ab25a2f0 in ?? () #3 0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so Backtrace stopped: previous frame inner to this frame (corrupt stack?) [Inferior 1 (process 867) detached] ```
### What problem does this PR solve? be sometimes hang at libjvm when it start, we don't known why. so disable be java support. only after we fix the be hang at libjvm, we enable java support again. ``` Thread 1 (LWP 867 "doris_be"): #0 0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40 #1 0x00007fffab25a2f0 in ?? () #2 0x00000000ab25a2f0 in ?? () #3 0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so Backtrace stopped: previous frame inner to this frame (corrupt stack?) [Inferior 1 (process 867) detached] ```
…apache#52524) Cherry-picked from apache#52412 Co-authored-by: yujun <yujun@selectdb.com>
### What problem does this PR solve? be sometimes hang at libjvm when it start, we don't known why. so disable be java support. only after we fix the be hang at libjvm, we enable java support again. ``` Thread 1 (LWP 867 "doris_be"): #0 0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40 apache#1 0x00007fffab25a2f0 in ?? () apache#2 0x00000000ab25a2f0 in ?? () apache#3 0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so Backtrace stopped: previous frame inner to this frame (corrupt stack?) [Inferior 1 (process 867) detached] ```
### What problem does this PR solve? be sometimes hang at libjvm when it start, we don't known why. so disable be java support. only after we fix the be hang at libjvm, we enable java support again. ``` Thread 1 (LWP 867 "doris_be"): #0 0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40 apache#1 0x00007fffab25a2f0 in ?? () apache#2 0x00000000ab25a2f0 in ?? () apache#3 0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so Backtrace stopped: previous frame inner to this frame (corrupt stack?) [Inferior 1 (process 867) detached] ```
### What problem does this PR solve? be sometimes hang at libjvm when it start, we don't known why. so disable be java support. only after we fix the be hang at libjvm, we enable java support again. ``` Thread 1 (LWP 867 "doris_be"): #0 0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40 apache#1 0x00007fffab25a2f0 in ?? () apache#2 0x00000000ab25a2f0 in ?? () apache#3 0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so Backtrace stopped: previous frame inner to this frame (corrupt stack?) [Inferior 1 (process 867) detached] ```
### What problem does this PR solve? ### Problem after enable java support, be can not start correctly, it will hang on stack: ``` (gdb) bt #0 0x00007f5fb1e97ce6 in __futex_abstimed_wait_common () from /lib64/libc.so.6 #1 0x00007f5fb1e9a798 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6 #2 0x00007f5fb2c98bf3 in os::PlatformEvent::park() () from /usr/lib/jvm/java-17//lib/server/libjvm.so #3 0x00007f5fb2c693a5 in ObjectMonitor::wait(long, bool, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so #4 0x00007f5fb2e8b316 in ObjectSynchronizer::wait(Handle, long, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so #5 0x00007f5fb2934a97 in JVM_MonitorWait () from /usr/lib/jvm/java-17//lib/server/libjvm.so #6 0x00007f5f9de245ba in ?? () #7 0x00007f5f49446158 in ?? () #8 0x00007f5faeb9fa00 in ?? () #9 0x00007ffc37f91178 in ?? () #10 0x00007f5f9de304bd in ?? () #11 0x00007ffc37f910a0 in ?? () #12 0x0000000000000000 in ?? () ``` jstack of be: ``` "main" #1 prio=5 os_prio=0 cpu=931.38ms elapsed=66.08s tid=0x00007fab8e12c400 nid=0x3e68aa in Object.wait() [0x00007ffdb3d4c000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(java.base@17.0.15-ga/Native Method) - waiting on <0x00000000bd183b68> (a java.lang.ProcessImpl) at java.lang.Object.wait(java.base@17.0.15-ga/Object.java:338) at java.lang.ProcessImpl.waitFor(java.base@17.0.15-ga/ProcessImpl.java:434) - locked <0x00000000bd183b68> (a java.lang.ProcessImpl) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1061) at org.apache.hadoop.util.Shell.run(Shell.java:957) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1282) at org.apache.hadoop.util.Shell.isSetsidSupported(Shell.java:853) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:838) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1713) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:103) at org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:92) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:312) - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575) at org.apache.hadoop.fs.viewfs.ViewFileSystem.<init>(ViewFileSystem.java:279) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@17.0.15-ga/Native Method) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/NativeConstructorAccessorImpl.java:77) at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstanceWithCaller(java.base@17.0.15-ga/Constructor.java:500) at java.lang.reflect.Constructor.newInstance(java.base@17.0.15-ga/Constructor.java:481) at java.util.ServiceLoader$ProviderImpl.newInstance(java.base@17.0.15-ga/ServiceLoader.java:789) at java.util.ServiceLoader$ProviderImpl.get(java.base@17.0.15-ga/ServiceLoader.java:729) at java.util.ServiceLoader$3.next(java.base@17.0.15-ga/ServiceLoader.java:1403) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3534) - locked <0x00000000826529f0> (a java.lang.Class for org.apache.hadoop.fs.FileSystem) ``` This problem comes from: #45287 after this fix, we could enable java support: #52412 ### Another Fix Method Add JAVA_OPTS `-Djdk.lang.processReaperUseDefaultStackSize=true` inside be.conf, also can fix this problem: https://bugs.openjdk.org/browse/JDK-8153057 From gemini: 
### What problem does this PR solve? ### Problem after enable java support, be can not start correctly, it will hang on stack: ``` (gdb) bt #0 0x00007f5fb1e97ce6 in __futex_abstimed_wait_common () from /lib64/libc.so.6 #1 0x00007f5fb1e9a798 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6 #2 0x00007f5fb2c98bf3 in os::PlatformEvent::park() () from /usr/lib/jvm/java-17//lib/server/libjvm.so #3 0x00007f5fb2c693a5 in ObjectMonitor::wait(long, bool, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so #4 0x00007f5fb2e8b316 in ObjectSynchronizer::wait(Handle, long, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so #5 0x00007f5fb2934a97 in JVM_MonitorWait () from /usr/lib/jvm/java-17//lib/server/libjvm.so #6 0x00007f5f9de245ba in ?? () #7 0x00007f5f49446158 in ?? () #8 0x00007f5faeb9fa00 in ?? () #9 0x00007ffc37f91178 in ?? () #10 0x00007f5f9de304bd in ?? () #11 0x00007ffc37f910a0 in ?? () #12 0x0000000000000000 in ?? () ``` jstack of be: ``` "main" #1 prio=5 os_prio=0 cpu=931.38ms elapsed=66.08s tid=0x00007fab8e12c400 nid=0x3e68aa in Object.wait() [0x00007ffdb3d4c000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(java.base@17.0.15-ga/Native Method) - waiting on <0x00000000bd183b68> (a java.lang.ProcessImpl) at java.lang.Object.wait(java.base@17.0.15-ga/Object.java:338) at java.lang.ProcessImpl.waitFor(java.base@17.0.15-ga/ProcessImpl.java:434) - locked <0x00000000bd183b68> (a java.lang.ProcessImpl) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1061) at org.apache.hadoop.util.Shell.run(Shell.java:957) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1282) at org.apache.hadoop.util.Shell.isSetsidSupported(Shell.java:853) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:838) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1713) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:103) at org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:92) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:312) - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575) at org.apache.hadoop.fs.viewfs.ViewFileSystem.<init>(ViewFileSystem.java:279) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@17.0.15-ga/Native Method) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/NativeConstructorAccessorImpl.java:77) at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstanceWithCaller(java.base@17.0.15-ga/Constructor.java:500) at java.lang.reflect.Constructor.newInstance(java.base@17.0.15-ga/Constructor.java:481) at java.util.ServiceLoader$ProviderImpl.newInstance(java.base@17.0.15-ga/ServiceLoader.java:789) at java.util.ServiceLoader$ProviderImpl.get(java.base@17.0.15-ga/ServiceLoader.java:729) at java.util.ServiceLoader$3.next(java.base@17.0.15-ga/ServiceLoader.java:1403) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3534) - locked <0x00000000826529f0> (a java.lang.Class for org.apache.hadoop.fs.FileSystem) ``` This problem comes from: #45287 after this fix, we could enable java support: #52412 ### Another Fix Method Add JAVA_OPTS `-Djdk.lang.processReaperUseDefaultStackSize=true` inside be.conf, also can fix this problem: https://bugs.openjdk.org/browse/JDK-8153057 From gemini: 
### What problem does this PR solve? ### Problem after enable java support, be can not start correctly, it will hang on stack: ``` (gdb) bt #0 0x00007f5fb1e97ce6 in __futex_abstimed_wait_common () from /lib64/libc.so.6 #1 0x00007f5fb1e9a798 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6 #2 0x00007f5fb2c98bf3 in os::PlatformEvent::park() () from /usr/lib/jvm/java-17//lib/server/libjvm.so #3 0x00007f5fb2c693a5 in ObjectMonitor::wait(long, bool, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so #4 0x00007f5fb2e8b316 in ObjectSynchronizer::wait(Handle, long, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so #5 0x00007f5fb2934a97 in JVM_MonitorWait () from /usr/lib/jvm/java-17//lib/server/libjvm.so #6 0x00007f5f9de245ba in ?? () #7 0x00007f5f49446158 in ?? () #8 0x00007f5faeb9fa00 in ?? () #9 0x00007ffc37f91178 in ?? () #10 0x00007f5f9de304bd in ?? () #11 0x00007ffc37f910a0 in ?? () #12 0x0000000000000000 in ?? () ``` jstack of be: ``` "main" #1 prio=5 os_prio=0 cpu=931.38ms elapsed=66.08s tid=0x00007fab8e12c400 nid=0x3e68aa in Object.wait() [0x00007ffdb3d4c000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(java.base@17.0.15-ga/Native Method) - waiting on <0x00000000bd183b68> (a java.lang.ProcessImpl) at java.lang.Object.wait(java.base@17.0.15-ga/Object.java:338) at java.lang.ProcessImpl.waitFor(java.base@17.0.15-ga/ProcessImpl.java:434) - locked <0x00000000bd183b68> (a java.lang.ProcessImpl) at org.apache.hadoop.util.Shell.runCommand(Shell.java:1061) at org.apache.hadoop.util.Shell.run(Shell.java:957) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1282) at org.apache.hadoop.util.Shell.isSetsidSupported(Shell.java:853) at org.apache.hadoop.util.Shell.<clinit>(Shell.java:838) at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79) at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1713) at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:103) at org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:92) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:312) - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation) at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575) at org.apache.hadoop.fs.viewfs.ViewFileSystem.<init>(ViewFileSystem.java:279) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@17.0.15-ga/Native Method) at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/NativeConstructorAccessorImpl.java:77) at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstanceWithCaller(java.base@17.0.15-ga/Constructor.java:500) at java.lang.reflect.Constructor.newInstance(java.base@17.0.15-ga/Constructor.java:481) at java.util.ServiceLoader$ProviderImpl.newInstance(java.base@17.0.15-ga/ServiceLoader.java:789) at java.util.ServiceLoader$ProviderImpl.get(java.base@17.0.15-ga/ServiceLoader.java:729) at java.util.ServiceLoader$3.next(java.base@17.0.15-ga/ServiceLoader.java:1403) at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3534) - locked <0x00000000826529f0> (a java.lang.Class for org.apache.hadoop.fs.FileSystem) ``` This problem comes from: #45287 after this fix, we could enable java support: #52412 ### Another Fix Method Add JAVA_OPTS `-Djdk.lang.processReaperUseDefaultStackSize=true` inside be.conf, also can fix this problem: https://bugs.openjdk.org/browse/JDK-8153057 From gemini: 
What problem does this PR solve?
be sometimes hang at libjvm when it start, we don't known why.
so disable be java support. only after we fix the be hang at libjvm, we enable java support again.
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)