Skip to content

Conversation

@yujun777
Copy link
Contributor

What problem does this PR solve?

be sometimes hang at libjvm when it start, we don't known why.

so disable be java support. only after we fix the be hang at libjvm, we enable java support again.

Thread 1 (LWP 867 "doris_be"):
#0  0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40
#1  0x00007fffab25a2f0 in ?? ()
#2  0x00000000ab25a2f0 in ?? ()
#3  0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
[Inferior 1 (process 867) detached]

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Jun 27, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777
Copy link
Contributor Author

run buildall

Copy link
Contributor

@deardeng deardeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yujun777 yujun777 changed the title [fix](doris compose) disable java support [fix](doris compose) be disable java support Jun 27, 2025
@yujun777
Copy link
Contributor Author

run p0

3 similar comments
@yujun777
Copy link
Contributor Author

run p0

@yujun777
Copy link
Contributor Author

run p0

@yujun777
Copy link
Contributor Author

run p0

@yujun777
Copy link
Contributor Author

run buildall

@yujun777
Copy link
Contributor Author

run p0

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jun 30, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@englefly englefly merged commit 49b8301 into apache:master Jun 30, 2025
26 of 27 checks passed
github-actions bot pushed a commit that referenced this pull request Jun 30, 2025
### What problem does this PR solve?

be sometimes hang at libjvm when it start,  we don't known why.

so disable be java support. only after we fix the be hang at libjvm, we
enable java support again.

```
Thread 1 (LWP 867 "doris_be"):
#0  0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40
#1  0x00007fffab25a2f0 in ?? ()
#2  0x00000000ab25a2f0 in ?? ()
#3  0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
[Inferior 1 (process 867) detached]
```
github-actions bot pushed a commit that referenced this pull request Jun 30, 2025
### What problem does this PR solve?

be sometimes hang at libjvm when it start,  we don't known why.

so disable be java support. only after we fix the be hang at libjvm, we
enable java support again.

```
Thread 1 (LWP 867 "doris_be"):
#0  0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40
#1  0x00007fffab25a2f0 in ?? ()
#2  0x00000000ab25a2f0 in ?? ()
#3  0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
[Inferior 1 (process 867) detached]
```
morrySnow pushed a commit that referenced this pull request Jul 1, 2025
Cherry-picked from #52412

Co-authored-by: yujun <yujun@selectdb.com>
dataroaring pushed a commit that referenced this pull request Jul 2, 2025
Cherry-picked from #52412

Co-authored-by: yujun <yujun@selectdb.com>
koarz pushed a commit to koarz/doris that referenced this pull request Jul 3, 2025
koarz pushed a commit to koarz/doris that referenced this pull request Jul 4, 2025
### What problem does this PR solve?

be sometimes hang at libjvm when it start,  we don't known why.

so disable be java support. only after we fix the be hang at libjvm, we
enable java support again.

```
Thread 1 (LWP 867 "doris_be"):
#0  0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40
apache#1  0x00007fffab25a2f0 in ?? ()
apache#2  0x00000000ab25a2f0 in ?? ()
apache#3  0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
[Inferior 1 (process 867) detached]
```
koarz pushed a commit to koarz/doris that referenced this pull request Jul 4, 2025
### What problem does this PR solve?

be sometimes hang at libjvm when it start,  we don't known why.

so disable be java support. only after we fix the be hang at libjvm, we
enable java support again.

```
Thread 1 (LWP 867 "doris_be"):
#0  0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40
apache#1  0x00007fffab25a2f0 in ?? ()
apache#2  0x00000000ab25a2f0 in ?? ()
apache#3  0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
[Inferior 1 (process 867) detached]
```
koarz pushed a commit to koarz/doris that referenced this pull request Jul 4, 2025
### What problem does this PR solve?

be sometimes hang at libjvm when it start,  we don't known why.

so disable be java support. only after we fix the be hang at libjvm, we
enable java support again.

```
Thread 1 (LWP 867 "doris_be"):
#0  0x00007fc6b09087b2 in __pthread_cond_signal_2_0 (cond=0x0) at old_pthread_cond_signal.c:40
apache#1  0x00007fffab25a2f0 in ?? ()
apache#2  0x00000000ab25a2f0 in ?? ()
apache#3  0x00007fc6b1a9436e in WeakHandle::WeakHandle(OopStorage*, oopDesc*) () from /usr/local/openjdk-17/lib/server/libjvm.so
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
[Inferior 1 (process 867) detached]
```
yiguolei pushed a commit that referenced this pull request Jul 5, 2025
### What problem does this PR solve?

### Problem

after enable java support, be can not start correctly, it will hang on
stack:
```
(gdb) bt
#0  0x00007f5fb1e97ce6 in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f5fb1e9a798 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f5fb2c98bf3 in os::PlatformEvent::park() () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#3  0x00007f5fb2c693a5 in ObjectMonitor::wait(long, bool, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#4  0x00007f5fb2e8b316 in ObjectSynchronizer::wait(Handle, long, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#5  0x00007f5fb2934a97 in JVM_MonitorWait () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#6  0x00007f5f9de245ba in ?? ()
#7  0x00007f5f49446158 in ?? ()
#8  0x00007f5faeb9fa00 in ?? ()
#9  0x00007ffc37f91178 in ?? ()
#10 0x00007f5f9de304bd in ?? ()
#11 0x00007ffc37f910a0 in ?? ()
#12 0x0000000000000000 in ?? ()
```

jstack of be:
```
"main" #1 prio=5 os_prio=0 cpu=931.38ms elapsed=66.08s tid=0x00007fab8e12c400 nid=0x3e68aa in Object.wait()  [0x00007ffdb3d4c000]  
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(java.base@17.0.15-ga/Native Method)
    - waiting on <0x00000000bd183b68> (a java.lang.ProcessImpl)
    at java.lang.Object.wait(java.base@17.0.15-ga/Object.java:338)
    at java.lang.ProcessImpl.waitFor(java.base@17.0.15-ga/ProcessImpl.java:434)
    - locked <0x00000000bd183b68> (a java.lang.ProcessImpl)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:1061)
    at org.apache.hadoop.util.Shell.run(Shell.java:957)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1282)
    at org.apache.hadoop.util.Shell.isSetsidSupported(Shell.java:853)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:838)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
    at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1713)
    at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:103)
    at org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:92)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:312)
    - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
    - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
    at org.apache.hadoop.fs.viewfs.ViewFileSystem.<init>(ViewFileSystem.java:279)
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@17.0.15-ga/Native Method)
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/NativeConstructorAccessorImpl.java:77)
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstanceWithCaller(java.base@17.0.15-ga/Constructor.java:500)
    at java.lang.reflect.Constructor.newInstance(java.base@17.0.15-ga/Constructor.java:481)
    at java.util.ServiceLoader$ProviderImpl.newInstance(java.base@17.0.15-ga/ServiceLoader.java:789)
    at java.util.ServiceLoader$ProviderImpl.get(java.base@17.0.15-ga/ServiceLoader.java:729)
    at java.util.ServiceLoader$3.next(java.base@17.0.15-ga/ServiceLoader.java:1403)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3534)
    - locked <0x00000000826529f0> (a java.lang.Class for org.apache.hadoop.fs.FileSystem)
```

This problem comes from: #45287

after this fix, we could enable java support:
#52412

### Another Fix Method
Add JAVA_OPTS `-Djdk.lang.processReaperUseDefaultStackSize=true` inside
be.conf, also can fix this problem:

https://bugs.openjdk.org/browse/JDK-8153057

From gemini:

![Clipboard_Screenshot_1751683706](https://github.com/user-attachments/assets/a54ffb3a-cb57-4f8a-bc85-5366081c2d9b)
github-actions bot pushed a commit that referenced this pull request Jul 5, 2025
### What problem does this PR solve?

### Problem

after enable java support, be can not start correctly, it will hang on
stack:
```
(gdb) bt
#0  0x00007f5fb1e97ce6 in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f5fb1e9a798 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f5fb2c98bf3 in os::PlatformEvent::park() () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#3  0x00007f5fb2c693a5 in ObjectMonitor::wait(long, bool, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#4  0x00007f5fb2e8b316 in ObjectSynchronizer::wait(Handle, long, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#5  0x00007f5fb2934a97 in JVM_MonitorWait () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#6  0x00007f5f9de245ba in ?? ()
#7  0x00007f5f49446158 in ?? ()
#8  0x00007f5faeb9fa00 in ?? ()
#9  0x00007ffc37f91178 in ?? ()
#10 0x00007f5f9de304bd in ?? ()
#11 0x00007ffc37f910a0 in ?? ()
#12 0x0000000000000000 in ?? ()
```

jstack of be:
```
"main" #1 prio=5 os_prio=0 cpu=931.38ms elapsed=66.08s tid=0x00007fab8e12c400 nid=0x3e68aa in Object.wait()  [0x00007ffdb3d4c000]  
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(java.base@17.0.15-ga/Native Method)
    - waiting on <0x00000000bd183b68> (a java.lang.ProcessImpl)
    at java.lang.Object.wait(java.base@17.0.15-ga/Object.java:338)
    at java.lang.ProcessImpl.waitFor(java.base@17.0.15-ga/ProcessImpl.java:434)
    - locked <0x00000000bd183b68> (a java.lang.ProcessImpl)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:1061)
    at org.apache.hadoop.util.Shell.run(Shell.java:957)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1282)
    at org.apache.hadoop.util.Shell.isSetsidSupported(Shell.java:853)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:838)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
    at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1713)
    at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:103)
    at org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:92)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:312)
    - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
    - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
    at org.apache.hadoop.fs.viewfs.ViewFileSystem.<init>(ViewFileSystem.java:279)
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@17.0.15-ga/Native Method)
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/NativeConstructorAccessorImpl.java:77)
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstanceWithCaller(java.base@17.0.15-ga/Constructor.java:500)
    at java.lang.reflect.Constructor.newInstance(java.base@17.0.15-ga/Constructor.java:481)
    at java.util.ServiceLoader$ProviderImpl.newInstance(java.base@17.0.15-ga/ServiceLoader.java:789)
    at java.util.ServiceLoader$ProviderImpl.get(java.base@17.0.15-ga/ServiceLoader.java:729)
    at java.util.ServiceLoader$3.next(java.base@17.0.15-ga/ServiceLoader.java:1403)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3534)
    - locked <0x00000000826529f0> (a java.lang.Class for org.apache.hadoop.fs.FileSystem)
```

This problem comes from: #45287

after this fix, we could enable java support:
#52412

### Another Fix Method
Add JAVA_OPTS `-Djdk.lang.processReaperUseDefaultStackSize=true` inside
be.conf, also can fix this problem:

https://bugs.openjdk.org/browse/JDK-8153057

From gemini:

![Clipboard_Screenshot_1751683706](https://github.com/user-attachments/assets/a54ffb3a-cb57-4f8a-bc85-5366081c2d9b)
github-actions bot pushed a commit that referenced this pull request Jul 5, 2025
### What problem does this PR solve?

### Problem

after enable java support, be can not start correctly, it will hang on
stack:
```
(gdb) bt
#0  0x00007f5fb1e97ce6 in __futex_abstimed_wait_common () from /lib64/libc.so.6
#1  0x00007f5fb1e9a798 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libc.so.6
#2  0x00007f5fb2c98bf3 in os::PlatformEvent::park() () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#3  0x00007f5fb2c693a5 in ObjectMonitor::wait(long, bool, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#4  0x00007f5fb2e8b316 in ObjectSynchronizer::wait(Handle, long, JavaThread*) () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#5  0x00007f5fb2934a97 in JVM_MonitorWait () from /usr/lib/jvm/java-17//lib/server/libjvm.so
#6  0x00007f5f9de245ba in ?? ()
#7  0x00007f5f49446158 in ?? ()
#8  0x00007f5faeb9fa00 in ?? ()
#9  0x00007ffc37f91178 in ?? ()
#10 0x00007f5f9de304bd in ?? ()
#11 0x00007ffc37f910a0 in ?? ()
#12 0x0000000000000000 in ?? ()
```

jstack of be:
```
"main" #1 prio=5 os_prio=0 cpu=931.38ms elapsed=66.08s tid=0x00007fab8e12c400 nid=0x3e68aa in Object.wait()  [0x00007ffdb3d4c000]  
   java.lang.Thread.State: WAITING (on object monitor)
    at java.lang.Object.wait(java.base@17.0.15-ga/Native Method)
    - waiting on <0x00000000bd183b68> (a java.lang.ProcessImpl)
    at java.lang.Object.wait(java.base@17.0.15-ga/Object.java:338)
    at java.lang.ProcessImpl.waitFor(java.base@17.0.15-ga/ProcessImpl.java:434)
    - locked <0x00000000bd183b68> (a java.lang.ProcessImpl)
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:1061)
    at org.apache.hadoop.util.Shell.run(Shell.java:957)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1282)
    at org.apache.hadoop.util.Shell.isSetsidSupported(Shell.java:853)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:838)
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
    at org.apache.hadoop.conf.Configuration.getBoolean(Configuration.java:1713)
    at org.apache.hadoop.security.SecurityUtil.setConfigurationInternal(SecurityUtil.java:103)
    at org.apache.hadoop.security.SecurityUtil.<clinit>(SecurityUtil.java:92)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:312)
    - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
    - locked <0x00000000bd5fd708> (a java.lang.Class for org.apache.hadoop.security.UserGroupInformation)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:575)
    at org.apache.hadoop.fs.viewfs.ViewFileSystem.<init>(ViewFileSystem.java:279)
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(java.base@17.0.15-ga/Native Method)
    at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/NativeConstructorAccessorImpl.java:77)
    at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(java.base@17.0.15-ga/DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstanceWithCaller(java.base@17.0.15-ga/Constructor.java:500)
    at java.lang.reflect.Constructor.newInstance(java.base@17.0.15-ga/Constructor.java:481)
    at java.util.ServiceLoader$ProviderImpl.newInstance(java.base@17.0.15-ga/ServiceLoader.java:789)
    at java.util.ServiceLoader$ProviderImpl.get(java.base@17.0.15-ga/ServiceLoader.java:729)
    at java.util.ServiceLoader$3.next(java.base@17.0.15-ga/ServiceLoader.java:1403)
    at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:3534)
    - locked <0x00000000826529f0> (a java.lang.Class for org.apache.hadoop.fs.FileSystem)
```

This problem comes from: #45287

after this fix, we could enable java support:
#52412

### Another Fix Method
Add JAVA_OPTS `-Djdk.lang.processReaperUseDefaultStackSize=true` inside
be.conf, also can fix this problem:

https://bugs.openjdk.org/browse/JDK-8153057

From gemini:

![Clipboard_Screenshot_1751683706](https://github.com/user-attachments/assets/a54ffb3a-cb57-4f8a-bc85-5366081c2d9b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.0.7-merged dev/3.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants