Skip to content

Conversation

@github-actions
Copy link
Contributor

Cherry-picked from #57382

Currently, tablet report logic uses a ForkJoinPool to process tablet
information, but often encounters unexplained hangs in the ForkJoinPool.
The printed stack trace doesn't reveal where the hang occurs, making it
difficult to troubleshoot the issue.

use forkjoin pool, report stuck stack such as 
```
"report-thread" #187 daemon prio=5 os_prio=0 cpu=97864.95ms elapsed=2469428.50s tid=0x00007ff1cb5c8530 nid=0xef2 waiting on condition  [0x00007fef462e1000]
   java.lang.Thread.State: WAITING (parking)
        at jdk.internal.misc.Unsafe.park(java.base@17.0.15/Native Method)
        - parking to wait for  <0x00000006dc3fae00> (a java.util.concurrent.ForkJoinTask$AdaptedRunnableAction)
        at java.util.concurrent.locks.LockSupport.park(java.base@17.0.15/LockSupport.java:341)
        at java.util.concurrent.ForkJoinTask.awaitDone(java.base@17.0.15/ForkJoinTask.java:468)
        at java.util.concurrent.ForkJoinTask.join(java.base@17.0.15/ForkJoinTask.java:670)
        at org.apache.doris.catalog.TabletInvertedIndex.tabletReport(TabletInvertedIndex.java:370)
        at org.apache.doris.master.ReportHandler.tabletReport(ReportHandler.java:509)
        at org.apache.doris.master.ReportHandler$ReportTask.exec(ReportHandler.java:339)
        at org.apache.doris.master.ReportHandler.runOneCycle(ReportHandler.java:1466)
        at org.apache.doris.common.util.Daemon.run(Daemon.java:119)

   Locked ownable synchronizers:
        - None
```
Can't find where the problem is in this stack

When the tablet report is stuck, the TabletInvertedIndex holds a read
lock, leading to a deadlock.

This pr uses a normal thread pool to replace forkjoinpool
@github-actions github-actions bot requested a review from morrySnow as a code owner November 11, 2025 19:53
@Thearas
Copy link
Contributor

Thearas commented Nov 11, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@dataroaring dataroaring reopened this Nov 11, 2025
@Thearas
Copy link
Contributor

Thearas commented Nov 11, 2025

run buildall

@morrySnow morrySnow merged commit 463044a into branch-3.1 Nov 12, 2025
22 of 23 checks passed
@morrySnow morrySnow deleted the auto-pick-57382-branch-3.1 branch November 12, 2025 06:07
@morrySnow morrySnow mentioned this pull request Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants