Skip to content

[Data] Disable hanging issue detection#61895

Merged
aslonnie merged 2 commits intoreleases/2.54.1from
disable-hanging-detector
Mar 20, 2026
Merged

[Data] Disable hanging issue detection#61895
aslonnie merged 2 commits intoreleases/2.54.1from
disable-hanging-detector

Conversation

@bveeramani
Copy link
Copy Markdown
Member

The hanging issue detector makes blocking calls to the Ray State API. This can cause the scheduling loop to block and severely degrade pipeline performance.

Since we haven't fixed the blocking calls yet, I'm disabling the detector for this patch release.

Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@bveeramani bveeramani requested a review from a team as a code owner March 19, 2026 22:47
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request disables the HangingExecutionIssueDetector to address a performance issue where blocking calls to the Ray State API were degrading pipeline performance. The change is a temporary measure, implemented by commenting out the detector from the default configuration. While this change is correct for its purpose, it will cause existing tests that rely on this detector to fail. I've added a critical comment highlighting the need to update the test suite. I also provided a suggestion to improve the FIXME comment with more context for future developers.

Comment on lines +27 to +28
# FIXME: The hanging detector is disabled because it can block the
# scheduling loop.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better future context, it would be helpful to enhance the FIXME comment with more details from the PR description. Mentioning that the blocking calls are to the Ray State API and that it should be re-enabled after refactoring to use non-blocking APIs would provide valuable information for whoever addresses this in the future. Linking to a GitHub issue for tracking would also be ideal.

Suggested change
# FIXME: The hanging detector is disabled because it can block the
# scheduling loop.
# FIXME: The hanging detector is disabled because it makes blocking calls to the
# Ray State API, which can block the scheduling loop. Re-enable after
# refactoring to use non-blocking APIs.

@ray-gardener ray-gardener bot added the data Ray Data-related issues label Mar 20, 2026
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
@aslonnie aslonnie added the go add ONLY when ready to merge, run all tests label Mar 20, 2026
@aslonnie aslonnie merged commit 75a0903 into releases/2.54.1 Mar 20, 2026
6 of 7 checks passed
@aslonnie aslonnie deleted the disable-hanging-detector branch March 20, 2026 18:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants