[Data] Disable hanging issue detection#61895
Conversation
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
There was a problem hiding this comment.
Code Review
This pull request disables the HangingExecutionIssueDetector to address a performance issue where blocking calls to the Ray State API were degrading pipeline performance. The change is a temporary measure, implemented by commenting out the detector from the default configuration. While this change is correct for its purpose, it will cause existing tests that rely on this detector to fail. I've added a critical comment highlighting the need to update the test suite. I also provided a suggestion to improve the FIXME comment with more context for future developers.
python/ray/data/_internal/issue_detection/issue_detector_configuration.py
Show resolved
Hide resolved
| # FIXME: The hanging detector is disabled because it can block the | ||
| # scheduling loop. |
There was a problem hiding this comment.
For better future context, it would be helpful to enhance the FIXME comment with more details from the PR description. Mentioning that the blocking calls are to the Ray State API and that it should be re-enabled after refactoring to use non-blocking APIs would provide valuable information for whoever addresses this in the future. Linking to a GitHub issue for tracking would also be ideal.
| # FIXME: The hanging detector is disabled because it can block the | |
| # scheduling loop. | |
| # FIXME: The hanging detector is disabled because it makes blocking calls to the | |
| # Ray State API, which can block the scheduling loop. Re-enable after | |
| # refactoring to use non-blocking APIs. |
Signed-off-by: Balaji Veeramani <bveeramani@berkeley.edu>
The hanging issue detector makes blocking calls to the Ray State API. This can cause the scheduling loop to block and severely degrade pipeline performance.
Since we haven't fixed the blocking calls yet, I'm disabling the detector for this patch release.