Skip to content

handle task cancellation error#53680

Merged
edoakes merged 3 commits intomasterfrom
53639-abrar-cancel
Jun 10, 2025
Merged

handle task cancellation error#53680
edoakes merged 3 commits intomasterfrom
53639-abrar-cancel

Conversation

@abrarsheikh
Copy link
Copy Markdown
Contributor

@abrarsheikh abrarsheikh commented Jun 9, 2025

don't know for sure, but seems like a race condition between when the cancel happens and attempt to access the ray task result causes RayTaskCancelled exception.

Used the repro script in the ticket to confirm that the issue is resolved #53639.

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Jun 9, 2025
Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh marked this pull request as ready for review June 10, 2025 01:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses the handling of task cancellation errors by simulating a TaskCancelledError in tests and converting it to an asyncio.CancelledError in production code. Key changes include:

  • Adding a special case in the test helper to raise TaskCancelledError.
  • Introducing a new test to verify the conversion of TaskCancelledError to asyncio.CancelledError.
  • Updating the replica wrapper to catch TaskCancelledError and re-raise it as asyncio.CancelledError.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/ray/serve/tests/test_actor_replica_wrapper.py Simulates TaskCancelledError in test flow and adds a corresponding unit test.
python/ray/serve/_private/request_router/replica_wrapper.py Adds exception handling for TaskCancelledError to convert it into asyncio.CancelledError.

Comment on lines +119 to +121
except TaskCancelledError:
logger.info("Request already cancelled.")
raise asyncio.CancelledError()
Copy link

Copilot AI Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider capturing the caught TaskCancelledError (e.g., 'except TaskCancelledError as err:') and re-raising the asyncio.CancelledError with exception chaining (using 'from err') to preserve the original error context for easier debugging.

Suggested change
except TaskCancelledError:
logger.info("Request already cancelled.")
raise asyncio.CancelledError()
except TaskCancelledError as err:
logger.info("Request already cancelled.")
raise asyncio.CancelledError() from err

Copilot uses AI. Check for mistakes.
Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh requested a review from edoakes June 10, 2025 01:47
@edoakes edoakes merged commit 6c0eb02 into master Jun 10, 2025
5 checks passed
@edoakes edoakes deleted the 53639-abrar-cancel branch June 10, 2025 15:42
elliot-barn pushed a commit that referenced this pull request Jun 18, 2025
don't know for sure, but seems like a race condition between when the
cancel happens and attempt to access the ray task result causes
`RayTaskCancelled` exception.

Used the repro script in the ticket to confirm that the issue is
resolved #53639.

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
elliot-barn pushed a commit that referenced this pull request Jul 2, 2025
don't know for sure, but seems like a race condition between when the
cancel happens and attempt to access the ray task result causes
`RayTaskCancelled` exception.

Used the repro script in the ticket to confirm that the issue is
resolved #53639.

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Serve] quick request cancellation with model composition leads to unhandled TaskCancelledErrors

3 participants