-
Notifications
You must be signed in to change notification settings - Fork 7.4k
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tserveRay Serve Related IssueRay Serve Related IssuetriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)
Description
What happened + What you expected to happen
I'm seeing unhandled TaskCancelledError's in a model composition setup (Parent calls Child) when requests are cancelled quickly.
repro
app.py
import asyncio
from ray import serve
from ray.serve import handle
DEPLOYMENT_KWARGS = {"max_ongoing_requests": 2_000, "autoscaling_config": {"max_replicas": 1}}
@serve.deployment(**DEPLOYMENT_KWARGS)
class Child:
async def __call__(self) -> None:
await asyncio.sleep(0.1)
@serve.deployment(**DEPLOYMENT_KWARGS)
class Parent:
def __init__(self, child: serve.Application) -> None:
self.child: handle.DeploymentHandle = child
async def __call__(self) -> None:
await self.child.remote()
serve.run(Parent.bind(Child.bind()), blocking=True)ab -n 2000 -c 400 http://localhost:8000immediately CTRL+C to cancel
logs
ray.exceptions.TaskCancelledError: Task: TaskID(a27cda441f7a9f8fa0538192947e43a2bf00e7b301000000) was cancelled.
Future exception was never retrieved
future: <Future finished exception=RayTaskError(TaskCancelledError)(TaskCancelledError(TaskID(ab0a1b231b57aee2a0538192947e43a2bf00e7b301000000)))>
Traceback (most recent call last):
File ".venv/lib/python3.12/site-packages/ray/serve/_private/replica.py", line 1640, in call_user_method
result, sync_gen_consumed = await self._call_func_or_gen(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/ray/serve/_private/replica.py", line 1347, in _call_func_or_gen
result = await result
^^^^^^^^^^^^
File "../repro.py", line 22, in __call__
await self.child.remote()
File ".venv/lib/python3.12/site-packages/ray/serve/handle.py", line 411, in __await__
replica_result = yield from self._fetch_future_result_async().__await__()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/ray/serve/handle.py", line 283, in _fetch_future_result_async
self._replica_result = await asyncio.wrap_future(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/ray/serve/_private/router.py", line 617, in assign_request
replica_result, replica_id = await self.schedule_and_send_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/ray/serve/_private/router.py", line 544, in schedule_and_send_request
result, queue_info = await r.send_request(pr, with_rejection=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/ray/serve/_private/replica_scheduler/replica_wrapper.py", line 195, in send_request
result, queue_len_info = await wrapper.send_request_python(pr, with_rejection)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".venv/lib/python3.12/site-packages/ray/serve/_private/replica_scheduler/replica_wrapper.py", line 108, in send_request_python
queue_len_info: ReplicaQueueLengthInfo = pickle.loads(await first_ref)
^^^^^^^^^^^^^^^
ray.exceptions.RayTaskError(TaskCancelledError): ray::ServeReplica:default:Child.handle_request_with_rejection() (pid=16897, ip=127.0.0.1, actor_id=a0538192947e43a2bf00e7b301000000, repr=<ray.serve._private.replica.ServeReplica:default:Child object at 0x102a63d70>)
raise CancelledError()
concurrent.futures._base.CancelledError
...Versions / Dependencies
ray==2.46.0
python==3.12Reproduction script
see above!
Issue Severity
Low: It annoys or frustrates me.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P1Issue that should be fixed within a few weeksIssue that should be fixed within a few weeksbugSomething that is supposed to be working; but isn'tSomething that is supposed to be working; but isn'tserveRay Serve Related IssueRay Serve Related IssuetriageNeeds triage (eg: priority, bug/not-bug, and owning component)Needs triage (eg: priority, bug/not-bug, and owning component)