[Serve] Fix ServeHandle serialization#13695
Conversation
|
Tests pass on my laptop and on Mac CI, but on Linux CI |
| my_batch_sizes = await asyncio.gather(*[( | ||
| await router.assign_request.remote(query_param)) for _ in range(3)]) | ||
| assert my_batch_sizes == [2, 2, 1] | ||
| assert sorted(my_batch_sizes) == [1, 2, 2] |
There was a problem hiding this comment.
let's make sure to revert this and merge master again
|
Hey @edoakes @architkulkarni this failed on Windows https://github.com/ray-project/ray/runs/1779615703 So I'm reverting it now. Can you open a re-revert either fixing (seems unlikely, access violation could be port conflict but this seems hard to debug) or skip? |
|
So weird, thanks for catching this, and my fault for not checking the Windows CI. It looks like this PR definitely caused this issue. I'm a little confused though, why does the run say that all 6/6 test_handle.py tests pass? And why does it say "FAIL: //python/ray/serve:test_router" but there's no such line for test_handle? Googling "Windows fatal exception: access violation" turns up a handful of pytest-related results, but also several that are not pytest-related. The only idea I have for how to try to fix this is to use |
|
In the log https://github.com/ray-project/ray/runs/1779615703 I see a line above the handle tests saying "TIMEOUT: //python/ray/serve:test_handle", but I can't figure out which test is timing out since all 6/6 tests pass in each of the three runs. The "Windows fatal exception" seems to be printed 1-2 times after each of the six tests, based on the endpoint names that are being printed. |
)" (ray-project#13753)" This reverts commit 405c8d7.
This reverts commit b522b1a.

Why are these changes needed?
Adds
__reduce__methods to ServeHandle andThreadProxiedRouterto fix serialization of ServeHandles. Upon deserialization, the constructor is simply called again, and a new router is started.Related issue number
Closes #13180
Checks
scripts/format.shto lint the changes in this PR.