Add wait_server_ready method in GrpcStorageProxy#6010
Add wait_server_ready method in GrpcStorageProxy#6010HideakiImamura merged 28 commits intooptuna:masterfrom
wait_server_ready method in GrpcStorageProxy#6010Conversation
|
@HideakiImamura @c-bata |
|
I confirmed that the failed tests cases pass in my local environment (Apple M2 chip, macOS 14.6.1). I used same python version with the CI (Python 3.12). The CI uses Apple M1 chip and macOS 14 according to https://github.com/actions/runner-images/tree/main. I would like to investigate the reason of the CI failure. |
|
I reproduced the issue on my M1 MacBook. When running tests, the gRPC Storage Proxy is repeatedly started on the same port, which leads to a port unavailability error. This seems to be due to the server not shutting down gracefully. I verified that implementing a graceful shutdown (by waiting for the shutdown event) resolves the problem on my machine. Please review the changes and let me know your thoughts. diff --git a/optuna/testing/storages.py b/optuna/testing/storages.py
index 1893994f2..819339e61 100644
--- a/optuna/testing/storages.py
+++ b/optuna/testing/storages.py
@@ -121,7 +121,8 @@ class StorageSupplier:
if self.server:
assert self.thread is not None
- self.server.stop(None)
+ shutdown_event = self.server.stop(0.1)
+ shutdown_event.wait()
self.thread.join()
self.server = None
self.thread = None |
This reverts commit c11f49f.
|
I reconsidered my approach and examined the gRPC clients rather than the gRPC servers. The main change is that it appears diff --git a/optuna/testing/storages.py b/optuna/testing/storages.py
index 21ad2edf6..8985b64e9 100644
--- a/optuna/testing/storages.py
+++ b/optuna/testing/storages.py
@@ -49,6 +49,7 @@ class StorageSupplier:
self.tempfile: IO[Any] | None = None
self.server: grpc.Server | None = None
self.thread: threading.Thread | None = None
+ self.proxy: GrpcStorageProxy | None = None
def __enter__(
self,
@@ -100,13 +101,13 @@ class StorageSupplier:
self.thread = threading.Thread(target=self.server.start)
self.thread.start()
- proxy = GrpcStorageProxy(host="localhost", port=port)
+ self.proxy = GrpcStorageProxy(host="localhost", port=port)
# Wait until the server is ready.
while True:
try:
- proxy.get_all_studies()
- return proxy
+ self.proxy.get_all_studies()
+ return self.proxy
except grpc.RpcError:
time.sleep(1)
continue
@@ -119,18 +120,24 @@ class StorageSupplier:
if self.tempfile:
self.tempfile.close()
+ if self.proxy:
+ del self.proxy._cache
+ del self.proxy._stub
+ # del self.proxy # In my environment, no need to delete this instance.
+ self.proxy = None
+
if self.server:
assert self.thread is not None
- self.server.stop(5).wait()
+ self.server.stop(None)
self.thread.join()
self.server = None
self.thread = None |
|
It seems a bit strange that only the test failed, so I reviewed the main code and noticed that the channel might not be closing properly. In the changes of this PR, the insecure_channel is used with the with statement, so it is closed when the scope is exited: class GrpcStorageProxy(BaseStorage):
def __init__(self, *, host: str = "localhost", port: int = 13000, timeout: int = 900) -> None:
def wait_server(host: str, port: int, timeout: int) -> None:
try:
with grpc.insecure_channel(
f"{host}:{port}", options=[("grpc.max_receive_message_length", -1)]
) as channel:
grpc.channel_ready_future(channel).result(timeout=timeout)
except grpc.FutureTimeoutError as e:
raise ConnectionError("GRPC connection timeout") from eHowever, the channel specified for _stub below does not seem to be closed: self._stub = api_pb2_grpc.StorageServiceStub(
grpc.insecure_channel(
f"{host}:{port}",
options=[("grpc.max_receive_message_length", -1)],
)
) # type: ignoreTypically, there might not be issues since we don't reconnect many times to the same port in a short period, and it might be acceptable to leave it to garbage collection. However, it might be problematic depending on the implementation, for example, in the case of mac-test CI jobs. |
|
Thank you. |
HideakiImamura
left a comment
There was a problem hiding this comment.
Thanks for the PR and great efforts. I have a minor comment. PTAL.
HideakiImamura
left a comment
There was a problem hiding this comment.
Thanks for your swift actions! LGTM!
|
trying to re-use stubs and channels |
c-bata
left a comment
There was a problem hiding this comment.
Thank you for your pull request. I left some early feedback comments.
c-bata
left a comment
There was a problem hiding this comment.
Changes almost look great to me 💯 I left some minor suggestions though.
Co-authored-by: c-bata <contact@c-bata.link>
Co-authored-by: c-bata <contact@c-bata.link>
wait_server_ready method in GrpcStorageProxy
HideakiImamura
left a comment
There was a problem hiding this comment.
Thanks for the update. LGTM!
Motivation
The GrpcStorageProxy class introduced in v4.2.0 does not have any mechanism to wait for the server to start, so users need to write code to wait for the server to start. In this PR, we change the GrpcStorageProxy class to wait for the server to start when initializing.
Description of the changes
Use grpc.channel_ready_future function for wait to server.