[EventEngine] RunAfter migration: grpc_chttp2_transport#32240
[EventEngine] RunAfter migration: grpc_chttp2_transport#32240yijiem merged 13 commits intogrpc:masterfrom
Conversation
drfloob
left a comment
There was a problem hiding this comment.
Looks pretty good. All of the extra UNREF_TRANSPORT calls are unfortunate, adds a bit of complexity to the ref counting if we need to debug problems, but I don't see a great way to avoid those at the moment.
| grpc_schedule_on_exec_ctx); | ||
| grpc_timer_init(&t->ping_state.delayed_ping_timer, next_allowed_ping, | ||
| &t->retry_initiate_ping_locked); | ||
| t->ping_state.delayed_ping_timer_handle = |
There was a problem hiding this comment.
Could there be a race here between assignment of the handle and its usage in close_transport_locked? It might only be a problem in theory if so. I think next_allowed_ping - now would have to be near zero, and it would have to coincide with an immediately closed transport. So I'm guessing the likelihood is infinitesimally small, and that it would be very difficult to test for.
There was a problem hiding this comment.
Hmmm, I didn't see the race... maybe_initiate_ping only runs in the transport's combiner lock, same for close_transport_locked I would think... So I don't think they will run concurrently or in parallel (correct me if I'm wrong though).
There was a problem hiding this comment.
You're right. I didn't realize maybe_initiate_ping was only ever called with the combiner lock held.
| grpc_schedule_on_exec_ctx); | ||
| grpc_timer_init(&t->ping_state.delayed_ping_timer, next_allowed_ping, | ||
| &t->retry_initiate_ping_locked); | ||
| t->ping_state.delayed_ping_timer_handle = |
There was a problem hiding this comment.
Hmmm, I didn't see the race... maybe_initiate_ping only runs in the transport's combiner lock, same for close_transport_locked I would think... So I don't think they will run concurrently or in parallel (correct me if I'm wrong though).
| grpc_core::ApplicationCallbackExecCtx callback_exec_ctx; | ||
| grpc_core::ExecCtx exec_ctx; | ||
| absl::MutexLock lock(&*mu); | ||
| init_keepalive_ping(t); |
There was a problem hiding this comment.
Why not just re-enter the combiner here, eschewing the mutex?
There was a problem hiding this comment.
yep, that's the other approach I was considering... init_keepalive_ping does re-enter the combiner, we need to run init_keepalive_pings_if_enabled in the combiner, so that init_keepalive_ping_locked is guaranteed to run after init_keepalive_pings_if_enabled had returned.
I decide to make the change since your prompt has changed my mind. :) PTAL. And also I don't like introducing mutex to chttp2_transport and adding performance overhead potentially.
This approach might slightly delay the run of the keepalive_ping_timer though.
There was a problem hiding this comment.
I think that delay is probably ok, but I like that we keep a consistent locking story in this code (we want to change that story, and having uniformity ahead of time is a huge boon)
…)" This reverts commit 1b3eade.
) Reverts #32240 Breaks `grpc/core/master/macos/grpc_objc_bazel_test`, sample run: https://source.cloud.google.com/results/invocations/04e5a95b-5f06-4b3e-95fe-5e1895068475/targets Seems like the `testKeepaliveWithV2API` test case failed: ``` Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' started. <unknown>:0: error: -[InteropTestsLocalCleartext testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive". Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' failed (5.547 seconds). ...... Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' started. <unknown>:0: error: -[InteropTestsLocalSSL testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive". Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' failed (5.011 seconds). ```
* initial commit, bdp timer * migrate keepalive_ping_timer and keepalive_watchdog_timer * still has some issue with refcount * fix refcount with the help from refcount_solver.py :^) * migrate delayed_ping_timer * add some GPR_ASSERTs * comment * review * try mutex/lock * fix build deps * use GRPC_UNUSED * review
…c#32339) Reverts grpc#32240 Breaks `grpc/core/master/macos/grpc_objc_bazel_test`, sample run: https://source.cloud.google.com/results/invocations/04e5a95b-5f06-4b3e-95fe-5e1895068475/targets Seems like the `testKeepaliveWithV2API` test case failed: ``` Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' started. <unknown>:0: error: -[InteropTestsLocalCleartext testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive". Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' failed (5.547 seconds). ...... Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' started. <unknown>:0: error: -[InteropTestsLocalSSL testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive". Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' failed (5.011 seconds). ```
* initial commit, bdp timer * migrate keepalive_ping_timer and keepalive_watchdog_timer * still has some issue with refcount * fix refcount with the help from refcount_solver.py :^) * migrate delayed_ping_timer * add some GPR_ASSERTs * comment * review * try mutex/lock * fix build deps * use GRPC_UNUSED * review
) Reverts #32240 Breaks `grpc/core/master/macos/grpc_objc_bazel_test`, sample run: https://source.cloud.google.com/results/invocations/04e5a95b-5f06-4b3e-95fe-5e1895068475/targets Seems like the `testKeepaliveWithV2API` test case failed: ``` Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' started. <unknown>:0: error: -[InteropTestsLocalCleartext testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive". Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' failed (5.547 seconds). ...... Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' started. <unknown>:0: error: -[InteropTestsLocalSSL testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive". Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' failed (5.011 seconds). ```
No description provided.