Skip to content

[EventEngine] RunAfter migration: grpc_chttp2_transport#32240

Merged
yijiem merged 13 commits intogrpc:masterfrom
yijiem:ee_runafter_migration_chttp2_transport
Feb 8, 2023
Merged

[EventEngine] RunAfter migration: grpc_chttp2_transport#32240
yijiem merged 13 commits intogrpc:masterfrom
yijiem:ee_runafter_migration_chttp2_transport

Conversation

@yijiem
Copy link
Copy Markdown
Member

@yijiem yijiem commented Jan 30, 2023

No description provided.

@yijiem yijiem requested review from drfloob and yashykt January 31, 2023 18:35
@yijiem yijiem marked this pull request as ready for review January 31, 2023 18:35
Copy link
Copy Markdown
Member

@drfloob drfloob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good. All of the extra UNREF_TRANSPORT calls are unfortunate, adds a bit of complexity to the ref counting if we need to debug problems, but I don't see a great way to avoid those at the moment.

grpc_schedule_on_exec_ctx);
grpc_timer_init(&t->ping_state.delayed_ping_timer, next_allowed_ping,
&t->retry_initiate_ping_locked);
t->ping_state.delayed_ping_timer_handle =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could there be a race here between assignment of the handle and its usage in close_transport_locked? It might only be a problem in theory if so. I think next_allowed_ping - now would have to be near zero, and it would have to coincide with an immediately closed transport. So I'm guessing the likelihood is infinitesimally small, and that it would be very difficult to test for.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I didn't see the race... maybe_initiate_ping only runs in the transport's combiner lock, same for close_transport_locked I would think... So I don't think they will run concurrently or in parallel (correct me if I'm wrong though).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. I didn't realize maybe_initiate_ping was only ever called with the combiner lock held.

Copy link
Copy Markdown
Member Author

@yijiem yijiem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks AJ!

grpc_schedule_on_exec_ctx);
grpc_timer_init(&t->ping_state.delayed_ping_timer, next_allowed_ping,
&t->retry_initiate_ping_locked);
t->ping_state.delayed_ping_timer_handle =
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I didn't see the race... maybe_initiate_ping only runs in the transport's combiner lock, same for close_transport_locked I would think... So I don't think they will run concurrently or in parallel (correct me if I'm wrong though).

grpc_core::ApplicationCallbackExecCtx callback_exec_ctx;
grpc_core::ExecCtx exec_ctx;
absl::MutexLock lock(&*mu);
init_keepalive_ping(t);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just re-enter the combiner here, eschewing the mutex?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that's the other approach I was considering... init_keepalive_ping does re-enter the combiner, we need to run init_keepalive_pings_if_enabled in the combiner, so that init_keepalive_ping_locked is guaranteed to run after init_keepalive_pings_if_enabled had returned.

I decide to make the change since your prompt has changed my mind. :) PTAL. And also I don't like introducing mutex to chttp2_transport and adding performance overhead potentially.

This approach might slightly delay the run of the keepalive_ping_timer though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that delay is probably ok, but I like that we keep a consistent locking story in this code (we want to change that story, and having uniformity ahead of time is a huge boon)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha, thanks Craig!

@yijiem yijiem merged commit 1b3eade into grpc:master Feb 8, 2023
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Feb 8, 2023
yijiem added a commit that referenced this pull request Feb 9, 2023
ctiller pushed a commit that referenced this pull request Feb 10, 2023
)

Reverts #32240

Breaks `grpc/core/master/macos/grpc_objc_bazel_test`, sample run:
https://source.cloud.google.com/results/invocations/04e5a95b-5f06-4b3e-95fe-5e1895068475/targets

Seems like the `testKeepaliveWithV2API` test case failed:
```
Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' started.
<unknown>:0: error: -[InteropTestsLocalCleartext testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive".
Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' failed (5.547 seconds).
......
Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' started.
<unknown>:0: error: -[InteropTestsLocalSSL testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive".
Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' failed (5.011 seconds).
```
XuanWang-Amos pushed a commit to XuanWang-Amos/grpc that referenced this pull request May 1, 2023
* initial commit, bdp timer

* migrate keepalive_ping_timer and keepalive_watchdog_timer

* still has some issue with refcount

* fix refcount

with the help from refcount_solver.py :^)

* migrate delayed_ping_timer

* add some GPR_ASSERTs

* comment

* review

* try mutex/lock

* fix build deps

* use GRPC_UNUSED

* review
XuanWang-Amos pushed a commit to XuanWang-Amos/grpc that referenced this pull request May 1, 2023
…c#32339)

Reverts grpc#32240

Breaks `grpc/core/master/macos/grpc_objc_bazel_test`, sample run:
https://source.cloud.google.com/results/invocations/04e5a95b-5f06-4b3e-95fe-5e1895068475/targets

Seems like the `testKeepaliveWithV2API` test case failed:
```
Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' started.
<unknown>:0: error: -[InteropTestsLocalCleartext testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive".
Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' failed (5.547 seconds).
......
Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' started.
<unknown>:0: error: -[InteropTestsLocalSSL testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive".
Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' failed (5.011 seconds).
```
wanlin31 pushed a commit that referenced this pull request May 18, 2023
* initial commit, bdp timer

* migrate keepalive_ping_timer and keepalive_watchdog_timer

* still has some issue with refcount

* fix refcount

with the help from refcount_solver.py :^)

* migrate delayed_ping_timer

* add some GPR_ASSERTs

* comment

* review

* try mutex/lock

* fix build deps

* use GRPC_UNUSED

* review
wanlin31 pushed a commit that referenced this pull request May 18, 2023
)

Reverts #32240

Breaks `grpc/core/master/macos/grpc_objc_bazel_test`, sample run:
https://source.cloud.google.com/results/invocations/04e5a95b-5f06-4b3e-95fe-5e1895068475/targets

Seems like the `testKeepaliveWithV2API` test case failed:
```
Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' started.
<unknown>:0: error: -[InteropTestsLocalCleartext testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive".
Test Case '-[InteropTestsLocalCleartext testKeepaliveWithV2API]' failed (5.547 seconds).
......
Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' started.
<unknown>:0: error: -[InteropTestsLocalSSL testKeepaliveWithV2API] : Asynchronous wait failed: Exceeded timeout of 5 seconds, with unfulfilled expectations: "Keepalive".
Test Case '-[InteropTestsLocalSSL testKeepaliveWithV2API]' failed (5.011 seconds).
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bloat/low imported Specifies if the PR has been imported to the internal repository lang/core per-call-memory/neutral per-channel-memory/neutral release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants