Add grpc_insecure_channel_create_internal, for channels created from within core#20572
Add grpc_insecure_channel_create_internal, for channels created from within core#20572apolcyn wants to merge 1 commit intogrpc:masterfrom
Conversation
markdroth
left a comment
There was a problem hiding this comment.
Stepping back a bit, I'm not really convinced that this is the right fix for this problem. The whole point of calling grpc_init() at channel creation and grpc_shutdown() at channel destruction was to ensure that all channels are destroyed before shutting down, and this would eliminate that safety check.
I think a better approach might be to change the ALTS handshaker to call grpc_insecure_channel_create() from a callback that runs on the ExecCtx rather than doing it inline.
| /* The same as grpc_insecure_channel_create expect for: | ||
| * - doesn't create a new ExecCtx | ||
| * - doesn't call grpc_init or grpc_shutdown internally */ | ||
| grpc_channel* grpc_insecure_channel_create_internal( |
There was a problem hiding this comment.
Instead of duplicating this code, how about just having the ALTS handshaker set the channel arg before it calls grpc_insecure_channel_create()?
I don't actually care about the ExecCtx stacking. The code can currently handle that, and we're trying to make ExecCtx go away completely in the long run anyway.
There was a problem hiding this comment.
I don't actually care about the ExecCtx stacking. The code can currently handle that, and we're trying to make ExecCtx go away completely in the long run anyway.
A little late response but not sure I understand here BTW. Speaking generally, my view is that ExecCtx stacking is error prone due to re-entrant locking risk. #20596 actually tries hard to avoid it.
There was a problem hiding this comment.
@vjpai or @yashykt can comment in more detail, but it's my understanding that the ExecCtx code automatically detects when this stacking is happening and basically shunts all of the queued closures to the "root" ExecCtx instance, so there should not be any risk of deadlock. At this point, I'd rather be putting effort into making ExecCtx go away instead of jumping through hoops trying to avoid stacking.
There was a problem hiding this comment.
Currently, ExecCtx can get stacked on top of one another. Closures are always queued to the most recent/top ExecCtx.
Also, it doesn't look like ExecCtx is gonna go away :)
There was a problem hiding this comment.
I think we need to put effort into either eliminating the stacking functionality or removing ExecCtx altogether. We'll have to do some experiments to see which way we'll go.
| // grpc_shutdown() when the channel is actually destroyed, thus | ||
| // ensuring that shutdown is deferred until that point. | ||
| grpc_init(); | ||
| if (!grpc_channel_args_find_bool( |
There was a problem hiding this comment.
Instead of calling grpc_channel_args_find_bool() multiple times, please call it once and save the result in a local bool, which can be checked multiple times.
Working on trying out the other approach. Note that one main motivation for this though is related to but not the same as the issue in the TSAN warning: I can't see what theoretically prevents accidentally running WDYT? |
|
I think that we should never wind up creating a new channel from inside of |
|
Makes sense, thanks. In that case, since the planned changes are now in the ALTS handshaker code, I will probably lump those changes to create the ALTS channel from an ExecCtx callback into the other changes involving the mentioned ALTS test. |
While working on a test exercising the ALTS handshaker code and running under TSAN, I got this TSAN warning during the test's shutdown:
It seems TSAN complains because the shutdown thread acquires:
g_init_mu(M202) -> a subchannel mutex (M563367029106029600)A different thread acquired these locks while creating a TCP connection and subsequently doing an ALTS handshake:
M563367029106029600 -> M562803941713156712 (connector mutex) -> M202 (while ALTS handshaker created it's inner grpc channel)
I'm not 100% sure about the real danger of TSAN warning, since I believe M563367029106029600 should always be unlocked before locking M562803941713156712. However, one thing pointed out by this warning is that it seems risky for the ALTS handshake code to be using
grpc_insecure_channel_create:grpc_initagain from within theshutdown_internal_lockedthread.ExecCtxstackingSo this PR creates an internal variant of
grpc_insecure_channel_create, mainly for use by the ALTS handshaker (fixes the TSAN warning).