Skip to content

Fix race in bm_chttp2_transport#20091

Merged
hcaseyal merged 1 commit intogrpc:masterfrom
hcaseyal:bm_chttp2_transport
Aug 28, 2019
Merged

Fix race in bm_chttp2_transport#20091
hcaseyal merged 1 commit intogrpc:masterfrom
hcaseyal:bm_chttp2_transport

Conversation

@hcaseyal
Copy link
Copy Markdown
Contributor

@hcaseyal hcaseyal commented Aug 27, 2019

This PR fixes a race in bm_chttp2_transport whereby the transport op was going out of scope while perform_stream_op_locked was trying to access it. The issue is that the benchmark was not waiting for the cancel stream op to complete before exiting. There's also another issue of trying to destroy the stream before the stream has been cancelled.

Fixes #19877

Note to self: I suspect the benchmarks that I didn't modify may suffer from this race as well, but they're of a different structure from the benchmark that's actually failing in our tests, so I have to experiment more with how to fix them. For now, I fixed 3/5 of the benchmarks, and it's important to get this fix in soon since the bug causes near-continuous failures on master.

@hcaseyal hcaseyal added lang/c++ release notes: no Indicates if PR should not be in release notes labels Aug 27, 2019
@hcaseyal hcaseyal requested review from arjunroy and markdroth August 27, 2019 22:11
@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Aug 27, 2019

CLA Check
The committers are authorized under a signed CLA.

Copy link
Copy Markdown
Contributor

@arjunroy arjunroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relatively minor nits.

gpr_event* stream_cancel_done = new gpr_event;
gpr_event_init(stream_cancel_done);
std::unique_ptr<Closure> stream_cancel_closure = MakeClosure(
[&](grpc_error* error) { gpr_event_set(stream_cancel_done, (void*)1); });
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should use static_cast instead of a c-style cast.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: grpc_error* error is being captured, but not used - is it needed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the c-style cast? Note that I am following the convention of the rest of this file.

I added an assert for the grpc_error. Thanks for catching that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, you mean the (void*)1? I can change that if you feel strongly, but note that it is currently consistent with the rest of this file.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, my preference is to go with what the style guide says (eschewing the use of C-casts).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering static_cast is failing to compile and the c-style casts are scattered throughout our tests, I'm filing a bug to do this cleanup later and submitting this PR for now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out there was already a bug filed for this. Re-opening: #14392

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right - since it's a integer it would need to be a reinterpret_cast.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think I'll do that in a followup.

@arjunroy arjunroy self-requested a review August 27, 2019 22:37
@hcaseyal
Copy link
Copy Markdown
Contributor Author

@arjunroy I added some changes to solve another bug in the benchmark; namely, waiting for the stream to be cancelled before attempting to destroy it. PTAL.

Copy link
Copy Markdown
Contributor

@arjunroy arjunroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hcaseyal hcaseyal force-pushed the bm_chttp2_transport branch from 1dae715 to f171523 Compare August 28, 2019 16:30
@hcaseyal hcaseyal force-pushed the bm_chttp2_transport branch from f171523 to 6dfe27a Compare August 28, 2019 17:08
@hcaseyal hcaseyal merged commit 07ba4de into grpc:master Aug 28, 2019
@hcaseyal hcaseyal deleted the bm_chttp2_transport branch August 28, 2019 23:13
@lock lock bot locked as resolved and limited conversation to collaborators Nov 26, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

lang/c++ priority/P1 release notes: no Indicates if PR should not be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bm_chttp2_transport consistently failing with segfault

3 participants