Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

grpc: searcher: fix race condition with server.send message#54500

Merged
ggilmore merged 2 commits into
mainfrom
fix-content-type
Jun 30, 2023
Merged

grpc: searcher: fix race condition with server.send message#54500
ggilmore merged 2 commits into
mainfrom
fix-content-type

Conversation

@ggilmore

@ggilmore ggilmore commented Jun 30, 2023

Copy link
Copy Markdown
Contributor

original issue

Slack link: https://sourcegraph.slack.com/archives/C04HCK4K3DL/p1687381299999559

I'm getting this error running some searches using the internal search client

insightsSearchClient searcher/search.go:120 searchFilesInRepo failed {"error": "rpc error: code = Unknown desc = transport: received unexpected content-type \"application/octet-stream\"", "repo": "github.com/sourcegraph/sourcegraph"}

background

The documentation for grpc.ServerStream.SendMsg() states:

// SendMsg sends a message. On error, SendMsg aborts the stream and the
// error is returned directly.
// ...
// It is safe to have a goroutine calling SendMsg and another goroutine
// calling RecvMsg on the same stream at the same time, but it is not safe
// to call SendMsg on the same stream in different goroutines
.
// ...

The gRPC searcher.Search provides a callback to the search implementation that tells it to invoke grpc.SendMsg after it finds a given match:

https://github.com/sourcegraph/sourcegraph/blob/57932ebfd7056abf5a8158979cecd42b51267414/cmd/searcher/internal/search/search_grpc.go#L17-L27

The search impementation, delegates to regexSearch:

https://github.com/sourcegraph/sourcegraph/blob/0018a1223d531f3448d5119f80b7d2cfef29f02f/cmd/searcher/internal/search/search.go#L286-L288

And regexSearch ends up invoking the callback concurrently across a worker pool of goroutines:

https://github.com/sourcegraph/sourcegraph/blob/f10b37d2e2fc49e8a5eba3f40a88808420a6968d/cmd/searcher/internal/search/search_regex.go#L324-L362

Since SendMsg is now called across multiple goroutines, the Write() method in the HTTP2 gRPC stack now unexpectedly has two threadings running it:

https://github.com/grpc/grpc-go/blob/2997e84fd8d18ddb000ac6736129b48b3c9773ec/internal/transport/handler_server.go#L304-L315

This makes it possible for one thread to say that it's going to write the headers and park before it actually does so. The other thread continues executing and ends up using the default application/octet-stream content type since the other thread hasn't set the correct content type yet:

https://github.com/golang/net/blob/daac0cec0cf964a628a29bb4b82940c225b921ed/http2/server.go#L2580-L2582

this pr

This PR fixes the issue by wrapping a mutex around regexSearch's use of the callback - now only one thread can call grpc.SendMsg at a time.

This PR also introduces additional monitoring to flag the original transport message as an "internal error" (see https://github.com/sourcegraph/sourcegraph/pull/51749). While such race conditions won't always manifest in this way, seeing this error is a certain sign that something is wrong.

Test plan

CI

@cla-bot cla-bot Bot added the cla-signed label Jun 30, 2023
@ggilmore ggilmore requested review from coury-clark and mucles June 30, 2023 22:50
@ggilmore ggilmore marked this pull request as ready for review June 30, 2023 22:50
@sourcegraph-bot

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff 57932eb...615d842.

Notify File(s)
@keegancsmith cmd/searcher/internal/search/search_grpc.go

@ggilmore ggilmore enabled auto-merge (squash) June 30, 2023 22:53

@mucles mucles left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This Looks good to me! thank you for all the background information made it very easy to follow what was happening 👍

@ggilmore ggilmore merged commit ca5304f into main Jun 30, 2023
@ggilmore ggilmore deleted the fix-content-type branch June 30, 2023 23:09
github-actions Bot pushed a commit that referenced this pull request Jun 30, 2023
BolajiOlajide pushed a commit that referenced this pull request Jul 3, 2023
…ssage (#54506)

## original issue 

Slack link: https://sourcegraph.slack.com/archives/C04HCK4K3DL/p1687381299999559

>I'm getting this error running some searches using the internal search client
>
> 
> ```
> insightsSearchClient searcher/search.go:120 searchFilesInRepo failed {"error": "rpc error: code = Unknown desc = transport: received unexpected content-type \"application/octet-stream\"", "repo": "github.com/sourcegraph/sourcegraph"}
> ```
> 

## background 


The documentation for `grpc.ServerStream.SendMsg()` [states](https://github.com/grpc/grpc-go/blob/5b67e5ea449ef0686a0c0b6de48cd4cb63e3db2a/stream.go#L1509C1-L1515):

> 
>    // SendMsg sends a message. On error, SendMsg aborts the stream and the
>    // error is returned directly.
>    // ...
>    // It is safe to have a goroutine calling SendMsg and another goroutine
>    // calling RecvMsg on the same stream at the same time, **but it is not safe
>    // to call SendMsg on the same stream in different goroutines**.
>    // ...


The gRPC searcher.Search provides a callback to the [search implementation](https://github.com/sourcegraph/sourcegraph/blob/0018a1223d531f3448d5119f80b7d2cfef29f02f/cmd/searcher/internal/search/search.go#L141) that tells it to invoke grpc.SendMsg after it finds a given match:

https://github.com/sourcegraph/sourcegraph/blob/57932ebfd7056abf5a8158979cecd42b51267414/cmd/searcher/internal/search/search_grpc.go#L17-L27

The search impementation, delegates to [regexSearch](https://github.com/sourcegraph/sourcegraph/blob/f10b37d2e2fc49e8a5eba3f40a88808420a6968d/cmd/searcher/internal/search/search_regex.go#L263):

https://github.com/sourcegraph/sourcegraph/blob/0018a1223d531f3448d5119f80b7d2cfef29f02f/cmd/searcher/internal/search/search.go#L286-L288


And [regexSearch](https://github.com/sourcegraph/sourcegraph/blob/f10b37d2e2fc49e8a5eba3f40a88808420a6968d/cmd/searcher/internal/search/search_regex.go#L263) ends up invoking the callback _concurrently_ across a worker pool of goroutines:

https://github.com/sourcegraph/sourcegraph/blob/f10b37d2e2fc49e8a5eba3f40a88808420a6968d/cmd/searcher/internal/search/search_regex.go#L324-L362


Since SendMsg is now called across multiple goroutines, the Write() method in the HTTP2 gRPC stack now unexpectedly has two threadings running it:

https://github.com/grpc/grpc-go/blob/2997e84fd8d18ddb000ac6736129b48b3c9773ec/internal/transport/handler_server.go#L304-L315

This makes it possible for one thread to say that it's going to write the headers and park before it actually does so. The other thread continues executing and ends up using the default application/octet-stream content type since the other thread hasn't set the correct content type yet:

https://github.com/golang/net/blob/daac0cec0cf964a628a29bb4b82940c225b921ed/http2/server.go#L2580-L2582


## this pr

This PR fixes the issue by wrapping a mutex around `regexSearch`'s use of the callback - now only one thread can call grpc.SendMsg at a time.

This PR also introduces additional monitoring to flag the original transport message as an "internal error" (see https://github.com/sourcegraph/sourcegraph/pull/51749). While such race conditions won't always manifest in this way, seeing this error is a certain sign that something is wrong. 



## Test plan

CI
 <br> Backport ca5304f from #54500

@camdencheek camdencheek left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, good catch!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants