[Backport 5.1] grpc: symbols: spread LocalCodeIntel response symbols across multiple messages by github-actions[bot] · Pull Request #55292 · sourcegraph/sourcegraph-public-snapshot

github-actions · 2023-07-25T21:25:45Z

This PR changes the LocalCodeIntel symbols gRPC method to return a stream of chunks of symbols, rather than the entire result set in a single message.

gRPC has default limits on how large individual messages can be. This is because gRPC is a message-based framework, which means that messages have to be entirely loaded into memory for both sending and receiving. As such, large allocations can negatively impact server performance and stability. The go-grpc default limit is 4MB.

The original LocalCodeIntel RPC implementation (both gRPC and REST) returned the entire set of returned symbols in one message. For certain repositories / files, this message can contains thousands of symbols, which can add up to hundreds of MB or even gigabytes of memory.

This PR adjusts the server-side implementation of LocalCodeIntel to chunk up its full result set into smaller chunks (so that each individual gRPC message is smaller). It does this in a few steps.

Adopting a chunk utility I found in the Gitaly project that can intelligently send a group of protobuf messages in smaller chunks (all ~ 1MB): https://github.com/sourcegraph/sourcegraph/pull/55242/commits/59ba6d0aba0c98a82d46b7a2f24127bbf97181cc
- I made some minor tweaks to the original code here:
  - using generics for type-safety
  - supporting variadic arguments for convenience / ergonomics
  - removing some unneeded gitaly packages from the test setup
Using the above chunk utility in the Symbols service to divide up the list of returned symbols into ~ 1MB batches and sending it across the result stream: https://github.com/sourcegraph/sourcegraph/pull/55242/commits/77fc77ee88e638c31c3ba8cba2b9f50f225503c5

Note that this PR doesn't "fix" the real issues with the underlying application itself. Notably:

The server is still calculating the entire result set in one shot (requiring it to hold all the symbols in a contiguous chunk of memory), as opposed to sending out incremental progress.
Is it necessary for the server to return thousands upon thousands of symbols, or should it support some sort of "limit" parameter?
The custom symbols client API still returns all these symbols in one giant slice (by joining all the received messages), so it still allocates a lot of memory - same as before as opposed to some sort of API where the result can be consumed incrementally.

However, we still aren't any worse off than we were before (the REST implementation still has the same memory allocation problems and sends the results in one giant JSON message). This PR does "work around" the gRPC-specific message size complaints while providing a building block to improve this in the future.

cc @sourcegraph/code-intel ^

Test plan

CI
Manual testing:

Using the repository and file mentioned in this log message, run a local sourcegraph instance with the following diff applied (to simulate the absence of the chunker by setting the message size to 1 TB)

diff --git a/internal/grpc/chunk/chunker.go b/internal/grpc/chunk/chunker.go
index 947b721368..c954854fc6 100644
--- a/internal/grpc/chunk/chunker.go
+++ b/internal/grpc/chunk/chunker.go
@@ -42,7 +42,7 @@ type Chunker[T Message] struct {
}

// maxMessageSize is the maximum size per protobuf message
-const maxMessageSize = 1 * 1024 * 1024
+const maxMessageSize = 1 * 1024 * 1024 * 1024 * 1024

Nagivate to the above-mentioned repository and file, and hover over any token - see the expected log message:

 [       frontend] ERROR symbolsConnectionCache.gRPC.internal.error.reporter.streamingMethod.postMessageReceive internalerrs/logging.go:239 grpc: received message larger than max (708588568 vs. 94371840) {&quot;grpcService&quot;: &quot;symbols.v1.SymbolsService&quot;, &quot;grpcMethod&quot;: &quot;LocalCodeIntel&quot;, &quot;grpcCode&quot;: &quot;ResourceExhausted&quot;, &quot;initialRequestJSON&quot;: &quot;[omitted]&quot;}

Revert the above diff (which should restore the 1MB chunking behavior), and hard-refresh the blob page and hover over the same token. Eventually (after 30s) on my machine, the request will complete and a hover will appear - no error occurs.
Backport 2a6c569 from grpc: symbols: spread LocalCodeIntel response symbols across multiple messages #55242

… messages (#55242) Co-authored-by: Camden Cheek <camden@ccheek.com> (cherry picked from commit 2a6c569)

…ction See https://github.com/sourcegraph/sourcegraph/pull/55292

sourcegraph-bot · 2023-07-25T21:36:22Z

📖 Storybook live preview

grpc: symbols: spread LocalCodeIntel response symbols across multiple…

c1ccd66

… messages (#55242) Co-authored-by: Camden Cheek <camden@ccheek.com> (cherry picked from commit 2a6c569)

github-actions Bot requested a review from ggilmore July 25, 2023 21:25

github-actions Bot added cla-signed backports backported-to-5.1 labels Jul 25, 2023

ggilmore referenced this pull request Jul 25, 2023

grpc: move localcodeintel changelog entry from unreleased to 5.1.5 se…

501e0da

…ction See https://github.com/sourcegraph/sourcegraph/pull/55292

ggilmore mentioned this pull request Jul 25, 2023

grpc: move localcodeintel changelog entry from unreleased to 5.1.5 section #55293

Merged

ggilmore approved these changes Jul 25, 2023

View reviewed changes

BolajiOlajide approved these changes Jul 26, 2023

View reviewed changes

BolajiOlajide merged commit 276b100 into 5.1 Jul 26, 2023

BolajiOlajide deleted the backport-55242-to-5.1 branch July 26, 2023 14:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Backport 5.1] grpc: symbols: spread LocalCodeIntel response symbols across multiple messages#55292

[Backport 5.1] grpc: symbols: spread LocalCodeIntel response symbols across multiple messages#55292
BolajiOlajide merged 1 commit into
5.1from
backport-55242-to-5.1

github-actions Bot commented Jul 25, 2023

Uh oh!

sourcegraph-bot commented Jul 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

github-actions Bot commented Jul 25, 2023

Test plan

Uh oh!

sourcegraph-bot commented Jul 25, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants