This repository was archived by the owner on Sep 30, 2024. It is now read-only.
[Backport 5.1] grpc: tweak internal error logic to log initial request for all error types (if enabled)#55232
Merged
Merged
Conversation
ggilmore
approved these changes
Jul 24, 2023
Contributor
coury-clark
approved these changes
Jul 24, 2023
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
overview
This PR tweaks the internal error interceptor implementation to be able to print the initial request message for all internal error types, not just utf-8 errors.
This feature is still off by default for security / privacy reasons. It can be enabled via the
SRC_GRPC_INTERNAL_ERROR_LOGGING_LOG_PROTOBUF_MESSAGES_ENABLEDenvironment variable.notes
size limits
If the protobuf message that we're trying to log is quite large (hundreds of megabytes or gigabyte+), rendering the JSON representation of the message could be quite expensive. (The JSON form of the message is larger than the original protobuf message, so attempting to log a large message could result in excessive memory usage. Ex: a 5 GB proto message could result in a 6 gigabyte JSON string, resulting in 11 GB of total allocations).
To solve this, I introduced an environment variable
SRC_GRPC_INTERNAL_ERROR_LOGGING_LOG_PROTOBUF_MESSAGES_HANDLING_MAX_MESSAGE_SIZE_BYTES. If the protobuf message is larger than this (calculated usingproto.Size), then it won't be turned into JSON at all. (An error message will be returned instead).The default value is 100 MB.
environment variable name changes
Since we log in more cases then just for non-uft8 string errors now, I tweak the names of the environment variables to reflect that:
(The
SRC_GRPC_INTERNAL_ERROR_LOGGING_LOG_PROTOBUF_MESSAGES_HANDLING_MAX_MESSAGE_SIZE_BYTESenvironment variable didn't exist before, since the concept of skipping logging a message in the first place because it was too large didn't exist until now).Since I changed the environment variables name, this will mean that that any instances that had the message logging feature enabled will have it disabled. Luckily, this only affects sourcegraph.com. After this PR is merged and lands, I'll adjust sourcegraph.com to change the appropriate environment variable names (in the same way as I did it in this PR https://github.com/sourcegraph/deploy-sourcegraph-cloud/pull/17936)
Test plan
Local testing.
I applied the following diff to force a non-utf8 error, ran a search for
context:global test type:diff, and saw the same logs as I expected before:In order to test the
SRC_GRPC_INTERNAL_ERROR_LOGGING_LOG_PROTOBUF_MESSAGES_HANDLING_MAX_MESSAGE_SIZE_BYTESfeature, I manually set it to 100 bytes and re-ran the same search:I tested that JSON truncation still worked by setting
SRC_GRPC_INTERNAL_ERROR_LOGGING_LOG_PROTOBUF_MESSAGES_JSON_TRUNCATION_SIZE_BYTESto 100 bytes and re-running the same search:I tested that disabling the feature still worked by setting
SRC_GRPC_INTERNAL_ERROR_LOGGING_LOG_PROTOBUF_MESSAGES_ENABLEDtofalseand re-running the same test:Lastly, I checked to make sure that the new logic did actually print proto messages for other kinds of internal errors. I applied the following diff to induce a "message size too large" error:
After re-running the same search, I saw the following expected log messages:
Backport c6194a2 from #55130