gitserver: relax GetCommitResponse protobuf fields to allow for arbitrary byte sequences#61940
Conversation
7e070d5 to
8b50956
Compare
…ary byte sequences commit-id:1af7fa34
8b50956 to
f24b57c
Compare
| GitSignature author = 2; | ||
| GitSignature committer = 3; | ||
| string message = 4; | ||
| bytes message = 4; |
There was a problem hiding this comment.
How does versioning work with protobuf? This is a breaking change , but its not a public api correct? So we are the only consumers?
There was a problem hiding this comment.
This actually isn't a breaking change, as utf-8 encoded bytes (string) are a subset of arbitrary byte sequences.
https://protobuf.dev/programming-guides/proto3/#updating
stringandbytesare compatible as long as the bytes are valid UTF-8.
(For old clients that still have the string type any non-utf8 encoded strings would be rejected anyway)
There was a problem hiding this comment.
Yep this is breaking to my understanding, I think until we finalize this API we can keep doing that for simplicity. If we should ever start independently versioning gitserver and sourcegraph or start hitting this API elsewhere, we should be stricter.
There was a problem hiding this comment.
| Name: p.GetAuthor().GetName(), | ||
| Email: p.GetAuthor().GetEmail(), | ||
| // TODO@ggilmore: It's entirely possible that the "name" could include non-utf8 characters, as there is no such enforcement in the git cli. We should consider using []byte here. | ||
| Name: string(p.GetAuthor().GetName()), |
There was a problem hiding this comment.
What will happen if name has no utf8 at this point, will it blow up? Does string constructor require utf8?
There was a problem hiding this comment.
No. string is not a constructor, but it is a type conversion (https://go.dev/tour/basics/13). There is nothing inherent about the string type in go enforces that the contents are utf-8 encoded. However, many of the operations that you do (e.x. using the strings package) assume this.
It’s important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes.
There was a problem hiding this comment.
Is the suggestion to use byte here to signify better what it can contain, or some other reason?
There was a problem hiding this comment.
to signify better what it can contain
Yes, that was my thinking. However, a quick find references showed me that this was a larger change than I wanted to make in this PR.
There was a problem hiding this comment.
At some point though someone needs to use this as a string right? Feels like a hot potato flowing along without anyone knowing what to do with it :)
| Name: p.GetAuthor().GetName(), | ||
| Email: p.GetAuthor().GetEmail(), | ||
| // TODO@ggilmore: It's entirely possible that the "name" could include non-utf8 characters, as there is no such enforcement in the git cli. We should consider using []byte here. | ||
| Name: string(p.GetAuthor().GetName()), |
There was a problem hiding this comment.
Is the suggestion to use byte here to signify better what it can contain, or some other reason?
| GitSignature author = 2; | ||
| GitSignature committer = 3; | ||
| string message = 4; | ||
| bytes message = 4; |
There was a problem hiding this comment.
Yep this is breaking to my understanding, I think until we finalize this API we can keep doing that for simplicity. If we should ever start independently versioning gitserver and sourcegraph or start hitting this API elsewhere, we should be stricter.
|
The backport to To backport this PR manually, you can either: Via the sg toolUse the sg backport -r 5.3.9104 -p 61940Via your terminalTo backport manually, run these commands in your terminal: # Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-5.3.9104 5.3.9104
# Navigate to the new working tree
cd .worktrees/backport-5.3.9104
# Create a new branch
git switch --create backport-61940-to-5.3.9104
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 22d9370786515e4c25eb20487d0a8323ecd46fdc
# Push it to GitHub
git push --set-upstream origin backport-61940-to-5.3.9104
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.3.9104If you encouter conflict, first resolve the conflict and stage all files, then run the commands below: git cherry-pick --continue
# Push it to GitHub
git push --set-upstream origin backport-61940-to-5.3.9104
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.3.9104
Once the pull request has been created, please ensure the following:
|
On customer instances and on sourcegraph.com, we get the following errors like the following from the gRPC internal error reporter:
https://cloudlogging.app.goo.gl/qWAYRLbRrNRQuZGh9
{ "insertId": "iyzf7vr820mc1o9q", "jsonPayload": { "message": "grpc: error while marshaling: string field contains invalid UTF-8", "timestampNanos": 1713275374946696000, "Resource": { "service.version": "269208_2024-04-15_5.3-294d4b47f870", "service.name": "worker", "service.instance.id": "worker-7fdf796bb5-4dktd" }, "InstrumentationScope": "gitserver.client.gRPC.internal.error.reporter.unaryMethod", "Attributes": { "messageJSON": "{\"repo_name\":\"github.com/weiss/bcfg2-accounts\",\"commit\":\"HEAD\"}", "grpcCode": "Internal", "grpcService": "gitserver.v1.GitserverService", "nonUTF8StringFields": [], "initialRequestJSON": "{\"repo_name\":\"github.com/weiss/bcfg2-accounts\",\"commit\":\"HEAD\"}", "grpcMethod": "GetCommit" }, "Function": "github.com/sourcegraph/sourcegraph/internal/grpc/internalerrs.doLog", "Caller": "internalerrs/logging.go:240" }, "resource": { "type": "k8s_container", "labels": { "cluster_name": "cloud", "container_name": "worker", "location": "us-central1-f", "pod_name": "worker-7fdf796bb5-4dktd", "project_id": "sourcegraph-dev", "namespace_name": "prod" } }, "timestamp": "2024-04-16T13:49:34.946909490Z", "severity": "ERROR", "labels": { "compute.googleapis.com/resource_name": "gke-cloud-containerd-pool-3ff685fe-6vw3", "k8s-pod/app": "worker", "k8s-pod/deploy": "sourcegraph", "k8s-pod/pod-template-hash": "7fdf796bb5" }, "logName": "projects/sourcegraph-dev/logs/stderr", "receiveTimestamp": "2024-04-16T13:49:38.542517006Z" }This error comes from the gitserver gRPC server when it tries to serialize a response to GetCommit:
https://github.com/sourcegraph/sourcegraph/blob/04509ed107e53264eea63c484a750b5a4a290711/internal/gitserver/v1/gitserver.proto#L277-L287
In protobuf, all
stringfields must be utf-8 encoded. However, the git cli allows for arbitrary encodings for things like the commit message, author, and email. So, in order to pass this information along correctly - we need to change the type for those fields fromstringtobytes- which allows for arbitrary byte sequences.Test plan