Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

gitserver: relax GetCommitResponse protobuf fields to allow for arbitrary byte sequences#61940

Merged
ggilmore merged 1 commit into
mainfrom
spr/main/1af7fa34
Apr 16, 2024
Merged

gitserver: relax GetCommitResponse protobuf fields to allow for arbitrary byte sequences#61940
ggilmore merged 1 commit into
mainfrom
spr/main/1af7fa34

Conversation

@ggilmore

@ggilmore ggilmore commented Apr 16, 2024

Copy link
Copy Markdown
Contributor

On customer instances and on sourcegraph.com, we get the following errors like the following from the gRPC internal error reporter:

https://cloudlogging.app.goo.gl/qWAYRLbRrNRQuZGh9

{
  "insertId": "iyzf7vr820mc1o9q",
  "jsonPayload": {
    "message": "grpc: error while marshaling: string field contains invalid UTF-8",
    "timestampNanos": 1713275374946696000,
    "Resource": {
      "service.version": "269208_2024-04-15_5.3-294d4b47f870",
      "service.name": "worker",
      "service.instance.id": "worker-7fdf796bb5-4dktd"
    },
    "InstrumentationScope": "gitserver.client.gRPC.internal.error.reporter.unaryMethod",
    "Attributes": {
      "messageJSON": "{\"repo_name\":\"github.com/weiss/bcfg2-accounts\",\"commit\":\"HEAD\"}",
      "grpcCode": "Internal",
      "grpcService": "gitserver.v1.GitserverService",
      "nonUTF8StringFields": [],
      "initialRequestJSON": "{\"repo_name\":\"github.com/weiss/bcfg2-accounts\",\"commit\":\"HEAD\"}",
      "grpcMethod": "GetCommit"
    },
    "Function": "github.com/sourcegraph/sourcegraph/internal/grpc/internalerrs.doLog",
    "Caller": "internalerrs/logging.go:240"
  },
  "resource": {
    "type": "k8s_container",
    "labels": {
      "cluster_name": "cloud",
      "container_name": "worker",
      "location": "us-central1-f",
      "pod_name": "worker-7fdf796bb5-4dktd",
      "project_id": "sourcegraph-dev",
      "namespace_name": "prod"
    }
  },
  "timestamp": "2024-04-16T13:49:34.946909490Z",
  "severity": "ERROR",
  "labels": {
    "compute.googleapis.com/resource_name": "gke-cloud-containerd-pool-3ff685fe-6vw3",
    "k8s-pod/app": "worker",
    "k8s-pod/deploy": "sourcegraph",
    "k8s-pod/pod-template-hash": "7fdf796bb5"
  },
  "logName": "projects/sourcegraph-dev/logs/stderr",
  "receiveTimestamp": "2024-04-16T13:49:38.542517006Z"
}

This error comes from the gitserver gRPC server when it tries to serialize a response to GetCommit:

https://github.com/sourcegraph/sourcegraph/blob/04509ed107e53264eea63c484a750b5a4a290711/internal/gitserver/v1/gitserver.proto#L277-L287

In protobuf, all string fields must be utf-8 encoded. However, the git cli allows for arbitrary encodings for things like the commit message, author, and email. So, in order to pass this information along correctly - we need to change the type for those fields from string to bytes - which allows for arbitrary byte sequences.

Test plan

@cla-bot cla-bot Bot added the cla-signed label Apr 16, 2024
@github-actions github-actions Bot added team/product-platform team/source Tickets under the purview of Source - the one Source to graph it all labels Apr 16, 2024
@ggilmore ggilmore requested a review from a team April 16, 2024 19:31
@ggilmore ggilmore force-pushed the spr/main/1af7fa34 branch from 7e070d5 to 8b50956 Compare April 16, 2024 19:33
@ggilmore ggilmore force-pushed the spr/main/1af7fa34 branch from 8b50956 to f24b57c Compare April 16, 2024 19:35
@ggilmore ggilmore changed the title gitserver: relax GetCommitMessage protobuf fields to allow for arbitrary byte sequences gitserver: relax GetCommitResponse protobuf fields to allow for arbitrary byte sequences Apr 16, 2024
GitSignature author = 2;
GitSignature committer = 3;
string message = 4;
bytes message = 4;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does versioning work with protobuf? This is a breaking change , but its not a public api correct? So we are the only consumers?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually isn't a breaking change, as utf-8 encoded bytes (string) are a subset of arbitrary byte sequences.

https://protobuf.dev/programming-guides/proto3/#updating

string and bytes are compatible as long as the bytes are valid UTF-8.

(For old clients that still have the string type any non-utf8 encoded strings would be rejected anyway)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is breaking to my understanding, I think until we finalize this API we can keep doing that for simplicity. If we should ever start independently versioning gitserver and sourcegraph or start hitting this API elsewhere, we should be stricter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL! Thanks 😊

Name: p.GetAuthor().GetName(),
Email: p.GetAuthor().GetEmail(),
// TODO@ggilmore: It's entirely possible that the "name" could include non-utf8 characters, as there is no such enforcement in the git cli. We should consider using []byte here.
Name: string(p.GetAuthor().GetName()),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What will happen if name has no utf8 at this point, will it blow up? Does string constructor require utf8?

@ggilmore ggilmore Apr 16, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. string is not a constructor, but it is a type conversion (https://go.dev/tour/basics/13). There is nothing inherent about the string type in go enforces that the contents are utf-8 encoded. However, many of the operations that you do (e.x. using the strings package) assume this.

https://go.dev/blog/strings

It’s important to state right up front that a string holds arbitrary bytes. It is not required to hold Unicode text, UTF-8 text, or any other predefined format. As far as the content of a string is concerned, it is exactly equivalent to a slice of bytes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the suggestion to use byte here to signify better what it can contain, or some other reason?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to signify better what it can contain

Yes, that was my thinking. However, a quick find references showed me that this was a larger change than I wanted to make in this PR.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At some point though someone needs to use this as a string right? Feels like a hot potato flowing along without anyone knowing what to do with it :)

@eseliger eseliger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TY for quick fix 🙂

Name: p.GetAuthor().GetName(),
Email: p.GetAuthor().GetEmail(),
// TODO@ggilmore: It's entirely possible that the "name" could include non-utf8 characters, as there is no such enforcement in the git cli. We should consider using []byte here.
Name: string(p.GetAuthor().GetName()),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the suggestion to use byte here to signify better what it can contain, or some other reason?

GitSignature author = 2;
GitSignature committer = 3;
string message = 4;
bytes message = 4;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is breaking to my understanding, I think until we finalize this API we can keep doing that for simplicity. If we should ever start independently versioning gitserver and sourcegraph or start hitting this API elsewhere, we should be stricter.

@ggilmore ggilmore merged commit 22d9370 into main Apr 16, 2024
@ggilmore ggilmore deleted the spr/main/1af7fa34 branch April 16, 2024 19:58
@sourcegraph-release-bot

sourcegraph-release-bot commented Apr 16, 2024

Copy link
Copy Markdown
Collaborator

The backport to 5.3.9104 failed at https://github.com/sourcegraph/sourcegraph/actions/runs/8711777536:

The process '/usr/bin/git' failed with exit code 1

To backport this PR manually, you can either:

Via the sg tool

Use the sg backport command to backport your commit to the release branch.

sg backport -r 5.3.9104 -p 61940
Via your terminal

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-5.3.9104 5.3.9104
# Navigate to the new working tree
cd .worktrees/backport-5.3.9104
# Create a new branch
git switch --create backport-61940-to-5.3.9104
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 22d9370786515e4c25eb20487d0a8323ecd46fdc
# Push it to GitHub
git push --set-upstream origin backport-61940-to-5.3.9104
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.3.9104

If you encouter conflict, first resolve the conflict and stage all files, then run the commands below:

git cherry-pick --continue
# Push it to GitHub
git push --set-upstream origin backport-61940-to-5.3.9104
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-5.3.9104
  • Follow above instructions to backport the commit.
  • Create a pull request where the base branch is 5.3.9104 and the compare/head branch is backport-61940-to-5.3.9104., click here to create the pull request.

Once the pull request has been created, please ensure the following:

  • Make sure to tag @sourcegraph/release in the pull request description.

  • kindly remove the release-blocker from this pull request.

@sourcegraph-release-bot sourcegraph-release-bot added release-blocker Prevents us from releasing: https://about.sourcegraph.com/handbook/engineering/releases failed-backport-to-5.3.9104 labels Apr 16, 2024
@ggilmore ggilmore removed the release-blocker Prevents us from releasing: https://about.sourcegraph.com/handbook/engineering/releases label Apr 17, 2024
ggilmore added a commit that referenced this pull request Apr 17, 2024
…rary byte sequences (#61940)

commit-id:1af7fa34
(cherry picked from commit 22d9370)
jdpleiness pushed a commit that referenced this pull request Apr 17, 2024
…w for arbitrary byte sequences (#61980)

gitserver: relax GetCommitResponse protobuf fields to allow for arbitrary byte sequences (#61940)

commit-id:1af7fa34
(cherry picked from commit 22d9370)
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants