fix(github): add workaround for duplicate comment creation#2457
Conversation
Summary of ChangesHello @theakshaypant, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses an issue where duplicate comments were occasionally appearing on GitHub Pull Requests, especially during end-to-end tests, despite only a single API call being logged. The solution implements a defensive strategy by introducing a brief, random delay and re-verifying the absence of an existing comment before proceeding with comment creation, thereby reducing the likelihood of race conditions leading to duplicates. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🔍 PR Lint Feedback
|
There was a problem hiding this comment.
Code Review
This pull request introduces a workaround for a race condition causing duplicate comments by adding a random sleep and extracting comment lookup logic into listAndFindComment. While addressing the race condition, the listAndFindComment function has security vulnerabilities due to lack of author verification and functional limitations related to pagination, which can lead to comment hijacking, PR flooding, and API exhaustion. Furthermore, for Go versions older than 1.20, math/rand should be seeded at application startup for effective random sleep.
| for _, comment := range comments { | ||
| if re.MatchString(comment.GetBody()) { | ||
| return comment, nil | ||
| } | ||
| } |
There was a problem hiding this comment.
The listAndFindComment function identifies comments to update solely based on a regex match of a marker string in the comment body. It does not verify that the comment was authored by the Pipelines as Code (PAC) GitHub App itself. An attacker can post a comment containing the marker string, causing PAC to identify that comment as its own and overwrite it, allowing unauthorized modification and potential hijacking of the PAC status display. Furthermore, this function only fetches the first 100 comments. For active PRs, this can lead to failing to find recent comments, causing duplicate comments, PR flooding, and API exhaustion. It is crucial to implement author verification and proper pagination to address these security and functional issues.
func (v *Provider) listAndFindComment(ctx context.Context, event *info.Event, marker string) (*github.IssueComment, error) {
re := regexp.MustCompile(regexp.QuoteMeta(marker))
opts := &github.IssueListCommentsOptions{
ListOptions: github.ListOptions{PerPage: 100},
}
for {
comments, resp, err := wrapAPI(v, "list_comments", func() ([]*github.IssueComment, *github.Response, error) {
return v.Client().Issues.ListComments(ctx, event.Organization, event.Repository, event.PullRequestNumber, opts)
})
if err != nil {
return nil, err
}
for _, comment := range comments {
if re.MatchString(comment.GetBody()) {
return comment, nil
}
}
if resp.NextPage == 0 {
break
}
opts.Page = resp.NextPage
}
return nil, nil
}There was a problem hiding this comment.
That is a good point lol, we should probably do this across every place where we do comments...
There was a problem hiding this comment.
Code Implementation 🛠️
Implementation: Tighten comment selection to only consider comments authored by a bot when running as a GitHub App, and prefer the most recent matching comment among the fetched comments to reduce hijacking risk and avoid updating stale matches.
| for _, comment := range comments { | |
| if re.MatchString(comment.GetBody()) { | |
| return comment, nil | |
| } | |
| } | |
| var matched *github.IssueComment | |
| for _, comment := range comments { | |
| user := comment.GetUser() | |
| if user == nil { | |
| continue | |
| } | |
| // If we're running as a GitHub App, only consider bot-authored comments | |
| // to avoid matching/updating attacker-controlled comments containing the marker. | |
| if event.InstallationID != 0 && strings.ToLower(user.GetType()) != "bot" { | |
| continue | |
| } | |
| if re.MatchString(comment.GetBody()) { | |
| // Prefer the most recent match among the fetched comments. | |
| if matched == nil || comment.GetCreatedAt().After(matched.GetCreatedAt()) { | |
| matched = comment | |
| } | |
| } | |
| } | |
| return matched, nil |
See review comment here
There was a problem hiding this comment.
that's actually a good suggestions qodot
There was a problem hiding this comment.
@chmouel qodo suggestion makes sense.
As for the gemini suggestion, shouldn't the edit_comment get a 403 if it tries to edit someone else's comment? Github Docs states that you need write or higher permissions on the repo. That is something we do not give (or recommend) for the app or webhook token, right?
There was a problem hiding this comment.
Ask❓
@chmouel qodo suggestion makes sense. As for the gemini suggestion, shouldn't the edit_comment get a 403 if it tries to edit someone else's comment? Github Docs states that you need write or higher permissions on the repo. That is something we do not give (or recommend) for the app or webhook token, right?
Answer:
GitHub won’t return 403 just because the comment was authored by someone else.
The 403 you’re thinking of is about authorization scope, not comment ownership. If the credential you’re using has the ability to write issue comments on that repo, then the PATCH /repos/{owner}/{repo}/issues/comments/{comment_id} (“edit comment”) endpoint generally allows editing any issue/PR comment in that repo, regardless of who created it.
Concretely for PaC:
- To create PR comments, PaC (GitHub App installation token) typically needs Issues: write (and/or Pull requests: write depending on API usage). With that, it can create and edit issue comments in the repo.
- Repo “write” is not the only way to get this; a GitHub App permission like Issues: write is sufficient to call these APIs. So even if we “don’t recommend write/admin on the repo”, the app still commonly has the relevant write permission to issue comments.
- Therefore, if an attacker posts a comment containing the marker,
listAndFindCommentmay select it and PaC may successfully edit it—no 403—leading to the hijacking scenario the bots flagged (overwriting attacker-controlled comment content or causing confusing status display).
If, in some deployments, the token doesn’t have permission to edit comments, then yes, EditComment would fail (likely 403), but that’s not a safe assumption:
- it would turn the attack into a DoS/flakiness vector (PaC errors trying to edit), and
- it doesn’t address the main risk for the common configuration where PaC does have Issues: write.
So the qodo suggestion (filter to bot/app-authored comments when running as a GitHub App, and prefer the newest match) is a good mitigation, and gemini’s pagination point is also valid (only checking first 100 comments can miss the intended marker and create duplicates).
pkg/provider/github/github.go
Outdated
| // comments. This reduces the window where parallel processes might both | ||
| // see no existing comment and both decide to create one. | ||
| //nolint:gosec // No need for crypto/rand here, just reducing timing window | ||
| time.Sleep(time.Duration(rand.Intn(500)) * time.Millisecond) |
There was a problem hiding this comment.
i think we should do context aware sleep instead, i.e:
| time.Sleep(time.Duration(rand.Intn(500)) * time.Millisecond) | |
| jitter := time.Duration(rand.Intn(500)) * time.Millisecond | |
| timer := time.NewTimer(jitter) | |
| defer timer.Stop() | |
| select { | |
| case <-ctx.Done(): | |
| return ctx.Err() | |
| case <-timer.C: | |
| } |
HACK: Add random sleep and re-check before creating comments. This mitigates duplicate comments seen in E2E tests where two identical comments appear despite only one API call being logged. Signed-off-by: Akshay Pant <akshay.akshaypant@gmail.com> Assisted-by: Claude Opus 4.5 (via Cursor)
8ba6005 to
2743e60
Compare
📝 Description of the Change
HACK: Add random sleep and re-check before creating comments. This mitigates duplicate comments seen in E2E tests where two identical comments appear despite only one API call being logged.
👨🏻 Linked Jira
N/A
🔗 Linked GitHub Issue
N/A
🚀 Type of Change
fix:) - HACKY FIXfeat:)feat!:,fix!:)docs:)chore:)refactor:)enhance:)deps:)🧪 Testing Strategy
🤖 AI Assistance
If you have used AI assistance, please provide the following details:
Which LLM was used?
Extent of AI Assistance:
Important
If the majority of the code in this PR was generated by an AI, please add a
Co-authored-bytrailer to your commit message.For example:
Co-authored-by: Gemini gemini@google.com
Co-authored-by: ChatGPT noreply@chatgpt.com
Co-authored-by: Claude noreply@anthropic.com
Co-authored-by: Cursor noreply@cursor.com
Co-authored-by: Copilot Copilot@users.noreply.github.com
**💡You can use the script
./hack/add-llm-coauthor.shto automatically addthese co-author trailers to your commits.
✅ Submitter Checklist
fix:,feat:) matches the "Type of Change" I selected above.make testandmake lintlocally to check for and fix anyissues. For an efficient workflow, I have considered installing
pre-commit and running
pre-commit installtoautomate these checks.