fix(gitlab): workaround the gitlab diff API limit when necessary#2482
Conversation
Summary of ChangesHello @aThorp96, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a limitation in the GitLab API where large merge requests can exceed the Diff API's file count limit, leading to incomplete change detection. The solution intelligently switches to an alternative API endpoint (the Compare API) only when necessary, ensuring all file changes are accurately captured without negatively impacting performance for typical merge requests. This improves the robustness of GitLab integration by handling edge cases gracefully. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request aims to work around a GitLab API limitation by detecting truncated diffs and switching to the Compare API to fetch all changed files. However, a critical bug exists where the targetProjectID field is uninitialized, which will cause GitLab Merge Request processing to fail. Furthermore, the use of the unpaged Compare API and pre-allocation of slices based on untrusted file counts introduces a risk of memory exhaustion (OOM) attacks. There are also opportunities to improve test clarity.
7859071 to
4e3659a
Compare
| return changedfiles.ChangedFiles{}, err | ||
| } | ||
| options := []gitlab.RequestOptionFunc{} | ||
| case triggertype.Push: |
There was a problem hiding this comment.
I saw that the attached doc only mentions Merge Requests but isn't a similar diff truncation applicable to commit diff or Max Push Size would help us in such a scenario?
There was a problem hiding this comment.
That's a good point. The Commit Diff endpoint is subject to the same limitation. It's not as straightforward to detect this is the case, but I believe I've found a way:
Using the Get Commit endpoint, the stats field seems to reflect the full number of files changes:
$ curl 'https://gitlab.com/api/v4/projects/athorp-rh%2Fkonflux-tests/repository/commits/83efdb8bfb2ce652c980be1b6cbd2cb6993e5b89' 2>/dev/null | jq .stats
{
"additions": 10000,
"deletions": 0,
"total": 10000
}
Once we know the total number of changes, when we make the first request to the Commit Diff API we can compare the total number against the Diff API's response header x-total. If x-total is lower than status.total, we know that the diff is truncated and can fall back on the Compare endpoint.
That being said, the Compare endpoint does require we specify a base and head to compare. We already have the commit's parent from the Get Commit endpoint, but for Merge Commits there are technically two parents, and we'd have to ensure we're comparing against the right one. It's unclear if the commits are ordered, but at least in this case the first commit is the "base" and the second commit is the merge request's "head":
$ curl 'https://gitlab.com/api/v4/projects/athorp-rh%2Fkonflux-tests/repository/commits/95e8435ccb4844496c89f19f71c22c54bd38ceba' | jq .parent_ids
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 845 100 845 0 0 3588 0 --:--:-- --:--:-- --:--:-- 3595
[
"83efdb8bfb2ce652c980be1b6cbd2cb6993e5b89",
"a618a6a727a2252cc1f4d70397823ba7dc3f48b2"
]
There was a problem hiding this comment.
Thanks for the detailed explanation!
Should we not include this truncated commit diff handling or you reckon that is outside the current scope of changes?
There was a problem hiding this comment.
The Commit Diff endpoint is subject to the same limitation. It's not as straightforward to detect this is the case
@aThorp96 if you see the response of Commit Diff API you have a field collapsed which indicates that the files changes are excluded.
There was a problem hiding this comment.
@zakisk the collapsed field is per file and is not related to the diff API returning a truncated file list. In the commit I used in the example above, none of the diffs returned from the diff API have collapsed: true even though the diff API only pages through 3k of the 10k file changes
There was a problem hiding this comment.
if its this API endpoint GET /projects/:id/repository/commits/:sha/diff
then on calling this GitLab return header X-total to indicate the total changes:
curl "https://gitlab.com/api/v4/projects/77045052/repository/commits/83efdb8bfb2ce652c980be1b6cbd2cb6993e5b89/diff" \
--dump-header - \
--output /dev/null
HTTP/2 200
date: Thu, 26 Feb 2026 09:11:03 GMT
content-type: application/json
content-length: 5125
server: cloudflare
...
x-gitlab-meta: {"correlation_id":"9d3e5009f38e47c8-ATL","version":"1"}
x-next-page: 2
x-page: 1
x-per-page: 20
x-prev-page:
x-request-id: 9d3e5009f38e47c8-ATL
x-runtime: 0.851352
x-total: 3000
x-total-pages: 150
set-cookie: __cf_bm=5QUhD5w3K1rEiNyrAKKNwVuF3ACS7fMOthDmCVFF2JA-1772097063-1.0.1.1-WgKIgyf3.vmlqVFS6cSkGDN5AmtlZGBudG38fH.tHky_5VM48iZXmgXyC6lx4MgCK.gu1R2TQl5Vs24ZZKKe0TG4d2CuQtWfDflVQjCBzPc; path=/; expires=Thu, 26-Feb-26 09:41:03 GMT; domain=.gitlab.com; HttpOnly; Secure; SameSite=None
set-cookie: _cfuvid=dx_NxS0lh_Qy5Ggjdp.C3AmjOkM1HmdDypBT7bfc6q8-1772097063160-0.0.1.1-604800000; path=/; domain=.gitlab.com; HttpOnly; Secure; SameSite=None
There was a problem hiding this comment.
The X-total header refers to the number of items which will be paged over, so if the commit exceeds the diff limits then the X-total number will be less than the actual number of changes. That header can be used to detect when the diff is too large, but only in conjunction with a separate API call that returns the real diff size. (details in my earlier comment).
I was considering implementing the diff-limit workaround for push events separately, but if you'd prefer I can include it in this pull request. WDYT? @zakisk @chmouel
There was a problem hiding this comment.
why do you need to call another can't you directly call the diff API and see if there are more changes using X-total header?
There was a problem hiding this comment.
I see that currently we use that diff API already for changes
There was a problem hiding this comment.
why do you need to call another can't you directly call the diff API and see if there are more changes using
X-totalheader?
Because of this:
The
X-totalheader refers to the number of items which will be paged over, so if the commit exceeds the diff limits then theX-totalnumber will be less than the actual number of changes.
In other words: if the diff API is limiting the total number of diffs which will be returned, the X-total header does not show the full number of changes, it shows the number of changes which will be returned.
For example, returning to the commit I discuss in previous comments:
- I have a commit with 10,000 changes
- the diffs API will return a page of 20 diffs and its
X-totalheader will say 3,000 - When I page through the first 3000 diffs using the diff API, the diff API will not indicate that there are 7,000 diffs which were not returned
- the only way to see that the commit includes more than 3,000 diffs is to use the Get Commit API endpoint which shows the full number of changes but not their content
|
@aThorp96 any chance you can address akhsay or zaki comments or set this PR as draft if you don't have time to take care of this ? |
🔍 PR Lint Feedback
|
|
Per discussion w/ @chmouel, I'll create a ticket for implementing support for this on Push events. |
6ca8fc6 to
77776aa
Compare
| From: &runevent.BaseBranch, | ||
| To: &runevent.HeadBranch, | ||
| } | ||
| comparison, _, err := v.Client().Repositories.Compare(v.targetProjectID, compareOpts, options...) |
There was a problem hiding this comment.
My memory is fuzzy but would that work when coming from fork ? it should check SourceProjectID there, can you double check please?
There was a problem hiding this comment.
I've confirmed this surprisingly does work with a fork 🙂
There was a problem hiding this comment.
this doesn't make sense at all, oh well 🙃
There was a problem hiding this comment.
Yeah it was unexpected to me as well. I was expecting to have to compare against refs/merge-requests/<id>/head, since a fork's commit gives a 404 in the target repo in the UI. But at least with the Compare endpoint it can be referenced. Maybe just when there is an open merge request, IDK
There was a problem hiding this comment.
I do see a from_project_id parameter in the docs, surprising it works without it.
There was a problem hiding this comment.
Let me double check to make sure I'm not conflating things
There was a problem hiding this comment.
I was mixing things up while working on something very similar at the same time, and was for some reason thinking the HeadBranch was the Gitlab internal ref refs/merge-requests/<id>/head. HeadBranch is the fork's branch name though which I've confirmed does not work. Thanks for flagging.
/hold
There was a problem hiding this comment.
Thanks again for catching @chmouel. I've confirmed that if a merge request is open the Compare API supports comparing between the base branch and the head commit SHA and I've updated the code and tests accordingly
| From: &runevent.BaseBranch, | ||
| To: &runevent.HeadBranch, | ||
| } | ||
| comparison, _, err := v.Client().Repositories.Compare(v.targetProjectID, compareOpts, options...) |
There was a problem hiding this comment.
this doesn't make sense at all, oh well 🙃
theakshaypant
left a comment
There was a problem hiding this comment.
Linting error at gitlab.go:L739.
77776aa to
6b4e7b7
Compare
When the Gitlab Merge Request exceeds the Diff API limit[1] the merge request diff API will not return the full list of changed files. To circumvent this limitation, we can use the Get Merge Request API to detect that the diff exceeds the limit and then switch to the Compare API to get the full diff. We do not use the Compare API in all instances, since it does not support paging and would increase the memory used by PaC during merge request event processing; by only using the Compare API when we know the limit is exceeded, PaC's memory usage will be more consistent and only spike when such a merge request is made. [1] - https://docs.gitlab.com/administration/instance_limits/#diff-limits
6b4e7b7 to
ca5be6b
Compare
|
/lgtm |
|
/retest boussole |
|
/test boussole |
|
/help |
🤖 Available Commands
Automated by the PAC Boussole 🧭 |
|
LOL commenting |
|
/approve |
|
/merge |
✅ PR Successfully Merged
Approvals Summary:
Thank you @aThorp96 for your valuable contribution! 🎉 Automated by the PAC Boussole 🧭 |
📝 Description of the Change
When the Gitlab Merge Request exceeds the Diff API limit[1] the merge request diff API will not return the full list of changed files. To circumvent this limitation, we can use the Get Merge Request API to detect that the diff exceeds the limit and then switch to the Compare API to get the full diff. We do not use the Compare API in all instances, since it does not support paging and would increase the memory used by PaC during merge request event processing; by only using the Compare API when we know the limit is exceeded, PaC's memory usage will be more consistent and only spike when such a merge request is made.
[1] - https://docs.gitlab.com/administration/instance_limits/#diff-limits
👨🏻 Linked Jira
https://issues.redhat.com/browse/SRVKP-10677
🔗 Linked GitHub Issue
Fixes #
🚀 Type of Change
fix:)feat:)feat!:,fix!:)docs:)chore:)refactor:)enhance:)deps:)🧪 Testing Strategy
🤖 AI Assistance
If you have used AI assistance, please provide the following details:
Which LLM was used?
Extent of AI Assistance:
Important
If the majority of the code in this PR was generated by an AI, please add a
Co-authored-bytrailer to your commit message.For example:
Co-authored-by: Gemini gemini@google.com
Co-authored-by: ChatGPT noreply@chatgpt.com
Co-authored-by: Claude noreply@anthropic.com
Co-authored-by: Cursor noreply@cursor.com
Co-authored-by: Copilot Copilot@users.noreply.github.com
**💡You can use the script
./hack/add-llm-coauthor.shto automatically addthese co-author trailers to your commits.
✅ Submitter Checklist
fix:,feat:) matches the "Type of Change" I selected above.make testandmake lintlocally to check for and fix anyissues. For an efficient workflow, I have considered installing
pre-commit and running
pre-commit installtoautomate these checks.