Skip to content

fix(gitlab): workaround the gitlab diff API limit when necessary#2482

Merged
pipelines-as-code[bot] merged 1 commit intotektoncd:mainfrom
aThorp96:gitlab-use-repo-diff-instead-of-mr-diff
Mar 4, 2026
Merged

fix(gitlab): workaround the gitlab diff API limit when necessary#2482
pipelines-as-code[bot] merged 1 commit intotektoncd:mainfrom
aThorp96:gitlab-use-repo-diff-instead-of-mr-diff

Conversation

@aThorp96
Copy link
Copy Markdown
Member

📝 Description of the Change

When the Gitlab Merge Request exceeds the Diff API limit[1] the merge request diff API will not return the full list of changed files. To circumvent this limitation, we can use the Get Merge Request API to detect that the diff exceeds the limit and then switch to the Compare API to get the full diff. We do not use the Compare API in all instances, since it does not support paging and would increase the memory used by PaC during merge request event processing; by only using the Compare API when we know the limit is exceeded, PaC's memory usage will be more consistent and only spike when such a merge request is made.

[1] - https://docs.gitlab.com/administration/instance_limits/#diff-limits

👨🏻‍ Linked Jira

https://issues.redhat.com/browse/SRVKP-10677

🔗 Linked GitHub Issue

Fixes #

🚀 Type of Change

  • 🐛 Bug fix (fix:)
  • ✨ New feature (feat:)
  • 💥 Breaking change (feat!:, fix!:)
  • 📚 Documentation update (docs:)
  • ⚙️ Chore (chore:)
  • 💅 Refactor (refactor:)
  • 🔧 Enhancement (enhance:)
  • 📦 Dependency update (deps:)

🧪 Testing Strategy

  • Unit tests
  • Integration tests
  • End-to-end tests
  • Manual testing
  • Not Applicable

🤖 AI Assistance

  • I have not used any AI assistance for this PR.
  • I have used AI assistance for this PR.

If you have used AI assistance, please provide the following details:

Which LLM was used?

  • GitHub Copilot
  • ChatGPT (OpenAI)
  • Claude (Anthropic)
  • Cursor
  • Gemini (Google)
  • Other: ____________

Extent of AI Assistance:

  • Documentation and research only
  • Unit tests or E2E tests only
  • Code generation (parts of the code)
  • Full code generation (most of the PR)
  • PR description and comments
  • Commit message(s)

Important

If the majority of the code in this PR was generated by an AI, please add a Co-authored-by trailer to your commit message.
For example:

Co-authored-by: Gemini gemini@google.com
Co-authored-by: ChatGPT noreply@chatgpt.com
Co-authored-by: Claude noreply@anthropic.com
Co-authored-by: Cursor noreply@cursor.com
Co-authored-by: Copilot Copilot@users.noreply.github.com

**💡You can use the script ./hack/add-llm-coauthor.sh to automatically add
these co-author trailers to your commits.

✅ Submitter Checklist

  • 📝 My commit messages are clear, informative, and follow the project's How to write a git commit message guide. The Gitlint linter ensures in CI it's properly validated
  • ✨ I have ensured my commit message prefix (e.g., fix:, feat:) matches the "Type of Change" I selected above.
  • ♽ I have run make test and make lint locally to check for and fix any
    issues. For an efficient workflow, I have considered installing
    pre-commit and running pre-commit install to
    automate these checks.
  • 📖 I have added or updated documentation for any user-facing changes.
  • 🧪 I have added sufficient unit tests for my code changes.
  • 🎁 I have added end-to-end tests where feasible. See README for more details.
  • 🔎 I have addressed any CI test flakiness or provided a clear reason to bypass it.
  • If adding a provider feature, I have filled in the following and updated the provider documentation:
    • GitHub App
    • GitHub Webhook
    • Gitea/Forgejo
    • GitLab
    • Bitbucket Cloud
    • Bitbucket Data Center

@aThorp96 aThorp96 requested a review from chmouel February 16, 2026 20:10
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello @aThorp96, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a limitation in the GitLab API where large merge requests can exceed the Diff API's file count limit, leading to incomplete change detection. The solution intelligently switches to an alternative API endpoint (the Compare API) only when necessary, ensuring all file changes are accurately captured without negatively impacting performance for typical merge requests. This improves the robustness of GitLab integration by handling edge cases gracefully.

Highlights

  • GitLab Diff API Limit Workaround: Implemented a workaround for the GitLab Merge Request Diff API limit. When a merge request exceeds this limit, the standard Diff API may not return the full list of changed files. The system now detects this condition and switches to using the GitLab Compare API to retrieve all changes.
  • Optimized API Usage: The more memory-intensive GitLab Compare API is only utilized when the Diff API limit is detected as exceeded. This approach prevents unnecessary memory spikes and maintains consistent performance for most merge request event processing.
  • New isMergeRequestDiffTruncated Function: Introduced a new helper function, isMergeRequestDiffTruncated, to determine if a merge request's diff has been truncated by checking the ChangesCount field for a '+' suffix, as indicated by GitLab's API documentation.
  • Enhanced Test Coverage: Added new unit tests and updated existing ones to cover scenarios where the GitLab Diff API limit is exceeded, ensuring the workaround functions as expected and API call metrics are correctly tracked.
Changelog
  • pkg/provider/gitlab/gitlab.go
    • Imported the strconv package for string to integer conversions.
    • Refactored fetchChangedFiles to delegate merge request file retrieval to a new function.
    • Introduced mergeRequestFilesChanged to conditionally use the Diff API or Compare API based on truncation status.
    • Added isMergeRequestDiffTruncated function to detect if a merge request's diff has exceeded GitLab's API limits.
  • pkg/provider/gitlab/gitlab_test.go
    • Imported the strconv package for string to integer conversions in tests.
    • Extended test case structure with diffAPILimitExceeded and apiCallCount fields.
    • Added a new test case to specifically validate the behavior when the GitLab Diff API limit is exceeded.
    • Updated mock HTTP handlers to simulate truncated Diff API responses and full Compare API responses.
    • Adjusted API call count assertions in tests to reflect the additional API calls made for limit checking.
Activity
  • The pull request was created to fix a bug related to GitLab Diff API limits.
  • Unit tests have been added to validate the new logic.
  • Manual testing has been performed.
  • The author confirmed no AI assistance was used for this PR.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to work around a GitLab API limitation by detecting truncated diffs and switching to the Compare API to fetch all changed files. However, a critical bug exists where the targetProjectID field is uninitialized, which will cause GitLab Merge Request processing to fail. Furthermore, the use of the unpaged Compare API and pre-allocation of slices based on untrusted file counts introduces a risk of memory exhaustion (OOM) attacks. There are also opportunities to improve test clarity.

@aThorp96 aThorp96 force-pushed the gitlab-use-repo-diff-instead-of-mr-diff branch from 7859071 to 4e3659a Compare February 16, 2026 20:41
return changedfiles.ChangedFiles{}, err
}
options := []gitlab.RequestOptionFunc{}
case triggertype.Push:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that the attached doc only mentions Merge Requests but isn't a similar diff truncation applicable to commit diff or Max Push Size would help us in such a scenario?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. The Commit Diff endpoint is subject to the same limitation. It's not as straightforward to detect this is the case, but I believe I've found a way:

Using the Get Commit endpoint, the stats field seems to reflect the full number of files changes:

 $ curl  'https://gitlab.com/api/v4/projects/athorp-rh%2Fkonflux-tests/repository/commits/83efdb8bfb2ce652c980be1b6cbd2cb6993e5b89' 2>/dev/null  | jq .stats
{
  "additions": 10000,
  "deletions": 0,
  "total": 10000
}

Once we know the total number of changes, when we make the first request to the Commit Diff API we can compare the total number against the Diff API's response header x-total. If x-total is lower than status.total, we know that the diff is truncated and can fall back on the Compare endpoint.

That being said, the Compare endpoint does require we specify a base and head to compare. We already have the commit's parent from the Get Commit endpoint, but for Merge Commits there are technically two parents, and we'd have to ensure we're comparing against the right one. It's unclear if the commits are ordered, but at least in this case the first commit is the "base" and the second commit is the merge request's "head":

$ curl  'https://gitlab.com/api/v4/projects/athorp-rh%2Fkonflux-tests/repository/commits/95e8435ccb4844496c89f19f71c22c54bd38ceba'  | jq .parent_ids
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   845  100   845    0     0   3588      0 --:--:-- --:--:-- --:--:--  3595
[
  "83efdb8bfb2ce652c980be1b6cbd2cb6993e5b89",
  "a618a6a727a2252cc1f4d70397823ba7dc3f48b2"
]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed explanation!
Should we not include this truncated commit diff handling or you reckon that is outside the current scope of changes?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Commit Diff endpoint is subject to the same limitation. It's not as straightforward to detect this is the case

@aThorp96 if you see the response of Commit Diff API you have a field collapsed which indicates that the files changes are excluded.

Image

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zakisk the collapsed field is per file and is not related to the diff API returning a truncated file list. In the commit I used in the example above, none of the diffs returned from the diff API have collapsed: true even though the diff API only pages through 3k of the 10k file changes

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if its this API endpoint GET /projects/:id/repository/commits/:sha/diff
then on calling this GitLab return header X-total to indicate the total changes:

curl "https://gitlab.com/api/v4/projects/77045052/repository/commits/83efdb8bfb2ce652c980be1b6cbd2cb6993e5b89/diff" \
     --dump-header - \
     --output /dev/null
HTTP/2 200 
date: Thu, 26 Feb 2026 09:11:03 GMT
content-type: application/json
content-length: 5125
server: cloudflare
...
x-gitlab-meta: {"correlation_id":"9d3e5009f38e47c8-ATL","version":"1"}
x-next-page: 2
x-page: 1
x-per-page: 20
x-prev-page: 
x-request-id: 9d3e5009f38e47c8-ATL
x-runtime: 0.851352
x-total: 3000
x-total-pages: 150
set-cookie: __cf_bm=5QUhD5w3K1rEiNyrAKKNwVuF3ACS7fMOthDmCVFF2JA-1772097063-1.0.1.1-WgKIgyf3.vmlqVFS6cSkGDN5AmtlZGBudG38fH.tHky_5VM48iZXmgXyC6lx4MgCK.gu1R2TQl5Vs24ZZKKe0TG4d2CuQtWfDflVQjCBzPc; path=/; expires=Thu, 26-Feb-26 09:41:03 GMT; domain=.gitlab.com; HttpOnly; Secure; SameSite=None
set-cookie: _cfuvid=dx_NxS0lh_Qy5Ggjdp.C3AmjOkM1HmdDypBT7bfc6q8-1772097063160-0.0.1.1-604800000; path=/; domain=.gitlab.com; HttpOnly; Secure; SameSite=None

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The X-total header refers to the number of items which will be paged over, so if the commit exceeds the diff limits then the X-total number will be less than the actual number of changes. That header can be used to detect when the diff is too large, but only in conjunction with a separate API call that returns the real diff size. (details in my earlier comment).

I was considering implementing the diff-limit workaround for push events separately, but if you'd prefer I can include it in this pull request. WDYT? @zakisk @chmouel

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to call another can't you directly call the diff API and see if there are more changes using X-total header?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that currently we use that diff API already for changes

Copy link
Copy Markdown
Member Author

@aThorp96 aThorp96 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need to call another can't you directly call the diff API and see if there are more changes using X-total header?

Because of this:

The X-total header refers to the number of items which will be paged over, so if the commit exceeds the diff limits then the X-total number will be less than the actual number of changes.

In other words: if the diff API is limiting the total number of diffs which will be returned, the X-total header does not show the full number of changes, it shows the number of changes which will be returned.

For example, returning to the commit I discuss in previous comments:

  • I have a commit with 10,000 changes
  • the diffs API will return a page of 20 diffs and its X-total header will say 3,000
  • When I page through the first 3000 diffs using the diff API, the diff API will not indicate that there are 7,000 diffs which were not returned
  • the only way to see that the commit includes more than 3,000 diffs is to use the Get Commit API endpoint which shows the full number of changes but not their content

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Feb 25, 2026

@aThorp96 any chance you can address akhsay or zaki comments or set this PR as draft if you don't have time to take care of this ?

@pipelines-as-code
Copy link
Copy Markdown

pipelines-as-code bot commented Mar 3, 2026

🔍 PR Lint Feedback

Note: This automated check helps ensure your PR follows our contribution guidelines.

⚠️ Items that need attention:

🤖 AI attribution

The following commits lack an explicit AI attribution footer:

  • ca5be6b fix(gitlab): workaround the gitlab diff API limit when necessary

If no AI assistance was used for a commit, you can ignore this warning.
Otherwise add an Assisted-by: or Co-authored-by: footer referencing the AI used.


ℹ️ Next Steps

  • Review and address the items above
  • Push new commits to update this PR
  • This comment will be automatically updated when issues are resolved
🔧 Admin Tools (click to expand)

Automated Issue/Ticket Creation:

  • /issue-create - Generate a GitHub issue from this PR content using AI
  • /jira-create - Create a SRVKP Jira ticket from this PR content using AI

⚠️ Important: Always review and edit generated content before finalizing tickets/issues.
The AI-generated content should be used as a starting point and may need adjustments.

These commands are available to maintainers and will post the generated content as PR comments for review.

🤖 This feedback was generated automatically by the PR CI system

@aThorp96
Copy link
Copy Markdown
Member Author

aThorp96 commented Mar 3, 2026

Per discussion w/ @chmouel, I'll create a ticket for implementing support for this on Push events.

@aThorp96 aThorp96 force-pushed the gitlab-use-repo-diff-instead-of-mr-diff branch from 6ca8fc6 to 77776aa Compare March 3, 2026 15:56
From: &runevent.BaseBranch,
To: &runevent.HeadBranch,
}
comparison, _, err := v.Client().Repositories.Compare(v.targetProjectID, compareOpts, options...)
Copy link
Copy Markdown
Member

@chmouel chmouel Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My memory is fuzzy but would that work when coming from fork ? it should check SourceProjectID there, can you double check please?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've confirmed this surprisingly does work with a fork 🙂

Copy link
Copy Markdown
Member

@chmouel chmouel Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make sense at all, oh well 🙃

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it was unexpected to me as well. I was expecting to have to compare against refs/merge-requests/<id>/head, since a fork's commit gives a 404 in the target repo in the UI. But at least with the Compare endpoint it can be referenced. Maybe just when there is an open merge request, IDK

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do see a from_project_id parameter in the docs, surprising it works without it.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me double check to make sure I'm not conflating things

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mixing things up while working on something very similar at the same time, and was for some reason thinking the HeadBranch was the Gitlab internal ref refs/merge-requests/<id>/head. HeadBranch is the fork's branch name though which I've confirmed does not work. Thanks for flagging.

/hold

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for catching @chmouel. I've confirmed that if a merge request is open the Compare API supports comparing between the base branch and the head commit SHA and I've updated the code and tests accordingly

Copy link
Copy Markdown
Member

@chmouel chmouel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine adding the support for Push later, this won't break it atm, we just need to capture it in our downstream backlog as well (already on gh issue #2527 )

From: &runevent.BaseBranch,
To: &runevent.HeadBranch,
}
comparison, _, err := v.Client().Repositories.Compare(v.targetProjectID, compareOpts, options...)
Copy link
Copy Markdown
Member

@chmouel chmouel Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't make sense at all, oh well 🙃

Copy link
Copy Markdown
Member

@theakshaypant theakshaypant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Linting error at gitlab.go:L739.

@aThorp96 aThorp96 force-pushed the gitlab-use-repo-diff-instead-of-mr-diff branch from 77776aa to 6b4e7b7 Compare March 4, 2026 14:58
When the Gitlab Merge Request exceeds the Diff API limit[1] the merge
request diff API will not return the full list of changed files. To
circumvent this limitation, we can use the Get Merge Request API to
detect that the diff exceeds the limit and then switch to the Compare
API to get the full diff. We do not use the Compare API in all instances,
since it does not support paging and would increase the memory used by
PaC during merge request event processing; by only using the Compare API
when we know the limit is exceeded, PaC's memory usage will be more
consistent and only spike when such a merge request is made.

[1] - https://docs.gitlab.com/administration/instance_limits/#diff-limits
@aThorp96 aThorp96 force-pushed the gitlab-use-repo-diff-instead-of-mr-diff branch from 6b4e7b7 to ca5be6b Compare March 4, 2026 18:24
@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 4, 2026

/lgtm

@aThorp96
Copy link
Copy Markdown
Member Author

aThorp96 commented Mar 4, 2026

/retest boussole

@aThorp96
Copy link
Copy Markdown
Member Author

aThorp96 commented Mar 4, 2026

/test boussole

@aThorp96
Copy link
Copy Markdown
Member Author

aThorp96 commented Mar 4, 2026

/help

@pipelines-as-code
Copy link
Copy Markdown

🤖 Available Commands

Command Description
/assign user1 user2 Assigns users for review to the PR
/unassign user1 user2 Removes assigned users
/label bug feature Adds labels to the PR
/unlabel bug feature Removes labels from the PR
/lgtm Approves the PR if at least 1 org members have commented /lgtm
/merge [method] Merges the PR if approvals are sufficient. Admin/write users can merge directly with threshold=1
/cherry-pick target-branch Cherry-picks the PR changes to the target branch
/rebase Rebases the PR branch on the base branch
/help Shows this help message

Automated by the PAC Boussole 🧭

@aThorp96
Copy link
Copy Markdown
Member Author

aThorp96 commented Mar 4, 2026

LOL commenting /retest boussole let boussolle run again, but it caused it to fail since it doesn't handle that slash command 🫠

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 4, 2026

/approve

@chmouel
Copy link
Copy Markdown
Member

chmouel commented Mar 4, 2026

/merge

@pipelines-as-code pipelines-as-code bot merged commit 05529c9 into tektoncd:main Mar 4, 2026
14 checks passed
@pipelines-as-code
Copy link
Copy Markdown

✅ PR Successfully Merged

  • Merge method: rebase
  • Merged by: @chmouel
  • Total approvals: 1/1

Approvals Summary:

Reviewer Permission Status
@chmouel admin

Thank you @aThorp96 for your valuable contribution! 🎉

Automated by the PAC Boussole 🧭

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants