Skip to content

Workaround Gitlab API's diff size limit for large pull requests and commits #2527

@aThorp96

Description

@aThorp96

Overview

Gitlab instances have a Diff API limit which limits the size, count, and content of results returned from /diff API endpoints. This effects endpoints we use like "Get Merge Request Diff" and "Get Commit Diff" when getting the list of file changes in the merge request or pushed commit. When the limit is exceeded, the diff API will not return the full list of diffs and the API response's x-total header will not be the full list of files; paging through the diffs will simply not yield the full set of diffs. This causes the CEL variable files and function <string>.pathChanged() to be incomplete when the merge request or commit exceeds the configured diff size limit.

Fix

It appears the only way to to circumvent this limitation it to use the Repository Compare API to get the full diff between two commits. This returns the full diff in one response, no paging. Since there is no paging, defaulting to using this endpoint could significantly increase the memory requirements of PaC, so ideally we only use the Compare endpoint when we detect the limit is exceeded.

Implementation

Merge Requests

To detect the limit for a Merge Request, we can use the Get Merge Request endpoint and if the ChangeCount fields ends with a + then we know the diff exceeds the limit.

Push

To detect the limit for a Push event we can use the Get Commit endpoint to get the actual number of diffs from the commit stats, then when we make the first request to the Get Commit Diff endpoint, we can compare the actual number of diffs to the Diff API's x-total header. If the header is a lower number than the actual changes, then the diff limit is exceeded.

For push events, the only hangup is what to use as the Base commit in the Compare request. If we use the parent commit that would be inline with the Get Commit Diff, however if there are multiple parents (e.g. a merge commit) this may not be correct (from the documentation, parent commit order is undefined). If we use the before commit (the head commit before the push event, that handles single-commit pushes and merge commits correctly, but for a push event that pushed more than one commit it changes the semantics of files and <string>.pathChanged() to include any file changed by any commit which was pushed. After discussing the tradeoffs with @chmouel, we decided it makes the most sense to compare the before commit with the pushed SHA.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions