Skip to content

Fix check_whitespace_only.py to handle PRs with >300 files#18804

Merged
harupy merged 6 commits intomasterfrom
copilot/fetch-diff-for-large-prs
Nov 12, 2025
Merged

Fix check_whitespace_only.py to handle PRs with >300 files#18804
harupy merged 6 commits intomasterfrom
copilot/fetch-diff-for-large-prs

Conversation

Copy link
Contributor

Copilot AI commented Nov 12, 2025

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

The GitHub API diff endpoint returns 406 for PRs modifying >300 files, causing dev/check_whitespace_only.py to fail:

curl -H "Accept: application/vnd.github.v3.diff" https://api.github.com/repos/mlflow/mlflow/pulls/18795
# {"message": "Sorry, the diff exceeded the maximum number of files (300)..."}

Changed get_diff_from_github_api() to fetch from https://github.com/{owner}/{repo}/pull/{number}.diff instead of the API endpoint. This URL has no file limit and returns the same diff format.

How is this PR tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests

Tested with PR #18795 (>300 files) and PR #14800 (small PR) to verify both large PRs and backward compatibility.

Does this PR require documentation update?

  • No. You can skip the rest of this section.
  • Yes. I've updated:
    • Examples
    • API references
    • Instructions

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/tracking: Tracking Service, tracking client APIs, autologging
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/evaluation: MLflow model evaluation features, evaluation metrics, and evaluation workflows
  • area/gateway: MLflow AI Gateway client APIs, server, and third-party integrations
  • area/prompts: MLflow prompt engineering features, prompt templates, and prompt management
  • area/tracing: MLflow Tracing features, tracing APIs, and LLM tracing functionality
  • area/projects: MLproject format, project running backends
  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

Should this PR be included in the next patch release?

  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)
Original prompt

dev/check_whitespace_only.py fails when a PR modifies more than 300 files. To work around this, use PR_URL + .diff(e.g., https://github.com/mlflow/mlflow/pull/18795.diff) to fetch the diff (follow the redirect) instead of using the GitHub API.

curl https://api.github.com/repos/mlflow/mlflow/pulls/18795 -H "Accept: application/vnd.github.v3.diff"

{
  "message": "Sorry, the diff exceeded the maximum number of files (300). Consider using 'List pull requests files' API or locally cloning the repository instead.",
  "errors": [
    {
      "resource": "PullRequest",
      "field": "diff",
      "code": "too_large"
    }
  ],
  "documentation_url": "https://docs.github.com/rest/pulls/pulls#list-pull-requests-files",
  "status": "406"
}
curl -L https://github.com/mlflow/mlflow/pull/18795.diff | head
  ...
index 9c6648137e840..c661205d6cf0e 100644
--- a/mlflow/server/js/CLAUDE.md
+++ b/mlflow/server/js/CLAUDE.md
@@ -9,6 +9,7 @@ This file provides guidance to Claude Code when working with the MLflow frontend
 **IMPORTANT**: Always be consistent with the rest of the repository. This is extremely important!

 Before implementing any feature:
+
 1. Read through similar files to understand their structure and patterns
  1 2426k    1 32768    0     0  40225      0  0:01:01 --:--:--  0:01:01 1022k
curl: (56) Failure writing output to destination, passed 1361 returned 0

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI self-assigned this Nov 12, 2025
@harupy harupy marked this pull request as ready for review November 12, 2025 07:18
@harupy harupy added the rn/none List under Small Changes in Changelogs. label Nov 12, 2025
@github-actions
Copy link
Contributor

@Copilot Thank you for the contribution! Could you fix the following issue(s)?

⚠ Invalid PR template

This PR does not appear to have been filed using the MLflow PR template. Please copy the PR template from here and fill it out.

Copilot AI and others added 2 commits November 12, 2025 07:24
Use direct .diff URL instead of GitHub API to avoid 300 file limit

Co-authored-by: harupy <17039389+harupy@users.noreply.github.com>
Include concrete example of the API failure for >300 files

Co-authored-by: harupy <17039389+harupy@users.noreply.github.com>
Copilot AI changed the title [WIP] Update handling of large pull requests Fix check_whitespace_only.py to handle PRs with >300 files Nov 12, 2025
Copilot AI requested a review from harupy November 12, 2025 07:30
@github-actions github-actions bot added the area/build Build and test infrastructure for MLflow label Nov 12, 2025
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy requested a review from daniellok-db November 12, 2025 07:40
@harupy harupy changed the title Fix check_whitespace_only.py to handle PRs with >300 files Fix check_whitespace_only.py to handle PRs with >300 files Nov 12, 2025
@harupy harupy enabled auto-merge November 12, 2025 08:19
@harupy harupy added this pull request to the merge queue Nov 12, 2025
Merged via the queue into master with commit 2920a03 Nov 12, 2025
46 of 48 checks passed
@harupy harupy deleted the copilot/fetch-diff-for-large-prs branch November 12, 2025 08:25
BenWilson2 pushed a commit to BenWilson2/mlflow that referenced this pull request Nov 14, 2025
…18804)

Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: harupy <17039389+harupy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/build Build and test infrastructure for MLflow rn/none List under Small Changes in Changelogs.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants