Skip to content

Add CI check to detect unused media files in docs#19099

Merged
harupy merged 5 commits intomlflow:masterfrom
harupy:remove-unused-doc-images
Dec 1, 2025
Merged

Add CI check to detect unused media files in docs#19099
harupy merged 5 commits intomlflow:masterfrom
harupy:remove-unused-doc-images

Conversation

@harupy
Copy link
Member

@harupy harupy commented Nov 28, 2025

🛠 DevTools 🛠

Open in GitHub Codespaces

Install mlflow from this PR

# mlflow
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19099/merge
# mlflow-skinny
pip install git+https://github.com/mlflow/mlflow.git@refs/pull/19099/merge#subdirectory=libs/skinny

For Databricks, use the following command:

%sh curl -LsSf https://raw.githubusercontent.com/mlflow/mlflow/HEAD/dev/install-skinny.sh | sh -s pull/19099/merge

What changes are proposed in this pull request?

This PR adds a CI check to detect unused media files (images and videos) in the documentation directories and removes existing unused files.

  • Add dev/find-unused-media.sh script that detects unused images (png, jpg, gif, webp, ico, avif) and videos (mp4) in docs/
  • Add setup-ripgrep composite action with configurable version
  • Add CI check in lint.yml workflow to run the unused media detection
  • Remove 122 unused media files, reducing repository size by ~57 MB

How is this PR tested?

  • Manual tests

Verified that the removed files are not referenced in the documentation.

Does this PR require documentation update?

  • No. You can skip the rest of this section.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.

How should the PR be classified in the release notes? Choose one:

  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section

Should this PR be included in the next patch release?

  • Yes (this PR will be cherry-picked and included in the next patch release)
  • No (this PR will be included in the next minor release)

🤖 Generated with Claude Code

@github-actions github-actions bot added the rn/none List under Small Changes in Changelogs. label Nov 28, 2025
@harupy harupy added the team-review Trigger a team review request label Nov 28, 2025
@harupy harupy force-pushed the remove-unused-doc-images branch 2 times, most recently from 0408ee3 to 1ffb978 Compare November 28, 2025 02:41
@github-actions
Copy link
Contributor

github-actions bot commented Nov 28, 2025

Documentation preview for 9b795e1 is available at:

More info
  • Ignore this comment if this PR does not change the documentation.
  • The preview is updated when a new commit is pushed to this PR.
  • This comment was created by this workflow run.
  • The documentation was built by this workflow run.

@harupy harupy force-pushed the remove-unused-doc-images branch 2 times, most recently from 4d1403e to 817f6ea Compare November 28, 2025 08:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces tooling to automatically detect and remove unused documentation images, successfully cleaning up approximately 100 unused image files totaling 30.60 MB. The solution includes a bash script for detecting unused images and integrates it into the CI workflow to prevent future accumulation of unused assets.

  • Added a bash script (dev/remove-unused-images.sh) to identify and optionally remove unused documentation images
  • Created a GitHub Actions composite action to install ripgrep as a dependency
  • Integrated the unused image check into the docs CI workflow to run on every PR

Reviewed changes

Copilot reviewed 3 out of 106 changed files in this pull request and generated 6 comments.

File Description
dev/remove-unused-images.sh New bash script that scans for image files in docs and identifies those not referenced in the codebase, with support for both check-only and removal modes
.github/workflows/docs.yml Adds CI check to detect unused images during PR validation, preventing future accumulation of unreferenced documentation assets
.github/actions/setup-ripgrep/action.yml Reusable composite action to install ripgrep (text search tool) used by the unused images detection script

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@harupy harupy force-pushed the remove-unused-doc-images branch 4 times, most recently from f4b5401 to a5a04a2 Compare December 1, 2025 02:26
@harupy harupy force-pushed the remove-unused-doc-images branch 5 times, most recently from 7edc159 to e24544c Compare December 1, 2025 09:16
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy force-pushed the remove-unused-doc-images branch from e24544c to c0f64ea Compare December 1, 2025 09:16
…d mp4 files

- Add mp4 support to the unused media detection script
- Rename script from find-unused-images.sh to find-unused-media.sh
- Update pre-commit hook configuration
- Remove 18 unused mp4 files from docs

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy changed the title Remove unused documentation images Add pre-commit hook to detect unused media files in docs Dec 1, 2025
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

% git ls-tree -r -l HEAD | awk '{sum += $4} END {printf "%.2f MB\n", sum/1024/1024}'
255.33 MB

This PR reduces the repo size by about 20%.

@harupy harupy changed the title Add pre-commit hook to detect unused media files in docs Add CI check to detect unused media files in docs Dec 1, 2025
- Remove unused-media pre-commit hook from .pre-commit-config.yaml
- Add setup-ripgrep composite action with configurable version
- Add unused media check step to lint.yml workflow
- Update find-unused-media.sh to use system rg with helpful error message

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
@harupy harupy force-pushed the remove-unused-doc-images branch from f3d8052 to 5716185 Compare December 1, 2025 14:01
Signed-off-by: harupy <17039389+harupy@users.noreply.github.com>
Copy link
Collaborator

@B-Step62 B-Step62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Co-authored-by: Yuki Watanabe <31463517+B-Step62@users.noreply.github.com>
Signed-off-by: Harutaka Kawamura <hkawamura0130@gmail.com>
@harupy harupy merged commit a201053 into mlflow:master Dec 1, 2025
19 of 46 checks passed
@harupy harupy deleted the remove-unused-doc-images branch December 1, 2025 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rn/none List under Small Changes in Changelogs. team-review Trigger a team review request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants