Skip to content

Purge git history for irrelevant files from all repos #5594

@schlessera

Description

@schlessera

WP-CLI started out as a singe repository in git and was split up into multiple packages many years later.

When splitting up the packages, we tried to make sure that no historical knowledge is lost, so each subsplit package started out as the original main package and then removed the files that were not needed.

However, it seems like the purging we did afterward to remove unused history was not thorough enough.

Right now, each package contains the history of all WP-CLI files across all commands up to the point of the split. This makes most statistics and contribution data useless and drastically increases the size of the individual VCS repos. When cloning a full environment, each command repo has about 10MB of historical data of which most is useless.

To solve this, a more thorough purge should be done:

git checkout master
git ls-files > keep-these.txt
git ls-files | while read -r line; do (git log --follow --raw --diff-filter=R --pretty=format:%H "$line" | while true; do if ! read hash; then break; fi; IFS=$'\t' read mode_etc oldname newname; read blankline; echo $oldname; done); done >> keep-these.txt
git filter-branch --force --index-filter  "git rm  --ignore-unmatch --cached -qr . ; cat $PWD/keep-these.txt | tr '\n' '\0' | xargs -d '\0' git reset -q \$GIT_COMMIT --" --prune-empty --tag-name-filter cat -- --all
rm -rf .git/refs/original/
git reflog expire --expire=now --all
git gc --aggressive --prune=now

Then, the master branch needs to be force-pushed to overwrite the current master branch history.

⚠️ However, this will cause all open PRs to become invalid and immediately be closed! ⚠️

Therefore, this should be worked upon for each repo separately, while first ensuring no PRs are open against the master branch anymore.

TODO:

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions