ACE+TAO Debian packaging moved to git
We recently converted Debian ACE+TAO package repository from Subversion to
git.
This was a long and interesting process; I learned a lot on git in the course.
I had been using git for a while for other packages: BOUML, dwarves and
GNU Smalltalk. But I did not really get it.
A preliminary study led by Pau[1] showed that out of the following three tools:
the last one was giving results that look better.
The conversion
svn-all-fast-export requires physical access to the repo, so the
Alioth SVN repo was copied on my machine svn-pkg-ace/
before running the tool:
svn-all-fast-export --identity-map authors.txt --rules pkg-ace.rules svn-pkg-ace
Here's the content of the pkg-ace.rules configuration file that was
used:
create repository pkg-ace end repository match /trunk/ repository pkg-ace branch master end match match /(branches|tags)/([^/]+)/ repository pkg-ace branch \2 end match
The author mapping file authors.txt being:
markos = Konstantinos Margaritis <email-hidden> mbrudka-guest = Marek Brudka <email-hidden> pgquiles-guest = Pau Garcia i Quiles <email-hidden> tgg = Thomas Girard <email-hidden> tgg-guest = Thomas Girard <email-hidden>
The tool sample configuration file merged-branches-tags.rules
recommends to post-process tags, which are just a branch in SVN. That's why
the configuration file above treats branches as tags.
The conversion was indeed fast: less than 1 minute.
Post-conversion observations
Invoking gitk --all in the converted repo revealed different
kind of issues:
svn tags as branches:
Branches are marked with green rectangles, and tags with yellow arrows.
What we have here (expected given our configuration of the tool) are
branches (e.g. 5.4.7-5) corresponding to tags, and tags matching the SVN
tagging commit (e.g. backups/5.4.7-5@224). We'll review and fix this.merged code that did not appear as such:
Branches that were not merged using svn merge look like they were not
merged at all.commits with wrong author:
Before being in SVN, the repository was stored in CVS. When it was imported
into SVN, no special attention was given to the commit author. Hence I
got credited for changes I did not write.obsolete branches:
The tool leaves all branches, including removed ones (with tag on their end)
so that you can decide what to do with them.missing merges:
The branch 5.4.7-12 was never merged into the trunk!
Learning git
Based on observations above, I realized my limited knowledge won't do to
complete the conversion and clean the repository. There are tons of
documentation on git out there, and you can find a lot of
links from the git documentation
page. Here's the one I've used:
- Pro Git, written by Scott Chacon
- The Git Community Book
The Git Object Model
It's described with pictures
here.
You really need to understand this if you haven't already.
Once you do, you understand that git is built bottom-up: the plumbing
then the porcelain. If you can't find the tool you need, it's easy to write
it.
git fast-import
The Migrating to Git chapter
explains how you can use the git fast-import tool to manually
import anything into git.
I've used it to create tags with dates in the past, slightly changing the
Custom Importer example in the book:
#!/usr/bin/env ruby # # retag.rb # # Small script to create an annotated tag, specifying commiter as well as # date, and tag comment. # # Based on Scott Chacon "Custom Importer" example. # # Arguments: # $1 -- tag name # $2 -- sha-1 revision to tag # $3 -- committer in the form First Last <email> # $4 -- date to use in the form YYYY/MM/DD_HH:MM:SS def help puts "Usage: retag <tag> <sha1sum> <committer> <date> <comment>" puts "Creates a annotated tag with name <tag> for commit <sha1sum>, using " puts "given <committer>, <date> and <comment>" puts "The output should be piped to git fast-import" end def to_date(datetime) (date, time) = datetime.split('_') (year, month, day) = date.split('/') (hour, minute, second) = time.split(':') return Time.local(year, month, day, hour, minute, second).to_i end def generate_tag(tag, sha1hash, committer, date, message) puts "tag #{tag}" puts "from #{sha1hash}" puts "tagger #{committer} #{date} +0000" print "data #{message.size}\n#{message}" end if ARGV.length != 5 help exit 1 else (tag, sha1sum, committer, date, message) = ARGV generate_tag(tag, sha1sum, committer, to_date(date), message) end
graft points
(graft means greffe in French)
Because of missing svn:mergeinfo some changes appear unmerged.
To fix this there are
graft points: they
override git idea of parents of a commit.
To create a graft point, assuming 6a6d48814d0746fa4c9f6869bd8d5c3bc3af8242
is the commit you want to change, currently with a single parent
898ad49b61d4d8d5dc4072351037e2c8ade1ab68, but containing changes from
commit 11cf74d4aa996ffed7c07157fe0780ec2224c73e:
me@mymachine$ echo 6a6d48814d0746fa4c9f6869bd8d5c3bc3af8242 11cf74d4aa996ffed7c07157fe0780ec2224c73e 898ad49b61d4d8d5dc4072351037e2c8ade1ab68 >> .git/info/grafts
git filter-branch
git filter-branch allows you to completely rewrite history of a git branch,
changing or dropping commits while traversing the branch.
As an additional benefit, this tool use graft points and make them permanent.
In other words: after running git filter-branch you can remove
.git/info/grafts file.
I've used it to rewrite author of a given set of commits, using a hack on top of
Chris Johnsen script:
#!/bin/sh br="HEAD" TARG_NAME="Raphael Bossek" TARG_EMAIL="hidden" export TARG_NAME TARG_EMAIL filt=' if test "$GIT_COMMIT" = 546db1966133737930350a098057c4d563b1acdf -o \ "$GIT_COMMIT" = 23419dde50662852cfbd2edde9468beb29a9ddcc; then if test -n "$TARG_EMAIL"; then GIT_AUTHOR_EMAIL="$TARG_EMAIL" export GIT_AUTHOR_EMAIL else unset GIT_AUTHOR_EMAIL fi if test -n "$TARG_NAME"; then GIT_AUTHOR_NAME="$TARG_NAME" export GIT_AUTHOR_NAME else unset GIT_AUTHOR_NAME fi fi ' git filter-branch $force --tag-name-filter cat --env-filter "$filt" -- $br
(Script edited here; there were much more commits written by Raphael.)
Important
It's important to realize that the whole selected branch history is
rewritten, so all objects id will change. You should not do this if
you already published your repository.
The --tag-name-filter cat argument ensures our tags are copied during the
traversal; otherwise they would be untouched, and hence not available in the
new history.
Hint
Once git filter-branch completes you get a new history, as well as a new
original ref to ease comparison. It is highly recommended to check the
result of the rewrite before removing original. To shrink the repo after
this, git clone the rewritten repo with file:// syntax
-- git-filter-branch says it all.
Cleaning up the repo
To recap, here's how the ACE+TAO git repo was changed after conversion:
Add graft points where needed.
Clean tags and branches.
Using git tag -d, git branch -d and the Ruby script above
it was possible to recreate tags.During this I was also able to add missing tags, and remove some SVN
errors I did -- like committing in a branch created under tags/.Remove obsolete branches.
Merge missing pieces.
There were just two missing debian/changelog entries.
I did this before git filter-branch because I did not find a way to
use the tool correctly with multiple heads.Fix commit author where needed.
Using the shell script above Raphael is now correctly credited for his work.
That's it.
The ACE+TAO git repository for Debian packages is alive at http://git.debian.org/?p=pkg-ace/pkg-ace.git;a=summary.
| [1] | http://lists.alioth.debian.org/pipermail/pkg-ace-devel/2011-March/002421.html |
| [2] | available in Debian as svn-all-fast-export |