ACE+TAO Debian packaging moved to git




We recently converted Debian ACE+TAO package repository from Subversion to
git.


This was a long and interesting process; I learned a lot on git in the course.
I had been using git for a while for other packages: BOUML, dwarves and
GNU Smalltalk. But I did not really get it.


A preliminary study led by Pau[1] showed that out of the following three tools:



the last one was giving results that look better.



The conversion


svn-all-fast-export requires physical access to the repo, so the
Alioth SVN repo was copied on my machine svn-pkg-ace/
before running the tool:


svn-all-fast-export --identity-map authors.txt --rules pkg-ace.rules svn-pkg-ace

Here's the content of the pkg-ace.rules configuration file that was
used:


create repository pkg-ace
end repository

match /trunk/
  repository pkg-ace
  branch master
end match

match /(branches|tags)/([^/]+)/
  repository pkg-ace
  branch \2
end match

The author mapping file authors.txt being:


markos = Konstantinos Margaritis <email-hidden>
mbrudka-guest = Marek Brudka <email-hidden>
pgquiles-guest = Pau Garcia i Quiles <email-hidden>
tgg = Thomas Girard <email-hidden>
tgg-guest = Thomas Girard <email-hidden>

The tool sample configuration file merged-branches-tags.rules
recommends to post-process tags, which are just a branch in SVN. That's why
the configuration file above treats branches as tags.


The conversion was indeed fast: less than 1 minute.




Post-conversion observations


Invoking gitk --all in the converted repo revealed different
kind of issues:



  • svn tags as branches:


    http://thomas.g.girard.free.fr/ACE/tags-as-branches.png

    Branches are marked with green rectangles, and tags with yellow arrows.
    What we have here (expected given our configuration of the tool) are
    branches (e.g. 5.4.7-5) corresponding to tags, and tags matching the SVN
    tagging commit (e.g. backups/5.4.7-5@224). We'll review and fix this.



  • merged code that did not appear as such:


    http://thomas.g.girard.free.fr/ACE/missing-merge-metadata.png

    Branches that were not merged using svn merge look like they were not
    merged at all.



  • commits with wrong author:


    http://thomas.g.girard.free.fr/ACE/wrong-author.png

    Before being in SVN, the repository was stored in CVS. When it was imported
    into SVN, no special attention was given to the commit author. Hence I
    got credited for changes I did not write.



  • obsolete branches:


    http://thomas.g.girard.free.fr/ACE/obsolete-branches.png

    The tool leaves all branches, including removed ones (with tag on their end)
    so that you can decide what to do with them.



  • missing merges:


    http://thomas.g.girard.free.fr/ACE/missing-merge.png

    The branch 5.4.7-12 was never merged into the trunk!






Learning git


Based on observations above, I realized my limited knowledge won't do to
complete the conversion and clean the repository. There are tons of
documentation on git out there, and you can find a lot of
links from the git documentation
page. Here's the one I've used:




The Git Object Model


It's described with pictures
here.
You really need to understand this if you haven't already.


Once you do, you understand that git is built bottom-up: the plumbing
then the porcelain. If you can't find the tool you need, it's easy to write
it.




git fast-import


The Migrating to Git chapter
explains how you can use the git fast-import tool to manually
import anything into git.


I've used it to create tags with dates in the past, slightly changing the
Custom Importer example in the book:


#!/usr/bin/env ruby
#
# retag.rb
#
# Small script to create an annotated tag, specifying commiter as well as
# date, and tag comment.
#
# Based on Scott Chacon "Custom Importer" example.
#
# Arguments:
#  $1 -- tag name
#  $2 -- sha-1 revision to tag
#  $3 -- committer in the form First Last <email>
#  $4 -- date to use in the form YYYY/MM/DD_HH:MM:SS

def help
  puts "Usage: retag <tag> <sha1sum> <committer> <date> <comment>"
  puts "Creates a annotated tag with name <tag> for commit <sha1sum>, using "
  puts "given <committer>, <date> and <comment>"
  puts "The output should be piped to git fast-import"
end

def to_date(datetime)
  (date, time) = datetime.split('_')
  (year, month, day) = date.split('/')
  (hour, minute, second) = time.split(':')
  return Time.local(year, month, day, hour, minute, second).to_i
end

def generate_tag(tag, sha1hash, committer, date, message)
  puts "tag #{tag}"
  puts "from #{sha1hash}"
  puts "tagger #{committer} #{date} +0000"
  print "data #{message.size}\n#{message}"
end

if ARGV.length != 5
  help
  exit 1
else
  (tag, sha1sum, committer, date, message) = ARGV
  generate_tag(tag, sha1sum, committer, to_date(date), message)
end



graft points


(graft means greffe in French)


Because of missing svn:mergeinfo some changes appear unmerged.
To fix this there are
graft points: they
override git idea of parents of a commit.


To create a graft point, assuming 6a6d48814d0746fa4c9f6869bd8d5c3bc3af8242
is the commit you want to change, currently with a single parent
898ad49b61d4d8d5dc4072351037e2c8ade1ab68, but containing changes from
commit 11cf74d4aa996ffed7c07157fe0780ec2224c73e:


me@mymachine$ echo 6a6d48814d0746fa4c9f6869bd8d5c3bc3af8242 11cf74d4aa996ffed7c07157fe0780ec2224c73e 898ad49b61d4d8d5dc4072351037e2c8ade1ab68 >> .git/info/grafts



git filter-branch


git filter-branch allows you to completely rewrite history of a git branch,
changing or dropping commits while traversing the branch.


As an additional benefit, this tool use graft points and make them permanent.
In other words: after running git filter-branch you can remove
.git/info/grafts file.


I've used it to rewrite author of a given set of commits, using a hack on top of
Chris Johnsen script:


#!/bin/sh

br="HEAD"

TARG_NAME="Raphael Bossek"
TARG_EMAIL="hidden"
export TARG_NAME TARG_EMAIL

filt='

    if test "$GIT_COMMIT" = 546db1966133737930350a098057c4d563b1acdf -o \
            "$GIT_COMMIT" = 23419dde50662852cfbd2edde9468beb29a9ddcc; then
        if test -n "$TARG_EMAIL"; then
            GIT_AUTHOR_EMAIL="$TARG_EMAIL"
            export GIT_AUTHOR_EMAIL
        else
            unset GIT_AUTHOR_EMAIL
        fi
        if test -n "$TARG_NAME"; then
            GIT_AUTHOR_NAME="$TARG_NAME"
            export GIT_AUTHOR_NAME
        else
            unset GIT_AUTHOR_NAME
        fi
    fi

'

git filter-branch $force --tag-name-filter cat --env-filter "$filt" -- $br

(Script edited here; there were much more commits written by Raphael.)



Important


It's important to realize that the whole selected branch history is
rewritten, so all objects id will change. You should not do this if
you already published your repository.



The --tag-name-filter cat argument ensures our tags are copied during the
traversal; otherwise they would be untouched, and hence not available in the
new history.



Hint


Once git filter-branch completes you get a new history, as well as a new
original ref to ease comparison. It is highly recommended to check the
result of the rewrite before removing original. To shrink the repo after
this, git clone the rewritten repo with file:// syntax
-- git-filter-branch says it all.






Cleaning up the repo


To recap, here's how the ACE+TAO git repo was changed after conversion:



  1. Add graft points where needed.



  2. Clean tags and branches.


    Using git tag -d, git branch -d and the Ruby script above
    it was possible to recreate tags.


    During this I was also able to add missing tags, and remove some SVN
    errors I did -- like committing in a branch created under tags/.



  3. Remove obsolete branches.



  4. Merge missing pieces.


    There were just two missing debian/changelog entries.


    I did this before git filter-branch because I did not find a way to
    use the tool correctly with multiple heads.



  5. Fix commit author where needed.


    Using the shell script above Raphael is now correctly credited for his work.




That's it.


The ACE+TAO git repository for Debian packages is alive at http://git.debian.org/?p=pkg-ace/pkg-ace.git;a=summary.




[1]http://lists.alioth.debian.org/pipermail/pkg-ace-devel/2011-March/002421.html

[2]available in Debian as svn-all-fast-export