A history of rsync on Mac OS X

I think Apple included the rsync binary with Mac OS X since at least version 10.2 Jaguar, but it couldn’t be used for backups of most Mac data due to its lack of support for resource forks, which were and are prevalent on the Mac.

Rsync first got attention on Mac OS X when Kevin Boyd of University of Michigan added support for resource forks to rsync with a port he called RsyncX. RsyncX also included a simple GUI for setting up backups. The GUI generated shell scripts and scheduled them in cron. The GUI was buggy and generated scripts with minor, but serious flaws. The scripts also lacked some necessary features that a backup solution would provide. But with the lack of a good backup solution (just say no to Retrospect), some admins including myself took the time to customize and extend the scripts that RsyncX created, or just created new ones from scratch. I added features to my scripts like checking for free disk space, rotating incremental backups with hard links, checking for a mounted volume before running, failure notifications, etc.

RsyncX had other flaws though. It was based on older rsync code which contained bugs and at least one serious security vulnerability. One of the bugs results in rsync getting stuck in an endless loop if files are changing while rsync is running. If you have logging enabled, rsyncx will quickly fill your drive while it is stuck in this loop — really bad news. My workaround for this was to spawn rsync off and watch its log every few seconds for telltale signs that it was stuck in a loop. I would then kill rsync and restart it. Ugly, but it actually works. Oh yeah, it also doesn’t know how to handle locked files (uchg), so I have to unlock all files on the destination before the sync. Ugh. Oh and it throws lchown errors on symlinks. Just grep -v them out.

When Apple began talking about the features of Mac OS X 10.4 Tiger, one of the big features for command line unix geeks was support for resource forks in all command line tools, including rsync. By this time those of us using RsyncX were getting pretty tired of all of the workarounds and bugs, so an Apple supported rsync that handled resource forks was awesome news.

When rsync was released with 10.4.0, it has serious bugs. It was basically unusable. It would crash unexpectedly with large data sets. It would incorrectly set modification dates, which would then cause all future syncs to re-copy all files. I stayed with rsyncx to handle production syncs with the additional slap in the face of deprecated warnings filling the system log. With each new release, I would test Apple’s included rsync. I filed bugs. I used a developer support incident. Finally around 10.4.9 and later, rsync did actually seem to work, at least well enough. In real world tests though, Apple’s rsync speed was and is dismal on large data sets. The problem lies in the fact that resource forks do not have a modification date to compare when syncing. Without this key piece of data, there is no easy way of knowing whether the resource fork of a file has changed or not. Apple’s solution to this problem was to ALWAYS COPY the resource fork. That’s right, if your data has resource forks, you copy the resource data every time rsync runs. Yes, resource forks are typically small, but they add up, and for terabytes of small files, the I/O causes an rsync of unchanged data that should take about 30 minutes to instead take 4 hours.

So again I stick with RsyncX.

Leopard’s rsync appears to be virtually unchanged from Tiger’s at least to me. It still copies the resource fork every time.

Mike Bombich apparently likes rsync too. He is using it for the sync engine in Carbon Copy Cloner 3. He includes an updated patched version in the bundle here: Carbon\ Copy\ Cloner/Carbon\ Copy\ Cloner.app/Contents/Resources/ccc_helper.app/Contents/Resources/rsync

One nice thing about this binary is that it has a patch to optionally checksum resource forks (–ea-checksum) to prevent them from being copied unnecessarily. Cool, but checksumming adds time, so I’ll need do some real world tests.

Rsync version 3.0 is in prerelease and includes built-in ACL and extended attribute support. Hopefully this includes some way of handling unchanged resource forks.

Until Apple adds some sort of date stamp to resource forks, checksums may be the only safe way to handle them. I suspect that addressing the issue properly will have to wait for a new filesystem.

Attempting to Boot from ZFS…

I spent some time attempting to get Leopard booting from a ZFS volume. I used Apple’s method of Boot!=Root. Boot!=Root basically allows you to boot from ‘exotic’ filesysytems by using a helper partition that is not ‘exotic’ (HFS+) and contains enough information to mount your exotic volume and root off of it. The information includes a kernel, kernel extensions caches, and a plist specifying the UUID of the volume you want to root off of. The machine actually boots from the helper partition, loads the kernel and kernel extensions from cache, waits for the volume with the specified UUID to show up, then switches over to root off of that. Apple has been doing this for some time with helper partitions that allow booting from Apple RAID volumes.

So I created a RAID volume, made it bootable, mounted its helper Apple Boot partition, asr restored the helper partition to another partition to use as a base for my ZFS helper partition. I edited the Boot.plist to point to my ZFS volume UUID. I blessed the helper partition and rebooted in verbose mode. I got an error from the AppleFileSystem kext and got the dreaded “Waiting for root device…” message. Damn.

Apparently the problem is that Apple marks certain filesystem types as allowed to boot from. These seem to be held in the Info.plist of the AppleFileSystem kernel extension. I was going to attempt editing the plist to include ZFS, but when I saw that the kernel extension appeared to be signed, I gave up assuming that I would invalidate the kext by modifying the plist. I downloaded the kext from the darwin site, but it had all sorts of dependencies.

Why would I think it would be that easy?

That is where I left it. A few hours lost. Not a lot gained. Maybe I’ll try again…

Leopard’s bless command references to ZFS?

The bless command in Leopard contains references to ZFS.

kserver:~ pbuffr$ strings /usr/sbin/bless | grep -i zfs
No ZFS container partitions found
ZFS container partition found: %s

Since bless is the command that sets the boot variables in nvram to set your a startup volume, it seems likely that Apple is at least working on boot support for ZFS, if it isn’t already there, but hidden away.  Interesting. 

Daemons and Agents Tech Note Updated for Leopard

There are TONS of changes to the way daemons and agents are handled in Leopard.  This new Apple technical note explains a lot.  If you are having trouble running a GUI app from a script or at startup in Leopard, this is required reading. http://developer.apple.com/technotes/tn2005/tn2083.html  Thanks Quinn “The Eskimo!”