Backing up FSFS repositories, Subversion 1.5 style

Subversion 1.5 isn’t out yet — and it won’t be out for a while — but there’s a neat little new feature that has the potential to make repository administration for FSFS repositories a little bit easier, especially when it comes to backups.

The new feature is that… svnadmin recover now does something for FSFS repositories.

If you haven’t come across it, svnadmin recover is one of those little oddities that was invented for Berkeley DB repositories¹. It literally is nothing more than a wrapper around BDB’s db_recover functionality, which performs ‘normal’ recovery of the database after an unclean shutdown: effectively nothing more than a journal replay. In the past it was necessary when BDB databases got ‘wedged’, and it’s still needed if you want to change some of the options in the db/DB_CONFIG file.

Subversion 1.1 introduced FSFS, the filesystem-based Subversion filesystem, and Subversion 1.2 made it the default option, so any repositories created with Subversion 1.2 or later will be in FSFS format. FSFS has some interesting properties, one of which being that the locking model is very simple: the filesystem is always in a consistent state for readers, so writers only block other writers. This property means that “recovery” in the BDB sense isn’t necessary for FSFS.

But recovery also has another use: it can be used to fix up some types of missing data. In fact, the svnadmin hotcopy mechanism uses Berkeley DB’s catastrophic recovery to create a set of empty log files after copying the database to a new location. And it’s in that context that we’ve implemented recovery for FSFS.

So, another interesting property of FSFS is that all revision files are immutable. This makes operations people very happy, because they can just back up the whole repository using a simple incremental backup strategy… almost.

The sticking point is a little file called db/current, which stores a very small amount of information about the filesystem — the largest revision number and the next unique node and copy ids.

Due to the way that FSFS makes sure that readers always see a consistent state, the current file is the last thing to be updated. This means that the backup may not be consistent if you follow a naïve strategy of just backing up the files in any old order — by the time current is backed up, it might be pointing to a revision that doesn’t exist in your backup copy, or worse, one that has only been partly-written.

The easy solution to the ordering problem is to make sure to back up the current file first — that way, you’ll always get a consistent copy of the filesystem up to whichever revision was current at that point. However, this can be quite awkward in some cases, especially if you have a lot of repositories or an inflexible backup solution.

There’s another problem if you’re trying to implement a disaster recovery solution using a warm standby. Typically, you’ll do this by copying the revision files as they appear on the disk, either by using a post-commit trigger, or direct support from the storage device.

If you have a very high commit throughput, or very large files (or both), you might find that some revisions take a lot longer to copy than others, so you may decide that it’s be worthwhile to run the copies in parallel. Again, this might be something that your storage device supports natively.

But now you’ve got a really big problem: there’s no safe synchronisation point at which it’s safe to copy current. The best you can do is copy it just before you kick off the copy for each revision, but now you have all your parallel copying jobs serialising against the same file, which is hardly ideal.

Anyway, I think you can see where this is going. As of Subversion 1.5.0, svnadmin recover will recreate the db/current file in an FSFS repository from the existing revision files, so if you fit into one of the scenarios above, you won’t have to worry about your backups quite so much — just run recovery after you restore.

Of course, if you can back up db/current in the correct way, I’d recommend you continue to do so. By definition, recovery is not a fast process: it has to read through all the revision files in the repository to find out what the next unique node and copy ids are, and that can take quite a while.

Update: I had a chance to test the speed on the ASF’s repository — using my under-powered fileserver, recovery completed in just over an hour, for just over half a million revisions. I guess I over-estimated the amount of time it’d take.

Incidentally, svnadmin recover is much less used nowadays: ever since support was added for BDB 4.4’s “auto-recovery” feature in Subversion 1.4, situations that require recovery are much rarer. ↩