farblog.subversion

mod_dav_svn improvements: SVNActivitiesDB

Here’s a more sensible post, as promised :-).

Submerged, the CollabNet Subversion blog, just posted a nice article describing the new merge-tracking features that will be included in Subversion 1.5, so I thought I’d post some more about one of the ‘new in 1.5’ features (and there’s a lot to choose from!).

Again, Subversion 1.5 isn’t out yet, and won’t be out for quite a while.

I’ve already mentioned some improvements we’ve made to FSFS (and a little bit about svnadmin), but those are both behind-the-scenes changes, so this time I thought I’d cover one of the new Apache configuration directives for mod_dav_svn — one that allows administrators to control the location of the activities database.

mod_dav_svn provides a bridge between the WebDAV world — which works in terms of ‘activities’ and ‘version resources’ — and the Subversion world — which works with ‘transactions’ and ‘nodes’. One of the many differences between these worlds is the way an activity (or transaction) is named: in WebDAV, the client names the activity, while in Subversion, the server names the transaction.

Subversion transaction naming is actually delegated to the filesystem: in FSFS, for example, transactions are named “base rev-unique”.

So one of the things mod_dav_svn needs to track is the relationship between WebDAV activity names and the corresponding Subversion transaction. Before 1.5, the repository contained a file called dav/activities which stored this mapping, implemented using APR-util’s simple database format. One problem with this approach was that the precise format would be whatever one of many supported dbm formats was chosen as the default when APR-util was compiled. This made it hard to reason about the safety of storing a Subversion repository on NFS, for example.

For 1.5, we switched to a simpler scheme where we create one file for each activity (storing it under the directory dav/activities.d/); each file contains the name of the corresponding Subversion transaction. This is much simpler, and shouldn’t cause any scaling problems — there are typically only a small number of transactions open at any one time.

So, to finally get to the point, one of the new configuration directives is SVNActivitiesDB. It allows the administrator to override the location of this activities ‘database’.

This directive behaves differently depending on whether repositories are being served with an SVNPath or SVNParentPath directive. If the relevant section of the configuration file looks like this:

SVNPath /home/svn/myrepo
SVNActivitiesDB /tmp/activities

then /tmp/activities/ will be the directory used to store the files mapping activities to transactions for the repository at /home/svn/myrepo/. On the other hand, if the configuration file looks like the following (where the repository name is inferred from the client’s URL):

SVNParentPath /home/svn
SVNActivitiesDB /tmp/activities

then the path supplied to SVNActivitiesDB will be used to store the activities databases for all repositories available through the parent path, and so the database for the myrepo repository will be at /tmp/activities/myrepo/.

Why would you want to move the activity database away from the main repository? The main reason is in cases where the repository is stored on a network filesystem, where you ideally want to be able to move all the transient and non-shared file operations onto local storage. While the activities database isn’t updated that much compared to the transaction data itself, it’s a start (and we’d like to be able to move the transaction data to local storage as well; we just haven’t done it yet :-)).

Of course, if you have more than one Apache server active at the same time, you’ll also need to ensure some form of session or host affinity so that the client returns to the same server that originally established the activity ↦ transaction mapping.

There are more new features in mod_dav_svn than just this, but they’ll have to wait until later.

Posted at 15:29:35 BST on 14th May 2007.

Tree-structured FSFS repositories

If you don’t want to read about geeky filesystem stuff, you can stop reading now :-).

In all current versions of Subversion, an FSFS filesystem has a directory db/revs/ that contains a single file for each revision, from db/revs/0 for revision zero, up to db/revs/N for the youngest revision in your repository. The same structure is used to store the (potentially mutable) revision properties in the directory db/revprops/.

Incidentally, the reason that we use a single file for each revision is to make atomic commits easier, though we have considered adopting a different structure that’s more optimised for read performance.

This can lead to a large number of files stored in a single directory. Since some filesystems are based on linear directory lookups, people have asked whether this means that Subversion will slow down with larger repositories (in fact, this was one of the first things I wondered about when I started looking at Subversion’s implementation, nearly two years ago!). Up till now, our stock answer has been “if you have a very large repository, make sure you have a decent filesystem”.

In Subversion 1.5, the structure has changed slightly. Newly-created FSFS repositories will instead use a two-level tree with up to (by default) 1000 files per directory. This means that revisions 0–999 will be stored in a directory db/revs/0/, revisions 1000–1999 will be stored in db/revs/1/, and so on. These repositories are only compatible with other Subversion 1.5 clients, of course, so existing repositories continue to use the older scheme.

Why did we make this change? Surprisingly, it wasn’t for performance — well, not really. I ran some micro-benchmarks that showed that most filesystems are more sensitive to the depth of the directory tree (the number of path components) than the number of files in the directory. (In any case, we’re talking about differences of microseconds in the time taken to open a file by name, so it’s not worth worrying about).

I wouldn’t say that these results are in any way comparable to something like Bonnie++; for one thing, I wasn’t concerned with the time taken to create a large number of revision files, since that’s not an interesting scenario for Subversion.

Additionally, macro-benchmarks using a clone of the ASF repository (about 500k revisions) showed that the new scheme might be slightly (<1%) slower than the old scheme for reads (I didn’t test writes, which might well be faster, but revision files are read many more times than they’re written, so I’ve focussed on read performance so far).

So, if performance doesn’t seem to be a concern, why change? There are two reasons, really. Firstly, when I said “most filesystems” above, I really meant “most UNIX filesystems”. While I wasn’t able to test NTFS, I did test VFAT. VFAT exhibits roughly O(N) behaviour, and slows down quite a bit above 2000 or so revisions.

The second reason is that some filesystems have limits: VFAT has a hard limit of 64k files in a directory, and some NAS servers ship with default directory size limits for performance reasons (and while you could get the NAS administrator to change the limits, it’s probably better for Subversion to work by default).

The final reason is maintainability: it’s a lot easier to deal with a million files if they’re in a thousand different directories than in just one (Windows Explorer? Not happy with large directories, for example).


So, how might you convert a repository to use this new structure? svnadmin dump | svnadmin load works, though if you have a particularly large repository you may not have the time (or space) available to make a complete copy of your repository. So we’re also developing an off-line conversion tool (fsfs-reshard.py) that does an in-place reorganisation between the old, linear structure and the new ‘sharded’ tree structure.

I converted my copy of the ASF repository to the new format in about 10 minutes. However, when I converted it back to a linear format, it took nearly 10 hours! I can only assume that ext2 has a really big problem with inserting new files into large directories.


Update: Someone asked me to clarify what I meant when I said that VFAT seemed to exhibit “roughly O(N) lookup behaviour” above (since O(N) is already a rough measure). What I saw was a response time that was O(log N) up to 1,024 files, and what I thought was O(N) from 2,048 onwards; on closer inspection, the latter is actually much closer to O(N2).

Posted at 12:01:50 BST on 30th April 2007.

fitz and sussman’s “Poisonous People” talk

I see that fitz and sussman had their five minutes of fame up on Slashdot yesterday for their talk, “How Open Source Projects Survive Poisonous People (And You Can Too)”.

If you haven’t seen it before, go take a look. It’s well worth the time. And I’ll share a secret with you: it’s not just useful for Open Source communities.

Posted at 09:14:34 GMT on 13th March 2007.

Backing up FSFS repositories, Subversion 1.5 style

Subversion 1.5 isn’t out yet — and it won’t be out for a while — but there’s a neat little new feature that has the potential to make repository administration for FSFS repositories a little bit easier, especially when it comes to backups.

The new feature is that… svnadmin recover now does something for FSFS repositories.

Incidentally, svnadmin recover is much less used nowadays: ever since support was added for BDB 4.4’s “auto-recovery” feature in Subversion 1.4, situations that require recovery are much rarer.

If you haven’t come across it, svnadmin recover is one of those little oddities that was invented for Berkeley DB repositories. It literally is nothing more than a wrapper around BDB’s db_recover functionality, which performs ‘normal’ recovery of the database after an unclean shutdown: effectively nothing more than a journal replay. In the past it was necessary when BDB databases got ‘wedged’, and it’s still needed if you want to change some of the options in the db/DB_CONFIG file.

Subversion 1.1 introduced FSFS, the filesystem-based Subversion filesystem, and Subversion 1.2 made it the default option, so any repositories created with Subversion 1.2 or later will be in FSFS format. FSFS has some interesting properties, one of which being that the locking model is very simple: the filesystem is always in a consistent state for readers, so writers only block other writers. This property means that “recovery” in the BDB sense isn’t necessary for FSFS.

But recovery also has another use: it can be used to fix up some types of missing data. In fact, the svnadmin hotcopy mechanism uses Berkeley DB’s catastrophic recovery to create a set of empty log files after copying the database to a new location. And it’s in that context that we’ve implemented recovery for FSFS.

So, another interesting property of FSFS is that all revision files are immutable. This makes operations people very happy, because they can just back up the whole repository using a simple incremental backup strategy… almost.

The sticking point is a little file called db/current, which stores a very small amount of information about the filesystem — the largest revision number and the next unique node and copy ids.

Due to the way that FSFS makes sure that readers always see a consistent state, the current file is the last thing to be updated. This means that the backup may not be consistent if you follow a naïve strategy of just backing up the files in any old order — by the time current is backed up, it might be pointing to a revision that doesn’t exist in your backup copy, or worse, one that has only been partly-written.

The easy solution to the ordering problem is to make sure to back up the current file first — that way, you’ll always get a consistent copy of the filesystem up to whichever revision was current at that point. However, this can be quite awkward in some cases, especially if you have a lot of repositories or an inflexible backup solution.

There’s another problem if you’re trying to implement a disaster recovery solution using a warm standby. Typically, you’ll do this by copying the revision files as they appear on the disk, either by using a post-commit trigger, or direct support from the storage device.

If you have a very high commit throughput, or very large files (or both), you might find that some revisions take a lot longer to copy than others, so you may decide that it’s be worthwhile to run the copies in parallel. Again, this might be something that your storage device supports natively.

But now you’ve got a really big problem: there’s no safe synchronisation point at which it’s safe to copy current. The best you can do is copy it just before you kick off the copy for each revision, but now you have all your parallel copying jobs serialising against the same file, which is hardly ideal.

Anyway, I think you can see where this is going. As of Subversion 1.5.0, svnadmin recover will recreate the db/current file in an FSFS repository from the existing revision files, so if you fit into one of the scenarios above, you won’t have to worry about your backups quite so much — just run recovery after you restore.

Of course, if you can back up db/current in the correct way, I’d recommend you continue to do so. By definition, recovery is not a fast process: it has to read through all the revision files in the repository to find out what the next unique node and copy ids are, and that can take quite a while.

Update: I had a chance to test the speed on the ASF’s repository — using my under-powered fileserver, recovery completed in just over an hour, for just over half a million revisions. I guess I over-estimated the amount of time it’d take.

Posted at 10:47:05 GMT on 7th March 2007.