mod_dav_svn improvements: SVNActivitiesDB
Here’s a more sensible post, as promised
:-).
Submerged, the CollabNet Subversion blog, just posted a nice article describing the new merge-tracking features that will be included in Subversion 1.5, so I thought I’d post some more about one of the ‘new in 1.5’ features (and there’s a lot to choose from!).
Again, Subversion 1.5 isn’t out yet, and won’t be out for quite a while.
I’ve already mentioned some improvements we’ve made to FSFS (and
a little bit about svnadmin), but those are both
behind-the-scenes changes, so this time I thought I’d cover one of the new
Apache configuration directives for mod_dav_svn — one that allows
administrators to control the location of the activities database.
mod_dav_svn provides a bridge between the WebDAV world — which works in
terms of ‘activities’ and ‘version resources’ — and the Subversion world —
which works with ‘transactions’ and ‘nodes’. One of the many differences
between these worlds is the way an activity (or transaction) is named: in
WebDAV, the client names the activity, while in Subversion, the server names
the transaction.
Subversion transaction naming is actually delegated to the filesystem: in FSFS, for example, transactions are named “base rev-unique”.
So one of the things mod_dav_svn needs to track is the relationship
between WebDAV activity names and the corresponding Subversion transaction.
Before 1.5, the repository contained a file called dav/activities which
stored this mapping, implemented using APR-util’s simple database format.
One problem with this approach was that the precise format would be whatever
one of many supported dbm formats was chosen as the default when APR-util
was compiled. This made it hard to reason about the safety of storing a
Subversion repository on NFS, for example.
For 1.5, we switched to a simpler scheme where we create one file for each
activity (storing it under the directory dav/activities.d/); each file
contains the name of the corresponding Subversion transaction. This is much
simpler, and shouldn’t cause any scaling problems — there are typically
only a small number of transactions open at any one time.
So, to finally get to the point, one of the new configuration directives is
SVNActivitiesDB. It allows the administrator to override the location
of this activities ‘database’.
This directive behaves differently depending on whether repositories are being
served with an SVNPath or SVNParentPath directive. If the relevant section
of the configuration file looks like this:
SVNPath /home/svn/myrepo
SVNActivitiesDB /tmp/activities
then /tmp/activities/ will be the directory used to store the files mapping
activities to transactions for the repository at /home/svn/myrepo/. On the
other hand, if the configuration file looks like the following (where the
repository name is inferred from the client’s URL):
SVNParentPath /home/svn
SVNActivitiesDB /tmp/activities
then the path supplied to SVNActivitiesDB will be used to store the
activities databases for all repositories available through the parent path,
and so the database for the myrepo repository will be at
/tmp/activities/myrepo/.
Why would you want to move the activity database away from the main
repository? The main reason is in cases where the repository is stored on
a network filesystem, where you ideally want to be able to move all the
transient and non-shared file operations onto local storage. While the
activities database isn’t updated that much compared to the transaction
data itself, it’s a start (and we’d like to be able to move the transaction
data to local storage as well; we just haven’t done it yet :-)).
Of course, if you have more than one Apache server active at the same time, you’ll also need to ensure some form of session or host affinity so that the client returns to the same server that originally established the activity ↦ transaction mapping.
There are more new features in mod_dav_svn than just this, but they’ll have
to wait until later.
Posted at 15:29:35 BST on 14th May 2007.
Tree-structured FSFS repositories
If you don’t want to read about geeky filesystem stuff, you can stop reading
now :-).
In all current versions of Subversion, an FSFS filesystem has a directory
db/revs/ that contains a single file for each revision, from db/revs/0
for revision zero, up to db/revs/N for the youngest revision in your
repository. The same structure is used to store the (potentially mutable)
revision properties in the directory db/revprops/.
Incidentally, the reason that we use a single file for each revision is to make atomic commits easier, though we have considered adopting a different structure that’s more optimised for read performance.
This can lead to a large number of files stored in a single directory. Since some filesystems are based on linear directory lookups, people have asked whether this means that Subversion will slow down with larger repositories (in fact, this was one of the first things I wondered about when I started looking at Subversion’s implementation, nearly two years ago!). Up till now, our stock answer has been “if you have a very large repository, make sure you have a decent filesystem”.
In Subversion 1.5, the structure has changed slightly. Newly-created FSFS
repositories will instead use a two-level tree with up to (by default) 1000
files per directory. This means that revisions 0–999 will be stored in a
directory db/revs/0/, revisions 1000–1999 will be stored in db/revs/1/,
and so on. These repositories are only compatible with other Subversion 1.5
clients, of course, so existing repositories continue to use the older
scheme.
Why did we make this change? Surprisingly, it wasn’t for performance — well, not really. I ran some micro-benchmarks that showed that most filesystems are more sensitive to the depth of the directory tree (the number of path components) than the number of files in the directory. (In any case, we’re talking about differences of microseconds in the time taken to open a file by name, so it’s not worth worrying about).
I wouldn’t say that these results are in any way comparable to something like Bonnie++; for one thing, I wasn’t concerned with the time taken to create a large number of revision files, since that’s not an interesting scenario for Subversion.
Additionally, macro-benchmarks using a clone of the ASF repository (about 500k revisions) showed that the new scheme might be slightly (<1%) slower than the old scheme for reads (I didn’t test writes, which might well be faster, but revision files are read many more times than they’re written, so I’ve focussed on read performance so far).
So, if performance doesn’t seem to be a concern, why change? There are two reasons, really. Firstly, when I said “most filesystems” above, I really meant “most UNIX filesystems”. While I wasn’t able to test NTFS, I did test VFAT. VFAT exhibits roughly O(N) behaviour, and slows down quite a bit above 2000 or so revisions.
The second reason is that some filesystems have limits: VFAT has a hard limit of 64k files in a directory, and some NAS servers ship with default directory size limits for performance reasons (and while you could get the NAS administrator to change the limits, it’s probably better for Subversion to work by default).
The final reason is maintainability: it’s a lot easier to deal with a million files if they’re in a thousand different directories than in just one (Windows Explorer? Not happy with large directories, for example).
So, how might you convert a repository to use this new structure? svnadmin
dump | svnadmin load works, though if you have a particularly large
repository you may not have the time (or space) available to make a complete
copy of your repository. So we’re also developing an off-line conversion
tool (fsfs-reshard.py) that does an in-place reorganisation between the
old, linear structure and the new ‘sharded’ tree structure.
I converted my copy of the ASF repository to the new format in about 10 minutes. However, when I converted it back to a linear format, it took nearly 10 hours! I can only assume that ext2 has a really big problem with inserting new files into large directories.
Update: Someone asked me to clarify what I meant when I said that VFAT seemed to exhibit “roughly O(N) lookup behaviour” above (since O(N) is already a rough measure). What I saw was a response time that was O(log N) up to 1,024 files, and what I thought was O(N) from 2,048 onwards; on closer inspection, the latter is actually much closer to O(N2).
Posted at 12:01:50 BST on 30th April 2007.
fitz and sussman’s “Poisonous People” talk
I see that fitz and sussman had their five minutes of fame up on Slashdot yesterday for their talk, “How Open Source Projects Survive Poisonous People (And You Can Too)”.
If you haven’t seen it before, go take a look. It’s well worth the time. And I’ll share a secret with you: it’s not just useful for Open Source communities.
Posted at 09:14:34 GMT on 13th March 2007.
Backing up FSFS repositories, Subversion 1.5 style
Subversion 1.5 isn’t out yet — and it won’t be out for a while — but there’s a neat little new feature that has the potential to make repository administration for FSFS repositories a little bit easier, especially when it comes to backups.
The new feature is that… svnadmin recover now does something for
FSFS repositories.
Incidentally, svnadmin recover is much less used nowadays:
ever since support was added for BDB 4.4’s “auto-recovery” feature in Subversion 1.4,
situations that require recovery are much rarer.
If you haven’t come across it, svnadmin recover is one of those little
oddities that was invented for Berkeley DB repositories. It literally is
nothing more than a wrapper around BDB’s db_recover functionality,
which performs ‘normal’ recovery of the database after an unclean
shutdown: effectively nothing more than a journal replay. In the past
it was necessary when BDB databases got ‘wedged’, and it’s still needed if
you want to change some of the options in the db/DB_CONFIG file.
Subversion 1.1 introduced FSFS, the filesystem-based Subversion filesystem, and Subversion 1.2 made it the default option, so any repositories created with Subversion 1.2 or later will be in FSFS format. FSFS has some interesting properties, one of which being that the locking model is very simple: the filesystem is always in a consistent state for readers, so writers only block other writers. This property means that “recovery” in the BDB sense isn’t necessary for FSFS.
But recovery also has another use: it can be used to fix up some types
of missing data. In fact, the svnadmin hotcopy mechanism uses Berkeley
DB’s catastrophic recovery to create a set of empty log files after
copying the database to a new location. And it’s in that context that
we’ve implemented recovery for FSFS.
So, another interesting property of FSFS is that all revision files are immutable. This makes operations people very happy, because they can just back up the whole repository using a simple incremental backup strategy… almost.
The sticking point is a little file called db/current, which stores
a very small amount of information about the filesystem — the largest
revision number and the next unique node and copy ids.
Due to the way that FSFS makes sure that readers always see a consistent
state, the current file is the last thing to be updated. This means
that the backup may not be consistent if you follow a naïve strategy
of just backing up the files in any old order — by the time current
is backed up, it might be pointing to a revision that doesn’t exist in
your backup copy, or worse, one that has only been partly-written.
The easy solution to the ordering problem is to make sure to back up the
current file first — that way, you’ll always get a consistent copy
of the filesystem up to whichever revision was current at that point.
However, this can be quite awkward in some cases, especially if you have
a lot of repositories or an inflexible backup solution.
There’s another problem if you’re trying to implement a disaster recovery solution using a warm standby. Typically, you’ll do this by copying the revision files as they appear on the disk, either by using a post-commit trigger, or direct support from the storage device.
If you have a very high commit throughput, or very large files (or both), you might find that some revisions take a lot longer to copy than others, so you may decide that it’s be worthwhile to run the copies in parallel. Again, this might be something that your storage device supports natively.
But now you’ve got a really big problem: there’s no safe synchronisation
point at which it’s safe to copy current. The best you can do is
copy it just before you kick off the copy for each revision, but now you
have all your parallel copying jobs serialising against the same file,
which is hardly ideal.
Anyway, I think you can see where this is going. As of Subversion
1.5.0, svnadmin recover will recreate the db/current file in an FSFS
repository from the existing revision files, so if you fit into one of
the scenarios above, you won’t have to worry about your backups quite
so much — just run recovery after you restore.
Of course, if you can back up db/current in the correct way, I’d
recommend you continue to do so. By definition, recovery is not a
fast process: it has to read through all the revision files in the
repository to find out what the next unique node and copy ids are,
and that can take quite a while.
Update: I had a chance to test the speed on the ASF’s repository — using my under-powered fileserver, recovery completed in just over an hour, for just over half a million revisions. I guess I over-estimated the amount of time it’d take.
Posted at 10:47:05 GMT on 7th March 2007.