Tree-structured FSFS repositories

If you don’t want to read about geeky filesystem stuff, you can stop reading now :-).

In all current versions of Subversion, an FSFS filesystem has a directory db/revs/ that contains a single file for each revision¹, from db/revs/0 for revision zero, up to db/revs/N for the youngest revision in your repository. The same structure is used to store the (potentially mutable) revision properties in the directory db/revprops/.

This can lead to a large number of files stored in a single directory. Since some filesystems are based on linear directory lookups, people have asked whether this means that Subversion will slow down with larger repositories (in fact, this was one of the first things I wondered about when I started looking at Subversion’s implementation, nearly two years ago!). Up till now, our stock answer has been “if you have a very large repository, make sure you have a decent filesystem”.

In Subversion 1.5, the structure has changed slightly. Newly-created FSFS repositories will instead use a two-level tree with up to (by default) 1000 files per directory. This means that revisions 0–999 will be stored in a directory db/revs/0/, revisions 1000–1999 will be stored in db/revs/1/, and so on. These repositories are only compatible with other Subversion 1.5 clients, of course, so existing repositories continue to use the older scheme.

Why did we make this change? Surprisingly, it wasn’t for performance — well, not really. I ran some micro-benchmarks that showed that most filesystems are more sensitive to the depth of the directory tree (the number of path components) than the number of files in the directory. (In any case, we’re talking about differences of microseconds in the time taken to open a file by name, so it’s not worth worrying about).

Additionally, macro-benchmarks² using a clone of the ASF repository (about 500k revisions) showed that the new scheme might be slightly (<1%) slower than the old scheme for reads (I didn’t test writes, which might well be faster, but revision files are read many more times than they’re written, so I’ve focussed on read performance so far).

So, if performance doesn’t seem to be a concern, why change? There are two reasons, really. Firstly, when I said “most filesystems” above, I really meant “most UNIX filesystems”. While I wasn’t able to test NTFS, I did test VFAT. VFAT exhibits roughly O(N) behaviour, and slows down quite a bit above 2000 or so revisions.

The second reason is that some filesystems have limits: VFAT has a hard limit of 64k files in a directory, and some NAS servers ship with default directory size limits for performance reasons (and while you could get the NAS administrator to change the limits, it’s probably better for Subversion to work by default).

The final reason is maintainability: it’s a lot easier to deal with a million files if they’re in a thousand different directories than in just one (Windows Explorer? Not happy with large directories, for example).

So, how might you convert a repository to use this new structure? svnadmin dump | svnadmin load works, though if you have a particularly large repository you may not have the time (or space) available to make a complete copy of your repository. So we’re also developing an off-line conversion tool (fsfs-reshard.py) that does an in-place reorganisation between the old, linear structure and the new ‘sharded’ tree structure.

I converted my copy of the ASF repository to the new format in about 10 minutes. However, when I converted it back to a linear format, it took nearly 10 hours! I can only assume that ext2 has a really big problem with inserting new files into large directories.

Update: Someone asked me to clarify what I meant when I said that VFAT seemed to exhibit “roughly O(N) lookup behaviour” above (since O(N) is already a rough measure). What I saw was a response time that was O(log N) up to 1,024 files, and what I thought was O(N) from 2,048 onwards; on closer inspection, the latter is actually much closer to O(N²).

Incidentally, the reason that we use a single file for each revision is to make atomic commits easier, though we have considered adopting a different structure that’s more optimised for read performance. ↩
I wouldn’t say that these results are in any way comparable to something like Bonnie++; for one thing, I wasn’t concerned with the time taken to create a large number of revision files, since that’s not an interesting scenario for Subversion. ↩