by Malcolm Rowe

Deadlocks in Java class initialisation

I recently ran across the fact that it’s possible to make the Java runtime deadlock while initialising a class — and that this behaviour is even mandated by the Java Language Specification.

Here’s a Java 7 program that demonstrates the problem:

public class Program {
  public static void main(String args[]) {
    new Thread(new Runnable() {
      @Override public void run() {


  private static class A {
    private static final B b = new B();
    static void initMe() {}

  private static class B {
    private static final A a = new A();
    static void initMe() {}

In addition to demonstrating that lambdas are a good idea (all that boilerplate to start a thread!), this also shows how cycles during class initialisation can lead to a deadlock. Here’s what happens when you run it1:

$ javac Program.java
$ java Program

That is, it hangs.

In Java, classes are loaded at some arbitrary point before use, but are only initialised — running the static {} blocks and static field initialisers — at defined points2.

One of these points is just before a static method is invoked, and so the two calls to A.initMe() and B.initMe() above will both trigger initialisation for the respective classes.

In this case, each class contains a static field that instantiates an instance of the other class. Instantiating the other class requires that that class is initialised, and so what we end up with is that each class’s initialisation is blocked waiting for the initialisation of the other class to complete.

If you trigger a thread dump at this point — by sending a SIGQUIT or hitting Ctrl-\ (or Ctrl-Break on Windows) — then you’ll see something like this:

Full thread dump OpenJDK 64-Bit Server VM (24.79-b02 mixed mode):

"Thread-0" prio=10 tid=0x00007efd50105000 nid=0x51db in Object.wait() [0x00007efd3f168000]
   java.lang.Thread.State: RUNNABLE
        at Program$A.<clinit>(Program.java:13)
        at Program$1.run(Program.java:5)
        at java.lang.Thread.run(Thread.java:745)

"main" prio=10 tid=0x00007efd5000a000 nid=0x51ca in Object.wait() [0x00007efd59d45000]
   java.lang.Thread.State: RUNNABLE
        at Program$B.<clinit>(Program.java:18)
        at Program.main(Program.java:9)


Interestingly, you can see that while both threads are executing an implicit Object.wait(), they’re listed as RUNNABLE rather than WAITING, and there’s no output from the deadlock detector. I suspect that the reason for both of these is that the details of class initialisation changed in Java 7:

In Java 6, the runtime would attempt to lock the monitor owned by each Class instance for the duration of the initialisation, while in Java 7, attempting to initialise a class that’s already being initialised by another thread just requires that that the caller be blocked in some undefined fashion until that initialisation completes.

There are other ways to trigger the same problem, too. Here’s another problematic snippet:

public class Foo {
  public static final Foo EMPTY = new EmptyFoo();

public class EmptyFoo extends Foo {}

Here we have Foo, and EmptyFoo, a special — presumably empty, in some fashion — version of Foo. EmptyFoo is usable directly, but it’s also available as Foo.EMPTY.

The problem here is that initialising EmptyFoo requires us to initialise the superclass, and initialising Foo requires initialisation of EmptyFoo for the static field. This would be fine in one thread, but if two threads attempt to initialise the two classes separately, deadlock results.

Cyclic dependencies between classes have always been problematic in both Java and C#, as references to non-constant static fields in classes that are already being initialised see uninitialised (Java) or default (C#) values. However, normally the initialisation does complete; here, it doesn’t, and here the dependencies are simply between the classes, not between their data members.

Unfortunately, I don’t know of any convenient way to detect these cycles in Java: OpenJDK provides -XX:+TraceClassInitialization, which I suspect might be useful, but it’s only available in debug builds of the OpenJDK JRE3, and I haven’t been able to confirm exactly what it shows.

And for what it’s worth, I’m not aware of a better solution for detecting cycles in C# either. For Noda Time, we used a custom cycle detector for a while; it spotted some bugs resulting from reading default values, but it was too brittle and invasive (it required modifying each class), and so we removed it before 1.0.

I suppose that if we assume that class initialisation occurs atomically and on multiple threads, then this kind of problem is bound to come up4. Perhaps what’s surprising is that these languages do allow the use of partially-initialised classes in the single-threaded case?

  1. Or at least, what happens when I run it, on a multiprocessor Debian machine running OpenJDK 7u79. I don’t think the versions are particularly important — this behaviour seems to be present in all Java versions — though I am a little surprised that I didn’t need to add any additional synchronisation or delays. 

  2. A similar situation exists in C# for classes with static constructors (for classes without, the runtime is allowed much more latitude as to when the type is initialised). 

  3. You can trace class loading with -XX:TraceClassLoadingPreorder and -XX:TraceClassLoading, but this doesn’t tell you when class initialisation happens. 

  4. He says, with a sample size of one. I haven’t managed to confirm what C# does, for example, and C++ avoids this problem by replacing it with a much larger one, the “static initialisation order fiasco”. 

pip install --isolated fails on Python 3.0–3.3

(This is a quick post for search-engine fodder, since I didn’t manage to find anything relevant myself.)

If you’re using pip install --isolated to install Python packages and find that it fails with an error like the following:

Complete output from command python setup.py egg_info:
usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: -c --help [cmd1 cmd2 ...]
   or: -c --help-commands
   or: -c cmd --help

error: option --no-user-cfg not recognized

… then you might have run into an incompatibility between pip and Python versions 3.0–3.3.

pip version 6.0 added an isolated mode (activated by the --isolated flag) that avoids looking at per-user configuration (pip.conf and environment variables).

Running pip in isolated mode also passes the --no-user-cfg flag to Python’s distutils to disable reading the per-user ~/.pydistutils.cfg. But that flag isn’t available in Python versions 3.0–3.3, causing the error above.

I ran into this because I recently migrated the Python code that generates this site to run under Python 3.x. I’m using a virtualenv setup, so once I had everything working under both Python versions, I was reasonably confident that I could switch ‘production’ (i.e. the Compute Engine instance that serves this site) to Python 3 and discard the 2.x-compatibility code.

Good thing I tested it out first, since it didn’t even install.

It turns out that:

  1. virtualenv ships an embedded copy of pip and setuptools, but setuptools will use the system version of distutils1, and
  2. --no-user-cfg was added in Python 2.7, but wasn’t ported to 3.x until 3.42, and
  3. the distribution I’m using on my real server (Debian 7) ships with Python 3.2.3, rather than the 3.4.x I’m using elsewhere.

I worked around this by just omitting the --isolated flag for Python verions [3.0, 3.4) — though since I don’t actually have any system config files in practice, I probably could have set PIP_CONFIG_FILE=/dev/null instead (which has the effect of ignoring all config files).

I’m not the first person to have noticed that virtualenv isn’t actually hermetic. Though some of that rant is out of date now (Python wheel files provide prebuilt binaries), and some isn’t relevant to the way I’m using virtualenv/pip, it’s definitely true that the dependency on the system Python libraries is the main reasons I’d look to something more like Docker or Vagrant for deployment were I doing this professionally.

So did I finally manage to switch to Python 3.x after that? Not even close: Python 3.x didn’t gain the ability to (redundantly) use the u'foo' syntax for Unicode strings until 3.3, and some of my dependencies use that syntax. So I’m waiting until I can switch to Debian 8 on Compute Engine3, at which point I can cleanly assume Python 3.4 or later.

  1. This is a rant for another day, but it looks like virtualenv monkeypatches pip, which monkeypatches setuptools, which either monkeypatches or builds upon distutils. Debugging through this edifice of patched abstractions is… not easy. 

  2. It’s a bit more complex than that: Python 3.0 and 3.1 were released first, then the feature was implemented in both 2.7 and 3.2, but then distutils as a whole was rolled back to its 3.1 state before 3.2 was released. That rollback was reverted for Python 3.4. 

  3. I can apt-get dist-upgrade from the Debian 7 image just fine, but it’s a bit slow and hacky, so I’d rather wait for official images. (I also need to fix some custom mail-related configuration that appears to have broken under Debian 8.) 

Noda Time 1.3.1

It’s been a while since the last Noda Time release, and while we’re still working towards 2.0, we’ve been collecting a few bug fixes that can’t really wait. So last Friday1, we released Noda Time 1.3.1.

Noda Time 1.3.1 updates the built-in version of TZDB from 2014e to 2015a, and fixes a few minor bugs, two of which were triggered by recent data changes.

Since it’s been a while since the previous release, it may be worth pointing out that new Noda Time releases are not the only way to get new time zone data: applications can choose to load an external version of the time zone database rather than use the embedded version, and so use up-to-date time zone data with any version of the Noda Time assemblies.

If you’re in a hurry, you can get Noda Time 1.3.1 from the NuGet repository (core, testing, JSON support packages), or from the links on the Noda Time home page. The rest of this post talks about the changes in 1.3.1 in a bit more detail.

End of year transitions (Bangladesh)

In the middle of 2009, Bangladesh started observing permanent daylight saving time, as an energy-saving measure. This was abandoned at the end of that year, and the country went back to permanent standard time.

Until recently, that transition back to standard time was actually recorded as happening a minute too early, at 23:59 on December 31st. TZDB 2014g fixed this by changing the transition time to “24:00” — that is, midnight at the end of the last day of the year.

Noda Time could already handle transitions at the end of the day, but would incorrectly ignore this particular transition because it occurred ‘after’ 2009. That’s now fixed, and Noda Time 1.3.1 returns the correct offset for Asia/Dhaka when using data from TZDB 2014g or later.

BCL provider: historical changes to the base offset (Russia)

In October 2014, most of Russia switched from permanent daylight saving time to permanent standard time, effectively moving local time back one hour. These changes were included in TZDB 2014f.

For people using the BCL provider instead of the TZDB provider (and using Windows), Microsoft delivered a hotfix in September 2014. However, our BCL provider depends upon the .NET framework’s TimeZoneInfo class, and the .NET framework — unlike TZDB — is unable to represent historical changes to the ‘base’ offset of a time zone (as happened here).

The result is that Noda Time (and other applications using TimeZoneInfo in .NET 4.5.3 and earlier) incorrectly compute the offset for dates before October 26th, 2014.

A future update of the .NET framework should correct this limitation, but without a corresponding change in Noda Time, the extra information wouldn’t be used; Noda Time 1.3.1 prepares for this change, and will use the correct offset for historical dates when TimeZoneInfo does.

BCL provider: time zone equality

The time zones returned by the BCL provider have long had a limitation in the way time zone equality was implemented: a BCL time zone was considered equal to itself, and unequal to a time zone returned by a different provider, but attempting to compare two different BCL time zone instances for equality always threw a NotImplementedException. This was particularly annoying for ZonedDateTime, as its equality is defined in terms of the contained DateTimeZone.

This was documented, but we always considered it a bug, as it wasn’t possible to predict whether testing for equality would throw an exception. Noda Time 1.3.1 fixes this by implementing equality in terms of the underlying TimeZoneInfo: BCL time zones are considered equal if they wrap the same underlying TimeZoneInfo instance.

Note that innate time zone equality is not really well defined in general, and is something we’re planning to reconsider for Noda Time 2.0. Rather than rely on DateTimeZone.Equals(), we’d recommend that applications that want to compare time zones for equality use ZoneEqualityComparer to specify how two time zones should be compared.

And finally…

There are a handful of other smaller fixes in 1.3.1: the NodaTime assembly correctly declares a dependency on System.Xml, so you won’t have to; the NuGet packages now work with ASP.NET’s kpm tool, and declare support for Xamarin’s Xamarin.iOS (for building iOS applications using C#) in addition to Xamarin.Android, which was already listed; and we’ve fixed a few reported documentation issues along the way.

As usual, see the User Guide and 1.3.1 release notes for more information about all of the above.

Work is still continuing on 2.0 along the lines described in our 1.3.0 release post, and we’re also planning a 1.4 release to act as a bridge between 1.x and 2.0. This will deprecate members that we plan to remove in 2.0 and introduce the replacements where feasible.

  1. Release late on Friday afternoon? What could go wrong? Apart from running out of time to write a blog post, I mean. 

Just how big is Noda Time anyway?

Some years back, I posted a graph showing the growth of Subversion’s codebase over time, and I thought it might be fun to do the same with Noda Time. The Subversion graph shows the typical pattern of linear growth over time, so I was expecting to see the same thing with Noda Time. I didn’t1.

Noda Time’s repository is a lot simpler than Subversion’s (it’s also at least an order-of-magnitude smaller), so it wasn’t that difficult to come up with a measure of code size: I just counted the lines in the .cs files under src/NodaTime/ (for the production code) and src/NodaTime.Test/ (for the test code).

I decided to exclude comments and blank lines this time round, because I wanted to know about the functional code, not whether we’d expanded our documentation. As it turns out, the proportion of comments has stayed about the same over time, but that ratio is very different for the production code and test code: comments and blank lines make up approximately 50% of the production code, but only about 20–25% of the test code.

Here’s the graph. It’s not exactly up-and-to-the-right, more… wibbly-wobbly-timey-wimey.

Code and test size are both approximately
flat over time, though with dips before some of the major releases.
Noda Time’s codesize in KLOC over time

There are some thing that aren’t surprising: during the pre-1.0 betas (the first two unlabelled points) we actively pruned code that we didn’t want to commit to for 1.x2, so the codebase shrinks until we release 1.0. After that, we added a bunch of functionality that we’d been deferring, along with a new compiled TZDB file format for the PCL implementation. So the codebase grows again for 1.1.

But then with 1.2, it shrinks. From what I can see, this is mostly due to an internal rewrite that removed the concept of calendar ‘fields’ (which had come along with the original mechanical port from Joda Time). This seems to counterbalance the fact that at the same time we added support for serialization3 and did a bunch of work on parsing and formatting.

1.3 sees an increase brought on by more features (new calendars and APIs), but then 2.0 (at least so far) sees an initial drop, a steady increase due to new features, and (just last month) another significant drop.

The first decrease for 2.0 came about immediately, as we removed code that was deprecated in 1.x (particularly, the handling for 1.0’s non-PCL-compatible compiled TZDB format). Somewhat surprisingly, this doesn’t come with a corresponding decrease in our test code size, which has otherwise been (roughly speaking) proportional in size to the production code (itself no real surprise, as most of our tests are unit tests). It turns out that the majority of this code was only covered by an integration test, so there wasn’t much test code to remove.

The second drop is more interesting: it’s all down to new features in C# 6.

For example, in Noda Time 1.3, Instant has Equals() and GetHashCode() methods that are written as follows:

public override bool Equals(object obj)
    if (obj is Instant)
        return Equals((Instant)obj);
    return false;
public override int GetHashCode()
    return Ticks.GetHashCode();

In Noda Time 2.0, the same methods are written using expression-bodied members, in two lines (I’ve wrapped the first line here):

public override bool Equals(object obj) =>
    obj is Instant && Equals((Instant)obj);
public override int GetHashCode() => duration.GetHashCode();

That’s the same functionality, just written in a terser syntax. I think it’s also clearer: the former reads more like a procedural recipe to me; the latter, a definition.

Likewise, ZoneRecurrence.ToString() uses expression-bodied members and string interpolation to turn this:

public override string ToString()
    var builder = new StringBuilder();
    builder.Append(" ").Append(Savings);
    builder.Append(" ").Append(YearOffset);
    builder.Append(" [").Append(fromYear).Append("-").Append(toYear).Append("]");
    return builder.ToString();

into this:

public override string ToString() =>
    $"{Name} {Savings} {YearOffset} [{FromYear}-{ToYear}]";

There’s no real decrease in test code size though: most of the C# 6 features are really only useful for production code.

All in all, Noda Time’s current production code is within 200 lines of where it was back in 1.0.0-beta1, which isn’t something I would have been able to predict. Also, while we don’t quite have more test code than production code yet, it’s interesting to note that we’re only about a hundred lines short.

Does any of this actually matter? Well, no, not really. Mostly, it was a fun little exercise in plotting some graphs.

It did remind me that we have certainly simplified the codebase along the way — removing undesirable APIs before 1.0 and removing concepts (like fields) that were an unnecessary abstraction — and those are definitely good things for the codebase.

And it’s also interesting to see how effective the syntactic sugar in C# 6 is in reducing line counts, but the removal of unnecessary text also improves readability, and it’s that that’s the key part here rather than the number of lines of code that results.

But mostly I just like the graphs.

  1. Or, if you prefer BuzzFeed-style headlines, “You won’t believe what happened to this codebase!”. 

  2. To get to 1.0, we removed at least: a verbose parsing API that tried to squish the Noda Time and BCL parsing models together, an in-code type-dependency graph checker, and a very confusingly-broken CultureInfo replacement. 

  3. I’m not counting the size of the NodaTime.Serialization.JsonNet package here at all (nor the NodaTime.Testing support package), so this serialization support just refers to the built-in XML and binary serialization.