by Malcolm Rowe

This is Just to Say

I have invalidated
the assumptions
that your code
depended upon

Forgive me
they were so well hidden
and so fragile
Reid McKenzie, Twitter

Deadlocks in Java class initialisation

I recently ran across the fact that it’s possible to make the Java runtime deadlock while initialising a class — and that this behaviour is even mandated by the Java Language Specification.

Here’s a Java 7 program that demonstrates the problem:

public class Program {
  public static void main(String args[]) {
    new Thread(new Runnable() {
      @Override public void run() {


  private static class A {
    private static final B b = new B();
    static void initMe() {}

  private static class B {
    private static final A a = new A();
    static void initMe() {}

In addition to demonstrating that lambdas are a good idea (all that boilerplate to start a thread!), this also shows how cycles during class initialisation can lead to a deadlock. Here’s what happens when you run it1:

$ javac Program.java
$ java Program

That is, it hangs.

In Java, classes are loaded at some arbitrary point before use, but are only initialised — running the static {} blocks and static field initialisers — at defined points2.

One of these points is just before a static method is invoked, and so the two calls to A.initMe() and B.initMe() above will both trigger initialisation for the respective classes.

In this case, each class contains a static field that instantiates an instance of the other class. Instantiating the other class requires that that class is initialised, and so what we end up with is that each class’s initialisation is blocked waiting for the initialisation of the other class to complete.

If you trigger a thread dump at this point — by sending a SIGQUIT or hitting Ctrl-\ (or Ctrl-Break on Windows) — then you’ll see something like this:

Full thread dump OpenJDK 64-Bit Server VM (24.79-b02 mixed mode):

"Thread-0" prio=10 tid=0x00007efd50105000 nid=0x51db in Object.wait() [0x00007efd3f168000]
   java.lang.Thread.State: RUNNABLE
        at Program$A.<clinit>(Program.java:13)
        at Program$1.run(Program.java:5)
        at java.lang.Thread.run(Thread.java:745)

"main" prio=10 tid=0x00007efd5000a000 nid=0x51ca in Object.wait() [0x00007efd59d45000]
   java.lang.Thread.State: RUNNABLE
        at Program$B.<clinit>(Program.java:18)
        at Program.main(Program.java:9)


Interestingly, you can see that while both threads are executing an implicit Object.wait(), they’re listed as RUNNABLE rather than WAITING, and there’s no output from the deadlock detector. I suspect that the reason for both of these is that the details of class initialisation changed in Java 7:

In Java 6, the runtime would attempt to lock the monitor owned by each Class instance for the duration of the initialisation, while in Java 7, attempting to initialise a class that’s already being initialised by another thread just requires that that the caller be blocked in some undefined fashion until that initialisation completes.

There are other ways to trigger the same problem, too. Here’s another problematic snippet:

public class Foo {
  public static final Foo EMPTY = new EmptyFoo();

public class EmptyFoo extends Foo {}

Here we have Foo, and EmptyFoo, a special — presumably empty, in some fashion — version of Foo. EmptyFoo is usable directly, but it’s also available as Foo.EMPTY.

The problem here is that initialising EmptyFoo requires us to initialise the superclass, and initialising Foo requires initialisation of EmptyFoo for the static field. This would be fine in one thread, but if two threads attempt to initialise the two classes separately, deadlock results.

Cyclic dependencies between classes have always been problematic in both Java and C#, as references to non-constant static fields in classes that are already being initialised see uninitialised (Java) or default (C#) values. However, normally the initialisation does complete; here, it doesn’t, and here the dependencies are simply between the classes, not between their data members.

Unfortunately, I don’t know of any convenient way to detect these cycles in Java: OpenJDK provides -XX:+TraceClassInitialization, which I suspect might be useful, but it’s only available in debug builds of the OpenJDK JRE3, and I haven’t been able to confirm exactly what it shows.

And for what it’s worth, I’m not aware of a better solution for detecting cycles in C# either. For Noda Time, we used a custom cycle detector for a while; it spotted some bugs resulting from reading default values, but it was too brittle and invasive (it required modifying each class), and so we removed it before 1.0.

I suppose that if we assume that class initialisation occurs atomically and on multiple threads, then this kind of problem is bound to come up4. Perhaps what’s surprising is that these languages do allow the use of partially-initialised classes in the single-threaded case?

If videos are your thing, the folks at Webucator have turned this post into a video as part of their free (registration required) Java Solutions from the Web course. They also offer a series of paid Java Fundamentals classes covering a variety of topics.

  1. Or at least, what happens when I run it, on a multiprocessor Debian machine running OpenJDK 7u79. I don’t think the versions are particularly important — this behaviour seems to be present in all Java versions — though I am a little surprised that I didn’t need to add any additional synchronisation or delays. 

  2. A similar situation exists in C# for classes with static constructors (for classes without, the runtime is allowed much more latitude as to when the type is initialised). 

  3. You can trace class loading with -XX:TraceClassLoadingPreorder and -XX:TraceClassLoading, but this doesn’t tell you when class initialisation happens. 

  4. He says, with a sample size of one. I haven’t managed to confirm what C# does, for example, and C++ avoids this problem by replacing it with a much larger one, the “static initialisation order fiasco”. 

pip install --isolated fails on Python 3.0–3.3

(This is a quick post for search-engine fodder, since I didn’t manage to find anything relevant myself.)

If you’re using pip install --isolated to install Python packages and find that it fails with an error like the following:

Complete output from command python setup.py egg_info:
usage: -c [global_opts] cmd1 [cmd1_opts] [cmd2 [cmd2_opts] ...]
   or: -c --help [cmd1 cmd2 ...]
   or: -c --help-commands
   or: -c cmd --help

error: option --no-user-cfg not recognized

… then you might have run into an incompatibility between pip and Python versions 3.0–3.3.

pip version 6.0 added an isolated mode (activated by the --isolated flag) that avoids looking at per-user configuration (pip.conf and environment variables).

Running pip in isolated mode also passes the --no-user-cfg flag to Python’s distutils to disable reading the per-user ~/.pydistutils.cfg. But that flag isn’t available in Python versions 3.0–3.3, causing the error above.

I ran into this because I recently migrated the Python code that generates this site to run under Python 3.x. I’m using a virtualenv setup, so once I had everything working under both Python versions, I was reasonably confident that I could switch ‘production’ (i.e. the Compute Engine instance that serves this site) to Python 3 and discard the 2.x-compatibility code.

Good thing I tested it out first, since it didn’t even install.

It turns out that:

  1. virtualenv ships an embedded copy of pip and setuptools, but setuptools will use the system version of distutils1, and
  2. --no-user-cfg was added in Python 2.7, but wasn’t ported to 3.x until 3.42, and
  3. the distribution I’m using on my real server (Debian 7) ships with Python 3.2.3, rather than the 3.4.x I’m using elsewhere.

I worked around this by just omitting the --isolated flag for Python verions [3.0, 3.4) — though since I don’t actually have any system config files in practice, I probably could have set PIP_CONFIG_FILE=/dev/null instead (which has the effect of ignoring all config files).

I’m not the first person to have noticed that virtualenv isn’t actually hermetic. Though some of that rant is out of date now (Python wheel files provide prebuilt binaries), and some isn’t relevant to the way I’m using virtualenv/pip, it’s definitely true that the dependency on the system Python libraries is the main reasons I’d look to something more like Docker or Vagrant for deployment were I doing this professionally.

So did I finally manage to switch to Python 3.x after that? Not even close: Python 3.x didn’t gain the ability to (redundantly) use the u'foo' syntax for Unicode strings until 3.3, and some of my dependencies use that syntax. So I’m waiting until I can switch to Debian 8 on Compute Engine3, at which point I can cleanly assume Python 3.4 or later.

  1. This is a rant for another day, but it looks like virtualenv monkeypatches pip, which monkeypatches setuptools, which either monkeypatches or builds upon distutils. Debugging through this edifice of patched abstractions is… not easy. 

  2. It’s a bit more complex than that: Python 3.0 and 3.1 were released first, then the feature was implemented in both 2.7 and 3.2, but then distutils as a whole was rolled back to its 3.1 state before 3.2 was released. That rollback was reverted for Python 3.4. 

  3. I can apt-get dist-upgrade from the Debian 7 image just fine, but it’s a bit slow and hacky, so I’d rather wait for official images. (I also need to fix some custom mail-related configuration that appears to have broken under Debian 8.) 

Noda Time 1.3.1

It’s been a while since the last Noda Time release, and while we’re still working towards 2.0, we’ve been collecting a few bug fixes that can’t really wait. So last Friday1, we released Noda Time 1.3.1.

Noda Time 1.3.1 updates the built-in version of TZDB from 2014e to 2015a, and fixes a few minor bugs, two of which were triggered by recent data changes.

Since it’s been a while since the previous release, it may be worth pointing out that new Noda Time releases are not the only way to get new time zone data: applications can choose to load an external version of the time zone database rather than use the embedded version, and so use up-to-date time zone data with any version of the Noda Time assemblies.

If you’re in a hurry, you can get Noda Time 1.3.1 from the NuGet repository (core, testing, JSON support packages), or from the links on the Noda Time home page. The rest of this post talks about the changes in 1.3.1 in a bit more detail.

End of year transitions (Bangladesh)

In the middle of 2009, Bangladesh started observing permanent daylight saving time, as an energy-saving measure. This was abandoned at the end of that year, and the country went back to permanent standard time.

Until recently, that transition back to standard time was actually recorded as happening a minute too early, at 23:59 on December 31st. TZDB 2014g fixed this by changing the transition time to “24:00” — that is, midnight at the end of the last day of the year.

Noda Time could already handle transitions at the end of the day, but would incorrectly ignore this particular transition because it occurred ‘after’ 2009. That’s now fixed, and Noda Time 1.3.1 returns the correct offset for Asia/Dhaka when using data from TZDB 2014g or later.

BCL provider: historical changes to the base offset (Russia)

In October 2014, most of Russia switched from permanent daylight saving time to permanent standard time, effectively moving local time back one hour. These changes were included in TZDB 2014f.

For people using the BCL provider instead of the TZDB provider (and using Windows), Microsoft delivered a hotfix in September 2014. However, our BCL provider depends upon the .NET framework’s TimeZoneInfo class, and the .NET framework — unlike TZDB — is unable to represent historical changes to the ‘base’ offset of a time zone (as happened here).

The result is that Noda Time (and other applications using TimeZoneInfo in .NET 4.5.3 and earlier) incorrectly compute the offset for dates before October 26th, 2014.

A future update of the .NET framework should correct this limitation, but without a corresponding change in Noda Time, the extra information wouldn’t be used; Noda Time 1.3.1 prepares for this change, and will use the correct offset for historical dates when TimeZoneInfo does.

BCL provider: time zone equality

The time zones returned by the BCL provider have long had a limitation in the way time zone equality was implemented: a BCL time zone was considered equal to itself, and unequal to a time zone returned by a different provider, but attempting to compare two different BCL time zone instances for equality always threw a NotImplementedException. This was particularly annoying for ZonedDateTime, as its equality is defined in terms of the contained DateTimeZone.

This was documented, but we always considered it a bug, as it wasn’t possible to predict whether testing for equality would throw an exception. Noda Time 1.3.1 fixes this by implementing equality in terms of the underlying TimeZoneInfo: BCL time zones are considered equal if they wrap the same underlying TimeZoneInfo instance.

Note that innate time zone equality is not really well defined in general, and is something we’re planning to reconsider for Noda Time 2.0. Rather than rely on DateTimeZone.Equals(), we’d recommend that applications that want to compare time zones for equality use ZoneEqualityComparer to specify how two time zones should be compared.

And finally…

There are a handful of other smaller fixes in 1.3.1: the NodaTime assembly correctly declares a dependency on System.Xml, so you won’t have to; the NuGet packages now work with ASP.NET’s kpm tool, and declare support for Xamarin’s Xamarin.iOS (for building iOS applications using C#) in addition to Xamarin.Android, which was already listed; and we’ve fixed a few reported documentation issues along the way.

As usual, see the User Guide and 1.3.1 release notes for more information about all of the above.

Work is still continuing on 2.0 along the lines described in our 1.3.0 release post, and we’re also planning a 1.4 release to act as a bridge between 1.x and 2.0. This will deprecate members that we plan to remove in 2.0 and introduce the replacements where feasible.

  1. Release late on Friday afternoon? What could go wrong? Apart from running out of time to write a blog post, I mean.