Learning from bad code

I read a lot of code, and I think it’s fair to say that most of it — mine included — can probably be improved in some way. If nothing else, there’s usually scope for some cleanup or refactoring.

What I’ve been thinking about recently is “bad” code, though, not merely imperfect code. That is, the kind of code that isn’t just buggy¹, but a candidate for The Daily WTF’s Code Snippet of the Day.

In general, I think we should be able to find something to learn from this kind of code, no matter how bad: does it imply that an API is badly documented — or badly designed? Or even if it’s simply ‘wrong’, does it fail in an interesting way that teaches us something new?

It’s the last of these that I’d like to look at today. I recently ran across some code that surprised me, twice: first, that it worked at all, and second, in the way that it occasionally didn’t.

In the middle of a large Java batch job, I found some code to speed up an operation by executing some subtasks asynchronously. Pared down to the bare essentials, it looked something like this:

/** Run a callable asynchronously, returning a Future. */
private <T> Future<T> runAsync(Callable<T> callable) {
  // [Omitted: monitoring code.]
  return Executors.newSingleThreadExecutor().submit(callable);
}

If you’ve worked with concurrency in Java, that’s clearly not a very sensible way to run a number of tasks asynchronously, as the executor returned by newSingleThreadExecutor() — and so the thread that it manages — can’t be reused. Instead, we’re effectively creating a new thread for each subtask (of which, in my case, there were a lot).

However, when I looked at how the code was running in production, I saw two things that surprised me. Firstly, the threads that were running these subtasks were actually disappearing once the work had been done. This was not what I’d expected.

Since threads are GC roots in Java, I figured that the number of threads should grow unbounded, as each thread would execute one piece of work, then sit around waiting for more work from its (no longer accessible) executor. Fairly soon, we’d expect to run out of memory (each thread’s stack being somewhere in the region of 512KB). But this clearly wasn’t happening.

The second surprise was that in a very few cases — maybe 0.01% — we received an exception while submitting the job, roughly as follows:

RejectedExecutionException:
  Task FutureTask rejected from ThreadPoolExecutor
  [Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at ThreadPoolExecutor$AbortPolicy.rejectedExecution
at ThreadPoolExecutor.reject
at ThreadPoolExecutor.execute
at AbstractExecutorService.submit
at Executors$DelegatedExecutorService.submit

We fixed the code to use a thread pool instead, and everything was much more sensible, but I was curious as to why that exception had occurred in the first place. Was the executor being shut down while the submit() method was executing? If so, how did the code calling shutdown() obtain a reference to the executor?

I managed to construct a simple test case:

import java.util.concurrent.Callable;
import java.util.concurrent.Executors;

class Prog {
  public static void main(String[] args) {
    Callable<Long> callable = new Callable<Long>() {
      public Long call() throws Exception {
        // Allocate, to create some memory pressure.
        byte[][] bytes = new byte[1024][];
        for (int i = 0; i < 1024; i++) {
          bytes[i] = new byte[1024];
        }

        return 42L;
      }
    };
    for (;;) {
      Executors.newSingleThreadExecutor().submit(callable);
    }
  }
}

When run with java -server -Xmx128m Prog, this has ThreadPoolExecutor throw a RejectedExecutionException within a few seconds, just like I’d seen.

I looked through the JDK source for an explanation as to why the task was rejected. Could it be that thread creation had failed? We were creating a lot of threads, after all, and there seemed to be some handling for that. The exception also said that the executor was shut down, but it wasn’t initially clear to me whether that was necessary true at the time the decision was made to throw the exception.

One clue was that ThreadPoolExecutor has a finalize() method that shuts down the executor. Given that we weren’t keeping a reference to the underling executor, it seemed like a strong possibility that finalization was causing the shutdown.

That would also explain why the threads weren’t sticking around (they were being asked to exit by the executor) and perhaps why the task submission was rejected: if the runtime decided to GC the executor while the task was being submitted, it would call shutdown() from a different thread, and result in the above exception.

But why would the instance be eligible for GC while submit() was executing in the first place? I ploughed through the relevant section of the JLS, but it didn’t get much clearer.

I thought I’d remembered that an object becomes eligible for GC once there are provably no further references to the instance’s fields (perhaps ignoring references that the JIT has loaded into a register? That may actually be C# rather than Java, though). Either way, I didn’t see any execution paths in which the this reference would no longer be needed (and now I come to check the source again, it’s trivially needed by the code that throws the exception, as shown above).

In the end, I mailed our Java platform folks to ask for an explanation.

They confirmed that I’d been right about objects being collectable once they were inaccessible², but that I’d been wrong about which object was being collected.

You can see from the stack trace above that we have three classes to consider: ThreadPoolExecutor, where most of the logic is, the superclass AbstractExecutorService, and another inner class referred to above as Executors$DelegatedExecutorService. That inner class turns out to be FinalizableDelegatedExecutorService, and the clue may well be in the name.

It turns out that FinalizableDelegatedExecutorService is returned by newSingleThreadExecutor(). It does nothing other than aggregate another executor and shut it down when finalized (even though, in this case, the wrapped ThreadPoolExecutor itself will do the same thing). That wrapper instance is eligible for GC as soon as it delegates the submit() call, and that finalization is what is calling shutdown() on the wrapped instance, not ThreadPoolExecutor’s own finalization method.

So next time you see some ‘bad’ code, maybe think about what you can learn from it. In this case, I learned that the code wasn’t actually as bad as it first seemed (after all, it worked), and I also learned a little more about how Java’s GC works.

Bugs are easy, in a sense: we spot a problem in the way the code executes, fix it — which may take a lot of work, or might not — and then we’re done. At a stretch, we might try to do some extra work to avoid making the same mistake again, perhaps with something like error-prone (for Java). ↩
They pointed me towards Hans Boehm’s presentation about finalizers, and specifically to “myth #2” (page 8). ↩