Learning from bad code
I read a lot of code, and I think it’s fair to say that most of it — mine included — can probably be improved in some way. If nothing else, there’s usually scope for some cleanup or refactoring.
What I’ve been thinking about recently is “bad” code, though, not merely imperfect code. That is, the kind of code that isn’t just buggy1, but a candidate for The Daily WTF’s Code Snippet of the Day.
In general, I think we should be able to find something to learn from this kind of code, no matter how bad: does it imply that an API is badly documented — or badly designed? Or even if it’s simply ‘wrong’, does it fail in an interesting way that teaches us something new?
It’s the last of these that I’d like to look at today. I recently ran across some code that surprised me, twice: first, that it worked at all, and second, in the way that it occasionally didn’t.
In the middle of a large Java batch job, I found some code to speed up an operation by executing some subtasks asynchronously. Pared down to the bare essentials, it looked something like this:
/** Run a callable asynchronously, returning a Future. */
private <T> Future<T> runAsync(Callable<T> callable) {
// [Omitted: monitoring code.]
return Executors.newSingleThreadExecutor().submit(callable);
}
If you’ve worked with concurrency in Java, that’s clearly not a very
sensible way to run a number of tasks asynchronously, as the executor
returned by newSingleThreadExecutor()
— and so the thread that it
manages — can’t be reused. Instead, we’re effectively creating a new
thread for each subtask (of which, in my case, there were a lot).
However, when I looked at how the code was running in production, I saw two things that surprised me. Firstly, the threads that were running these subtasks were actually disappearing once the work had been done. This was not what I’d expected.
Since threads are GC roots in Java, I figured that the number of threads should grow unbounded, as each thread would execute one piece of work, then sit around waiting for more work from its (no longer accessible) executor. Fairly soon, we’d expect to run out of memory (each thread’s stack being somewhere in the region of 512KB). But this clearly wasn’t happening.
The second surprise was that in a very few cases — maybe 0.01% — we received an exception while submitting the job, roughly as follows:
RejectedExecutionException:
Task FutureTask rejected from ThreadPoolExecutor
[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 0]
at ThreadPoolExecutor$AbortPolicy.rejectedExecution
at ThreadPoolExecutor.reject
at ThreadPoolExecutor.execute
at AbstractExecutorService.submit
at Executors$DelegatedExecutorService.submit
We fixed the code to use a thread pool instead, and everything was much more
sensible, but I was curious as to why that exception had occurred in the
first place. Was the executor being shut down while the submit()
method
was executing? If so, how did the code calling shutdown()
obtain a
reference to the executor?
I managed to construct a simple test case:
import java.util.concurrent.Callable;
import java.util.concurrent.Executors;
class Prog {
public static void main(String[] args) {
Callable<Long> callable = new Callable<Long>() {
public Long call() throws Exception {
// Allocate, to create some memory pressure.
byte[][] bytes = new byte[1024][];
for (int i = 0; i < 1024; i++) {
bytes[i] = new byte[1024];
}
return 42L;
}
};
for (;;) {
Executors.newSingleThreadExecutor().submit(callable);
}
}
}
When run with java -server -Xmx128m Prog
, this has ThreadPoolExecutor
throw a RejectedExecutionException
within a few seconds, just like I’d
seen.
I looked through the JDK source for an explanation as to why the task was rejected. Could it be that thread creation had failed? We were creating a lot of threads, after all, and there seemed to be some handling for that. The exception also said that the executor was shut down, but it wasn’t initially clear to me whether that was necessary true at the time the decision was made to throw the exception.
One clue was that ThreadPoolExecutor has a
finalize()
method that shuts down the executor. Given that we weren’t
keeping a reference to the underling executor, it seemed like a strong
possibility that finalization was causing the shutdown.
That would also explain why the threads weren’t sticking around (they were
being asked to exit by the executor) and perhaps why the task submission was
rejected: if the runtime decided to GC the executor while the task was
being submitted, it would call shutdown()
from a different thread, and
result in the above exception.
But why would the instance be eligible for GC while submit()
was executing
in the first place? I ploughed through the relevant section of the
JLS, but it didn’t get much clearer.
I thought I’d remembered that an object becomes eligible for GC once there
are provably no further references to the instance’s fields (perhaps
ignoring references that the JIT has loaded into a register? That may
actually be C# rather than Java, though). Either way, I didn’t see any
execution paths in which the this
reference would no longer be needed (and
now I come to check the source again, it’s trivially needed by the code that
throws the exception, as shown above).
In the end, I mailed our Java platform folks to ask for an explanation.
They confirmed that I’d been right about objects being collectable once they were inaccessible2, but that I’d been wrong about which object was being collected.
You can see from the stack trace above that we have three classes to
consider: ThreadPoolExecutor
, where most of the logic is, the superclass
AbstractExecutorService
, and another inner class referred to above as
Executors$DelegatedExecutorService
. That inner class turns out to be
FinalizableDelegatedExecutorService
, and the clue may well be in the name.
It turns out that FinalizableDelegatedExecutorService
is returned by
newSingleThreadExecutor()
. It does nothing other than aggregate another
executor and shut it down when finalized (even though, in this case, the
wrapped ThreadPoolExecutor
itself will do the same thing). That wrapper
instance is eligible for GC as soon as it delegates the submit()
call, and
that finalization is what is calling shutdown()
on the wrapped instance,
not ThreadPoolExecutor
’s own finalization method.
So next time you see some ‘bad’ code, maybe think about what you can learn from it. In this case, I learned that the code wasn’t actually as bad as it first seemed (after all, it worked), and I also learned a little more about how Java’s GC works.
-
Bugs are easy, in a sense: we spot a problem in the way the code executes, fix it — which may take a lot of work, or might not — and then we’re done. At a stretch, we might try to do some extra work to avoid making the same mistake again, perhaps with something like error-prone (for Java). ↩
-
They pointed me towards Hans Boehm’s presentation about finalizers, and specifically to “myth #2” (page 8). ↩