Understanding Python scope

Here’s a complete Python function and unittest, similar in structure to something I wrote recently. What’s wrong with it?

import unittest

def method_under_test(callback, value):
    """Call callback with value."""
    callback(value)

class MyTestCase(unittest.TestCase):
    def test_function_calls_callback(self):
        callback_called = False
        def callback(actual):
            callback_called = True
            if actual != 42:
                raise AssertionError('wrong value!')

        method_under_test(callback, 42)
        self.assertTrue(callback_called)

if __name__ == '__main__':
    unittest.main()

Let’s run that test:

$ python test.py
F
======================================================================
FAIL: test_function_calls_callback (__main__.MyTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "x.py", line 16, in test_function_calls_callback
    self.assertTrue(callback_called)
AssertionError: False is not true

----------------------------------------------------------------------
Ran 1 test in 0.000s

FAILED (failures=1)
$

Well, it certainly looks like the callback isn’t being called. Except that it is: if you change the callback argument we pass to method_under_test(), you’ll see that the assertion in our test callback fires.

So what is going on here? Well, it turns out that the mental model I had for how Python handles scope was wrong.

Let’s start with some basics. Like most other languages, Python is statically scoped. This simply means that every mention of a variable name¹ in a program can be resolved ahead of time in order to determine the object to which it refers, by inspection only of the program’s text.

Contrast with dynamic scope, where the identity (and therefore the type, in typed languages) of the object to which a variable refers cannot be determined until runtime, as name resolution depends upon the execution context set up by the caller (typically, implemented using a stack of active bindings that are pushed and popped on function call and exit). Dynamic scope is not common in current programming languages², partly because it defeats information hiding and local analysis: reasoning about a function’s behaviour with dynamic scope is much harder. In any case, all ALGOL³-derived languages (Pascal, C, C#, Java, etc) use static scope.

Static scope is effectively synonymous with lexical scope⁴, though in some cases the latter is used to differentiate the subset of statically-scoped languages that allow arbitrary nested scopes (from ALGOL, again), where name resolution is permitted to access bindings defined in a (typically closest) parent scope.

Incidentally, none of these definitions have anything particularly to do with closures: a closure is simply an instance of a function that has references to non-local variables. That is, a closure is the combination of a lambda expression (a function with free variables) along with a description of the bindings of those free variables to specific objects.

While we’re here, “closure” also doesn’t mean “anonymous function”, and it doesn’t require the function to take no arguments. There are, however, differences between languages as to whether non-local variables can be re-bound inside a closure, and if so, whether the effects of doing so are visible to other closures (or even to repeated calls to the same closure), and as to whether control flow statements (like return) can affect control flow in the caller, or merely in the closure.

The only real link between scope and closures is that supporting both static scope and first-class nested functions effectively requires support for closures; supporting one without the other is much simpler.

In fact, I got this wrong myself: I originally titled this post “Understanding Python closures”. But actually, there aren’t any closures in the code above.

So what is going on in our original example? Quite simply, Python is both statically scoped and also lacks an explicit variable declaration statement (e.g. a var-like keyword). This means that it needs to know how to determine, when given a statement that refers to a variable, and the existence of a non-local binding with the same name, whether or not that statement creates a new local binding that shadows the existing one.

The rule that Python chooses is: any assignment within a block establishes a new local binding, unless a global statement for the name appears in the block, in which case the name always refers to a binding in the module-global environment instead. That’s it (for Python 2.x, anyway).

So in our example, when I wrote:

def test_function_calls_callback(self):
    callback_called = False
    def callback(actual):
        callback_called = True

That second assignment actually creates a new local binding for the variable callback_called that shadows the non-local one, and so the original variable bound at the start of the test method is never updated (and pylint will show it as unused, for example).

It’s important to note that Python could have chosen to do this differently: for example, Ruby 1.8 interprets assignments within blocks as creating a new local binding only if doing so would not shadow an existing non-local binding; if it would, it is interpreted as a rebinding of that existing name instead.

And Python 3 solves this a different way: it adds a nonlocal keyword which, like global, can be used to force the interpretation of an assignment as referring to an existing (re-)binding in an outer scope, rather than establishing a new local binding.

Python 3’s nonlocal wasn’t an option for me, since I’m still using Python 2.x. Fortunately the solution is just to avoid rebinding the variable name itself, and use any mutable structure instead⁵:

def test_function_calls_callback(self):
    callback_called = [False]
    def callback(actual):
        callback_called[0] = True
        if actual != 42:
            raise AssertionError('wrong value!')

    method_under_test(callback, 42)
    self.assertTrue(callback_called[0])

or more precisely, any identifier, but I’ll restrict the following to variables for simplicity. ↩
In fact, of the commonly-used languages, only Perl and Emacs Lisp (and bash, arguably) use dynamic scope, and Perl allows variables to be declared with static scope using the my keyword. ↩
ALGOL 60 marked the first time that a programming language’s design and specification were rigorously considered separately from any implementation; the “ALGOL 60 Report” defined the formal specification for the language (inventing Backus-Naur Form along the way). It’s no wonder that ALGOL ended up influencing so many later languages⁶. ↩
It’s not clear which term came first; Google Books has “lexical scope” appearing in a conference paper from 1967, while a 1969 book by Alan Kay contrasts ALGOL’s static scope with the dynamic scope available (optionally) in LISP 1.5 (which itself was implemented in 1962). The ALGOL 60 Report (1963) refers to “scope”, but doesn’t use any of the qualifiers. ↩
This is the “holder” pattern familiar from Java, where the language prevents rebinding names from an outer scope entirely. ↩
And authors. Bonus quote, from a paper by C. A. R. Hoare derived from a keynote he gave on language design, 1973: “The more I ponder the principles of language design, and the techniques which put them into practice, the more is my amazement and admiration of ALGOL 60. Here is a language so far ahead of its time, that it was not only an improvement on its predecessors, but also on nearly all its successors.” ↩