Understanding Python scope
Here’s a complete Python function and unittest, similar in structure to something I wrote recently. What’s wrong with it?
import unittest
def method_under_test(callback, value):
"""Call callback with value."""
callback(value)
class MyTestCase(unittest.TestCase):
def test_function_calls_callback(self):
callback_called = False
def callback(actual):
callback_called = True
if actual != 42:
raise AssertionError('wrong value!')
method_under_test(callback, 42)
self.assertTrue(callback_called)
if __name__ == '__main__':
unittest.main()
Let’s run that test:
$ python test.py
F
======================================================================
FAIL: test_function_calls_callback (__main__.MyTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "x.py", line 16, in test_function_calls_callback
self.assertTrue(callback_called)
AssertionError: False is not true
----------------------------------------------------------------------
Ran 1 test in 0.000s
FAILED (failures=1)
$
Well, it certainly looks like the callback isn’t being called. Except that
it is: if you change the callback argument we pass to method_under_test()
,
you’ll see that the assertion in our test callback fires.
So what is going on here? Well, it turns out that the mental model I had for how Python handles scope was wrong.
Let’s start with some basics. Like most other languages, Python is statically scoped. This simply means that every mention of a variable name1 in a program can be resolved ahead of time in order to determine the object to which it refers, by inspection only of the program’s text.
Contrast with dynamic scope, where the identity (and therefore the type, in typed languages) of the object to which a variable refers cannot be determined until runtime, as name resolution depends upon the execution context set up by the caller (typically, implemented using a stack of active bindings that are pushed and popped on function call and exit). Dynamic scope is not common in current programming languages2, partly because it defeats information hiding and local analysis: reasoning about a function’s behaviour with dynamic scope is much harder. In any case, all ALGOL3-derived languages (Pascal, C, C#, Java, etc) use static scope.
Static scope is effectively synonymous with lexical scope4, though in some cases the latter is used to differentiate the subset of statically-scoped languages that allow arbitrary nested scopes (from ALGOL, again), where name resolution is permitted to access bindings defined in a (typically closest) parent scope.
Incidentally, none of these definitions have anything particularly to do with closures: a closure is simply an instance of a function that has references to non-local variables. That is, a closure is the combination of a lambda expression (a function with free variables) along with a description of the bindings of those free variables to specific objects.
While we’re here, “closure” also doesn’t mean “anonymous function”, and it
doesn’t require the function to take no arguments. There are, however,
differences between languages as to whether non-local variables can be
re-bound inside a closure, and if so, whether the effects of doing so are
visible to other closures (or even to repeated calls to the same closure),
and as to whether control flow statements (like return
) can affect control
flow in the caller, or merely in the closure.
The only real link between scope and closures is that supporting both static scope and first-class nested functions effectively requires support for closures; supporting one without the other is much simpler.
In fact, I got this wrong myself: I originally titled this post “Understanding Python closures”. But actually, there aren’t any closures in the code above.
So what is going on in our original example? Quite simply, Python is both
statically scoped and also lacks an explicit variable declaration statement
(e.g. a var
-like keyword). This means that it needs to know how to
determine, when given a statement that refers to a variable, and the
existence of a non-local binding with the same name, whether or not that
statement creates a new local binding that shadows the existing one.
The rule that Python chooses is: any assignment within a block establishes
a new local binding, unless a global
statement for the name appears in
the block, in which case the name always refers to a binding in the
module-global environment instead. That’s it (for Python 2.x, anyway).
So in our example, when I wrote:
def test_function_calls_callback(self):
callback_called = False
def callback(actual):
callback_called = True
That second assignment actually creates a new local binding for the variable
callback_called
that shadows the non-local one, and so the original
variable bound at the start of the test method is never updated (and
pylint
will show it as unused, for example).
It’s important to note that Python could have chosen to do this differently: for example, Ruby 1.8 interprets assignments within blocks as creating a new local binding only if doing so would not shadow an existing non-local binding; if it would, it is interpreted as a rebinding of that existing name instead.
And Python 3 solves this a different way: it adds a nonlocal
keyword which, like global
, can be used to force the interpretation of an
assignment as referring to an existing (re-)binding in an outer scope,
rather than establishing a new local binding.
Python 3’s nonlocal
wasn’t an option for me, since I’m still using Python
2.x. Fortunately the solution is just to avoid rebinding the variable name
itself, and use any mutable structure instead5:
def test_function_calls_callback(self):
callback_called = [False]
def callback(actual):
callback_called[0] = True
if actual != 42:
raise AssertionError('wrong value!')
method_under_test(callback, 42)
self.assertTrue(callback_called[0])
-
or more precisely, any identifier, but I’ll restrict the following to variables for simplicity. ↩
-
In fact, of the commonly-used languages, only Perl and Emacs Lisp (and bash, arguably) use dynamic scope, and Perl allows variables to be declared with static scope using the
my
keyword. ↩ -
ALGOL 60 marked the first time that a programming language’s design and specification were rigorously considered separately from any implementation; the “ALGOL 60 Report” defined the formal specification for the language (inventing Backus-Naur Form along the way). It’s no wonder that ALGOL ended up influencing so many later languages6. ↩
-
It’s not clear which term came first; Google Books has “lexical scope” appearing in a conference paper from 1967, while a 1969 book by Alan Kay contrasts ALGOL’s static scope with the dynamic scope available (optionally) in LISP 1.5 (which itself was implemented in 1962). The ALGOL 60 Report (1963) refers to “scope”, but doesn’t use any of the qualifiers. ↩
-
This is the “holder” pattern familiar from Java, where the language prevents rebinding names from an outer scope entirely. ↩