Enlightenment in Python
October 4, 2006
Through a rather confusing bug, I have just been enlightened as to the reasoning behind an idiom I often see in scripts written in the Python programming language.
So, in Python, good style says that you should avoid code that systematically executes. You define all the functions, classes and constants for your file, and then only execute the actual program if the file was invoked directly. This is so that another script can import yours, and have access to all of your functions, classes and constants, without some weird funky code executing automatically when the script is imported.
This “Execute only if invoked directly, not through an import from elsewhere” is achieved by the following language statement:
# First our module code...
def my_function_foo():
return "foo"
def my_function_bar():
return "bar"
# And then the main statement:
if __name__ == "__main__":
print my_function_foo() + my_function_bar() # output "foobar"
However, more than once, I’ve seen this:
# Some module code here
# The main *function*
def main():
print my_function_foo() + my_function_bar()
# The main statement
if __name__ == "__main__":
main()
That seemed like a needless repetition to me, as we gain nothing from the indirection of calling main(). However, there is a subtle difference: scope.
Say we have the following module code:
# Module function
def foo():
print bar
# The main statement
if __name__ == "__main__":
bar = 2
foo()
This module contains an error: there is no ‘bar’ variable in the foo() function, probably an omission (well, deliberate in this case, but you get the idea). But instead of issuing a syntax error, the python interpreter will happily print ‘2’. What the hell?
Python’s scoping rules are: First try to locate a variable in the local (function) scope, and then move back up one level at a time (so, back into a class scope for instance), until you reach the “global” context, which is stuff defined by itself outside of a function or class. So, in this case, the interpreter needs to find a bar to print, and so walks back out of the foo() function. And lo and behold, in the global context, we have bar=2 !
The thing is, if blocks in Python do not constitute a separate scope from the thing containing it. So, the if __main__ block actually operates from the global context, defining variables inside it as it goes.
And so we come to the justification of having the “main” code inside a main() function, rather than just nested inside the if block: by putting the main code inside a function, we hide the variable definitions it makes inside the main() function’s scope, rather than pollute the global scope. That way, the only thing actually in the global scope is a call to main(), which doesn’t pollute the global scope. With a main() function, the previous incorrect python module indeed triggered an exception:
NameError: global name 'bar' is not defined
I always knew the python folks didn’t just make the distinction for the hell of it, to make the language more verbose than it needs to be. And now I know why, and I worked it out all by myself too :-).