Regardless of which programming language you’re coding in, you’ve probably encountered good chunk of weird and seemingly unexplainable issues that ended up being really stupid mistakes or quirks of that specific language. Python aims at being clean and simple language, yet it also has its portion of gotchas and quirks that can surprise both beginner and experienced software developers. So, to avoid unnecessary rage and frustration over some weird issue in your favourite programming language, here follows a list of common Python pitfalls, that you should try to avoid at all costs.
Mutable Default Arguments Are a Bad Idea
Setting default arguments for a function is very common and useful for defining optional arguments or arguments that can usually use same, predefined value. Setting default argument to a mutable value such as
dict can, however, cause unexpected behavior:
The problem with using mutable value as default argument is that the default argument in not initialized every time the function is called. Instead, the recently used value will be passed in, which in case of mutable types is a problem. To avoid this problem, you should always use
None or other immutable type instead, and perform check against the argument as shown above.
Even though this might seem like nuisance and a problem, it’s an intended behavior and it can also be exploited to make caching functions which can use the persistent mutable default argument as cache:
Similar behavior to the default arguments above, can also be seen with
dict.setdefault(key, value). In the below code we can see some surprising results:
Even though we didn’t touch the
data dictionary above, it was modified by appending to default value
val. That's because default value passed to
setdefault is assigned directly into the dictionary when the key is missing instead of being copied from original. To avoid this issue, make sure you never reuse values when using
NaN (Non-) Reflexivity
Working with floats and non-integer numbers can often be difficult and annoying, but it gets especially weird when you get into Not-a-Number and Infinity territory. So, let’s demonstrate this by making a few comparisons with these values:
The above code shows the non-reflexivity of
NaN in Python will never compare as equal even when compared with itself. So, in case you need to test for
inf, then you should use
math.isinf(). Also be careful with any other arithmetic operation when working with code that might produce
NaN, as it will propagate through all operations without raising an exception.
Python is usually clever and won’t generally return
NaN from math functions, e.g.
math.exp(1000.0) will return
OverflowError: math range error and both
math.log(0.0) will return
ValueError: math domain error, but you might encounter it with Numpy or Pandas and if you do so, remember not to try comparing
NaNs for equality.
Late Binding Closures
There are quote a few gotchas, pitfalls and surprises surrounding scopes and closures in Python. The most common one — I’d say — is late binding in closures. Let’s start with example:
The code above shows definition of function inside a loop which is then added to a list. With each iteration the
i variable increments and so does the
i variable in the defined function, right? Wrong.
Late binding causes all the functions to assume value of 2 (from last iteration). This happens because of the closure in which all the functions are defined in — the global one. Because of this all of them refer to the same
i which gets mutated in the loop.
There’s more than one way to fix this, but the cleanest one in my opinion is to use
partial we can create new callable object with predefined
i, forcing the variable to be bound immediately which fixes the issue. We can then supply the remaining original parameter
n when we want to actually call the functions.
Reassigning Global Variables
Using a lot of global variables is generally discouraged and viewed as a bad practice. There are however, valid reason to use some global variables — for example to define various flags, which can be used to set log level of function.
But what if you decide to flip (reassign) this flag? Well, it can cause a massive headache:
Looking at the code above one might expect the value of global
flag variable to change to
True after execution of
some_func(), but that's not the case. The
some_func declares new local variable
flag, sets it to
True and it then disappears after end of function body. The global variable is never touched.
There’s a simple fix to this, though. We need to first declare in the function body that we want to refer to the global variable instead of using local one. We do that with
global <var_name> - in this case
Another “fun” issues with variables that you might run into — which is luckily much easier to debug and fix — is modification of out-of-scope variable. Similarly to previous gotcha, it’s caused by manipulating variable that was defined in outer scope:
Here we try to increment variable
var inside function scope, assuming that it will modify the global one. But again, that's wrong.
When you modify variable it becomes local to the scope, but you can’t increment variable that wasn’t declared before (in current scope), so
UnboundLocalError is thrown.
This again can be fixed using
global <var_name> in case of global variables. This so-called scoping bug can also occur inside nested functions where you would use
nonlocal <var_name> instead, to refer to variable in the nearest outer scope:
Proper Way to Define Tuples
One misconception that pretty much every Python developer has ingrained in their mind, is that tuples are defined by surrounding parenthesis. Unlike iterables like
tuple is defined by the colon separating its elements.
Mistakes originating from this misconception usually arise when we try to define tuple with just single element:
In the snippet above, we can see that it’s necessary to add
, after the singular element to make Python recognize it as tuple. We can also completely omit parenthesis, which is pretty common practice with
return statements that return multiple values.
Last example above shows one more similar pitfall. If you forget to separate elements with colon, Python will use implicit concatenation making it a single value of a type string. This kind of implicit concatenation can happen anywhere in the code not just when defining tuple, so always double check your strings and iterables if something fishy is happening with your program.
Indexing Byte Values Instead of Byte Strings
When working with files and data in them we mostly just use ASCII or UTF-8 strings. From time to time however, you might have to read and write some binary data and you might be surprised with the results of indexing and iterating them:
When indexing into binary string, instead of receiving byte string, we get integer byte value, or in other words — ordinal value of the indexed character. To avoid this — especially when reading binary file — it’s best to always use
text.decode('utf-8') to get proper string. If you however want to keep the original data as binary string, then you can instead use
chr(c) to convert individual characters to string representation.
Indexing with Negated Variable
Slicing and dicing is one of the most handy features of Python including the ability to specify negative indexes, but if you are not careful with those, you might get unexpected results:
If we slice a sequence with any negative value (variable) other than
-0 we will get the expected values, but if we happen to accidentally slice using
[-0:] we will receive as a result a copy of whole sequence as it is equivalent to
Why Is It Returning None!?
I left my “favourite” gotcha the for last. It’s easy to forget whether a function returns new value or modifies original in-place. Especially, when there are generally 2 types of methods — list methods which modify the argument and return
None and string methods which modify the argument in-place.
I’m guilty of making this mistake way too many times. It’s easy to forget behavior of one of the many string or list methods and it can lead to hours of debugging. So, if you receive
None where there should be whole string or list, then double check to make sure you're using all of the above shown methods correctly.
It’s inevitable that you will run into these or other similar gotchas and pitfalls that will cause a lot of rage and frustration. More often than not, the best way to solve any of these issues is to just step back for a moment. Go for walk. Go make a cup of coffee. Or at least take a deep breath. Most of the time all it takes to solve such an issue, is to leave it for bit and come back later.
If that doesn’t help, maybe it’s time for some rubber duck debugging or to bring in another pair of eye (colleague sitting next to you). Oftentimes, when you start explaining the problem to somebody else, you will immediately realise where the problem really is.
When you eventually find the bug and manage to solve it, take a moment to think about what you could have done to find it faster. Next time you run into similar issue you might be able to resolve it a bit more quickly.