Python Pitfalls - Expecting The Unexpected

By Martin Heinz

Martin Heinz

Regardless of which programming language you’re coding in, you’ve probably encountered good chunk of weird and seemingly unexplainable issues that ended up being really stupid mistakes or quirks of that specific language. Python aims at being clean and simple language, yet it also has its portion of gotchas and quirks that can surprise both beginner and experienced software developers. So, to avoid unnecessary rage and frustration over some weird issue in your favourite programming language, here follows a list of common Python pitfalls, that you should try to avoid at all costs.

Image for post
Image for post
Photo by Meor Mohamad on Unsplash

Mutable Default Arguments Are a Bad Idea

The problem with using mutable value as default argument is that the default argument in not initialized every time the function is called. Instead, the recently used value will be passed in, which in case of mutable types is a problem. To avoid this problem, you should always use None or other immutable type instead, and perform check against the argument as shown above.

Even though this might seem like nuisance and a problem, it’s an intended behavior and it can also be exploited to make caching functions which can use the persistent mutable default argument as cache:

Similar behavior to the default arguments above, can also be seen with dict.setdefault(key, value). In the below code we can see some surprising results:

Even though we didn’t touch the data dictionary above, it was modified by appending to default value val. That's because default value passed to setdefault is assigned directly into the dictionary when the key is missing instead of being copied from original. To avoid this issue, make sure you never reuse values when using setdefault.

NaN (Non-) Reflexivity

The above code shows the non-reflexivity of NaN. NaN in Python will never compare as equal even when compared with itself. So, in case you need to test for NaN or inf, then you should use math.isnan() and math.isinf(). Also be careful with any other arithmetic operation when working with code that might produce NaN, as it will propagate through all operations without raising an exception.

Python is usually clever and won’t generally return NaN from math functions, e.g. math.exp(1000.0) will return OverflowError: math range error and both math.sqrt(-1.0) and math.log(0.0) will return ValueError: math domain error, but you might encounter it with Numpy or Pandas and if you do so, remember not to try comparing NaNs for equality.

Late Binding Closures

The code above shows definition of function inside a loop which is then added to a list. With each iteration the i variable increments and so does the i variable in the defined function, right? Wrong.

Late binding causes all the functions to assume value of 2 (from last iteration). This happens because of the closure in which all the functions are defined in — the global one. Because of this all of them refer to the same i which gets mutated in the loop.

There’s more than one way to fix this, but the cleanest one in my opinion is to use functools.partial:

With partial we can create new callable object with predefined i, forcing the variable to be bound immediately which fixes the issue. We can then supply the remaining original parameter n when we want to actually call the functions.

Reassigning Global Variables

But what if you decide to flip (reassign) this flag? Well, it can cause a massive headache:

Looking at the code above one might expect the value of global flag variable to change to True after execution of some_func(), but that's not the case. The some_func declares new local variable flag, sets it to True and it then disappears after end of function body. The global variable is never touched.

There’s a simple fix to this, though. We need to first declare in the function body that we want to refer to the global variable instead of using local one. We do that with global <var_name> - in this case global flag:

Another “fun” issues with variables that you might run into — which is luckily much easier to debug and fix — is modification of out-of-scope variable. Similarly to previous gotcha, it’s caused by manipulating variable that was defined in outer scope:

Here we try to increment variable var inside function scope, assuming that it will modify the global one. But again, that's wrong.

When you modify variable it becomes local to the scope, but you can’t increment variable that wasn’t declared before (in current scope), so UnboundLocalError is thrown.

This again can be fixed using global <var_name> in case of global variables. This so-called scoping bug can also occur inside nested functions where you would use nonlocal <var_name> instead, to refer to variable in the nearest outer scope:

Proper Way to Define Tuples

Mistakes originating from this misconception usually arise when we try to define tuple with just single element:

In the snippet above, we can see that it’s necessary to add , after the singular element to make Python recognize it as tuple. We can also completely omit parenthesis, which is pretty common practice with return statements that return multiple values.

Last example above shows one more similar pitfall. If you forget to separate elements with colon, Python will use implicit concatenation making it a single value of a type string. This kind of implicit concatenation can happen anywhere in the code not just when defining tuple, so always double check your strings and iterables if something fishy is happening with your program.

Indexing Byte Values Instead of Byte Strings

When indexing into binary string, instead of receiving byte string, we get integer byte value, or in other words — ordinal value of the indexed character. To avoid this — especially when reading binary file — it’s best to always use text.decode('utf-8') to get proper string. If you however want to keep the original data as binary string, then you can instead use chr(c) to convert individual characters to string representation.

Indexing with Negated Variable

If we slice a sequence with any negative value (variable) other than -0 we will get the expected values, but if we happen to accidentally slice using [-0:] we will receive as a result a copy of whole sequence as it is equivalent to [:].

Why Is It Returning None!?

I’m guilty of making this mistake way too many times. It’s easy to forget behavior of one of the many string or list methods and it can lead to hours of debugging. So, if you receive None where there should be whole string or list, then double check to make sure you're using all of the above shown methods correctly.

Conclusion

If that doesn’t help, maybe it’s time for some rubber duck debugging or to bring in another pair of eye (colleague sitting next to you). Oftentimes, when you start explaining the problem to somebody else, you will immediately realise where the problem really is.

When you eventually find the bug and manage to solve it, take a moment to think about what you could have done to find it faster. Next time you run into similar issue you might be able to resolve it a bit more quickly.