8 Reasons Python Sucks - The Hacker Factor Blog

Occasionally I go out to lunch with some of my techie friends and we have a great time geeking together. We talk about projects, current events, and various tech-related issues. Inevitably, the discussion will turn to programming languages. One might lament "I have to modify some Java code. I hate Java. (Oh, sorry, Kyle.)" (It probably doesn't help that we gave Kyle the nickname "Java-boy" over a decade ago.) Another will gripe about some old monolithic shell code that nobody wants to rewrite. And me, well... I just blurted it out: I hate Python. I hate it with a passion. If I have the choice between using some pre-existing Python code or rewriting it in C, I'd rather rewrite it in C.

When I finished shouting, Bill humorously added, "But what do you really think about Python, Neal?" So I'm dedicating this blog entry to Bill.

Here's my list of "8 reasons Python sucks". If you install a default Linux operating system, there's a really good chance that it will install multiple versions of Python. It will probably have Python2 and Python3, and maybe even some fractional versions like 3.5 or 3.7. There's a reason for this: Python3 is not fully compatible with Python2. Even some of the fractional versions are distinct enough to lack backwards compatibility. I'm all for adding new functionality to languages. I don't even mind if some old version becomes obsolete. However, Python installs in separate installations. My code for Python 3.5 won't work with the Python 3.7 installation unless I intentionally port it to 3.7. Enough Linux developers have decided that porting isn't worth the effort, so Ubuntu installs with both Python2 and Python3 -- because they are needed by different core functions. This lack of backwards compatibility and split versions is usually a death knell. Commodore created one of the first home computers (long before the IBM PC or Apple). But the Commodore PET wasn't compatible with the subsequent Commodore CBM computer. And the CBM wasn't compatible with the VIC-20, Commodore-64, Amiga, etc. So either you spent a lot of time porting code from one platform to another, or you abandoned the platform. (Where's Commodore today? It died out as users abandoned the platform.) Similarly, Perl used to be very popular. But when Perl3 came out, it wasn't fully backwards compatible with a lot of Perl2 code. The community griped, good code was ported, and the rest was abandoned. Then came Perl4 and the same thing happened. When Perl5 came out, a lot of people just switched to a different programming language that was more stable. Today, there's only a small community of people actively using Perl to maintain existing Perl projects. I haven't seen any major new projects based on Perl.

By the same means, Python has distinct silos of code for each version. And the community keeps dragging along the old versions. So you end up with a lot of old, dead Python code that keeps getting dragged along because nobody wants to spend the time porting it to the latest version. As far as I can tell, nobody creates new code for Python2, but we drag it along because nobody's ported the needed code to Python3.x. At the official Python web site, their documentation is actively maintained and available for Python 2.7, 3.5, 3.6, and 3.7 -- because they can't decide to give up on the old code. Python is like the zombie of programming languages -- the dead just keep walking on.

With most software packages, you can easily run apt, yum, rpm, or some other install base and get the most recent code. That isn't the case with Python. If you install using 'apt-get install python', you don't know what version you're actually installing, and it may not be compatible with all of the code you need. So instead, you install the version of Python you need. For one of the projects I was on, we used Python. But we had to use Python3.5 (the latest at that time). My computer ended up with Python2, Python2.6, Python3, and Python3.5 installed. Two were from the operating system, one was for the project, and one came in because of some unrelated software I installed for some other reason. Even though they are all "Python", they are not all the same. If you want to install packages for Python, you're supposed to use "pip". (Pip stands for "Pip Installs Packages", because someone thinks recursive acronyms are still funny.) But since there's a bunch of versions of Python on the system, you have to remember to use the correct version of pip. Otherwise, 'pip' might run 'pip2' and not the 'pip3.7' that you need. (And you need to specify the actual path for pip3.7 if the name doesn't exist.)

I was advised by one teammate that I needed to configure my environment so that everything uses the Python 3.5 base. This worked great until I started on a second project that needed Python 3.6. Two concurrent projects with two different versions of Python -- no, that wasn't confusing. (What's the emoticon for sarcasm?)

The pip installer places files in the user's local directory. You don't use pip to install system-wide libraries. And Gawd forbid you make the mistake of running 'sudo pip', because that can screw up your entire computer! Running sudo might make some packages install at the system level, some install for the wrong version of Python, and some files in your home directory might end up being owned by root, so future non-sudo pip installs may fail due to permissions. Just don't do it.

By the way, who maintains these pip modules? The community. That is, no clear owner and no enforced chain of provenance or accountability. Earlier this year, a version of PyPI was found to have a backdoor that stole SSH credentials. This doesn't surprise me at all. (I don't use Node.js and npm for the same reason; I don't trust their community repositories.)

I'm a strong believer in readable code. And at first glance, Python seems very readable. That is, until you start making large code bases. Most programming languages use some kind of notation to identify scope -- where a function begins and ends, actions contained in a conditional statement, range of a variable's definition, etc. With C, Java, JavaScript, Perl, and PHP, braces {...} define the scope. Lisp uses parenthesis (...). And Python? It uses spaces. If you need to define a scope for complex code, then you indent the next few lines. The scope ends when the indent ends.

The Python manual says that you can use any number of spaces or tabs for defining the scope. However, ALWAYS USE FOUR SPACES PER INDENT! If you want to indent twice for nesting, use eight spaces! The Python community has standardized on this nomenclature, even though it isn't in the Python manual. Forget the fact that the examples in the documentation use tabs, tabs + 1 space, and other indents. The community is rabid about using four spaces. So unless you plan to never show your code to anyone else, always use four spaces for each indent.

When I first saw Python code, I thought that using indents to define the scope seemed like a good idea. However, there's a huge downside. Deep nesting is permitted, but lines can get so wide that they wrap lines in the text editor. Long functions and long conditional actions may make it hard to match the start to the end. And I pity anyone who miscounts spaces and accidentally puts in three spaces instead of four somewhere -- this can take hours to debug and track down. For other languages, I've picked up the habit of putting debug code without any indents. This way, I can quickly browse the code and easily identify and remove debugging code when I'm done. But with Python? Anything not indented properly generates an indention error. This means debugging code must blend in to the active code. Most programming languages have some way to include other chunks of code. For C, it's "#include". For PHP, there's 'include', 'include_once', 'require', and 'require_once'. And for Python, there's "import". Python's import permits including an entire module, part of a module, or a specific function from a module. Finding a list of what can be imported is non-intuitive. With C, you can just look in /usr/include/*.h. But with Python? It's best to use 'python -v' to list all of the places it looks, and then search every file in every directory and subdirectory from that list. I have friends who love Python and I've seen them grep through standard modules as they look for the thing they want to import. Seriously. The import function also allows users to rename the imported code. They basically define a custom namespace. At first glance, this might seem nice, but it ends up impacting readability and long-term support. Renaming modules is great for small scripts, but really bad for long programs. People who use 1-2 letter namespaces, like "import numpy as n" should be shot (or forced to convert all of their code to Perl5). But that's not the worst part. With most languages, including code just includes the code. A few languages, like object-oriented C++, may execute code if there's a global object with a constructor. Similarly, some PHP code may define global variables, so an import could run code -- but that's typically considered a bad practice. In contrast, many Python modules include initialization functions that run during the import. You don't know what's running, you don't know what it does, and you might not notice. Unless there's a namespace conflict, in which case you get to spend many fun hours tracking down the cause. In every other language, arrays are called 'arrays'. In Python, they are called 'lists'. And an associative array is sometimes called a 'hash' (Perl), but Python calls it a 'dictionary'. Python seems to go out of it's way to not use the common terms found throughout the computer and information science field. And then there are the names of libraries. PyPy, PyPi, NumPy, SciPy, SymPy, PyGtk, Pyglet, PyGame... (Yes, those first two are pronounced the same way, but they do very different things.) I understand that the 'py' is for Python. But couldn't they be consistent about whether it comes first or second? Some common libraries just gave up on the pun-like "Py" naming convention. This includes, matplotlib, nose, Pillow, and SQLAlchemy. And while some of the names may give you a hint to the purpose (e.g., "SQLAlchemy" contains SQL, so it's probably an SQL interface), others are just random words. If you didn't know what "BeautifulSoup" did, could you tell from the name that it's an HTML/XML parser? (As an aside, BeautifulSoup is well documented and easy to use. If every Python module was like this, I wouldn't be complaining so much. Unfortunately, this is the exception and not the norm. Most Python libraries seriously suck at documentation.)

Overall, I view Python as a collection of libraries with horrible and inconsistent naming conventions. I have a standing gripe that open source projects typically have horrible names. Unless you know the project, you'll never figure out what it does by the name. And unless you know what to look for, you'll probably only find it by accidentally stumbling across someone who mentions it in passing. Most of Python's libraries reinforce this negative criticism.

Every language has its quirks. With C, there's the weird nomenclature of using & and * for accessing address space and values. C also has that increment/decrement shortcut using ++ and --. With Bash, there's the whole "when to use a backslash" when quoting special characters like parenthesis and periods for regular expressions. And JavaScript has issues around compatibility (not every browser supports every useful function). However, Python has more quirks than any other language I've ever seen. Consider strings:
  • In C, double quotes enclose strings. Single quotes enclose characters.

  • In PHP and Bash, both types of quotes can enclose strings. However, a double quote can have variables embedded in the string. In contrast, single quoted strings are literals; any embedded variable-like names are not expanded.

  • In JavaScript, there's really no difference between single quotes and double quotes.

  • In Python, there's no difference between single quotes and double quotes. However, if you want your string to span lines, then you need to use triple quotes """string""" or '''string'''. And if you want to use binary, then you need to preference the string with b (b'binary') or r (r'raw'). And sometimes you need to cast your strings as strings using str(string), or convert it to utf8 using string.encode('utf-8').
If you thought that =, ==, and === was initially a little weird in PHP and JavaScript, wait until you play with quotes in Python. Most programming languages pass function parameters by value. If the function alters the value, the results are not passed back to the calling code. But as I've already explained, Python goes out of its way to be different. Python defaults to doing functions with pass-by-object-reference parameters. This means that changing the source variable may end up changing the value.

This is one of the big differences between procedural, functional, and object-oriented programming languages. If every variable is passed by object reference, and any change to the variable changes the reference everywhere, then you might as well use globals for everything. Calling the same object by different names doesn't change the object, so it is effectively global. And as C programmers learned long ago, global variables are evil and should not be used.

In Python, you have to work to pass variables by value. Saying "a=b" just assigns another name to the same object space; this doesn't copy the value of b into a. If you actually meant to copy the value, then you need to use a copy function. Usually this is "a=b.copy()". However, notice that I said "usually". Not all data types have a 'copy' prototype. Or maybe the copy function is incomplete. In those cases, there is a separate library called 'copy' that you can use: "a=copy.deepcopy(b)". It's a common programming technique to name the program after the library or function being used. For example, if I'm testing a screen capture program with a C library called "libscreencapture.so", I would call my program "screencapture.c" and compile into "screencapture.exe".

gcc -o screencapture.exe screencapture.c -lscreencapture

With C, Java, JavaScript, Perl, PHP, etc., this works fine because the language can easily distinguish resource libraries from the local program; they have different paths. But with Python? Don't do this. Never do this. Why? Python assumes you want to import the local code first. If I have a program called "screencapture.py" that uses "import screencapture", then it will import itself rather than the system library. At minimum, you should call your local program "myscreencapture.py" instead. Python is a very popular language and has a huge following. I even have a handful of friends who really like Python -- it's their preferred programming language. Over the years, I've discussed these issues with them, and each time they nod their heads and agree. They don't disagree that these are problems with Python; they just think it's not bad enough for them to stop loving the language.

My friends often cite all of the really cool Python libraries that exist. And I agree that some of the libraries are really useful. For example, BeautifulSoup is one of the best HTML parsers I've ever used, NumPy makes multidimensional arrays and complex mathematics easier to implement, and TensorFlow is very useful for machine learning. However, I'm not going to make a monolithic program in Python just because I like TensorFlow or SciPy. I'm not going to give up readability and maintainability for a free pony; it's not worth the effort.

Usually when I write negative criticisms about a topic, I also try to write something positive. I followed my blog entry on "Open Source Sucks" with "Open Source Rocks". And when I wrote about limitations with FFmpeg, I explicitly mentioned how it's the best video processing library out there. But I can't make a list of good things about Python because I really think that Python sucks.

(Hey Bill, does this answer your question?)