One of the more prevalent topics in the Python ecosystem of 2019 was that of packaging and distribution. As the year comes to an end, I wanted to put together a summary of the many paths we currently have available to distribute apps built with Python. Though some of these also apply to any language.
Whether delivering an executable, a virtual environment, your packaged code, or a full application, the following list includes both standard systems and some up-and-comers to keep in mind as we enter 2020.
To distribute an application, you need more than just a library to pip-install from PyPI. You require a fool-proof way of installing it and its dependencies in all supported operating systems. These dependencies can include other non-Python packages as well as external resources like binaries or images.
Below are some options to help install and distribute code across platforms.
Docker uses base functionality in an operating system to isolate a process in such a way that it’s unaware of the rest of the system. It can run the kernel process of a different OS inside your host, much like virtualization, but without using the virtual hardware.
Its file system is layered, giving it a minimal footprint that only incorporates the files needed to run, instead of the typical virtual disk that also includes the free space in a virtual machine along with it.
With a minimally packaged root file system, it’s not uncommon to see the image for an entire OS only occupy tens of MB, instead of the GB needed for a virtual machine.
You can distribute these containers in a public registry like DockerHub or a private one inside your org. Users install the Docker daemon on their own compute, then use it to pull your image and run it locally.
Since Docker images are nothing more than a root file system, it’s also possible to distribute them as a file, using Docker to import it.
To build an image, you can start from a rootfs, or add on top of an existing registry image. Most operating system vendors maintain official stripped-down images in DockerHub, which are usually tiny.
Other organizations also make official images for new builds of their applications, like the Python image built on top of Debian. Postgress, MySQL, Redis, Nginx, and many other standard services do the same.
Docker now runs on Linux, OSX, and Windows. This cross-platform support means that any image you build has a wide distribution with minimal complexity. You control not only the application but also the environment it runs on, making compatibility much less of an issue.
However, complexity can exist when configuring the network or persistent storage. The typical application doesn’t have to deal with anything more than port forwarding, but it’s sometimes hard to visualize how the abstraction layers work. It helps to budget for better documentation around it.
With Docker, you can control the Python distribution, the supporting OS packages - like C libraries needed for individual modules - and your virtual environment.
Since it takes milliseconds for a container to start, it’s perfectly acceptable, even encouraged, to run a fresh container every time you wish to execute your app.
Some people even use Docker as a virtual environment replacement. Since it starts up quickly and can offer an interactive shell, it’s not a bad idea to make a new container whenever you need to work on a particular project.
Once you have a running container configured with the basics, you can save the image for reuse or even export it to a file.
For example, the following command starts a new Python container, mounts the current working directory as
/work, and drops you into a bash shell. Changes made inside the container directory reflect in the base directory.
docker run -it --name python-work -v `pwd`:/work python:3-slim bash
Once inside the container, you can install whatever Apt or PyPI packages are needed.
Exiting the shell returns you to the host. At which point, running this next command saves any changes as a new Docker image that you can reuse later or push to DockerHub for distribution:
docker commit python-work my-python-image
It’s possible to share the new image in a private Docker Registry internal to your organization, or run the following to export the container as a file that anyone can download and import into their local Docker environment:
docker export python-work -o my-python-work.dock
However, the real workflow for distribution is to create a
Dockerfile that anyone can use to create the image themselves. In other words, you give out a small text file with instructions for the Docker daemon, instead of a copy of the entire filesystem.
Anyone could clone your repo with that file and run this command to create a local version of the image:
docker build -t my-python-image .
A typical Dockerfile looks like below, please look through their documentation for more info:
FROM Python:3-slim COPY path/to/your/app /work WORKDIR /work RUN pip install -r requirements.txt
Virtual Machines and Vagrant
The next step up from containers is to distribute a full virtual machine. This type of system has been around for a while since virtualization became widespread and hardware supported.
Delivering a “virtual appliance” is attractive because you have close to total control of the environment in which your app runs. Everything is configurable: the operating system, its packages, disks, and networking, even the amount of free space.
The drawback of using VMs is the size of your distribution, typically in the gigs. Plus, you’ll have to work out a mechanism for getting the image to your customers. Things like Amazon S3 or Digital Ocean Spaces are a great place to start.
In the beginning, only server hardware supported running virtual machines, but these days every processor has the capability, and all major operating systems support it. There are also free applications to help you manage and configure VMs like Oracle’s VirtualBox.
Vagrant is another system for configuring and running VMs on top of managers like VirtualBox. It functions a lot like Docker in that you specify everything the VM needs in a file, and it takes care of building and running it for you.
Similar to the Dockerfile in the previous section, Hashicorp’s Vagrant uses a
Vagrantfile with instructions on how to start and configure a virtual machine.
Just like a Docker image provides the base file system to run a container, a Vagrant box provides the basis for a virtual machine.
The example Vagrantfile below does something similar to the Dockerfile in the previous section:
Vagrant.configure("2") do |config| config.vm.box = "bento/debian-10.2" config.vm.synced_folder "./", "/work" config.vm.provision "shell", inline: "pip install -r /work/requirements.txt keep_color: true end
Cloning your repository with this file and running these commands starts the VM and gets you into its shell:
vagrant up vagrant ssh
Vagrant does help the distribution problem by providing a similar experience to DockerHub with the Vagrant Box catalog.
With it, you can get your base images or upload new ones to share with others. You can even point a Vagrantfile to internal URLs to download a box.
We’ve written about this module before. PyInstaller takes care of bundling all resources required to run your application, including the Python distribution. Crawling your code, it figures out which Python dependencies to package, while still allowing you to specify additional assets to include with the bundle.
The result is an installable application for either Windows, Linux, or OSX. During execution, it unpacks it onto a folder, along with the bundled interpreter, and runs your entrypoint script.
It’s flexible enough to give you control over the Python distribution and the execution environment. I’ve even successfully used it to bundle browsers with my Python packages.
But using it does bring a complication. When extracting itself, it changes the base directory from which your application runs. Meaning, any code dependent on
__file__ to determine the current execution path now needs to use the internal environment that PyInstaller configures.
Distributing your bundle is entirely up to you. Most people choose object stores and CDNs for this type of setup.
A point to keep in mind when distributing this way is to check whether your code needs to validate the OS environment it’s running on.
In other words, if you need a specific
apt package installed, unlike using Docker or Vagrant, there’s no guarantee that the package is already there at the time of execution.
Briefcase is an up-and-comer in this category. It’s a part of the Beeware project that aims to enable the packaging of Python applications for distribution to all operating systems and devices, including Android and iOS.
It’s in a similar arena as PyInstaller, meaning it can bundle your module along with its dependencies into an installable application. But it also adds support for mobile devices and TVs with AppleTV or tvOS.
Unfortunately, documentation is still a little lax and mostly in the form of examples. However, the project shows promise and is still under active development. The popular editor Mu uses it for packaging.
You can submit applications built with Briefcase to the Android and Apple App Stores for distribution.
Sometimes, it’s possible to assume that your users have a standard operating system. Maybe they’re all running from a stock image built by the IT department inside your company. Maybe your app is just simple enough to not worry about OS or interpreter complexities.
If all you need to think about is your Python code and its dependent libraries, this category is for you.
Built by the folks at Twitter, Pex is a way of distributing an entire virtual environment along with your Python application. Designed to use a pre-installed Python interpreter, it leverages the standard built for Python Zip Applications outlined in PEP-441.
Since Python 2.6, the interpreter has the ability to execute directories or zip-format archives as scripts.
Pex builds on top of that, simplifying distribution to the act of copying a single file. These files can work across platforms (OSX, Linux, Windows), as well as different interpreters. Though there are some limitations when using modules with C bindings.
After installing Pex in your base system, you can use it to produce a single file containing your entire Python environment.
Pass that file to your coworker’s computer, and you’ll be able to execute it there without installing anything other than the base Python interpreter.
You can even run a file in interpreter mode, such that it opens a Python REPL with all the necessary modules in your environment available for import.
Freezing your virtual environment into a
.pex file is easy enough:
pex -r requirements.txt -o virtualenv.pex
Then you can execute that file to open a REPL with your environment:
You can also specify entrypoints when creating the file so that it executes a specific function in your module, behaving as if you run the python command directly.
Here’s a great 15 min video with an example that bundles a simple Flask app.
There’s no system to distribute Pex files for you, so just like the previous items, you’re stuck with public object stores and CDNs.
Similar to Pex, the folks at LinkedIn built a different module called Shiv. Their main reason for building something different was to try and tackle a problem in the start time of Pex executables. Given the complexity of their repositories and dependencies, they wanted to handle the environment setup differently.
Instead of bundling wheels along with the application, Shiv includes an entire site-packages directory as installed by pip. It makes everything work right out of the box and almost twice as fast as Pex.
Here’s an example of how to produce a
.pyz file with Shiv that does something similar to the Pex section:
shiv --compressed -o virtualenv.pyz -r requirements.txt
Which you can then execute directly with:
It’s important to note that packaging libraries with OS dependencies is not cross-compatible between platforms. As mentioned in the Pex section, it’s mainly an issue for modules that depend on lower-level C libraries. You’ll have to produce different files for each platform.
Again, there’s no system built for you to distribute these files, so you’ll have to rely on AWS, DO, CDNs, or other artifact stores like JFrog’s Artifactory or Sonatype’s Nexus.
While not a method to build applications or distribute them, Pipx offers a different way to install them. It works with your OS to isolate the virtual environments and its dependencies, closer to what a system like Homebrew does for OSX.
Pipx provides an easy way to install packages into isolated environments and expose their command line entrypoints globally. It also provides a mechanism to list, upgrade, and uninstall those packages without getting into the virtual environment details.
A good example is the use of a linter. Let’s say you work on multiple python applications, each of which has a separate virtualenv, and you wish to perform the same linting operations across all of them using
Instead of installing the flake8 module into every virtualenv, you can use Pipx to install a system-wide flake8 command that’s available inside each of those environments but runs in its own entirely separate and isolated environment.
Sometimes you want to give your customers an executable that doesn’t require pre-installed libraries to run, as you would with Docker or Pex.
The mechanisms described here help you accomplish that. And just like the previous category, they all require some form of object or artifact store to help with distribution.
While we already discussed PyInstaller, it’s worth another mention in this category, because this is one of its primary functions.
It can produce a single-file executable of your entire application with all its dependencies bundled. You can make one for each operating system, and it runs just like any other native application.
One of the newest additions to the packaging and distribution arena, PyOxidizer, is very promising. It takes advantage of the packaging tools created for the Rust programming language.
Much like PyInstaller, you have complete control of everything you want to bundle into it, but it also allows you to execute code much like Pex or Shiv. Meaning you can create your package so that it runs as a REPL with all dependencies pre-installed for you.
Distributing an entire Python environment that includes the REPL makes for some exciting applications, especially with research teams or scientific computing that require several packages to do data exploration.
One advantage over PyInstaller is that instead of extracting out to the file system, it extracts itself into memory, making the start-up time of your actual Python application considerably faster.
This feature comes with a similar drawback to PyInstaller. You have to adjust any internal references to
__file__ or similar operations, so they rely on the environment configured by PyOxidizer at runtime.
Beyond executing your Python code with a bundled interpreter, you also have the option of compiling your code down to C. This comes with several advantages, including the possibility of faster execution given that the C compiler can perform optimizations that the interpreter cannot.
Nuitka is a system built for compiling Python code to C. While the concept is similar to the more widely known Cython, it’s not a separate language. And it’s capable of doing things that Cythong can’t, like crawling your dependencies and compiling everything down to one binary.
The resulting executable runs as is much faster, in native code, and never needing extraction.
Compilation can get complicated, especially when considering platform complexities. But if you budget the time for it, you’ll be able to reap the benefits. I’ve done it successfully several times before.
App Store Experiences
Other systems for distributing applications we use very successfully almost every day: app stores. This software exists to install and maintain other applications.
Just like the Apple App Store or the Google Play Store, there are similar mechanisms available in Linux that enable simple integrations.
Snapcraft provides the standard app store experience. You can publish your application to their store, where users can discover and install it.
The installation is isolated to avoid conflicts with other applications, and it works across Linux distributions, including library dependencies.
Once installed, the store automatically keeps your application at the latest stable version and provides a mechanism to revert to previous states while preserving data.
Ubuntu manages the store, so after bundling your application (or snap as they call it), you’ll have to publish the snap to the store with a registered Ubuntu One account.
Another very similar concept to Snapcraft is Flatpak.
It also presents a store-like experience with FlatHub.org using container technologies to provide isolation to your application. Still, you can also host your own private hub, or distribute bundles in a single file.
Flatpak bundles are can also make use of some desktop integration capabilities. These provide information like locality detection, the ability to access resources external to the app (much like your phone asks for permissions to open files or URLs), notifications, window decorations, and others.
We have a full, feature-rich ecosystem of application packaging and distribution mechanisms. Most of these are not specific to the Python language but easily integrate with it.
While it seems like many options, hopefully, the categorization applied here should help you pick what best suits your needs, given the pieces you can control.