Dependencies with Python: a bit of history

Last month we started to work on support for Python projects in gemnasium.com. The goal is simple: being able to track the dependencies of any Python project, like we do with Ruby and Node.js ones. This is just yet another packaging system, right? Well, not quite. There are some gotchas. And there’s a story to tell.

Early days: Distutils

Distutils is the first well-established distribution system for Python. I don’t know how old it is, but it seems that it has been around forever.

The whole purpose of Distutils is to distribute Python modules. These can be “pure modules” written Python or “extension modules” written in C/C++.

Among many things, Distutils is able to create a source distribution for a Python project. The result is a self-contained zip or gzip archive, like a Rubygem. Here is how to build the distribution:

$ python setup.py sdist

Distutils requires that you create a setup.py script for your project. According to the documentation, the bare minimum is:

from distutils.core import setup

setup(
  name='foobar',
  version='1.0',
  package_dir={'': 'src'},
  packages=[''],
)

The documentation states that it’s possible to define relationships between distributions and packages. Dependencies on other Python packages are specified by “supplying the requires keyword argument to setup()”. Each dependency comes with a requirement. So this should work:

from distutils.core import setup

setup(
  name='foobar',
  version='1.0',
  requires=[
    'pyramid >=1.0, <1.3',
    'SQLAlchemy >= 0.8.1']
)

But, as far as I can tell, it does nothing. The requires statement is not processed. And Distutils is not able to download and auto-install a package, anyway.

Middle age: Setuptools

Then came Setuptools. Among other things, Setuptools is able to:

  • find, download and install the dependencies using EasyInstall
  • create Python Eggs, a single-file importable distribution format

So Setuptools filled the gap and brought real dependency management to Python. Great!

Whenever possible, Setuptools tries to be a drop-in replacement for Distutils. This means that Setuptools also rely on a setup.py script. But it extends it with new keywords to setup() and even has special ones for dependency management:

  • install_requires
  • setup_requires
  • tests_require
  • extras_require

So we are now able to declare the packages that we need to install, setup, test, etc.

The syntax is the same as it was for the requires keywords. But requires (from Distutils) is not used anymore. By the way, Setuptools comes with some documentation about versioning and declaring the dependencies.

Let’s consider the shootout sample application from the Pylons. setup.py takes care of building the egg:

$ git clone https://github.com/Pylons/shootout.git

$ python setup.py bdist_egg
...
creating 'dist/shootout-0.2.3-py2.7.egg'

EasyInstall will automatically install the egg and its dependencies:

$ sudo easy_install dist/shootout-0.2.3-py2.7.egg
Processing shootout-0.2.3-py2.7.egg
...
Processing dependencies for shootout==0.2.3
Searching for WebTest==2.0.9
Reading http://pypi.python.org/simple/WebTest/
Best match: WebTest 2.0.9
Downloading https://pypi.python.org/packages/source/W/WebTest/WebTest-2.0.9.zip#md5=bf0a04fcf8b2cdcaa13b04324cefb53d
Processing WebTest-2.0.9.zip

The dependencies are declared in setup.py:

from setuptools import setup

setup(
  name='shootout',
  version='0.2.3',
  install_requires=[
    'setuptools',
    'pyramid',
    'SQLAlchemy',
    'WebTest = 2.0.9',
    # more dependencies
  ],
  # more setup
)

EasyInstall can also install a package and its dependencies straight from PyPI, the Python Package Index:

$ sudo easy_install "pyramid >= 1.0"
Searching for pyramid>=1.0
Best match: pyramid 1.3b3
Processing pyramid-1.3b3-py2.7.egg
...

So far so good. But I miss some tools to explore the dependencies of a package. Hopefully, the information is in the egg, in the requires.txt metadata file.

Here is how I got requires.txt for shootout:

$ python setup.py egg_info
running egg_info
writing requirements to shootout.egg-info/requires.txt
...

$ cat shootout.egg-info/requires.txt
setuptools
pyramid
WebTest == 2.0.9
...

By the way, this is how EasyInstall explores the dependencies of a package.

What about Distribute?

By the way, you may have heard about Distribute, a fork of the Setuptools project. But don’t worry about it since Setuptools and Distribute have merged. This means that the latest version of Distribute is now a wrapper around Setuptools. Have a look at the Merge FAQ if you are wondering about the consequences.

Modern ages: pip

And then came pip. It extends EasyInstall in many ways. Here are some features:

  • packages are downloaded before installation
  • support for various control systems, like Git
  • simple to define fixed sets of requirements and reliably reproduce a set of packages

At first sight pip install looks like easy_install with more options. But pip goes a step further with this killer feature:

$ pip install -r requirements.txt

It installs everything according to the requirements described in a text file. The requirement file look like the requires.txt metadata file from the Python Egg format:

MyApp
Framework==0.9.4
Library>=0.2

But the requirements are now stored in a separate text file one can easily review, edit and store. It’s not generated by some Python script anymore.

OK, but how do we get this requirements.txt file? The workflow is simple:

  1. install the dependencies for your project
  2. freeze the requirements to a file using pip freeze

But this means we need some tool like virtualenv to create and isolate Python environments. We don’t want to mess with the dependencies, do we?

By the way, the approach is somewhat familiar to what bundler does in the Ruby land:

  1. bundler creates an isolated environment for your project
  2. it knows the exact versions of the packages you have installed
  3. so it is able to reinstall the exact same version of each package

If you are curious about Ruby, I suggest you have a look to Yehuda Katz’s blog post that gives the best practices to manage the dependencies using Bundler and its Gemfile.

The PaaS Era

Pip has been there for a while but it became irreplaceable: if you deploy using some PaaS chances are that you need pip requirement file to control the environment where your web application runs into.

For instance, Heroku has Python support and it leverages pip and its requirements.txt to provision the servers with the dependencies of the web application you deploy.

The Flask getting started proceeds in 3 steps:

  1. create an empty directory for your project
  2. setup a new Python environment with virtualenv venv
  3. active the environment (see virtualenv documentation)
  4. install whatever you need with pip install
  5. freeze your environment

Finally, it comes down to:

$ pip freeze > requirements.txt

Wrapping up

So pip is the new thing, and we don’t have to declare the dependencies into setup.py anymore, right? Well, it depends on what you are looking for.

The pip workflow is best if your project is some kind of application, like a Django application. Then it makes sense to freeze your dependencies at the exact versions that are known to be safe. In fact, having a requirements.txt file is the only way to deploy to PaaS providers like Heroku. And you probably don’t care about building Python eggs.

But your project may be a reusable library, something other projects may depend on. In this case, you probably want to distribute your package as a Python Egg via PyPI. The dependencies of your package should be installed automatically. The only way to get this: declare your dependencies using install_requires keyword in setup.py.

Some projects, like CMS engines, are both reusable packages and standalone web applications. In this case, it makes sense to declare the requirements both in setup.py and requirements.txt. Requirements are generally broader in setup.py in order to avoid conflicts. And it’s a common practice to have “locked versions” in requirements.txt.

Gemnasium is about to support Python projects. Stay tuned!