I don’t release stuff as often as I should. Because it’s a pain to configure new applications and get them installed where you want them. I want a tool that:
- Will automatically install dependencies
- Handles version incrementing without being tied to source control (i.e. the current version is stored in a file instead of inferred from SCM tags) (why do I want this? I’m not sure but it just seems cleaner to me)
- Runs scripts post (and pre) install (and remove)
- Lets me do new server setups with a single command
- Allows easy auditing of the versions of software that are installed
- Not a necessity, but being using an “open standard” would be advantageous; this I will elaborate on below
I use Ubuntu on all my servers, so it makes sense to use the native DPKG system for application deployment. .debs are easy to build using many different applications (part of the “open standard” above). They support dependencies, so applications can automatically install other applications that they require to run. You can include post-install scripts. Using the dpkg -l
command you can see a list of installed packages (and their versions), and by carefully crafting a manifest package you can run a single command and have all your applications up and running on a new server.
Build Tools
There are a few different ways of building .deb files. For getting started quickly, I can recommend FPM: it allows building of .debs (and other types of packages) with a single command. There are some limitations, the biggest for me are that all files must be owned by the same user, and you can’t define symlink creation manually. If these aren’t limitations for you, then I highly recommend it. But building the package is just one piece of the puzzle. Scripting, incrementing versions, and pushing out the built .deb are all separate problems that need to be catered for. The need to have slightly more configurability than FPM led me to look at JDEB, a .deb building library written in Java. This in turn led me further outside the Python world into looking at existing build tools that would work with JDEB, namely Apache Maven and Gradle.
Maven is the 800 pound gorilla of the build tools world, and given enough configuration I’m sure it would be great for building my apps. Unfortunately, a Maven novice trying to stray somewhat outside of its normal use (building Java stuff) meant that I couldn’t quite get it working how I wanted. Maybe that will be in The Ultimate Python Deployment System II (or III). So Gradle it is.
Gradle
Unlike Maven, Gradle doesn’t tie you to specific build phases and actions, instead you can define a task that does whatever you like, and then run it (and other tasks at the same time) by supplying the task name as a command line argument to Gradle. It means more custom coding, but also more freedom in how and when you do things.
For example, say you wanted to build, package, then upload your project to your repository. You could call each task separately by doing gradle build package distribute
, or create one task in your script that takes care of calling these, then just run gradle release
.
Information about projects is stored in a properties.gradle
file, which I currently have storing just four values: architecture
, majorVersion
, minorVersion
and name
. I have special tasks incrementMinorVersion
and incrementMajorVersion
that handle version changing (removing the tie between version and SCM). The other configuration values are used during package building for things like generating the install path and setting the architecture of the package. To make more sense of these it’s probably time to talk about a real project.
Package Separation
I generally split Python applications into two separate packages for deployment: the application code package, and the virtual environment package. This is because the virtual environment is usually bigger but changes less often, whereas the application code can sometimes be only a few kilobytes to a couple of megabytes, so that can be quickly rolled out at any time.
The Virtualenv Package
Virtualenvs should be familiar to most Python developers. Install a set of packages at the user level, separate to the system packages; the list of packages is defined in a requirements.txt
file. The problem with deploying virtualenvs using just a requirements file is that the virtualenv needs to be built at deploy time on the application server. This has a number of downsides: delays as the virtualenv is built (waiting for downloads and compilation); needing to have compilers and dev libraries installed on the application server; the virtualenv will need to be built on each application server, and if there’s any issues with downloading any of the required packages you’re SOL. So you can see that creating an artifact that can be deployed atomically can be attractive.
There are two tasks to execute in the creation of the virtualenv package, first, build the virtualenv, then package the virtualenv directory into a .deb. This is how the virtualenv is built:
def createVenvAndInstallReqs() { exec { executable "rm" args '-rf', project.venvBuildDir } exec { executable "virtualenv" args project.venvBuildDir } exec { executable project.venvBuildDir + "/bin/pip" args "install", "-r", "src/requirements.txt" } exec { executable "virtualenv" args "--relocatable", project.venvBuildDir } }
The task is fairly simple, it just executes four commands, first remove the old virtualenv build directory, then build the virtualenv, install the requirements with pip, and finally make the virtualenv relocatable (note that for the virtualenv to really be relocatable some paths in the activate
script need to be updated, but since that will never be executed it’s not necessary to do this). Once that’s done we can take the directory and turn it into a .deb package. The task that does that looks like this:
task buildVenvDeb(type: Deb) { arch = project.architecture requires('some-requirement') from(project.venvBuildDir) { into '/var/virtualenvs/' + project.name user 'www-data' } link('/var/virtualenvs/' + project.name + '/lib/python2.7/encodings', '/usr/lib/python2.7/encodings') link('/var/virtualenvs/' + project.name + '/lib/python2.7/lib-dynload', '/usr/lib/python2.7/lib-dynload') }
I’m using the nebula.os-package to package .debs — it’s an interface to JDEB for Gradle. The configuration options being used are simple. The files from the virtualenv build directory (as per the previous task) are set to be installed into /var/virtualenvs
directory, and owned by www-data
. Symlinks are created to point from the virtualenv directory to the system Python lib directories as required.
It’s important to note here that the way this is set up, the architecture and OS layout of the build machine must match that of the deployment machine (i.e. you probably want to build on the same OS you deploy to). And, for virtualenvs, because you may be including compiled libraries in the virtualenv, the architecture is processor specific. If you only had Python libraries that are platform independent, this could be all
.
Currently the requirements are defined in the build script itself, the next improvement I would like to make it to read the requirements from the gradle.properties
file as well.
The Application Package
Building the Python application package doesn’t introduce any new concepts on top of building the virtualenv package. The task to build the application .deb package looks like this:
task buildAppDeb(type: Deb) { arch = project.architecture requires('nginx') requires('uwsgi') requires('project-virtualenv') from('src/' + project.name) { user 'www-data' into '/srv/python/' + project.name } }
The application code is taken from the src/$applicationName
directory, and will be deployed in /srv/python/$applicationName
, owned by www-data
. The requirements is where it gets slightly more interesting: the application needs uwsgi
and nginx
to run, so these are included, and also project-virtualenv
, which matches the name of the virtualenv package created previously. That way, only the application package needs to be explicitly installed, the virtualenv will be installed as part of the dependency tree.
Adding to Repository
Once the .deb packages are built, getting them to a repository for installation is simple. I won’t go into too much detail on setting up an apt repository as there are a lots of tutorials out there already. Personally, I used reprepro to manage the repository (as it has an easy interface and will automatically sign packages), nginx to serve .debs (cos I’m hip), and a directory watcher script to automatically add any new .deb files that are put into an incoming
directory into the repository.
Of course there is a corresponding Gradle task to upload to this incoming directory.
task sendDeb << { def String debName = generateDebName() def String localPath = "build/distributions/" + debName def String remoteFinalPath = "/srv/www/repos/apt/ubuntu/incoming/" + debName def String remoteTempPath = remoteFinalPath + ".tmp" def moveCommand = "mv " + remoteTempPath + " " + remoteFinalPath ssh.run { session(remotes.role('apt')) { put from: localPath, into: remoteTempPath execute moveCommand } } }
Configuration
While I won’t cover it in detail here, I also use .deb packages to deploy configuration. Reusing the above concepts, it’s easy to build a package that puts configuration files in the correct place. I just include the nginx and uwsgi conf files, and have the package create symlinks to activate them. Depending on how many environments you have set up (e.g. beta and production servers) you may want one configuration package per environment, in which case your application package can’t depend on a specific configuration package, and thus the configuration won’t be automatically installed. If you have only one environment that you’re deploying to, then you can have your application package depend on the configuration and it will be installed automatically.
You can also add a post-install script to restart nginx/uwsgi (or whatever) so that your daemons are automatically running on the new configuration after deployment.
Installation and Updates
Getting applications up and running on a new server is now only a few commands. You need to add your repository address to your apt sources in /etc/apt/sources.d
directory, and then install the public key for the repository (which you will have generated when setting up reprepo). Then, install your config package and the application package with apt-get
(or if your application package depends on your config package, just install the application package and the rest happens automatically).
To further automate this, you could build a manifest package that has all the applications that are needed to run on the server as dependencies, and have multiple applications set up at once by installing this single package.
Updating your applications can be just as easy. After you’ve made changes to your application or config, just push to your repository, then run apt-get update && apt-get dist-upgrade
on each of your servers. This can be easily automated with something like Salt, or if you want to have the push to reprepro as the “trigger” for updates, you can have cron calling apt-get
periodically to always update to the latest packages as soon as they become available.
Conclusion
I really like using .debs for deployment, and it’s interesting to work with non-Python-specific deployment tools. I’m sure that the same could be achieved on RPM based systems. The great thing about these open standards is that it doesn’t matter what package is built with, so in future (when I crack the Maven nut) I will be able to switch to a nicer system. For now, my Gradle setup is a little clunkier than I’d like (which is why I’m not sharing too much code). As I get it cleaned up and the system working better as a whole, I will publish The Ultimate Python Deployment System II.