Python Is Slow, Make It Faster With C

This is a companion blog post to my talk at Kiwi Pycon 2014 (surprisingly, the title of my talk is also Python is slow, make it faster with C). The point of the talk was not so much “Here is a Python project and these are the changes I made to turn it into C”, but more along the lines of “Given that I have some code that runs faster in C than in Python, here’s how to integrate it with Python”. It covers both including pre-built shared objects with CTypes, and building a Python C Module, as well as comparing these with the speedups you can get with PyPy. If you want to follow along at home, you can find the code on GitHub.

The Algorithm

I wanted to choose something easy to understand, easy to implement, and that would run faster in C than in Python, so I chose the classic “Is a number prime?”, using the most naïve algorithm possible, its pseudo-code looks like this:

def naive_prime_checker(number):
    if number == 1:
        return False

    if number == 2:
        return True

    if number % 2 == 0:
        return False

    for divisor in xrange(square_root(number)):
        if number % divisor == 0:
            return False

    return True

So really it’s an iterate over a bunch of numbers and do math with each one algorithm. But the algorithm doesn’t matter, it runs a lot faster in C than in Python, so it’s a perfect candidate for these examples.

Pure Python

First, there’s some Python wrapper code that is common to all the implementations that use Python in some way. It takes care of parsing the command line arguments, and takes, as an argument, the function to use to check primality. The command line arguments will either be a single integer, to check for primality, or -b <n>, for a benchmark function that will compute the primality of every number from 1 to n. The source code for this can be found in the wrappers.py file in the repository.

The first implementation of the prime checking algorithm is written purely in Python. It’s found in con_math.py and looks suspiciously like the pseudo-code:

def is_prime_naive(number):
    if number == 1:
        return False

    if number == 2:
        return True

    if number % 2 == 0:
        return False

    test_max = prime_test_max(number)

    for divisor in xrange(3, test_max + 1, 2):
        if number % divisor == 0:
            return False

    return True

And the python wrapper file (the actual Python file that gets called) looks like this:

from pythonlib.con_math import is_prime_naive
from pythonlib.wrappers import main


if __name__ == '__main__':
    main(is_prime_naive)

Execute it like this:

$ python purepython.py 5
5 is prime.

$ python purepython.py 9
9 is not prime.

Pure C

The pure C implementation (from con_math.c looks a lot like the Python:

bool isPrimeNaive(const unsigned int number) {
    // uses naive iterative method to test if a number is prime
    if (number == 1 || number % 2 == 0)
        return false;

    if (number == 2)
        return true;

    unsigned int testMax = primeTestMax(number), divisor;

    for(divisor = 3; divisor <= testMax; divisor += 2)
        if (number % divisor == 0)
            return false;

    return true;
}

There’s a makefile to build the Pure C version, which will also create a .so (shared object) which will be used later. After building, the Pure C version can be executed in the same method as the Pure Python:

$ ./purec 5
5 is prime.

$ ./purec 9
9 is not prime.

Python C Module

The first way of integrating C with Python is by building a Python C module. The documentation for Python 2.7 (Python 3.4 docs are here) is a good introduction to how it works. A Python module in C basically has three parts:

  1. The actual function(s) that are callable from Python
  2. An array that lets Python know what these functions are, what they’re called, and their helptext.
  3. A function that will initialise the Python module and link up the definition.

The C module functions will call the pure C functions, with some wrapper code to convert Python variables to C variables, and vice-versa. From con_math_py.c:

#include <Python.h>
#include <con_math.h>

static PyObject *con_math_py_is_prime_naive(PyObject *self, PyObject *args) {
    unsigned int numberToCheck;

    if (!PyArg_ParseTuple(args, "I", &numberToCheck))
        return NULL;

    bool result = isPrimeNaive(numberToCheck);

    return Py_BuildValue("i", result);
}

The interesting parts of this code are PyArg_ParseTuple, a function that will convert Python variables to their native C types, and Py_BuildValue, which will convert back the other way.

Also in con_math_py.c is the array that defines the functions available in the module:

static PyMethodDef ConMathMethods[] = {
    {"is_prime_naive",  con_math_py_is_prime_naive, METH_VARARGS,
     "Test if a number is prime."},
    {NULL, NULL, 0, NULL}
};

The parts of this are:

  • "is_prime_naive": the name of the Python function
  • con_math_py_is_prime_naive: the function that "is_prime_naive" refers to
  • METH_VARARGS: the arguments are a list (rathen that a kwargs dictionary)
  • "Test if a number is prime.": the helptext for the method
  • {NULL, NULL, 0, NULL}: sentinel indicating the end of the args

Pretty simple so far; the final piece of the puzzle is the function that sets everything up - the name is based on the name of the module (initmodule-name):

PyMODINIT_FUNC initcon_math(void) {
    (void) Py_InitModule("con_math", ConMathMethods);
}

Execute the setup.py file to build the module, and them import it just like a normal Python module. Check out the python_c_extension.py file for how to use the newly built module.

CTypes

The newer method of integrating Python with C is with the ctypes library (Python 3.4 docs here). CTypes let you import a shared object (or DLL) and use it just like a standard Python module. The advantage this has over writing a Python module in C is that you don’t have to write any special wrapper code in C to link with Python - just import your existing C library. As simple as this (from python_ctypes.py):

from ctypes import CDLL
from pythonlib.wrappers import main


if __name__ == '__main__':
    lib = CDLL('../lib/libcon_math.so')
    main(lib.isPrimeNaive)

The lib variable is a reference to the libcon_math.so shared object built for the Pure C implementation, so no extra C code needs to be written or compiled to use CTypes.

PyPy

To quote pypy.org:

PyPy is a fast, compliant alternative implementation of the Python language (2.7.6 and 3.2.5).

Basically, provided all your libraries are supported by PyPy, you can drop-in-replace your Python binary with PyPy and get more speed from your Python Code - without a C file in sight.

You can download and build the source (or download a binary for a bunch of different OSs) and replace calls to the Python binary with calls to PyPy, like so:

$ ~/pypy/bin/pypy purepython.py 5
5 is prime.

$ ~/pypy/bin/pypy purepython.py 9
9 is not prime.

Benchmarks

So what are the differences in speed between the different implementations? To benchmark, I used time to time how long each implementation took to test each number between 1 and 1,000,000 for primality. Each ran 10 times and I took the mean of user + sys time as the execution time. The numbers are:

CPython Pure C C Module C Types PyPy
Execution Time (s) 4.648 0.417 0.570 0.815 1.452
Relative Speed (n times faster) 1 11.41 8.16 5.70 3.20

Graphing them looks like this…

Execution for C vs Python implementations times. Execution times for C vs Python Implementations.
Relative Speeds for C vs Python implementations times. Relative speeds for C vs Python Implementations.

The C Module and CTypes are around 8 times and 6 times faster (respectively) than standard CPython. This is not quite the 11 times speedup from using pure C, but it is still impressive for the little/no amount of C code that has to be written.

Conclusions

Pure C is much, much faster than Python (in this example anyway). You can reach similar speeds with both CTypes and with a full C Module - which you choose depends on how much extra performance you need and how much C you want to write. Of course, the more of your code you can implement in C the faster it will be. In this example, I could have also done the benchmark iteration in C for an extra speed up. The easiest way to get some speedups (if your libraries are supported) is to use PyPy. You’ll get some good speed boosts — and you don’t have to write any C at all.

Resources

Download slides.

View code on GitHub.

Previous entry

Next entry