Tightening Up Python

I’ve heard complaints from certain developers that Python is too “scruffy”, and while it’s true that Python’s low barrier to entry allows new coders to quickly write code that runs, but because of its lack of type checking the results at run time might be unexpected causing errors to be revealed too late.

While Python folks might find this praise of Java weird, at my day job, it’s quite reassuring that my IDE can tell me straight away if parameters or return types are incorrect, and that my code won’t even compile if something’s wrong.

Luckily, Python 3 has introduced a number of features which can help developers be more explicit about what they’re expecting the code to do. In particular I’m going to mention the type system, enums, and named tuples (OK, these aren’t new to Python 3, but they fit in with this category and the type system in 3.5 adds some extra niceness).

Type Hints

PEP 484 introduced Type Hints, and they were added in Python 3.5 with some improvements in 3.6. I’m not sure of the ins and outs of the changes between 3.5 and 3.6, so all the code examples I’m giving I can only guarantee are 3.6 compatible, but they will probably work with 3.5, and if not the changes are should be minor.

Typing allows you to specify the types of variables and methods, so that a static typing checker can analyse if the parameters throughout your module are correct. In my examples, I’m using mypy to validate the types.

Let’s take the classic “say hello” example, first with no type annotations.

def say_hello(person):
    greeting = "Hello, {0}".format(person.name)

    print(greeting)

If you read the source code of the method, because it’s so short, it’s obvious that the person argument must be some kind of object with a “name” attribute; if it were longer or we only looking at the name of the method, or there were more branches that we needed to check, this might not be so obvious. I’m using simple examples here people! Anyway, let’s say a naïve programmer uses it like this:

say_hello("Ben")

Well, there’s no syntax error, it should run fine:

$ python scruffy_examples.py
Traceback (most recent call last):
  File "scruffy_examples.py", line 7, in 
    say_hello("Ben")
  File "scruffy_examples.py", line 2, in say_hello
    greeting = "Hello, {0}".format(person.name)
AttributeError: 'str' object has no attribute 'name'

As (un)expected, it fails at run time because the wrong type of variable is being passed in. Now let’s add some type hints. The syntax of these hints are similar to Swift or TypeScript, and are fairly simple to understand.

def say_hello(person: Person):
    greeting = "Hello, {0}".format(person.name)

    print(greeting)

The extra code is : Person, and it tells the type system that the argument must be of type Person. Now when we run the script:

$ python scruffy_examples.py
Traceback (most recent call last):
  File "scruffy_examples.py", line 12, in 
    say_hello("Ben")
  File "scruffy_examples.py", line 7, in say_hello
    greeting = "Hello, {0}".format(person.name)
AttributeError: 'str' object has no attribute 'name'

What?! Didn’t we fix it?

Kind of. The Python type hinting system is just that: hints. They aren’t enforced by the Python interpreter. This is where mypy comes in, it can analyse your whole Python project and show you any type issues before runtime. Running it on the script gives:

$ mypy scruffy_examples.py
scruffy_examples.py:12: error: Argument 1 to "say_hello" has incompatible type "str"; expected "Person"

We can fix this up:

say_hello(Person("Ben"))

And to our amazement it works fine now:

$ python scruffy_examples.py
Hello, Ben

And mypy will run with no errors, and a successful exit code.

$ mypy scruffy_examples.py
$ echo $?
0

This is the briefest of brief introductions to typing, I haven’t even mentioned defining function return types (def parse_int(s: str) -> int:) or variable types (my_var: int = 5), but the Python typing documentation is excellent as usual, and once you understand the basics and start type checking your code you can easily go on from there, to more advanced options like type unions and generics.

Enums

The enum module was added in Python 3.4. Enums are a construct that are available in various forms in many other languages, and they allow you to statically define a number of related properties that are reusable throughout your code. Your IDE will also understand them and provide code completion, leading to less typing, and typos.

Let’s take a look at an example not using enums; this function takes a day of the week and tells us if it is a work day or not (us programmers like our weekends).

>>> day_is_workday("sunday")
True

OK, that’s not right, what about this

>>> day_is_workday("sun")
True

hmm

Let’s check the implementation:

def day_is_workday(day: str) -> bool:
    if day == "Saturday" or day == "Sunday":
        return False

    return True

I see…

>>> day_is_workday("Sunday")
False

Finally. There’s a number of ways to make this better, but I’m going to fix it with an enum.

from enum import Enum, auto


class DayOfWeek(Enum):
    SUNDAY = auto()
    MONDAY = auto()
    TUESDAY = auto()
    WEDNESDAY = auto()
    THURSDAY = auto()
    FRIDAY = auto()
    SATURDAY = auto()


def day_is_workday(day: DayOfWeek) -> bool:
    if day == DayOfWeek.SATURDAY or day == DayOfWeek.SUNDAY:
        return False

    return True

Now it’s obvious what the arguments should be, since it has to be a member of the DayOfWeek enum, and what’s more, as soon as I start typing DayOfWeek., my IDE starts offering me suggestions of all the defined days, meaning not only can I type less, but I won’t make silly typo mistakes (snuday) as when using strings.

>>> day_is_workday(DayOfWeek.SUNDAY)
False

Once again the Python documentation on enums is excellent.

Named Tuples

Named tuples aren’t new, but they’re often overlooked due to Python offering other simple options for achieving similar results. They can also be used with type hinting (by importing from the typing module) to give even less scruffiness. There’s one main use case that I’ve found myself implementing recently, and that’s functions with multiple return values. For example, let’s say a function needs to return both a success flag (true or false) and some data, maybe from an HTTP request.

def do_the_request(should_work: bool) -> tuple:
    # of course this should actually do something instead of just switching on an argument
    if should_work:
        success = True
        message = "Everything went fine"
    else:
        success = False
        message = "Everything went very badly"

    return success, message

if __name__ == '__main__':
    outer_success, outer_message = do_the_request(True)
    print("The request good: {0} Message: {1}".format(outer_success, outer_message))
$ python scruffy_examples.py
The request good: True Message: Everything went fine

So far so good, but what if I want to return a status code as well. If I just refactor the return value, and anyone relying on my code has not yet updated, then something like this might happen:

def do_the_request(should_work: bool) -> tuple:
    if should_work:
        success = True
        code = 200
        message = "Everything went fine"
    else:
        success = False
        code = 500
        message = "Everything went very badly"

    return success, code, message

if __name__ == '__main__':
    outer_success, outer_message = do_the_request(True)
    print("The request good: {0} Message: {1}".format(outer_success, outer_message))
$ python scruffy_examples.py
Traceback (most recent call last):
  File "scruffy_examples.py", line 15, in 
    outer_success, outer_message = do_the_request(True)
ValueError: too many values to unpack (expected 2)

Breaking changes aren’t good. Let’s refactor the original example to use a NamedTuple.

from typing import NamedTuple


class SimpleResponse(NamedTuple):
    success: bool
    message: str


def do_the_request(should_work: bool) -> SimpleResponse:
    if should_work:
        success = True
        message = "Everything went fine"
    else:
        success = False
        message = "Everything went very badly"

    return SimpleResponse(success, message)

if __name__ == '__main__':
    resp = do_the_request(True)
    print("The request good: {0} Message: {1}".format(resp.success, resp.message))
$ python scruffy_examples.py
The request good: True Message: Everything went fine

Nice. Now, I want to add a status code to the return as well.

from typing import NamedTuple


class SimpleResponse(NamedTuple):
    success: bool
    code: int
    message: str


def do_the_request(should_work: bool) -> SimpleResponse:
    if should_work:
        success = True
        code = 200
        message = "Everything went fine"
    else:
        success = False
        code = 500
        message = "Everything went very badly"

    return SimpleResponse(success, code, message)

if __name__ == '__main__':
    resp = do_the_request(True)
    print("The request good: {0} Message: {1}".format(resp.success, resp.message))
$ python scruffy_examples.py
The request good: True Message: Everything went fine

So without refactoring the code calling the request function, everything is still working OK! And, once again using typing, we are much more explicit on what types of values are being passed around and the IDE can infer a lot more information. We wouldn’t get these type and autocomplete benefits if we had just refactored the request method to return a dictionary.

Another user for named tuples, which I won’t give code examples for here, is when passing a lot of arguments to a function that all need to be passed to other functions it calls. For example, recursively parsing a tree that has the same types of nodes.

Node 1 of type A, has children 2 and 3 of types B, and child 2 has child 4 of type A

Therefore the parsing of this structure would involve function calls:

parse_node_a(node1, param1, param2, param3) -> parse_node_b(node2, param1, param2, param3) -> parse_node_a(node4, param1, param2, param3)

If we want to add param4, used only when parsing nodes of type A it needs to be added to every intermediary function (parse_node_b) just to be passed around.

A better solution is to use a NamedTuple to store a context of how the parser behaves:

class ParserContext(NamedTuple):
    param1: bool
    param2: bool
    param3: bool
    param4: bool

Then extra parameters can be added to the context simply, without having to refactor all the method calls as they don’t change:

parse_node_a(node1, ctx) -> parse_node_b(node2, ctx) -> parse_node_a(node4, ctx)

Conclusion

Python is terrible, make it better by writing it more like Java.

Just kidding. Just thought I’d write a short post about enforcing some stricter rules to Python coding can make your code easier to write and understand.

Previous entry

Next entry

Related entries

DRY Django URLs with Enums