How Python computes 2 + 5 under the hood (part 2)

(see also Part 1 of this post)

Representing Python objects at C level

In CPython, Python objects are represented as C structs. While struct members can vary depending on the object type, all PyObject instances contain at least the following two members, i.e. the so-called PyObject_HEAD:

  • ob_refcnt – the number of references to the object. Used for garbage
    collection purposes, since the objects that are not referred by anything anymore
    should be cleaned up to avoid memory leaks.
  • ob_type – a pointer to a type object, which is a special object describing
    the referencing object’s type.

The segment of the interpreter code for the BINARY_ADD instruction that was omitted for brevity in Part 1 is the following:

if (PyUnicode_CheckExact(left) &&
         PyUnicode_CheckExact(right)) {
    sum = unicode_concatenate(left, right, f, next_instr);
    /* unicode_concatenate consumed the ref to left */
else {
    sum = PyNumber_Add(left, right);

Here Python checks if the left and right operands are both Unicode instances, i.e. strings. It does that by inspecting their type objects. If both operands are indeed strings, it performs string concatenation on them, but for anything else the PyNumber_Add() function gets called. Since the operands 2 and 5 in our case are integers, this is exactly what happens. There is also some reference count management (the Py_DECREF() macro), but we will not dive into that.

PyNumberAdd() first tries to perform the add operation on the given operands v and w (two pointers to PyObject) by invoking binary_op1(v, w, NB_SLOT(nb_add)). If the result of that call is Py_NotImplemented, it further tries to concatenate the operands as sequences. This is not the case with integers, however, so let’s have a look at the binary_op1() function located in Objects/abstract.c file:

static PyObject *
binary_op1(PyObject *v, PyObject *w, const int op_slot)
    PyObject *x;
    binaryfunc slotv = NULL;
    binaryfunc slotw = NULL;

    if (v->ob_type->tp_as_number != NULL)
        slotv = NB_BINOP(v->ob_type->tp_as_number, op_slot);
    if (w->ob_type != v->ob_type &&
        w->ob_type->tp_as_number != NULL) {
        slotw = NB_BINOP(w->ob_type->tp_as_number, op_slot);
        if (slotw == slotv)
            slotw = NULL;
    if (slotv) {
        if (slotw && PyType_IsSubtype(w->ob_type, v->ob_type)) {
            x = slotw(v, w);
            if (x != Py_NotImplemented)
                return x;
            Py_DECREF(x); /* can't do it */
            slotw = NULL;
        x = slotv(v, w);
        if (x != Py_NotImplemented)
            return x;
        Py_DECREF(x); /* can't do it */
    if (slotw) {
        x = slotw(v, w);
        if (x != Py_NotImplemented)
            return x;
        Py_DECREF(x); /* can't do it */
Delegating the work to the right function

The binary_op1() function expects references to two Python objects and the binary operation that should be performed on them. The actual function that will perform this operation is obtained with the following:

NB_BINOP(v->ob_type->tp_as_number, op_slot)

Remember how each PyObject contains a reference to another object describing the former’s type, i.e. the ob_type struct member? For integers this is the PyLong_Type located in Objects/longobject.c.

PyLong_Type has the tp_as_number member, a reference to a structure holding pointers to all “number” methods available on Python int objects (integers in Python 3 are what is known as the long type in Python 2):

static PyNumberMethods long_as_number = {
    (binaryfunc)long_add,       /*nb_add*/
    (binaryfunc)long_sub,       /*nb_subtract*/
    (binaryfunc)long_mul,       /*nb_multiply*/
    long_mod,                   /*nb_remainder*/

Finally there is the NB_BINOP(nb_methods, slot) macro that picks a particular method from this list. Since in our case binary_op1() is invoked with NB_SLOT(nb_add) as the third argument, the function for adding two integers is returned.

Now, with two operands in the expression left + right, a decision needs to be made which operand should be used to pick the addition function from to compute the result. As explained in a helpful comment above the binary_op1() function, the order is as follows:

  • If right is a strict subclass of left, right.add(left, right) is tried.
  • left.add(left, right) is tried.
  • right.add(left, right) is tried (unless it hast already been tried in the first step).

Python tries to do its best to obtain a meaningful result, i.e. something other than NotImplemented, and if one of the operands does not support the operation, the other one is tried, too.

Nailing it

So which function is the one that actually computes the sum of 2 and 5 in the end?

It’s the long_add() function implemented in Objects/longobject.c. It is perhaps a bit more complex than expected, because it needs to support the addition of integers of arbitrary length, and still performing fast for integers small enough to fit into a CPU register.

Whoa! After all the digging down the rabbit hole we finally found the right function. Quite a lot of extra work for such a simple operation the addition is, but that’s the price we have to pay in order to get the Python’s dynamic nature in exchange. Remember that the same add(x, y) function we wrote in Part 1 of this post works out of the box with different operand types, and I hope the mechanisms behind the scenes that allow for this are now more clear.

>>> add(2, 5)
>>> add('2', '5')
>>> add([2], [5])
[2, 5]

As always, comments, suggestions, praise, and (constructive) criticism are all welcome. Thanks for reading!

How Python computes 2 + 5 under the hood (part 1)

Suppose we have a very simple Python function that accepts two arguments and returns their sum, and let’s name this function with an (un)imaginative name add:

def add(x, y):
    return x + y

>>> add (2, 5)

As a bonus, since Python is a dynamic language, the function also works with (some) other argument types out of the box. If given, say, two sequences, it returns their concatenation:

>>> add([1, 2], [3, 4])
[1, 2, 3, 4]
>>> add('foo', 'bar')

How does this work, you might ask? What happens behind the scenes when we invoke add()? We will see this in a minute.

Python 3.7 will be used in the examples (currently in beta 1 at the time of writing).

The dis module

For a start, we will inspect the add() function by using a handy built-in module dis.

>>> import dis
>>> bytecode = dis.Bytecode(add)
>>> print(
Name:              add
Filename:          <stdin>
Argument count:    2
Kw-only arguments: 0
Number of locals:  2
Stack size:        2
   0: None
Variable names:
   0: x
   1: y

Besides peeking into function’s metadata, we can also disassemble it:

>>> dis.dis(add)
  2           0 LOAD_FAST                0 (x)
              2 LOAD_FAST                1 (y)
              4 BINARY_ADD
              6 RETURN_VALUE

Disassembling shows that the function is comprised of four primitive bytecode instructions, which are understood and interpreted by the Python virtual machine.

Python is a stack-based machine

In CPython implementation, the interpreter is a stack-based machine, meaning that it does not have registers, but instead uses a stack to perform the computations.

The first bytecode instruction, LOAD_FAST, pushes a reference to a particular local variable onto the stack, and the single argument to the instruction specifies which variable that is. LOAD_FAST 0 thus picks a reference to x, because x is the first local variable, i.e. at index 0, which can be also seen from the function’s metadata presented just above.

Similarly, LOAD_FAST 1 pushes a reference to y onto the stack, resulting in the following state after the first two bytecode instructions have been executed:

    +---+  <-- TOS (Top Of Stack)
    | y |
    | x |

The next instruction, BINARY_ADD, takes no arguments. It simply takes the top two elements from the stack, performs an addition on them, and pushes the result of the operation back onto the stack.

At the end, RETURN_VALUE takes whatever the remaining element on the stack is, and returns that element to the caller of the add() function.

Going even deeper (enter the C level)

The bytecode instructions themselves are also just an abstraction, and something needs to make sense of them. That “something” is the Python interpreter. In CPython, its reference implementation, this is a program written in C that loops through the bytecode given to it, and interprets the instructions in it one by one.

The heart of this machinery is implemented in Python/ceval.c file. It runs an infinite loop that contains a giant switch statement with each case (target) handling on of the possible bytecode operations.

This is how the code for the instruction BINARY_ADD looks like:

    PyObject *right = POP();
    PyObject *left = TOP();
    PyObject *sum;

    /* the computation of the "sum" omitted */

    if (sum == NULL)
        goto error;

POP(), TOP(), and SET_TOP() are convenience C macros that perform primitive interpreter stack operations such as popping the top value from the stack, or replacing the current TOS (Top Of Stack) value with a different one.

The code above is actually pretty straightforward. It pops the right-hand operand from the top of the stack, which is a reference to y in our case, and stores it under the name right. It then also stores a reference to the left-hand side operand, i.e. x, that became the new TOS.

After performing the calculation, it sets the sum, i.e. a reference to the result, as the new TOS, performs a quick error check, and dispatches the control to the next bytecode instruction in the line.

In Part 2 the representation of Python objects at C level is explained, and how adding two such objects is done.

Comparing objects of different types in Python 2

On the project I currently work on, we were recently (briefly) bitten by an interesting bug that initially slipped through the tests and code review. We had a custom database-mapped type that had two attributes defined, let’s call them value_low and value_high. As their names suggest, the value of the former should be lower than the value of the latter. The model type also defined a validator that would enforce this rule, and assure internal consistency of the model instances.

A simplified version of the type is shown below:

from sqlalchemy import Column, Integer
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import validates

Base = declarative_base()

class MyModel(Base):
  __tablename__ = 'my_models'

  id = Column(Integer, primary_key=True)
  value_low = Column(Integer)
  value_high = Column(Integer)

  @validates('value_low', 'value_high')
  def validate_thresholds(self, key, value):
    """Assure that relative ordering of both values remains consistent."""
    if key == 'value_low' and value >= self.value_high:
      raise ValueError('value_low must be strictly lower than value_high')
    elif key == 'value_high' and value <= self.value_low:
       raise ValueError('value_high must be strictly higher than value_low')

    return value

The model definition seems pretty sensible, and it works, too:

>>> instance = MyModel(value_low=2, value_high=10)
>>> instance.value_low, instance.value_high
(2, 10)

Except when you run it on another machine with the same Python version:

>>> instance = MyModel(value_low=2, value_high=10)
Traceback (most recent call last):
ValueError: value_low must be strictly lower than value_high

Uhm, what?

After a bit of research, the following behavior can be noticed, even on the machine where the code initially seemed to work just fine:

>>> instance = MyModel()
>>> instance.value_high = 10
>>> instance.value_low = 2
>>> instance.value_low, instance.value_high
(2, 10)

# now let's try to reverse the order of assignments
>>> instance = MyModel()
>>> instance.value_low = 2
# ValueError ...

What’s going on here? Why is the order of attribute value assignments important?

This, by the way, explains why the first example worked on one machine, but failed on another. The values passed to MyModel() as keyword arguments were applied in a different order1.

Debugging showed that if value_low is set first when value_high is still None, the following expression evaluates to True (key equals to value_low), resulting in a failed validation:

# MIND: value == 2
if key == 'value_low' and value >= self.value_high:
  raise ValueError(...)

On the other hand, if we first set value_high when value_low is None, the corresponding if statement condition evaluates to False and the error does not get raised. Stripping it down to the bare essentials gives us the following2:

>>> 2 >= None
>>> 10 <= None

If we further explore this, it gets even better:

>>> None < -1
>>> None < -float('inf')
>>> None < 'foobar'
>>> 'foobar' > -1
>>> 'foobar' > 42
>>> tuple >= (lambda x: x)
>>> type > Ellipsis

Oh… of course. It’s Python2 that we use on the project (doh).

You see, Python2 is perfectly happy to compare objects of different types, even when that does not make much sense. Peeking into the CPython source code reveals that None is smaller than anything:

/* None is smaller than anything */
if (v == Py_None)
    return -1;
if (w == Py_None)
    return 1;

Different types are ordered by name, with number types being smaller than other types3.

Now that we know this, the bug from the beginning makes perfect sense. It was just a coincidence that high_value could be set on an instance, because the validator’s check for an error (value <= self.value_low) would never return True when self.value_low is None, because the latter is smaller than everything.

In the end the issue was, fortunately, quickly discovered, and fixing it was straightforward. We just needed to add an extra check:

if (
    key == 'value_low' and
    self.value_high is not None and  # <-- THIS
    value >= self.value_high):
# and the same for the other if...

And the application worked correctly and happily ever after…

  1. That’s probably because when creating a new instance of the model, SQLAlchemy loops over the kwargs dict somewhere down the rabbit hole, but the order of the dictionary keys is arbitrary. 
  2. Python2 only. In Python3 we would get TypeError: unorderable types
  3. Provided that types do not define their own custom rich comparison methods, of course.