How Python computes 2 + 5 under the hood (part 2)

(see also Part 1 of this post)

Representing Python objects at C level

In CPython, Python objects are represented as C structs. While struct members can vary depending on the object type, all PyObject instances contain at least the following two members, i.e. the so-called PyObject_HEAD:

ob_refcnt – the number of references to the object. Used for garbage
collection purposes, since the objects that are not referred by anything anymore
should be cleaned up to avoid memory leaks.
ob_type – a pointer to a type object, which is a special object describing
the referencing object’s type.

The segment of the interpreter code for the BINARY_ADD instruction that was omitted for brevity in Part 1 is the following:

if (PyUnicode_CheckExact(left) &&
         PyUnicode_CheckExact(right)) {
    sum = unicode_concatenate(left, right, f, next_instr);
    /* unicode_concatenate consumed the ref to left */
}
else {
    sum = PyNumber_Add(left, right);
    Py_DECREF(left);
}
Py_DECREF(right);

Here Python checks if the left and right operands are both Unicode instances, i.e. strings. It does that by inspecting their type objects. If both operands are indeed strings, it performs string concatenation on them, but for anything else the PyNumber_Add() function gets called. Since the operands 2 and 5 in our case are integers, this is exactly what happens. There is also some reference count management (the Py_DECREF() macro), but we will not dive into that.

PyNumberAdd() first tries to perform the add operation on the given operands v and w (two pointers to PyObject) by invoking binary_op1(v, w, NB_SLOT(nb_add)). If the result of that call is Py_NotImplemented, it further tries to concatenate the operands as sequences. This is not the case with integers, however, so let’s have a look at the binary_op1() function located in Objects/abstract.c file:

static PyObject *
binary_op1(PyObject *v, PyObject *w, const int op_slot)
{
    PyObject *x;
    binaryfunc slotv = NULL;
    binaryfunc slotw = NULL;

    if (v->ob_type->tp_as_number != NULL)
        slotv = NB_BINOP(v->ob_type->tp_as_number, op_slot);
    if (w->ob_type != v->ob_type &&
        w->ob_type->tp_as_number != NULL) {
        slotw = NB_BINOP(w->ob_type->tp_as_number, op_slot);
        if (slotw == slotv)
            slotw = NULL;
    }
    if (slotv) {
        if (slotw && PyType_IsSubtype(w->ob_type, v->ob_type)) {
            x = slotw(v, w);
            if (x != Py_NotImplemented)
                return x;
            Py_DECREF(x); /* can't do it */
            slotw = NULL;
        }
        x = slotv(v, w);
        if (x != Py_NotImplemented)
            return x;
        Py_DECREF(x); /* can't do it */
    }
    if (slotw) {
        x = slotw(v, w);
        if (x != Py_NotImplemented)
            return x;
        Py_DECREF(x); /* can't do it */
    }
    Py_RETURN_NOTIMPLEMENTED;
}

Delegating the work to the right function

The binary_op1() function expects references to two Python objects and the binary operation that should be performed on them. The actual function that will perform this operation is obtained with the following:

NB_BINOP(v->ob_type->tp_as_number, op_slot)

Remember how each PyObject contains a reference to another object describing the former’s type, i.e. the ob_type struct member? For integers this is the PyLong_Type located in Objects/longobject.c.

PyLong_Type has the tp_as_number member, a reference to a structure holding pointers to all “number” methods available on Python int objects (integers in Python 3 are what is known as the long type in Python 2):

static PyNumberMethods long_as_number = {
    (binaryfunc)long_add,       /*nb_add*/
    (binaryfunc)long_sub,       /*nb_subtract*/
    (binaryfunc)long_mul,       /*nb_multiply*/
    long_mod,                   /*nb_remainder*/
    ...
}

Finally there is the NB_BINOP(nb_methods, slot) macro that picks a particular method from this list. Since in our case binary_op1() is invoked with NB_SLOT(nb_add) as the third argument, the function for adding two integers is returned.

Now, with two operands in the expression left + right, a decision needs to be made which operand should be used to pick the addition function from to compute the result. As explained in a helpful comment above the binary_op1() function, the order is as follows:

If right is a strict subclass of left, right.add(left, right) is tried.
left.add(left, right) is tried.
right.add(left, right) is tried (unless it hast already been tried in the first step).

Python tries to do its best to obtain a meaningful result, i.e. something other than NotImplemented, and if one of the operands does not support the operation, the other one is tried, too.

Nailing it

So which function is the one that actually computes the sum of 2 and 5 in the end?

It’s the long_add() function implemented in Objects/longobject.c. It is perhaps a bit more complex than expected, because it needs to support the addition of integers of arbitrary length, and still performing fast for integers small enough to fit into a CPU register.

Whoa! After all the digging down the rabbit hole we finally found the right function. Quite a lot of extra work for such a simple operation the addition is, but that’s the price we have to pay in order to get the Python’s dynamic nature in exchange. Remember that the same add(x, y) function we wrote in Part 1 of this post works out of the box with different operand types, and I hope the mechanisms behind the scenes that allow for this are now more clear.

>>> add(2, 5)
7
>>> add('2', '5')
'25'
>>> add([2], [5])
[2, 5]

As always, comments, suggestions, praise, and (constructive) criticism are all welcome. Thanks for reading!

Representing Python objects at C level

Delegating the work to the right function

Nailing it

Share this:

Related

Leave a comment Cancel reply