Python attribute lookup explained in detail

A few months ago I gave a talk at the local Python meetup on how attribute lookup on an object works in Python. It is not as straightforward as it looks like on the surface, and thought that it might be an interesting topic to present.

I received highly positive feedback from the listeners, confirming that they learned something new and potentially valuable. I used a Jupyter Notebook for the presentation, and if you prefer running the code examples to reading, you can jump straight into it – just download the notebook and play around with it. It contains quite a few comments, thus the examples should hopefully be self-explanatory.

Storing attributes on an object

Say we have the following instance:

class Foo(object):  # a new-style class
    x = 'x of Foo'

foo_inst = Foo()

We can inspect its attributes by peeking into the instances __dict__, which is currently empty, because the x from above belong to the instance’s class:

>>> foo_inst.__dict__
{}

Nevertheless, an attempt to retrieve x from the instance succeeds, because Python finds it in the instance’s class. The lookup is dynamic and a change to a class attribute is also reflected on the instance:

>>> foo_inst.x
'x of Foo'
>>> Foo.x = 'new x of Foo'
>>> foo_inst.x
'new x of Foo'

But what happens when both the instance and its class contain an attribute with the same name? Which one takes precedence? Let’s inject x into the instance and observe the result:

>>> foo_inst.__dict__['x'] = 'x of foo_inst'
>>> foo_inst.__dict__
{'x': 'x of foo_inst'}
>>> foo_inst.x
'x of foo_inst'

No surprises here, the x is looked up on the instance first and found there. If we now remove the x, it will be picked from the class again:

>>> del foo_inst.__dict__['x']
>>> foo_inst.__dict__
{}
>>> foo_inst.x
'new x of Foo'

As demonstrated, instance attributes take precedence over class attributes – with a caveat. Contrary to what a quite lot of people think, this is not always the case, and sometimes class attributes shadow the instance attributes. Enter descriptors.

Descriptors

Descriptors are special objects that can alter the interaction with attributes. For an object to be a descriptor, it needs to define at least one of the following special methods: __get__(), __set__(), or __delete__().

class DescriptorX(object):

    def __get__(self, obj, obj_type=None):
        if obj is None:
            print('__get__(): Accessing x from the class', obj_type)
            return self

        print('__get__(): Accessing x from the object', obj)
        return 'X from the descriptor'

    def __set__(self, obj, value):
        print('__set__(): Setting x on the object', obj)
        obj.__dict__['x'] = '{0}|{0}'.format(value)

The class DescriptorX conforms to the given definition, and we can instantiate it to turn the attribute x into a descriptor:

>>> Foo.x = DescriptorX()
>>> Foo.__dict__['x']
<__main__.DescriptorX at 0x7fa0b2ff3790>

Accessing a descriptor does not simply return it as it is the case with non-descriptor attributes, but instead invokes its __get__() method and return the result.

>>> Foo.x
# prints: __get__(): Accessing x from the class <class '__main__.Foo'>
<__main__.DescriptorX at 0x7fa0b2ff3790>

Even though the result is actually the descriptor itself, the extra line printed to output tells us that its __get__() method was indeed invoked, returning the descriptor.

The __get__() method receives two arguments – the instance on which an attribute was looked up (can be None if accessing an attribute on a class), and the “owner” class, i.e. the class containing the descriptor instance.

Let’s see what happens if we access a descriptor on an instance of a class, and that instance also contains an attribute with the same name:

>>> foo_inst.__dict__['x'] = 'x of foo_inst is back'
>>> foo_inst.__dict__
{'x': 'x of foo_inst is back'}
>>> foo_inst.x
# prints: __get__(): Accessing x from the object <__main__.Foo object at 0x7fe2bc613350>
'X from the descriptor'

The result might surprise you – the descriptor (defined on the class) took precedence over the instance attribute!

Overriding and non-overriding descriptors

The story does not end here, however – sometimes a descriptor does not take precedence:

>>> del DescriptorX.__set__
>>> foo_inst.x
'x of foo_inst is back'

It turns out there are actually two kinds of descriptors:

  • Data descriptors (overriding) – they define the __set__() and/or the __delete__() method (but normally __set__() as well) and take precedence over instance attributes.
  • Non-data descriptors (non-overriding) – they define only the __get__() method and are shadowed by an instance attribute of the same name.

If descriptor behavior seems similar to a property to you, it is because properties are actually implemented as descriptors behind the scenes. The same goes for class methods, ORM attributes on data models, and several other constructs.

from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class SomeClass(Base):
    __tablename__ = 'some_table'
    id = Column(Integer, primary_key=True)  # a descriptor
    name =  Column(String(50))  # a descriptor

    @property  # creates a descriptor under the name "foo"
    def foo(self):  
        return 'foo'

    @classmethod  # creates a descriptor, too
    def make_instance(cls, **kwargs):
        return cls(**kwargs)

Traversing the inheritance hierarchy

Sometimes an attribute does not exist on a class/instance, but Python does not give up just yet. It continues searching the parent classes, as the attribute might be found there.

Consider the following hierarchy:

class A(object): pass

class B(object):
    x = 'x from B'

class C(A, B): pass

class D(B):
    x = 'x from D'

class E(C, D): pass

Or in a picture, because a picture is worth a thousand words:

Class hierarchy

The attribute x is defined both on the class D and class B. If accessing it on an instance of E that does not have it, the lookup still succeeds:

>>> e_inst = E()
>>> e_inst.x
'x from D'

The thing to observe here is that, apparently, the lookup algorithm is not depth-first search, otherwise x would be first found on B. The algorithm used is also not breadth-first search, otherwise x would still be picked from D in the following example, but it is instead retrieved from A:

>>> A.x = 'x from A'
>>> e_inst.x
'x from A'

The actual lookup order can be seen by inspecting the method resolution order (MRO) of a class:

>>> E.__mro__
(__main__.E, __main__.C, __main__.A, __main__.D, __main__.B, object)

Python uses the C3 linearization algorithm to construct it and to decide how to traverse the class hierarchy.

What about metaclasses?


Interlude – what is a metaclass?
Putting it simply, a metaclass is a “thing” that can create a new class in the same way a class can be used to create new objects, i.e. instances of itself:

  • metaclass() —> a new Class (an instance of metaclass)
  • Class() —> a new object (an instance of Class)

Or by example:

# "AgeMetaclass" inherits from the metaclass "type",
# thus "AgeMetaclass" is a metaclass, too
class AgeMetaclass(type):
    age = 18

# create an instance of a metaclass to produce a class
Person = AgeMetaclass('Person', (object,), {'age': 5})  # name, base classes, class attributes

# the above is the same as using the standard class definition syntax:
class Person(object):
    __metaclass__ = AgeMetaclass
    age = 5

# NOTE: in Python 3 the metaclass would be specified differently:
class Person(metaclass=AgeMetaclass):
   age = 5  

If an attribute is found on a class, its metaclass does not interfere, nor does it interfere when looking up an attribute on an instance:

>>> Person.age
5
>>> john_doe = Person()
>>> john_doe.age
5

On the other hand, if an attribute is not found on the class, it s looked up on its metaclass:

>>> del Person.age
>>> Person.age
18

There is a caveat, however – a metaclass is not considered when accessing an attribute on a class instance:

>>> john_doe.age
# AttributeError: 'Person' object has no attribute 'age'

The lookup only goes one layer up. It inspects the class of an instance, or a metaclass of a class, but not an “indirect metaclass”1 of a class instance.

What happens if an attribute is not found?

Python does not give up just yet. If implemented, it uses the __getattr__() hook on the class as a fallback.

class Product(object):
    def __init__(self, label):
        self.label = label

    def __getattr__(self, name):
        print('attribute "{}" not found, but giving you a foobar tuple!'.format(name))
        return ('foo', 'bar')

Let’s access an attribute that exists, and then an attribute that does not:

>>> chair = Product('dining chair DC-745')
>>> chair.label
'dining chair DC-745'
>>> chair.manufacturer
# prints: attribute "manufacturer" not found, but giving you a foobar tuple!
('foo', 'bar')

Because of the fallback, the AttributeError was not raised. Just keep in mind that defining __getattr__() on an instance instead of on a class will not work:

>>> del Product.__getattr__
>>> chair.__getattr__ = lambda self, name: 'instance __getattr__'
>>> chair.unknown_attr
# AttributeError: 'Product' object has no attribute 'unknown_attr'
NOTE: __getattr__() or __getattribute__() ?

__getattr__() should not be confused with __getattribute__(). The former is a fallback for missing attributes as demonstrated above, while the latter is the method that gets invoked on attribute access, i.e. when using the “dot” operator. It implements the lookup algorithm explained in this post, but can be overriden and customized. The default implementation is in C function _PyObject_GenericGetAttrWithDict().

Most of the time, however, it is probably the __getattr__() method that you want to override.

Summary

Accessing an attribute on a (new-style) class instance invokes the __getattribute__() method that performs the following:

  • Check the class hierarchy using MRO (but do not examine metaclasses):
    • If a data (overriding) descriptor is found in class hierarchy, call its __get__() method;
  • Otherwise check the instance __dict__ (assuming no __slots__ for the sake of example). If an attribute is there, return it;
  • If attribute not in instance.__dict__ but found in the class hierarchy:
    • If (non-data) descriptor, call its __get__() method;
    • If not a descriptor, return the attribute itself;
  • If still not found, invoke __getattr__(), if implemented on a class;
  • Finally give up and raise AttributeError.

  1. I totally made this term up, do not use it in a conversation when trying to sound smart. :) 

Applying a decorator to a class method results in an error

You might have seen this one before – you wrote a decorator in Python and tried to apply it to a class method (or static method, for that matter), only to see an error.

from functools import wraps

def logged(func):
    """A decorator printing a message before invoking the wrapped function."""
    @wraps(func)
    def wrapped_func(*args, **kwargs):
        print('Invoking', func)
        return func(*args, **kwargs)
    return wrapped_func


class Foo(object):
    @logged
    @classmethod
    def get_name(cls):
        return cls.__name__

As the docstring explains, the logged decorator simply prints a message before invoking the decorated function, and it is applied to the get_name() class method of the class Foo. The @wraps decorator makes sure the original function’s metadata is copied to the wrapper function returned by the decorator (docs).

But despite this essentially being a textbook example of a decorator in Python, invoking the get_name() method results in an error (using Python3 below):

>>> Foo.get_name()
Invoking <classmethod object at 0x7f8e7473e0f0>
Traceback (most recent call last):
    ...
TypeError: 'classmethod' object is not callable

If you just want to quickly fix this issue, because it annoys you, here’s the TL;DR fix – just swap the order of the decorators, making sure that the @classmethod decorator is applied last:

class Foo(object):
    @classmethod
    @logged
    def get_name(cls):
        return cls.__name__

>>> Foo.get_name()
Invoking <function Foo.get_name at 0x7fce90356c80>
'Foo'

On the other hand, if you are curious what is actually happening behind the scenes, please keep reading.

The first thing to note is the output in each example immediately after calling Foo.get_name(). Our decorator prints the object that is about to invoke in the very next line, and in the non-working example that object is actually not a function!

Invoking <classmethod object at 0x7f8e7473e0f0>

Instead, the thing that our decorator tries to invoke is a “classmethod” object, but the latter is not callable, causing the Python interpreter to complain.

Meet descriptors

Let’s take a closer look at a stripped-down version of the Foo class:

class Foo(object):
    @classmethod
    def get_name(cls):
        return cls.__name__

>>> thing = Foo.__dict__['get_name']
>>> thing
<classmethod object at 0x7f295ffc6d30>
>>> hasattr(thing, '__get__')
True
>>> callable(thing)
False

As it turns out, get_name is an object which is not callable, i.e. we can not say get_name() and expect it to work. By the presence of the __get__ attribute we can also see, that it is a descriptor.

Descriptors are object that behave differently than “normal” attributes. When accessing a descriptor, what happens is that its __get__() method gets called behind the scenes, returning the actual value. The following two expressions are thus equivalent:

>>> Foo.get_name
<bound method Foo.get_name of <class '__main__.Foo'>>
>>> Foo.__dict__['get_name'].__get__(None, Foo)
<bound method Foo.get_name of <class '__main__.Foo'>>

__get__() gets called with two parameters – the object instance the attribute belongs to (None here, because accessing the attribute through a class), and the owner class, i.e. the one the descriptor is defined on (Foo in this case)1.

What the classmethod descriptor does is binding the original get_name() function to its class (Foo), and returning a bound method object. When the latter gets called, it invokes get_name(), passing class Foo as the first argument (cls) along with any other arguments the bound method was originally called with.

Armed with this knowledge it is now clear why our logged decorator from the beginning does not always work. It assumes that the object passed to it is directly callable, and does not take the descriptor protocol into account.

Making it right

Describing how to adjust the logged decorator to work correctly is quite a lengthy topic, and out of scope of this post. If interested, you should definitely read the blog series by Graham Dumpleton, as it addresses many more aspects than just working well with classmethods. Or just use his wrapt library for writing decorators:

import wrapt

@wrapt.decorator
def logged(wrapped, instance, args, kwargs):
    print('Invoking', wrapped)
    return wrapped(*args, **kwargs)

class Foo(object):
    @logged
    @classmethod
    def get_name(cls):
        return cls.__name__

>>> Foo.get_name()
Invoking <bound method Foo.get_name of <class 'main2.Foo'>>
'Foo'

Yup, it works.


  1. On the other hand, if retrieving a descriptor object directly from the class’s __dict__, the descriptor’s __get__() method is bypassed, and that’s why we used Foo.__dict__['get_name'] at a few places in the examples. 
%d bloggers like this: