Python attribute lookup explained in detail

A few months ago I gave a talk at the local Python meetup on how attribute lookup on an object works in Python. It is not as straightforward as it looks like on the surface, and thought that it might be an interesting topic to present.

I received highly positive feedback from the listeners, confirming that they learned something new and potentially valuable. I used a Jupyter Notebook for the presentation, and if you prefer running the code examples to reading, you can jump straight into it – just download the notebook and play around with it. It contains quite a few comments, thus the examples should hopefully be self-explanatory.

Storing attributes on an object

Say we have the following instance:

class Foo(object):  # a new-style class
    x = 'x of Foo'

foo_inst = Foo()

We can inspect its attributes by peeking into the instances __dict__, which is currently empty, because the x from above belong to the instance’s class:

>>> foo_inst.__dict__
{}

Nevertheless, an attempt to retrieve x from the instance succeeds, because Python finds it in the instance’s class. The lookup is dynamic and a change to a class attribute is also reflected on the instance:

>>> foo_inst.x
'x of Foo'
>>> Foo.x = 'new x of Foo'
>>> foo_inst.x
'new x of Foo'

But what happens when both the instance and its class contain an attribute with the same name? Which one takes precedence? Let’s inject x into the instance and observe the result:

>>> foo_inst.__dict__['x'] = 'x of foo_inst'
>>> foo_inst.__dict__
{'x': 'x of foo_inst'}
>>> foo_inst.x
'x of foo_inst'

No surprises here, the x is looked up on the instance first and found there. If we now remove the x, it will be picked from the class again:

>>> del foo_inst.__dict__['x']
>>> foo_inst.__dict__
{}
>>> foo_inst.x
'new x of Foo'

As demonstrated, instance attributes take precedence over class attributes – with a caveat. Contrary to what a quite lot of people think, this is not always the case, and sometimes class attributes shadow the instance attributes. Enter descriptors.

Descriptors

Descriptors are special objects that can alter the interaction with attributes. For an object to be a descriptor, it needs to define at least one of the following special methods: __get__(), __set__(), or __delete__().

class DescriptorX(object):

    def __get__(self, obj, obj_type=None):
        if obj is None:
            print('__get__(): Accessing x from the class', obj_type)
            return self

        print('__get__(): Accessing x from the object', obj)
        return 'X from the descriptor'

    def __set__(self, obj, value):
        print('__set__(): Setting x on the object', obj)
        obj.__dict__['x'] = '{0}|{0}'.format(value)

The class DescriptorX conforms to the given definition, and we can instantiate it to turn the attribute x into a descriptor:

>>> Foo.x = DescriptorX()
>>> Foo.__dict__['x']
<__main__.DescriptorX at 0x7fa0b2ff3790>

Accessing a descriptor does not simply return it as it is the case with non-descriptor attributes, but instead invokes its __get__() method and return the result.

>>> Foo.x
# prints: __get__(): Accessing x from the class <class '__main__.Foo'>
<__main__.DescriptorX at 0x7fa0b2ff3790>

Even though the result is actually the descriptor itself, the extra line printed to output tells us that its __get__() method was indeed invoked, returning the descriptor.

The __get__() method receives two arguments – the instance on which an attribute was looked up (can be None if accessing an attribute on a class), and the “owner” class, i.e. the class containing the descriptor instance.

Let’s see what happens if we access a descriptor on an instance of a class, and that instance also contains an attribute with the same name:

>>> foo_inst.__dict__['x'] = 'x of foo_inst is back'
>>> foo_inst.__dict__
{'x': 'x of foo_inst is back'}
>>> foo_inst.x
# prints: __get__(): Accessing x from the object <__main__.Foo object at 0x7fe2bc613350>
'X from the descriptor'

The result might surprise you – the descriptor (defined on the class) took precedence over the instance attribute!

Overriding and non-overriding descriptors

The story does not end here, however – sometimes a descriptor does not take precedence:

>>> del DescriptorX.__set__
>>> foo_inst.x
'x of foo_inst is back'

It turns out there are actually two kinds of descriptors:

  • Data descriptors (overriding) – they define the __set__() and/or the __delete__() method (but normally __set__() as well) and take precedence over instance attributes.
  • Non-data descriptors (non-overriding) – they define only the __get__() method and are shadowed by an instance attribute of the same name.

If descriptor behavior seems similar to a property to you, it is because properties are actually implemented as descriptors behind the scenes. The same goes for class methods, ORM attributes on data models, and several other constructs.

from sqlalchemy import Column, Integer, String
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()

class SomeClass(Base):
    __tablename__ = 'some_table'
    id = Column(Integer, primary_key=True)  # a descriptor
    name =  Column(String(50))  # a descriptor

    @property  # creates a descriptor under the name "foo"
    def foo(self):  
        return 'foo'

    @classmethod  # creates a descriptor, too
    def make_instance(cls, **kwargs):
        return cls(**kwargs)

Traversing the inheritance hierarchy

Sometimes an attribute does not exist on a class/instance, but Python does not give up just yet. It continues searching the parent classes, as the attribute might be found there.

Consider the following hierarchy:

class A(object): pass

class B(object):
    x = 'x from B'

class C(A, B): pass

class D(B):
    x = 'x from D'

class E(C, D): pass

Or in a picture, because a picture is worth a thousand words:

Class hierarchy

The attribute x is defined both on the class D and class B. If accessing it on an instance of E that does not have it, the lookup still succeeds:

>>> e_inst = E()
>>> e_inst.x
'x from D'

The thing to observe here is that, apparently, the lookup algorithm is not depth-first search, otherwise x would be first found on B. The algorithm used is also not breadth-first search, otherwise x would still be picked from D in the following example, but it is instead retrieved from A:

>>> A.x = 'x from A'
>>> e_inst.x
'x from A'

The actual lookup order can be seen by inspecting the method resolution order (MRO) of a class:

>>> E.__mro__
(__main__.E, __main__.C, __main__.A, __main__.D, __main__.B, object)

Python uses the C3 linearization algorithm to construct it and to decide how to traverse the class hierarchy.

What about metaclasses?


Interlude – what is a metaclass?
Putting it simply, a metaclass is a “thing” that can create a new class in the same way a class can be used to create new objects, i.e. instances of itself:

  • metaclass() —> a new Class (an instance of metaclass)
  • Class() —> a new object (an instance of Class)

Or by example:

# "AgeMetaclass" inherits from the metaclass "type",
# thus "AgeMetaclass" is a metaclass, too
class AgeMetaclass(type):
    age = 18

# create an instance of a metaclass to produce a class
Person = AgeMetaclass('Person', (object,), {'age': 5})  # name, base classes, class attributes

# the above is the same as using the standard class definition syntax:
class Person(object):
    __metaclass__ = AgeMetaclass
    age = 5

# NOTE: in Python 3 the metaclass would be specified differently:
class Person(metaclass=AgeMetaclass):
   age = 5  

If an attribute is found on a class, its metaclass does not interfere, nor does it interfere when looking up an attribute on an instance:

>>> Person.age
5
>>> john_doe = Person()
>>> john_doe.age
5

On the other hand, if an attribute is not found on the class, it s looked up on its metaclass:

>>> del Person.age
>>> Person.age
18

There is a caveat, however – a metaclass is not considered when accessing an attribute on a class instance:

>>> john_doe.age
# AttributeError: 'Person' object has no attribute 'age'

The lookup only goes one layer up. It inspects the class of an instance, or a metaclass of a class, but not an “indirect metaclass”1 of a class instance.

What happens if an attribute is not found?

Python does not give up just yet. If implemented, it uses the __getattr__() hook on the class as a fallback.

class Product(object):
    def __init__(self, label):
        self.label = label

    def __getattr__(self, name):
        print('attribute "{}" not found, but giving you a foobar tuple!'.format(name))
        return ('foo', 'bar')

Let’s access an attribute that exists, and then an attribute that does not:

>>> chair = Product('dining chair DC-745')
>>> chair.label
'dining chair DC-745'
>>> chair.manufacturer
# prints: attribute "manufacturer" not found, but giving you a foobar tuple!
('foo', 'bar')

Because of the fallback, the AttributeError was not raised. Just keep in mind that defining __getattr__() on an instance instead of on a class will not work:

>>> del Product.__getattr__
>>> chair.__getattr__ = lambda self, name: 'instance __getattr__'
>>> chair.unknown_attr
# AttributeError: 'Product' object has no attribute 'unknown_attr'
NOTE: __getattr__() or __getattribute__() ?

__getattr__() should not be confused with __getattribute__(). The former is a fallback for missing attributes as demonstrated above, while the latter is the method that gets invoked on attribute access, i.e. when using the “dot” operator. It implements the lookup algorithm explained in this post, but can be overriden and customized. The default implementation is in C function _PyObject_GenericGetAttrWithDict().

Most of the time, however, it is probably the __getattr__() method that you want to override.

Summary

Accessing an attribute on a (new-style) class instance invokes the __getattribute__() method that performs the following:

  • Check the class hierarchy using MRO (but do not examine metaclasses):
    • If a data (overriding) descriptor is found in class hierarchy, call its __get__() method;
  • Otherwise check the instance __dict__ (assuming no __slots__ for the sake of example). If an attribute is there, return it;
  • If attribute not in instance.__dict__ but found in the class hierarchy:
    • If (non-data) descriptor, call its __get__() method;
    • If not a descriptor, return the attribute itself;
  • If still not found, invoke __getattr__(), if implemented on a class;
  • Finally give up and raise AttributeError.

  1. I totally made this term up, do not use it in a conversation when trying to sound smart. :)