Handling application settings hierarchy more easily with Python’s ChainMap

Say you have an application that, as almost every more complex application, exposes some settings that affect how that application behaves. These settings can be configured in several ways, such as specifying them in a configuration file, or by using the system’s environment variables. They can also be provided as command line arguments when starting the application.

If the same setting is provided at more than a single configuration layer, the value from the layer with the highest precedence is taken. This could mean that, for example, configuration file settings override those from environment variables, but command line arguments have precedence over both. If a particular setting is not specified in any of these three layers, its default value is used.

Ordered by priority (highest first), our hierarchy of settings layers looks as follows:

  • command line arguments
  • configuration file
  • environment variables
  • application defaults

A natural way of representing settings is by using a dictionary, one for each layer. An application might thus contain the following:

defaults = {
    'items_per_page': 100,
    'log_level': 'WARNING',
    'max_retries': 5,
}

env_vars = {
    'max_retries': 10
}

config_file = {
    'log_level': 'INFO'
}

cmd_args = {
    'max_retries': 2
}

Putting them all together while taking the precedence levels into account, here’s how the application settings would look like:

>>> settings  # all settings put together
{
    'items_per_page': 100,  # a default value
    'log_level': 'INFO',  # from the config file
    'max_retries': 2,  # from a command line argument
}

There is a point in the code where a decision needs to be made on what setting value to consider. One way of of determining that is by examining the dictionaries one by one until the setting is found:1

setting_name = 'log_level'

for settings in (cmd_args, config_file, env_vars, defaults):
    if setting_name in settings:
        value = settings[setting_name]
        break
else:
    raise ValueError('Setting not found: {}'.format(setting_name))

# do something with value...

A somewhat verbose approach, but at least better than a series of if-elif statements.

An alternative approach is to merge all settings into a single dictionary before using any of their values:

settings = defaults.copy()
for d in (env_vars, config_file, cmd_args):
    settings.update(d)

Mind that here the order of applying the settings layers must be reversed, i.e. the highest priority layers get applied last, so that no lower-level layers can override them. This works quite well, with a possibly only downside that if any of the underlying settings dictionaries get updated, the “main” settings dictionary must be rebuilt, because the changes are not reflected in it automatically:

>>> config_file['items_per_page'] = 25
>>> settings['items_per_page']
100  # change not propagated

Now, I probably wouldn’t be writing about all this, if there didn’t already exist an elegant solution in the standard library. Python 3.32 brought us collections.ChainMap, a handy class that can transparently group multiple dicts together. We just need to pass it all our settings layers (higher-priority ones first), and ChainMap takes care of the rest:

>>> from collections import ChainMap
>>> settings = ChainMap(cmd_args, config_file, env_vars, defaults)
>>> settings['items_per_page']
100
>>> env_vars['items_per_page'] = 25
>>> settings['items_per_page']
25  # yup, automatically propagated

Pretty handy, isn’t it?


  1. In case you spot a mis-indented else block – the indentation is actually correct. It’s a for-else statement, and the else block only gets executed if the loop finishes normally, i.e. without break-ing out of it. 
  2. There is also a polyfill for (some) older Python versions. 

Keyword-only arguments in Python

Python allows you to pass arguments in two ways – either by their position or by their name (keyword):1

def foo(x, y):
    print(x, y)

# the following all print 5 10
foo(5, 10)
foo(5, y=10)
foo(x=5, y=10)  

It works the same if a function accepts keyword arguments:2

def bar(x=5, y=10):
    print(x, y)

# the following all print 4 8
bar(4, 8)
bar(4, y=8)
bar(x=4, y=8)

If a particular keyword argument is not provided, its default value is used:

>>> bar(2)  # y is omitted
2 10

It is quite common to use keyword arguments to define “extra” options that can be passed to a callable, and if omitted, their default value is used. That can improve readability, because it is enough to only pass the options whose value differs from the default:

def cat(x, y, to_upper=False, strip=True):
    """Concatenate given strings."""
    if strip:
       x, y = x.strip(), y.strip()

    result = x + y

    if to_upper:
        result = result.upper()

    return result


# returns 'foobar'
cat('  foo ', 'bar ')

# the following both return '  foo bar '
cat('  foo ', 'bar ', False, False)
cat('  foo ', 'bar ', strip=False)

You will probably agree that the second form is indeed more clean.

Positional arguments pitfalls

The ability to pass keyword arguments by position as demonstrated in the introduction can, if not careful, bite you, especially if you do not have a thorough test coverage in place as a safety net. Let’s say that there is a piece of code which invokes the cat() function using positional arguments only:

cat('  foo ', 'bar ', True)  # returns 'FOOBAR'

Let’s also say that suddenly one of the team members gets an inspiration and decides that it would be great to sort all keyword parameters alphabetically. You know, for readability. Before you can express your doubt, he eagerly refactors the function, swapping the two keyword parameters:

def cat(x, y, strip=True, to_upper=False):
    ...

If you have proper tests in place, good for you, but if you don’t, you might not realize, that this change just introduced a bug:

cat('  foo ', 'bar ', True)  # now returns 'foobar'

The poor 'FOOBAR' return value just got demoted to its lowercase version. This would not have happened if the option would be passed as a keyword argument, i.e. to_upper=True.

Another source of potential errors is accidentally passing an option value through a positional argument. Let’s imagine another contrived scenario where a new team member uses intuition to deduce how the cat() function works. Of course – it’s just a version of sum() adjusted to work with strings!

>>> cat('  foo ', 'bar ', 'baz')  # the original cat
'FOOBAR'

Erm…

The option to_upper was assigned the value 'baz' which is truthy, but it is probably not what the caller intended to achieve.

It can be argued that this behavior is a bit unintuitive, and that it would be nice if we could somehow force the callers to explicitly pass keyword arguments by their name (keyword), and not their position.

Making the arguments keyword-only (Python 3)

The trick is to swallow any redundant positional arguments, preventing them from filling the keyword arguments:

def demo(x, y, *args, separator='___'):
    print(x, y, args, sep=separator)


>>> demo(10, 20, 30, 40)
10___20___(30, 40)

>>> demo(10, 20, 30, 40, separator='---')
10---20---(30, 40)

Any positional arguments beyond the first two (30 and 40) get swallowed by the args tuple, and the only way to specify a different separator is through an explicit keyword argument. To complete the picture, we just need to prevent the callers to pass in too many positional arguments, and we can do this with a simple check if args is not empty:

if args:
    raise TypeError('Too many positional arguments given.')

What’s more, if we omit the variable arguments tuple’s name altogether, we get the above check for free!
Plus a useful error message on top of it, demo:

def demo2(x, y, *, separator='___'):
    print(x, y, sep=separator)                                                                                                                                                                                                                                                                                           


>>> demo2(1, 2, 3, separator=';')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: demo2() takes 2 positional arguments but 3 positional arguments (and 1 keyword-only argument) were given
Keyword-only arguments in Python 2

Unfortunately, if we try the same approach in Python 2, it will complain and raise a syntax error. We cannot specify individual keyword arguments after *args, but we can specify that a function accepts a variable number of keyword arguments, and then manually unpack it:

def foo(x, y, *args, **kwargs):
    option_a = kwargs.pop('option_a', 'default_A')
    option_b = kwargs.pop('option_b', 'default_B')

    if args or kwargs:
        raise TypeError('Too many positional and/or keyword arguments given.')

We also have to manually check if the caller has passed any unexpected positional and/or keyword arguments by testing the args tuple and the kwargs dict (after popping all expected items from it) – both should be empty.

Cumbersome indeed, not to mention that the function signature is not as pretty as it could be. But that’s what we have in Python 2.


  1. The following few examples assume either Python 3, or that print_function is imported from the __future__ module if using Python 2. 
  2. The fourth invocation option, i.e. saying foo(x=5, 10), is not listed, because it is a syntax error – positional arguments must precede any keyword arguments. 
%d bloggers like this: