Whoa, it’s been quite a while since my last blog post. This is, of course, not my fault (it never is 😛), let’s instead blame the architect of this world that only gave us 24 hours in a single day. That’s not nearly enough to do all the things one wants! Anyhow, now that I optimistically deluded myself that the excuse I just gave is convincing, I would like to share a tip when dealing with maybe-missing dictionary data.
This situation is quite common when receiving JSON data from an external APIs, which is normally handed to you, the programmer, parsed in a dictionary. Reading values from such dictionaries requires some caution to avoid unnecessary errors:
data = dict(foo=1, bar=2) baz_value = data['baz'] # KeyError
The obvious, but somewhat naive, way of dealing with this is to first check if baz
exists in the first place:
baz_value = data['baz'] if 'baz' in data else None
It works, but it’s verbose, and there is already a dict.get(key[, default])
method that does exactly that – return the value stored under key
, if such key exists, otherwise return the default
value (which defaults to None
). It is thus a common practice to just use get()
instead:
baz_value = data.get('baz')
However, sometimes there are cases when one actually needs to distinguish between the missing values and the values explicitly set to None
. If get()
returns None
, it is not clear why it returned it without an additional check. It is also not recommended to use a different default value that can “never” possibly represent an actual value in the dictionary, as that assumption can change and break our code.
baz_value = data.get('baz', 'NO VALUE') if baz_value == 'NO VALUE': # handle the "missing" case else: # handle the "normal" case
If baz
somehow happens to end up with the value 'NO VALUE'
in data
, the code above will not work as intended. One solution is to again use an explicit membership test, and only read the value if the latter succeeds:
if 'baz' not in data: # handle the "missing" case else: baz_value = data['baz'] # handle the "normal" case
The downside is that we need to do two dictionary lookups, one for the membership test, and another to actually retrieve the baz
value.
Fortunately, there is an elegant fix to this – sentinels. A value picked for a sentinel must be something that is always uniquely distinguishable from the data. One option is to generate a unique string (with, say, uuid.uuid4()), which is “unique enough” for practical purposes, another is to instantiate a new object and test for identity:
MISSING = object() # a sentinel baz_value = data.get('baz', MISSING) if baz_value is MISSING: # handle the "missing" case elif baz_value is None: # handle the "normal" case when None else: # handle all other "normal" cases
The explicit check for None
is optional and you might not even need it, but it is nevertheless included in the above snippet to demonstrate all three possible outcomes, and how they can be handled with just a single dictionary lookup.