Python Tip of the Day - Double If's in Comprehensions

Posted on Wed 11 April 2018 in Posts

So I was reviewing a coworkers pull request today and saw something I hadn't seen in Python before. As it turns out you can have multiple if clauses on a list comprehension. For example:

>>> [v for v in range(50) if v % 2 == 0 if v > 10]  # all even numbers between 10 & 50
[12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]

I wondered if this was mentioned in the standard docs , and sure enough:

A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses.

(emphasis added) That is, you can have as many if clauses as appropriate. It is roughly the same as:

>>> result = []
>>> for v in range(50):
...    if v % 2 == 0:
...        if v > 10:
...            result.append(v)
...
>>> result
[12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]

Astute readers will note that the multiple if clauses is equivalent to and-ing multiple boolean expressions together. Ie that example is the same as:

>>> [v for v in range(50) if v % 2 == 0 and v > 10]
[12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48]

For me, this raised the question about order of evaluation, and if multiple if clauses do boolean short-circuiting like an and statement does. So let's test that out:

>>> def foo():
...    raise ValueError()
...
>>> False and foo()
False
>>> [x for x in range(10) if False and foo()]
[]
>>> [x for x in range(10) if False if foo()]
[]

And we can see because we don't see an exception raised, that foo() was never called, so yes it is only evaluating the second if clause if the first evaluates to True.

If you want to really convince yourself they're equivalent, you can use the dis module to see the equivalent bytecode for the two snippets:

>>> import dis
>>> dis.dis("[x for x in range(10) if False and foo()]")
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x10c4aeed0, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (range)
              8 LOAD_CONST               2 (10)
             10 CALL_FUNCTION            1
             12 GET_ITER
             14 CALL_FUNCTION            1
             16 RETURN_VALUE
>>> dis.dis("[x for x in range(10) if False if foo()]")
  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x10c4aeed0, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (range)
              8 LOAD_CONST               2 (10)
             10 CALL_FUNCTION            1
             12 GET_ITER
             14 CALL_FUNCTION            1
             16 RETURN_VALUE

From this we can see that the corresponding bytecode is identical.

Probably unsurprisingly these apply to not only list comprehensions but set and dict comprehensions as well:

>>> names = ["adam", "bob", "andrew", "adam", "fred"]
>>> [name for name in names if len(name) == 4 if name.startswith("a")]
['adam', 'adam']
>>> {name for name in names if len(name) == 4 if name.startswith("a")}
{'adam'}
>>> {name: name.title() for name in names if len(name) == 4 if name.startswith("a")}
{'adam': 'Adam'}

The first snippet gets all names that are four characters long & start with the letter "a". Because it's a list, duplicates are matched. The second does the same, but as a set comprehension and since it's a set, duplicates are filtered out, so we only see "adam" once. The last one creates a dict mapping the name as it's in the original list to the titlecased version of the name.