List containment checks PREMIUM

Series: Data Structures

Trey Hunner

3 min. read • 2 min. video • Python 3.10—3.14 • Dec. 9, 2022

Watch as video

01:43

Show captions

Autoplay

Auto-expand

Let's talk about containment checks on lists in Python.

Using the `in` operator on a list

Have you ever used the in operator to check whether a list contains a certain value?

This works:

>>> colors = ["purple", "green", "yellow", "blue"]
>>> "red" in colors
False

But what if you're doing this in a loop with a lot of items?

Containment checks on lists are relatively slow

Here we have a program called anadromes.py where we're looking for anadromes in a large list of words:

with open("words.txt") as word_file:
    words = word_file.read().split()

anadromes = []
for word in words:
    reverse = word[::-1]
    if reverse in words:
        print(word)
        anadromes.append(word)

print(f"Found {len(anadromes)} anadromes.")

In this program, we're looping over every word, and then we're checking to see whether the reverse of that word is also a valid word:

for word in words:
    reverse = word[::-1]
    if reverse in words:
        print(word)

This program takes a while to run (about 15 seconds in total):

$ python anadromes.py
abba
abut
aga

...

yay
yeh
yo
Found 369 anadromes.

It takes a while because when you use the in operator on a list, Python will loop over all of the items in that list until it finds a match. If it doesn't find a match, it'll end up looping over the whole list.

So the in operator on lists in Python is pretty much the same thing as using a for loop to check for equality with each item in a list:

for word in words:
    reverse = word[::-1]
    if any(other_word == reverse for other_word in words):
        print(word)

So the in operator is slow on lists. But using the in operator seems unavoidable here. Is there a way that we could make this code any faster?

Set containment checks are faster

We could make our code faster by using a set instead of a list.

If we turn our list into a set and then use the in operator on that new set, our code will run much faster:

with open("words.txt") as word_file:
    words = word_file.read().split()

word_set = set(words)
anadromes = []
for word in words:
    reverse = word[::-1]
    if reverse in word_set:
        print(word)
        anadromes.append(word)

print(f"Found {len(anadromes)} anadromes.")

Before it took over 10 seconds to run, but now it takes less than a second!

$ python anadromes.py
abba
abut
aga

...

yay
yeh
yo
Found 369 anadromes.

Sets and dictionaries in Python rely on hash values to look up their items so that they don't need to loop over all of the items to find a match:

>>> with open("words.txt") as word_file:
...     words = word_file.read().split()
...
>>> words[6071]
'desserts'
>>> hash(words[6071])
425545820893779226
>>> hash("desserts")
425545820893779226

If the hash value matches, then the item matches.

In fact, checking whether an item is in a set usually takes pretty much the same amount of time regardless of whether there's one item in that set or a million items.

In time complexity (Big O) terms, using in on a list is a linear operation but using in on a set is a constant time operation.