Sign in to your Python Morsels account to save your screencast settings.
Don't have an account yet? Sign up here.
Let's talk about containment checks on lists in Python.
in operator on a listHave you ever used the in operator to check whether a list contains a certain value?
This works:
>>> colors = ["purple", "green", "yellow", "blue"]
>>> "red" in colors
False
But what if you're doing this in a loop with a lot of items?
Here we have a program called anadromes.py where we're looking for anadromes in a large list of words:
with open("words.txt") as word_file:
words = word_file.read().split()
anadromes = []
for word in words:
reverse = word[::-1]
if reverse in words:
print(word)
anadromes.append(word)
print(f"Found {len(anadromes)} anadromes.")
In this program, we're looping over every word, and then we're checking to see whether the reverse of that word is also a valid word:
for word in words:
reverse = word[::-1]
if reverse in words:
print(word)
This program takes a while to run (about 15 seconds in total):
$ python anadromes.py
abba
abut
aga
...
yay
yeh
yo
Found 369 anadromes.
It takes a while because when you use the in operator on a list, Python will loop over all of the items in that list until it finds a match.
If it doesn't find a match, it'll end up looping over the whole list.
So the in operator on lists in Python is pretty much the same thing as using a for loop to check for equality with each item in a list:
for word in words:
reverse = word[::-1]
if any(other_word == reverse for other_word in words):
print(word)
So the in operator is slow on lists.
But using the in operator seems unavoidable here.
Is there a way that we could make this code any faster?
We could make our code faster by using a set instead of a list.
If we turn our list into a set and then use the in operator on that new set, our code will run much faster:
with open("words.txt") as word_file:
words = word_file.read().split()
word_set = set(words)
anadromes = []
for word in words:
reverse = word[::-1]
if reverse in word_set:
print(word)
anadromes.append(word)
print(f"Found {len(anadromes)} anadromes.")
Before it took over 10 seconds to run, but now it takes less than a second!
$ python anadromes.py
abba
abut
aga
...
yay
yeh
yo
Found 369 anadromes.
Sets and dictionaries in Python rely on hash values to look up their items so that they don't need to loop over all of the items to find a match:
>>> with open("words.txt") as word_file:
... words = word_file.read().split()
...
>>> words[6071]
'desserts'
>>> hash(words[6071])
425545820893779226
>>> hash("desserts")
425545820893779226
If the hash value matches, then the item matches.
In fact, checking whether an item is in a set usually takes pretty much the same amount of time regardless of whether there's one item in that set or a million items.
In time complexity (Big O) terms, using in on a list is a linear operation but using in on a set is a constant time operation.
If you're finding yourself using the in operator frequently on large lists, you might consider using a set or a dictionary instead.
We don't learn by reading or watching. We learn by doing. That means writing Python code.
Practice this topic by working on these related Python exercises.
Sign in to your Python Morsels account to track your progress.
Don't have an account yet? Sign up here.