I have two list, and I have to extract the items from the first list, from which the first element is present in the second. The code I have pasted bellow works perfectly but as I am operating with several million records, it is painfully slow. Does any one have any idea how it can be optimized?
a = [[1,0],[2,0],[3,0],[4,0]]
b = [2,4,7,8]
same_nums = list(set([x[0] for x in a]).intersection(set(b)))
result = []
for i in a:
if i[0] in same_nums:
result.append(i)
print(result)
Solution:
You are overcomplicating things. Just turn b into a set to speed up the contains check. Then one iteration of a in the comprehension will suffice:
set_b = set(b) # makes vvvvvvvvvvvvv O(1)
result = [x for x in a if x[0] in set_b]
Particular turning same_nums back into a list is a real performance killer as it makes the whole thing O(m*n) again. With a single set from b it is O(m+n). But same_nums is entirely unnecessary to begin with, since you know all the i[0] are in a as you are iterating a.