efficient way to count the element in a dictionary in Python using a loop

Question

I have a list of values. I wish to count during a loop the number of element for each class (i.e. 1,2,3,4,5)

mylist = [1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]
mydict = dict()
for index in mylist:
    mydict[index] = +1
mydict
Out[344]: {1: 1, 2: 1, 3: 1, 4: 1, 5: 1}

I wish to get this result

Out[344]: {1: 6, 2: 5, 3: 3, 4: 1, 5: 4}

collections.Counter, in your code you need: mydict[index] += 1 — Ashwini Chaudhary
– Ashwini Chaudhary, Commented Aug 20, 2013 at 19:28
collections.Counter(mylist) and you're done. (Well, aside from importing collections, and you still need to do whatever you were going to do with the counts, but collections.Counter(mylist) is the entire "counting things" phase.) — user2357112
– user2357112, Commented Aug 20, 2013 at 19:29
@GrijeshChauhan Yes you're right, I just wanted to point out that OP was using the operator incorrectly.(Though it'd raise KeyError.) — Ashwini Chaudhary
– Ashwini Chaudhary, Commented Aug 20, 2013 at 19:41

score 15 · Accepted Answer · 2013-08-22 21:20:15Z

For your smaller example, with a limited diversity of elements, you can use a set and a dict comprehension:

>>> mylist = [1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]
>>> {k:mylist.count(k) for k in set(mylist)}
{1: 6, 2: 5, 3: 3, 4: 1, 5: 4}

To break it down, set(mylist) uniquifies the list and makes it more compact:

>>> set(mylist)
set([1, 2, 3, 4, 5])

Then the dictionary comprehension steps through the unique values and sets the count from the list.

This also is significantly faster than using Counter and faster than using setdefault:

from __future__ import print_function
from collections import Counter
from collections import defaultdict
import random

mylist=[1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]*10

def s1(mylist):
    return {k:mylist.count(k) for k in set(mylist)}

def s2(mlist):
    return Counter(mylist)

def s3(mylist):
    mydict=dict()
    for index in mylist:
        mydict[index] = mydict.setdefault(index, 0) + 1
    return mydict   

def s4(mylist):
    mydict={}.fromkeys(mylist,0)
    for k in mydict:
        mydict[k]=mylist.count(k)    
    return mydict    

def s5(mylist):
    mydict={}
    for k in mylist:
        mydict[k]=mydict.get(k,0)+1
    return mydict     

def s6(mylist):
    mydict=defaultdict(int)
    for i in mylist:
        mydict[i] += 1
    return mydict       

def s7(mylist):
    mydict={}.fromkeys(mylist,0)
    for e in mylist:
        mydict[e]+=1    
    return mydict    

if __name__ == '__main__':   
    import timeit 
    n=1000000
    print(timeit.timeit("s1(mylist)", setup="from __main__ import s1, mylist",number=n))
    print(timeit.timeit("s2(mylist)", setup="from __main__ import s2, mylist, Counter",number=n))
    print(timeit.timeit("s3(mylist)", setup="from __main__ import s3, mylist",number=n))
    print(timeit.timeit("s4(mylist)", setup="from __main__ import s4, mylist",number=n))
    print(timeit.timeit("s5(mylist)", setup="from __main__ import s5, mylist",number=n))
    print(timeit.timeit("s6(mylist)", setup="from __main__ import s6, mylist, defaultdict",number=n))
    print(timeit.timeit("s7(mylist)", setup="from __main__ import s7, mylist",number=n))

On my machine that prints (Python 3):

18.123854104997008          # set and dict comprehension 
78.54796334600542           # Counter 
33.98185228800867           # setdefault 
19.0563529439969            # fromkeys / count 
34.54294775899325           # dict.get 
21.134678319009254          # defaultdict 
22.760544238000875          # fromkeys / loop

For Larger lists, like 10 million integers, with more diverse elements (1,500 random ints), use defaultdict or fromkeys in a loop:

from __future__ import print_function
from collections import Counter
from collections import defaultdict
import random

mylist = [random.randint(0,1500) for _ in range(10000000)]

def s1(mylist):
    return {k:mylist.count(k) for k in set(mylist)}

def s2(mlist):
    return Counter(mylist)

def s3(mylist):
    mydict=dict()
    for index in mylist:
        mydict[index] = mydict.setdefault(index, 0) + 1
    return mydict   

def s4(mylist):
    mydict={}.fromkeys(mylist,0)
    for k in mydict:
        mydict[k]=mylist.count(k)    
    return mydict    

def s5(mylist):
    mydict={}
    for k in mylist:
        mydict[k]=mydict.get(k,0)+1
    return mydict     

def s6(mylist):
    mydict=defaultdict(int)
    for i in mylist:
        mydict[i] += 1
    return mydict       

def s7(mylist):
    mydict={}.fromkeys(mylist,0)
    for e in mylist:
        mydict[e]+=1    
    return mydict    

if __name__ == '__main__':   
    import timeit 
    n=1
    print(timeit.timeit("s1(mylist)", setup="from __main__ import s1, mylist",number=n))
    print(timeit.timeit("s2(mylist)", setup="from __main__ import s2, mylist, Counter",number=n))
    print(timeit.timeit("s3(mylist)", setup="from __main__ import s3, mylist",number=n))
    print(timeit.timeit("s4(mylist)", setup="from __main__ import s4, mylist",number=n))
    print(timeit.timeit("s5(mylist)", setup="from __main__ import s5, mylist",number=n))
    print(timeit.timeit("s6(mylist)", setup="from __main__ import s6, mylist, defaultdict",number=n))
    print(timeit.timeit("s7(mylist)", setup="from __main__ import s7, mylist",number=n))

Prints:

2825.2697427899984              # set and dict comprehension 
42.607481333994656              # Counter 
22.77713537499949               # setdefault 
2853.11187016801                # fromkeys / count 
23.241977066005347              # dict.get 
15.023175164998975              # defaultdict 
18.28165417900891               # fromkeys / loop

You can see that solutions that relay on count with a moderate number of times through the large list will suffer badly/catastrophically in comparison to other solutions.

Does anyone know why the Cunter class is much slower than the default dict class?

Benjamin Peterson · Accepted Answer · 2013-08-20 19:30:24Z

6

Try collections.Counter:

   >>> from collections import Counter
   >>> Counter([1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5])
   Counter({1: 6, 2: 5, 5: 4, 3: 3, 4: 1})

In your code you can basically replace mydict with a Counter and write

mydict[index] += 1

instead of

mydict[index] = +1

answered Aug 20, 2013 at 19:30

Benjamin Peterson

20.9k6 gold badges36 silver badges44 bronze badges

2 Comments

Gianni Spear Over a year ago

Thanks but i am sorry because i need to find a way inside the loop. The list was an example because i have 32 GB of data to run in a loop/

user2357112 Over a year ago

If you have an iterator over your 32 GB of items, collections.Counter works the same with that as with a list.

Grijesh Chauhan · Accepted Answer · 2013-08-20 19:30:34Z

4

To rectify code:

mydict[index] = +1

should be:

mydict[index] = mydict.setdefault(index, 0) + 1

answered Aug 20, 2013 at 19:30

Grijesh Chauhan

58.6k20 gold badges146 silver badges214 bronze badges

5 Comments

Gianni Spear Over a year ago

Thanks. with your example i got {1: 7, 2: 6, 3: 4, 4: 2, 5: 5}. The best way is mydict[index] = mydict.get(index, 0) + 1

Grijesh Chauhan Over a year ago

@Gianni Oh I just corrected, I am new Python learner so I posted a simple answer.

rlms Over a year ago

I'd say the best way is a Counter myself, due to simplicity, but this way works.

Grijesh Chauhan Over a year ago

@user2387370 If this works I will be happy as I am just started Python :)

Grijesh Chauhan Over a year ago

Hey @Gianni did you notice your expression mydict[index] = +1 is just mydict[index] = 1 so you always get 1 in value :). You might misspelled by mydict[index] += 1 but I suspect it would be a keyException as initially there is no value at mydict[index] read Ashwini's second comment to me in your question.

Community · Accepted Answer · 2017-05-23 12:00:08Z

A variation on the setdefault approach is the collections.defaultdict. This is a bit faster.

def foo(mylist):
    d=defaultdict(int)
    for i in mylist:
        d[i] += 1
    return d

itertools.groupBy provides another option. It's speed is about the same as Counter (at least on 2.7)

{x[0]:len(list(x[1])) for x in itertools.groupby(sorted(mylist))}

However time tests on this small test list might not be the same when dealing the 32Gb of data that the OP mentions in a comment.

I ran several of these options in the word count case in python top N word count, why multiprocess slower then single process

There the OP used Counter, and was trying to speed things up by using multiprocessing. With a 1.2Mb text file, the counter using defaultdict was fast, take 0.2sec. Sorting the output to get the top 40 words took as long as the counting itself.

Counter was a bit slower on 3.2, and much slower on 2.7. That's because 3.2 a compiled version (.so file).

But the counter using mylist.count ground to a standstill when processing a large list; almost 200 sec. It has to search that large list many times, once to collect keys, and then once for each key when it counts.

Animal Spirits · Accepted Answer · 2013-08-20 20:49:45Z

1

Your code is assigning 1 as the value for each key. Replace mydict[index] = +1 with mylist.count(index)

This should work:

mylist = [1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]
mydict = dict()
for index in mylist:
    mydict[index] = mylist.count(index)
mydict

answered Aug 20, 2013 at 20:49

Animal Spirits

111 bronze badge

Collectives™ on Stack Overflow

efficient way to count the element in a dictionary in Python using a loop

5 Answers 5

1 Comment

2 Comments

5 Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

2 Comments

5 Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related