8

I have a list of values. I wish to count during a loop the number of element for each class (i.e. 1,2,3,4,5)

mylist = [1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]
mydict = dict()
for index in mylist:
    mydict[index] = +1
mydict
Out[344]: {1: 1, 2: 1, 3: 1, 4: 1, 5: 1}

I wish to get this result

Out[344]: {1: 6, 2: 5, 3: 3, 4: 1, 5: 4}
9
  • 5
    collections.Counter, in your code you need: mydict[index] += 1 Commented Aug 20, 2013 at 19:28
  • Could i ask an example, please? thanks in advance Commented Aug 20, 2013 at 19:29
  • collections.Counter(mylist) and you're done. (Well, aside from importing collections, and you still need to do whatever you were going to do with the counts, but collections.Counter(mylist) is the entire "counting things" phase.) Commented Aug 20, 2013 at 19:29
  • 1
    Perhaps this will help Commented Aug 20, 2013 at 19:31
  • 1
    @GrijeshChauhan Yes you're right, I just wanted to point out that OP was using the operator incorrectly.(Though it'd raise KeyError.) Commented Aug 20, 2013 at 19:41

5 Answers 5

15

For your smaller example, with a limited diversity of elements, you can use a set and a dict comprehension:

>>> mylist = [1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]
>>> {k:mylist.count(k) for k in set(mylist)}
{1: 6, 2: 5, 3: 3, 4: 1, 5: 4}

To break it down, set(mylist) uniquifies the list and makes it more compact:

>>> set(mylist)
set([1, 2, 3, 4, 5])

Then the dictionary comprehension steps through the unique values and sets the count from the list.

This also is significantly faster than using Counter and faster than using setdefault:

from __future__ import print_function
from collections import Counter
from collections import defaultdict
import random

mylist=[1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]*10

def s1(mylist):
    return {k:mylist.count(k) for k in set(mylist)}

def s2(mlist):
    return Counter(mylist)

def s3(mylist):
    mydict=dict()
    for index in mylist:
        mydict[index] = mydict.setdefault(index, 0) + 1
    return mydict   

def s4(mylist):
    mydict={}.fromkeys(mylist,0)
    for k in mydict:
        mydict[k]=mylist.count(k)    
    return mydict    

def s5(mylist):
    mydict={}
    for k in mylist:
        mydict[k]=mydict.get(k,0)+1
    return mydict     

def s6(mylist):
    mydict=defaultdict(int)
    for i in mylist:
        mydict[i] += 1
    return mydict       

def s7(mylist):
    mydict={}.fromkeys(mylist,0)
    for e in mylist:
        mydict[e]+=1    
    return mydict    

if __name__ == '__main__':   
    import timeit 
    n=1000000
    print(timeit.timeit("s1(mylist)", setup="from __main__ import s1, mylist",number=n))
    print(timeit.timeit("s2(mylist)", setup="from __main__ import s2, mylist, Counter",number=n))
    print(timeit.timeit("s3(mylist)", setup="from __main__ import s3, mylist",number=n))
    print(timeit.timeit("s4(mylist)", setup="from __main__ import s4, mylist",number=n))
    print(timeit.timeit("s5(mylist)", setup="from __main__ import s5, mylist",number=n))
    print(timeit.timeit("s6(mylist)", setup="from __main__ import s6, mylist, defaultdict",number=n))
    print(timeit.timeit("s7(mylist)", setup="from __main__ import s7, mylist",number=n))

On my machine that prints (Python 3):

18.123854104997008          # set and dict comprehension 
78.54796334600542           # Counter 
33.98185228800867           # setdefault 
19.0563529439969            # fromkeys / count 
34.54294775899325           # dict.get 
21.134678319009254          # defaultdict 
22.760544238000875          # fromkeys / loop

For Larger lists, like 10 million integers, with more diverse elements (1,500 random ints), use defaultdict or fromkeys in a loop:

from __future__ import print_function
from collections import Counter
from collections import defaultdict
import random

mylist = [random.randint(0,1500) for _ in range(10000000)]

def s1(mylist):
    return {k:mylist.count(k) for k in set(mylist)}

def s2(mlist):
    return Counter(mylist)

def s3(mylist):
    mydict=dict()
    for index in mylist:
        mydict[index] = mydict.setdefault(index, 0) + 1
    return mydict   

def s4(mylist):
    mydict={}.fromkeys(mylist,0)
    for k in mydict:
        mydict[k]=mylist.count(k)    
    return mydict    

def s5(mylist):
    mydict={}
    for k in mylist:
        mydict[k]=mydict.get(k,0)+1
    return mydict     

def s6(mylist):
    mydict=defaultdict(int)
    for i in mylist:
        mydict[i] += 1
    return mydict       

def s7(mylist):
    mydict={}.fromkeys(mylist,0)
    for e in mylist:
        mydict[e]+=1    
    return mydict    

if __name__ == '__main__':   
    import timeit 
    n=1
    print(timeit.timeit("s1(mylist)", setup="from __main__ import s1, mylist",number=n))
    print(timeit.timeit("s2(mylist)", setup="from __main__ import s2, mylist, Counter",number=n))
    print(timeit.timeit("s3(mylist)", setup="from __main__ import s3, mylist",number=n))
    print(timeit.timeit("s4(mylist)", setup="from __main__ import s4, mylist",number=n))
    print(timeit.timeit("s5(mylist)", setup="from __main__ import s5, mylist",number=n))
    print(timeit.timeit("s6(mylist)", setup="from __main__ import s6, mylist, defaultdict",number=n))
    print(timeit.timeit("s7(mylist)", setup="from __main__ import s7, mylist",number=n))

Prints:

2825.2697427899984              # set and dict comprehension 
42.607481333994656              # Counter 
22.77713537499949               # setdefault 
2853.11187016801                # fromkeys / count 
23.241977066005347              # dict.get 
15.023175164998975              # defaultdict 
18.28165417900891               # fromkeys / loop

You can see that solutions that relay on count with a moderate number of times through the large list will suffer badly/catastrophically in comparison to other solutions.

Sign up to request clarification or add additional context in comments.

1 Comment

Does anyone know why the Cunter class is much slower than the default dict class?
6

Try collections.Counter:

   >>> from collections import Counter
   >>> Counter([1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5])
   Counter({1: 6, 2: 5, 5: 4, 3: 3, 4: 1})

In your code you can basically replace mydict with a Counter and write

mydict[index] += 1

instead of

mydict[index] = +1

2 Comments

Thanks but i am sorry because i need to find a way inside the loop. The list was an example because i have 32 GB of data to run in a loop/
If you have an iterator over your 32 GB of items, collections.Counter works the same with that as with a list.
4

To rectify code:

mydict[index] = +1

should be:

mydict[index] = mydict.setdefault(index, 0) + 1

5 Comments

Thanks. with your example i got {1: 7, 2: 6, 3: 4, 4: 2, 5: 5}. The best way is mydict[index] = mydict.get(index, 0) + 1
@Gianni Oh I just corrected, I am new Python learner so I posted a simple answer.
I'd say the best way is a Counter myself, due to simplicity, but this way works.
@user2387370 If this works I will be happy as I am just started Python :)
Hey @Gianni did you notice your expression mydict[index] = +1 is just mydict[index] = 1 so you always get 1 in value :). You might misspelled by mydict[index] += 1 but I suspect it would be a keyException as initially there is no value at mydict[index] read Ashwini's second comment to me in your question.
4

A variation on the setdefault approach is the collections.defaultdict. This is a bit faster.

def foo(mylist):
    d=defaultdict(int)
    for i in mylist:
        d[i] += 1
    return d

itertools.groupBy provides another option. It's speed is about the same as Counter (at least on 2.7)

{x[0]:len(list(x[1])) for x in itertools.groupby(sorted(mylist))}

However time tests on this small test list might not be the same when dealing the 32Gb of data that the OP mentions in a comment.


I ran several of these options in the word count case in python top N word count, why multiprocess slower then single process

There the OP used Counter, and was trying to speed things up by using multiprocessing. With a 1.2Mb text file, the counter using defaultdict was fast, take 0.2sec. Sorting the output to get the top 40 words took as long as the counting itself.

Counter was a bit slower on 3.2, and much slower on 2.7. That's because 3.2 a compiled version (.so file).

But the counter using mylist.count ground to a standstill when processing a large list; almost 200 sec. It has to search that large list many times, once to collect keys, and then once for each key when it counts.

Comments

1

Your code is assigning 1 as the value for each key. Replace mydict[index] = +1 with mylist.count(index)

This should work:

mylist = [1,1,1,1,1,1,2,3,2,2,2,2,3,3,4,5,5,5,5]
mydict = dict()
for index in mylist:
    mydict[index] = mylist.count(index)
mydict

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.