This is an interesting and often tackled task in programming, and especially prevalent in NLP and Data Science. Often one has a list of “things” with many repeats and wants to know which ones are the most popular.
Data Examples
1 |
my_list = [1,1,1,1,5,5,5,3,2,2,2,3,4,5,5,5,5,5,5,6,6,6,6] |
or:
1 |
my_sentence = "The cat sat on the mat and read a newspaper" |
Which is the most popular number? Which is the most popular letter?
The Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
from collections import defaultdict from operator import itemgetter def freq_table(l): """Takes a list 'l' and returns a list of lists where each item is ['item', frequency], sorted descending.""" all_freqs = defaultdict(int) #to temporarily hold values for item in l: all_freqs[item] += 1 #add to dictionary #convert the dictionary to a list of lists all_freqs = [[k,v] for k,v in all_freqs.iteritems()] #return a sorted version return sorted(all_freqs, key=itemgetter(1), reverse=True) |
For our data, we now get:
1 2 |
>>> freq_table(my_list) [[5, 9], [1, 4], [6, 4], [2, 3], [3, 2], [4, 1]] |
So 5 is the most popular item, with 9 appearances.
1 2 |
>>> freq_table(my_sentence) [[' ', 9], ['a', 7], ['e', 5], ['t', 4], ['n', 3], ['d', 2], ['h', 2], ['p', 2], ['s', 2], ['r', 2], ['c', 1], ['m', 1], ['o', 1], ['T', 1], ['w', 1]] |
So the space is the most popular, followed by a, e and t.