In Python, there is the dictionary datatype. This is basically a look-up table:
1 2 3 |
d = {} d['hello'] = 'bonjour' # a string d['world'] = ['monde', u'世界'] # a list of strings |
Let’s try and create a frequency table for words though:
1 2 3 4 5 6 7 8 |
def make_frequency_table(list_of_things): table = {} for item in list_of_things: try: table[item] += 1 except KeyError: table[item] = 1 return table |
This bad code will eventually return a dictionary with entries like:
thing: how_many_times_it_occurred
However, we have to do this try-except statement in case the key doesn’t exist. Defaultdicts get rid of this stage by returning a blank entry (0, empty string, empty list) instead, which is really awesome!
1 2 3 4 5 6 7 |
from collections import defaultdict def make_frequency_table(list_of_things): table = defaultdict(int) for item in list_of_things: table[item] += 1 return table |
If the key didn’t already exist in our look-up table, then the defaultdict returns an <int> to write the new value! This defaultdict(int) could be replaced with defaultdict(list) or any data type.
And now to the crux of the post! We can replace this variable type with a lambda instead, like this:
1 |
table = defaultdict(lambda: {'frequency':0, 'last_seen_at':0 }) |
Now, when the key doesn’t exist, the dictionary will create a new dictionary within! So we can bring another metric into our analysis:
1 2 3 4 5 6 7 8 |
from collections import defaultdict def make_frequency_table(list_of_things): table = defaultdict(lambda: {'frequency':0, 'last_seen_at':0 }) for position, item in enumerate(list_of_things): table[item]['frequency'] += 1 table[item]['last_seen_at'] = position return table |
Now our function will return a dictionary that not only lets you know how many times something occurred, but also when it last occurred! Try it out with the following data:
1 |
dataset = [1,1,1,1,2,2,2,2,2,5,5,5,1,4,5,6,8,9] |