One great function in python is the ast (Abstract Syntax Tree) library’s literal_eval . This lets you read in a string version of a python datatype:
1 2 3 4 5 6 7 8 9 |
>>> from ast import literal_eval >>> # A sample dictionary with some trivial information >>> myDict = "{'someKey': 1, 'otherKey': 2}" >>> #Let's parse it using the function >>> testEval = literal_eval(myDict) >>> print testEval {'someKey': 1, 'otherKey': 2} >>> type(testEval) <type 'dict'> |
Importing a dictionary such as this is similar to parsing JSON using Python’s json.loads decoder. But it also comes with the shortcoming’s of JSON’s restrictive datatypes, as we can see here when the dictionary contains, for example, a datetime object:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
>>> myDict = "{'someKey': 1, 'rightNow': datetime.datetime(2013, 8, 10, 21, 46, 52, 638649)}" >>> >>> literal_eval(myDict) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/root/anaconda/lib/python2.7/ast.py", line 80, in literal_eval return _convert(node_or_string) File "/root/anaconda/lib/python2.7/ast.py", line 63, in _convert in zip(node.keys, node.values)) File "/root/anaconda/lib/python2.7/ast.py", line 62, in <genexpr> return dict((_convert(k), _convert(v)) for k, v File "/root/anaconda/lib/python2.7/ast.py", line 79, in _convert raise ValueError('malformed string') ValueError: malformed string |
So you might try and write some code to parse the dictionary data-type yourself. This gets very tricky, but eventually you could probably accommodate for all common data-types:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 |
def read_dict(d): """Accepts a string containing a dictionary. Tries to parse this and returns the dictionary.""" parsed_dict = {} #The result #Remove the {} and split into chunks d = d[1:-1].split(', ') #iterate through the chunks and try to interpret them for kv_pair in d: #split up by the central colon kv_pair = kv_pair.split(": ") #interpret the key and value k = whatAmI(kv_pair[0]) v = whatAmI(kv_pair[1]) #add to the final parsed dictionary parsed_dict[k] = v return parsed_dict def whatAmI(thing): """Simple attempt at interpreting a string of a datatype. Can deal with Strings and Ints""" #remove any inverted commas if thing.startswith("'") or thing.startswith('"'): thing = thing[1:-1] #Now check for data-types (there are way more than this though) if thing.isdigit(): return int(thing) #return the digit else: return thing #return the string #if not recognized by either, then return an error return "Corrupted data" |
But this still doesn’t truly fix our datetime object problem:
1 2 3 4 5 |
>>> read_dict(myDict) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 13, in read_dict IndexError: list index out of range |
Which is where we get to the crux of this post. I thought at first that I could deal with datetime’s formatting by extracting the class datetime.datetime(2013, 8, 10, 21, 46, 52, 638649) as a tuple by spotting the brackets, then feeding the tuple back into datetime like:
1 2 3 4 5 |
>>> x = (2013, 8, 10, 21, 46, 52, 638649) >>> parsedDate = datetime(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: an integer is required |
But apparently not. The tuple must be extracted – not by a lambda or perhaps list comprehension, but in fact by using asterisk notation:
1 2 3 |
>>> parsedDate = datetime(*x) >>> print parsedDate 2013-08-10 21:46:52.638649 |
Asterisk ( * ) unpacks an iterable such as x into positional arguments for the function. Simple!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
def whatAmI(thing): """Simple attempt at interpreting a string of a datatype. Can deal with str, int, datetime""" #remove any inverted commas if thing.startswith("'") or thing.startswith('"'): thing = thing[1:-1] #Now check for some data-types if thing.isdigit(): return int(thing) #return the digit elif thing.startswith('datetime'): #get the numbers between the brackets thing = thing.split('(')[1].split(')')[0] #parse the digits, unpack into datetime and return return datetime(*[int(x) for x in thing]) else: return thing #return the string #if not recognized by either, then return an error return "Corrupted data" |