Category Archives: multiprocessing

Shared Variables in Python Multiprocessing to pre-map/reduce

I’ve been using the multiprocessing library in Python quite a bit recently and started using the shared variable functionality. It can change something like this from my previous post:

Into a much nicer:

Thus eliminating the reduce stage. This is especially useful if you have a shared dictionary which you’re updating from multiple servers. There’s another possible shared datatype called Array, which, as it suggests, is a shared array. Note: One pitfall (that I fell for) is thinking that the "i"  in Value("i", 0)  is the name of the variable. Actually, its a typecode which stands for “integer”.

There are other ways to do this, however, each of which has its own trade offs:

# Solution Advantages Disadvantages
1 Shared file Easy to implement and access after Very slow
2 Shared mongoDB document Easy to implement Slow to constantly query for it
3 Multiprocessing Value/Array (this example) Very fast, easy to implement On 1 PC only, can’t be accessed after process is killed
4 Memcached Shared Value Networked aspect is useful for big distributed databases, shared.set() function is already available TCP could slow you down a bit

Really simple multi-processing in Python – and then linking to MongoDB

Multiprocessing in Python is really easy! You can spawn processes by using the Pool()  class.

Here, we spawn 4 processes, and use the map() function to send a number to each of them.

This is a trivial example, but gets much more powerful when each process does something like making a remote connection: