Skip to content Skip to sidebar Skip to footer

Sharing Numpy Arrays In Python Multiprocessing Pool

I'm working on some code that does some fairly heavy numerical work on a large (tens to hundreds of thousands of numerical integrations) set of problems. Fortunately, these integr

Solution 1:

I had a similar problem. If you just want to read my solution skip some lines :) I had to:

  • share a numpy.array between threads operating on different part of it and...
  • pass Pool.map a function with more then one argument.

I noticed that:

  • the data of the numpy.array was correctly read but...
  • changes on the numpy.array where not made permanent
  • Pool.map had problems handling lambda functions, or so it appeared to me (if this point is not clear to you, just ignore it)

My solution was to:

  • make the target function only argument a list
  • make the target function return the modified data instead of directly trying to write on the numpy.array

I understand that your do_work function already return the computed data, so you would just have to modify to_work to accept a list (containing X,param_1,param_2 and arg) as argument and to pack the input to the target function in this format before passing it to Pool.map.

Here is a sample implementation:

def do_work2(args):
    X,param_1,param_2,arg =args
    returnheavy_computation(X, param_1, param_2, arg)

Now you have to pack the input to the do_work function before calling it. Your main become:

if __name__=='__main__':
   filename = raw_input("Filename> ")
   param_1 = float(raw_input("Parameter 1: "))
   param_2 = float(raw_input("Parameter 2: "))
   X = parse_numpy_array(filename)
   # now you pack the input arguments
   arglist = [[X,param1,param2,n] for n in linspace(0.0,1.0,100)]
   # consider that you're not making 100 copies of X here. You're just passing a reference to it
   results = Pool.map(do_work2,arglist)
   #save results in a .npy file for analysis
   save("Results", [X,results])

Solution 2:

To make your last idea work, I think you can simply make X, param_1, and param_2 global variables by using the global keyword before modifying them inside the if statement. So add the following:

global X
global param_1
global param_2

directly after the if __name__ == '__main__'.

Post a Comment for "Sharing Numpy Arrays In Python Multiprocessing Pool"