Skip to content Skip to sidebar Skip to footer

Why Is Multiprocessing.pool.map Slower Than Builtin Map?

import multiprocessing import time from subprocess import call,STDOUT from glob import glob import sys def do_calculation(data): x = time.time() with open(data + '.classe

Solution 1:

When you use multiprocessing, it behooves you to give the worker processes enough computation to last for at least a few seconds. If the worker process ends too quickly, then too much time is spent setting up the pool, spawning the subprocess, and (potentially) switching between processes (and not enough time actually doing the intended computation) to justify using multiprocessing.

Also, if you have a CPU-bound computation, then initializing a pool with more processes than cores (multiprocessing.cpu_count()) is counter-productive. It will make the OS switch between processes while not allowing the computation to proceed any faster.

Solution 2:

def do_calculation(data):
    x = time.time()
    with open(data + '.classes.report','w') as f:
        call(["external script", data], stdout = f.fileno(), stderr=STDOUT)
    return'apk: {data!s} time {tim!s}'.format(data = data ,tim = time.time()-x)

You are measuring the time required to perform a single task. If you run your tasks in parallel, each individual task doesn't get shorter. Rather, they all run at the same time. In other words, you are measuring this wrong, you should be calculating the total time for all tasks not each task individually.

The slowness is probably because running the various tasks at the same time interferes with each other somewhat and so the tasks don't run at full speed.

Post a Comment for "Why Is Multiprocessing.pool.map Slower Than Builtin Map?"