Skip to content Skip to sidebar Skip to footer

Why Is Numpy.array() Is Sometimes Very Slow?

I'm using the numpy.array() function to create numpy.float64 ndarrays from lists. I noticed that this is very slow when either the list contains None or a list of lists is provided

Solution 1:

I've reported this as a numpy issue. The report and patch files are here:

https://github.com/numpy/numpy/issues/3392

After patching:

# was 240 ms, best alternate version was 3.29
In [5]: %timeit numpy.array([None]*100000)
100 loops, best of 3: 7.49 ms per loop

# was 353 ms, best alternate version was 9.65
In [6]: %timeit numpy.array([[0.0]]*100000)
10 loops, best of 3: 23.7 ms per loop

Solution 2:

My guess would be that the code for converting lists just calls float on everything. If the argument defines __float__, we call that, otherwise we treat it like a string (throwing an exception on None, we catch that and puts in np.nan). The exception handling should be relatively slower.

Timing seems to verify this hypothesis:

import numpy as np
%timeit [None] * 100000
> 1000 loops, best of 3: 1.04 ms per loop

%timeit np.array([0.0] * 100000)
> 10 loops, best of 3: 21.3 ms per loop
%timeit [i.__float__() foriin [0.0] * 100000]
> 10 loops, best of 3: 32 ms per loop


def flt(d):
    try:
        returnfloat(d)
    except:
        return np.nan

%timeit np.array([None] * 100000, dtype=np.float64)
> 1 loops, best of 3: 477 ms per loop    
%timeit [flt(d) fordin [None] * 100000]
> 1 loops, best of 3: 328 ms per loop

Adding another case just to be obvious about where I'm going with this. If there was an explicit check for None, it would not be this slow above:

defflt2(d):                              
    if d isNone:
        return np.nan
    try:
        returnfloat(d)
    except:
        return np.nan

%timeit [flt2(d) for d in [None] * 100000]
> 10 loops, best of 3: 45 ms per loop

Post a Comment for "Why Is Numpy.array() Is Sometimes Very Slow?"