Why Is Numpy.array() Is Sometimes Very Slow?
I'm using the numpy.array() function to create numpy.float64 ndarrays from lists. I noticed that this is very slow when either the list contains None or a list of lists is provided
Solution 1:
I've reported this as a numpy issue. The report and patch files are here:
https://github.com/numpy/numpy/issues/3392
After patching:
# was 240 ms, best alternate version was 3.29
In [5]: %timeit numpy.array([None]*100000)
100 loops, best of 3: 7.49 ms per loop
# was 353 ms, best alternate version was 9.65
In [6]: %timeit numpy.array([[0.0]]*100000)
10 loops, best of 3: 23.7 ms per loop
Solution 2:
My guess would be that the code for converting lists just calls float
on everything. If the argument defines __float__
, we call that, otherwise we treat it like a string (throwing an exception on None, we catch that and puts in np.nan
). The exception handling should be relatively slower.
Timing seems to verify this hypothesis:
import numpy as np
%timeit [None] * 100000
> 1000 loops, best of 3: 1.04 ms per loop
%timeit np.array([0.0] * 100000)
> 10 loops, best of 3: 21.3 ms per loop
%timeit [i.__float__() foriin [0.0] * 100000]
> 10 loops, best of 3: 32 ms per loop
def flt(d):
try:
returnfloat(d)
except:
return np.nan
%timeit np.array([None] * 100000, dtype=np.float64)
> 1 loops, best of 3: 477 ms per loop
%timeit [flt(d) fordin [None] * 100000]
> 1 loops, best of 3: 328 ms per loop
Adding another case just to be obvious about where I'm going with this. If there was an explicit check for None, it would not be this slow above:
defflt2(d):
if d isNone:
return np.nan
try:
returnfloat(d)
except:
return np.nan
%timeit [flt2(d) for d in [None] * 100000]
> 10 loops, best of 3: 45 ms per loop
Post a Comment for "Why Is Numpy.array() Is Sometimes Very Slow?"