Matrix Multiplication Using Hdf5
I'm trying to multiplicate 2 big matrices with memory limit using hdf5 (pytables) but function numpy.dot seems to give me error: Valueerror: array is too big I need to do matrix
Solution 1:
I don't know of a np.dot that work without loading into memory. I think blocking would work pretty well. Create a an output array (called "c" below) as pytables CArray and fill in blocks. You should choose the chunkshape when you create it to match your blocking scheme. Something like
atom = tables.Float32Atom() # you have UInt8Atom() above. do you mean that?
shape = (a.shape[0], b.shape[1])
# you can vary block_size and chunkshape independently, but I would
# aim to have block_size an integer multiple of chunkshape
# your mileage may vary and depends on the array size and how you'll
# access it in the future.
Nchunk = 128 # ?
chunkshape = (Nchunk, Nchunk)
chunk_multiple = 1
block_size = chunk_multiple * Nchunk
c = h5f.create_carray(h5.root, 'c', atom, shape, chunkshape=chunkshape)
for i_start in range(0, a.shape[0], block_size):
for j_start in range(0, b.shape[1], block_size):
for k_start in range(0, a.shape[1], block_size):
c[i_start:i_start+block_size, j_start:j_start + block_size] += \
np.dot(a[i_start:i_start + block_size, k_start:k_start + block_size],
b[k_start:k_start + block_size, j_start:j_start + block_size]
Post a Comment for "Matrix Multiplication Using Hdf5"