Skip to content Skip to sidebar Skip to footer

Use Information Of Two Arrays To Create A Third One

I have two numpy-arrays and want to create a third one with the information in these twos. Here is a simple example: have = np.array([[1, 2, 3, 4], [5, 6, 7, 8]]) use = np.array([[

Solution 1:

If there are only small such data structures and performance is not an issue then you can do this so simple:

np.array([ [a[0]]*b[0]+list(a[b[0]:]) for a,b in zip(have,use)])   

Solution 2:

Simply iterate through the have and replace the values based on the use.

Use:

foriinrange(use.shape[0]):
    have[i, :use[i, 0]] = np.repeat(have[i, 0], use[i, 0])

Using only numpy operations:

First create a boolean mask of same size as have. mask(i, j) is True if j < use[i, j] otherwise it's False. So mask is True for indices which are to be replaced by first column value. Now use np.where to replace.

n, m = have.shapemask= np.repeat(np.arange(m)[None, :], n, axis = 0) < usehave= np.where(mask, have[:, 0:1], have)

Output:

>>> have
array([[1, 1, 3, 4],
       [5, 5, 5, 8]])

Solution 3:

If performance matters, you can use np.apply_along_axis().

import numpy as np

have = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
use = np.array([[2], [3]])


defrep1st(arr):
    rep = arr[0]
    res = np.repeat(arr[1], rep)
    res = np.concatenate([res, arr[rep+1:]])
    return res


solution = np.apply_along_axis(rep1st, 1, np.concatenate([use, have], axis=1))

update:

As @hpaulj said, actually the method using apply_along_axis above is not as efficient as I expected. I misunderstood it. Reference: numpy np.apply_along_axis function speed up?.

However, I made some test on current methods:

import numpy as np
from timeit import timeit


defrep1st(arr):
    rep = arr[0]
    res = np.repeat(arr[1], rep)
    res = np.concatenate([res, arr[rep + 1:]])
    return res


deftest(row, col, run):
    have = np.random.randint(0, 100, size=(row, col))
    use = np.random.randint(0, col, size=(row, 1))
    d = locals()
    d.update(globals())
    # method by me
    t1 = timeit("np.apply_along_axis(rep1st, 1, np.concatenate([use, have], axis=1))", number=run, globals=d)
    # method by @quantummind
    t2 = timeit("np.array([[a[0]] * b[0] + list(a[b[0]:]) for a, b in zip(have, use)])", number=run, globals=d)
    # method by @Amit Vikram Singh
    t3 = timeit(
        "np.where(np.repeat(np.arange(have.shape[1])[None, :], have.shape[0], axis=0) < use, have[:, 0:1], have)",
        number=run, globals=d
    )
    print(f"{t1:8.6f}, {t2:8.6f}, {t3:8.6f}")


test(1000, 10, 10)
test(100, 100, 10)
test(10, 1000, 10)

test(1000000, 10, 1)
test(100000, 100, 1)
test(10000, 1000, 1)
test(1000, 10000, 1)
test(100, 100000, 1)
test(10, 1000000, 1)

results:

0.062488, 0.028484, 0.000408
0.010787, 0.013811, 0.000270
0.001057, 0.009146, 0.000216

6.146863, 3.210017, 0.044232
0.585289, 1.186013, 0.034110
0.091086, 0.961570, 0.026294
0.039448, 0.917052, 0.022553
0.028719, 0.919377, 0.022751
0.035121, 1.027036, 0.025216

It shows that the second method proposed by @Amit Vikram Singh always works well even when the arrays are huge.

Post a Comment for "Use Information Of Two Arrays To Create A Third One"