Skip to content Skip to sidebar Skip to footer

Generate A Numpy 1d Array With A Pre-specified Correlation With An Existing 1d Array?

I have a non-generated 1D NumPy array. For now, we will use a generated one. import numpy as np arr1 = np.random.uniform(0, 100, 1_000) I need an array that will be correlated 0.

Solution 1:

I've adapted this answer by whuber on stats.SE to NumPy. The idea is to generate a second array noise randomly, and then compute the residuals of a least-squares linear regression of noise on arr1. The residuals necessarily have a correlation of 0 with arr1, and of course arr1 has a correlation of 1 with itself, so an appropriate linear combination of a*arr1 + b*residuals will have any desired correlation.

import numpy as np

defgenerate_with_corrcoef(arr1, p):
    n = len(arr1)

    # generate noise
    noise = np.random.uniform(0, 1, n)

    # least squares linear regression for noise = m*arr1 + c
    m, c = np.linalg.lstsq(np.vstack([arr1, np.ones(n)]).T, noise)[0]

    # residuals have 0 correlation with arr1
    residuals = noise - (m*arr1 + c)

    # the right linear combination a*arr1 + b*residuals
    a = p * np.std(residuals)
    b = (1 - p**2)**0.5 * np.std(arr1)

    arr2 = a*arr1 + b*residuals

    # return a scaled/shifted result to have the same mean/sd as arr1# this doesn't change the correlation coefficientreturn np.mean(arr1) + (arr2 - np.mean(arr2)) * np.std(arr1) / np.std(arr2)

The last line scales the result so that the mean and standard deviation are the same as arr1's. However, arr1 and arr2 will not be identically distributed.

Usage:

>>> arr1 = np.random.uniform(0, 100, 1000)
>>> arr2 = generate_with_corrcoef(arr1, 0.3)
>>> np.corrcoef(arr1, arr2)
array([[1. , 0.3],
       [0.3, 1. ]])

Post a Comment for "Generate A Numpy 1d Array With A Pre-specified Correlation With An Existing 1d Array?"