Skip to content Skip to sidebar Skip to footer

Sort Rows Of Array To Match Order Of Another Array Using An Identifier Column

I have two arrays like this: A = [[111, ...], B = [[222, ...], [222, ...], [111, ...], [333, ...], [333, ...], [555, ...]]

Solution 1:

Here's a vectorized approach using np.searchsorted -

# Store the sorted indices of A
sidx = A[:,0].argsort()

# Find the indices of col-0 of B in col-0 of sorted A
l_idx = np.searchsorted(A[:,0],B[:,0],sorter = sidx)

# Create a mask corresponding to all those indices that indicates which indices# corresponding to B's col-0 match up with A's col-0
valid_mask = l_idx != np.searchsorted(A[:,0],B[:,0],sorter = sidx,side='right')

# Initialize output array with NaNs. # Use l_idx to set rows from A into output array. Use valid_mask to select # indices from l_idx and output rows that are to be set.
out = np.full((B.shape[0],A.shape[1]),np.nan)
out[valid_mask] = A[sidx[l_idx[valid_mask]]]

Please note that valid_mask could also be created using np.in1d : np.in1d(B[:,0],A[:,0]) for a more intuitive answer. But, we are using np.searchsorted as that's better in terms of performance as also disscused in greater detail in this other solution.

Sample run -

In [184]: A
Out[184]: 
array([[45, 11, 86],
       [18, 74, 59],
       [30, 68, 13],
       [55, 47, 78]])

In [185]: B
Out[185]: 
array([[45, 11, 88],
       [55, 83, 46],
       [95, 87, 77],
       [30,  9, 37],
       [14, 97, 98],
       [18, 48, 53]])

In [186]: out
Out[186]: 
array([[ 45.,  11.,  86.],
       [ 55.,  47.,  78.],
       [ nan,  nan,  nan],
       [ 30.,  68.,  13.],
       [ nan,  nan,  nan],
       [ 18.,  74.,  59.]])

Solution 2:

The simple approach is to build a dict from A and then use it to map identifiers found in B to the new array.

Constructing dict:

>>> A = [[1,"a"], [2,"b"], [3,"c"]]
>>> A_dict = {x[0]: x for x in A}
>>> A_dict
{1: [1, 'a'], 2: [2, 'b'], 3: [3, 'c']}

Mapping:

>>> B = [[3,"..."], [2,"..."], [1,"..."]]
>>> result = (A_dict[x[0]] for x in B)
>>> list(result)
[[3, 'c'], [2, 'b'], [1, 'a']]

Solution 3:

Its not clear if you wish to concatenate the values in B onto A. Lets assume not ... then the simplest way is probably to just build a dictionary of identifier to row and then reorder A:

defmatch_order(A, B):
    # identifier -> row
    by_id = {A[i, 0]: A[i] for i inrange(len(A))}

    # make up a fill row and rearrange according to B
    fill_row = [-1] * A.shape[1]
    return numpy.array([by_id.get(k, fill_row) for k in B[:, 0]])

As an example, if we have:

A = numpy.array([[111, 1], [222, 2], [333, 3], [555, 5]])
B = numpy.array([[222, 2], [111, 1], [333, 3], [444, 4], [555, 5]])

Then

>>> match_order(A, B)
array([[222,   2],
       [111,   1],
       [333,   3],
       [ -1,  -1],
       [555,   5]])

If you wish to concatenate B, then you can do so simply as:

>>> numpy.hstack( (match_order(A, B), B[:, 1:]) )
array([[222,   2,   2],
       [111,   1,   1],
       [333,   3,   3],
       [ -1,  -1,   4],
       [555,   5,   5]])

Solution 4:

>>> A = [[3,'d', 'e', 'f'], [1,'a','b','c'], [2,'n','n','n']]
>>> B = [[1,'a','b','c'], [3,'d','e','f']]
>>> A_dict = {x[0]:x[1:] for x in A}
>>> A_dict
    {1: ['a', 'b', 'c'], 2: ['n', 'n', 'n'], 3: ['d', 'e', 'f']}
>>> B_dict = {x[0]:x[1:] for x in B}
>>> B_dict
    {1: ['a', 'b', 'c'], 3: ['d', 'e', 'f']} 
>>> result=[[x] + A_dict[x] for x in A_dict if x in B_dict and A_dict[x]==B_dict[x]]
>>> result
    [[1, 'a', 'b', 'c'], [3, 'd', 'e', 'f']]

Here A[0], B[1] and A[1],B[0] are identical. Converting into a dict and dealing the problem makes life easier here.

Step 1: Create dict objects for each 2D list.

Step 2: Iterate each key in A_dict and check: a. If Key exists in B_dict, b. If yes, see if both keys have same value

Step 3: Append the key and value to form a 2-D list.

Cheers!

Post a Comment for "Sort Rows Of Array To Match Order Of Another Array Using An Identifier Column"