Skip to content Skip to sidebar Skip to footer

Distance Matrix For Custom Distance

From what I understand, the scipy function scipy.spatial.distance_matrix returns the Minkowski distance for any pair of vectors from the provided matrices of vectors. Is there a wa

Solution 1:

It is quite straight forward to implement it yourself

Also the performance will very likely be better than the distance functions already implemented in scipy.

Most of the distance functions are applying one function on all pairs and sum them up eg. (A_ik-B_jk)**n for Minkowski distance and at the end there is some other function applied eg. acc**(1/n).

Template function

You don't have to change anything here to implement various distance functions.

import numpy as np
import numba as nb

defgen_cust_dist_func(kernel_inner,kernel_outer,parallel=True):

    kernel_inner_nb=nb.njit(kernel_inner,fastmath=True,inline='always')
    kernel_outer_nb=nb.njit(kernel_outer,fastmath=True,inline='always')

    defcust_dot_T(A,B):
        assert B.shape[1]==A.shape[1]

        out=np.empty((A.shape[0],B.shape[0]),dtype=A.dtype)
        for i in nb.prange(A.shape[0]):
            for j inrange(B.shape[0]):
                acc=0for k inrange(A.shape[1]):
                    acc+=kernel_inner_nb(A[i,k],B[j,k])
                out[i,j]=kernel_outer_nb(acc)
        return out

    if parallel==True:
        return nb.njit(cust_dot_T,fastmath=True,parallel=True)
    else:
        return nb.njit(cust_dot_T,fastmath=True,parallel=False)

Examples and Timings

#Implement for example a Minkowski distance and euclidian distance#Minkowski distance p=20
inner=lambda A,B:(A-B)**20
outer=lambda acc:acc**(1./20)
my_minkowski_dist=gen_cust_dist_func(inner,outer,parallel=True)

#Euclidian distance
inner=lambda A,B:(A-B)**2
outer=lambda acc:np.sqrt(acc)
my_euclidian_dist=gen_cust_dist_func(inner,outer,parallel=True)

from scipy.spatial.distance import cdist

A=np.random.rand(1000,50)
B=np.random.rand(1000,50)

#Minkowski p=20
%timeit res_1=cdist(A,B,'m',p=20)
#1.44 s ± 8.18 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit res_2=my_minkowski_dist(A,B)
#10.8 ms ± 105 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
res_1=cdist(A,B,'m',p=20)
res_2=my_minkowski_dist(A,B)
print(np.allclose(res_1,res_2))
#True#Euclidian
%timeit res_1=cdist(A,B,'euclidean')
#39.3 ms ± 307 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit res_2=my_euclidian_dist(A,B)
#3.61 ms ± 22.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
res_1=res_1=cdist(A,B,'euclidean')
res_2=my_euclidian_dist(A,B)
print(np.allclose(res_1,res_2))
#True

Post a Comment for "Distance Matrix For Custom Distance"