Skip to content Skip to sidebar Skip to footer

Indexed Lookup On Pandas Dataframe. Why So Slow? How To Speed Up?

Suppose I have an pandas series that I'd like to function as a multimap (multiple values for each index key): # intval -> data1 a = pd.Series(data=-np.arange(100000),

Solution 1:

Repeated indices are guaranteed to slow down your dataframe indexing operations. You can amend your inputs to prove this to yourself:

a = pd.Series(data=-np.arange(100000), index=np.random.randint(0, 50000, 100000))
%timeit a.loc[common]  # 34.1 ms

a = pd.Series(data=-np.arange(100000), index=np.arange(100000))
%timeit a.loc[common]  # 6.86 ms

As mentioned in this related question:

When index is unique, pandas use a hashtable to map key to value O(1). When index is non-unique and sorted, pandas use binary search O(logN), when index is random ordered pandas need to check all the keys in the index O(N).

Post a Comment for "Indexed Lookup On Pandas Dataframe. Why So Slow? How To Speed Up?"