Indexed Lookup On Pandas Dataframe. Why So Slow? How To Speed Up?

July 16, 2023 Post a Comment

Suppose I have an pandas series that I'd like to function as a multimap (multiple values for each index key): # intval -> data1 a = pd.Series(data=-np.arange(100000),

Solution 1:

Repeated indices are guaranteed to slow down your dataframe indexing operations. You can amend your inputs to prove this to yourself:

a = pd.Series(data=-np.arange(100000), index=np.random.randint(0, 50000, 100000))
%timeit a.loc[common]  # 34.1 ms

a = pd.Series(data=-np.arange(100000), index=np.arange(100000))
%timeit a.loc[common]  # 6.86 ms

As mentioned in this related question:

When index is unique, pandas use a hashtable to map key to value O(1). When index is non-unique and sorted, pandas use binary search O(logN), when index is random ordered pandas need to check all the keys in the index O(N).

Python Freelancers

Indexed Lookup On Pandas Dataframe. Why So Slow? How To Speed Up?

Solution 1:

Post a Comment for "Indexed Lookup On Pandas Dataframe. Why So Slow? How To Speed Up?"