Remove Outliers (+/- 3 Std) And Replace With Np.nan In Python/pandas
I have seen several solutions that come close to solving my problem link1 link2 but they have not helped me succeed thus far. I believe that the following solution is what I need,
Solution 1:
If I have understood you right, there is no need to iterate over the columns. This solution replaces all values which deviates more than three group standard deviations with NaN.
def replace(group, stds):
group[np.abs(group - group.mean()) > stds * group.std()] = np.nan
return group
# df is your DataFrame
df.loc[:, df.columns != group_column] = df.groupby(group_column).transform(lambda g: replace(g, 3))
Post a Comment for "Remove Outliers (+/- 3 Std) And Replace With Np.nan In Python/pandas"