Skip to content Skip to sidebar Skip to footer

Python Pandas Dataframe Fill Nan With Other Series

I want to fill NaN values in a DataFrame (df) column (var4) based on a control table (fillna_mean) using column mean, and var1 as index.In the dataframe I want them to match on var

Solution 1:

you can use boolean indexing in conjunction with .map() method:

In [178]: fillna.set_index('var1', inplace=True)

In [179]: df.loc[df.var4.isnull(), 'var4'] = df.loc[df.var4.isnull(), 'var1'].map(fillna['mean'])

In [180]: df
Out[180]:
   var1  var2  var3  var4
0     a     0401.01     a     1972.02     a     2341.03     b     363.04     b     4192.05     c     5476.56     c     6651.07     c     72934.08     c     8486.59     d     98810.010    d    104012.011    d    112312.0

Explanation:

In[184]: df.loc[df.var4.isnull()]Out[184]:
  var1var2var3var42a275NaN5c575NaN8c844NaN9d934NaNIn[185]: df.loc[df.var4.isnull(), 'var1']Out[185]:
2a5c8c9dName: var1, dtype: objectIn[186]: df.loc[df.var4.isnull(), 'var1'].map(fillna['mean'])
Out[186]:
21.056.586.5910.0Name: var1, dtype: float64

UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.

Solution 2:

Get faster results with combine_first, and you don't bother you filter out nonnull data:

fillna.set_index('var1', inplace=True)

df.var4 = df.var4.combine_first(df.var1.map(fillna['mean']))

Post a Comment for "Python Pandas Dataframe Fill Nan With Other Series"