Skip to content Skip to sidebar Skip to footer

Pandas: Turn Multiple Variables Into A Single Set Of Dummy Variables

I have a column with categories (A, B, C, D) I want to turn into dummy variables. Problem is, this column can contain multiple categories per row, like this: DF = pd.DataFrame({'Co

Solution 1:

Simplest way is

DF.Col.str.get_dummies(', ')

   ABCD0100011100210103011140001

Slightly more complicated

from sklearn.preprocessing import MultiLabelBinarizer
from numpy.core.defchararray import split

mlb = MultiLabelBinarizer()
s = DF.Col.values.astype(str)
d = mlb.fit_transform(split(s, ', '))

pd.DataFrame(d, columns=mlb.classes_)

   A  B  C  D
0100011100210103011140001

Solution 2:

By using pd.crosstab

import pandas as pd
df = pd.DataFrame({'Col':['A', 'A,B', 'A,C', 'B,C,D', 'D']})
df.Col=df.Col.str.split(',')
df1=df.Col.apply(pd.Series).stack()
pd.crosstab(df1.index.get_level_values(0),df1)

Out[893]: 
col_0  A  B  C  D
row_0            
0100011100210103011140001

Post a Comment for "Pandas: Turn Multiple Variables Into A Single Set Of Dummy Variables"