How Can I Summarize Several Pandas Dataframe Columns Into A Parent Column Name?
I've a dataframe which looks like this some feature another feature label sample 0 ... ... ... and I'd like to get a dataframe with multiind
Solution 1:
From the API it's not clear to me how to use
from_arrays()
,from_product()
,from_tuples()
orfrom_frame()
correctly.
It is mainly used, if generate new DataFrame with MultiIndex independent of original columns names.
So it means if need completely new MultiIndex
, e.g. by lists or arrays:
a = ['a','a','b']
b = ['x','y','z']
df.columns = pd.MultiIndex.from_arrays([a,b])
print (df)
a b
x y z
sample
0 2 3 5
1 4 5 7
EDIT1: If want set all columns to MultiIndex
all columns same way without last one:
a = ['parent'] * (len(df.columns) - 1) + ['label']
b = df.columns[:-1].tolist() + ['val']
df.columns = pd.MultiIndex.from_arrays([a,b])
print (df)
parent label
feature a feature b val
sample
0 2 3 5
1 4 5 7
It is possible by split
, but if some column(s) without separator get NaN
s for second level, because is not possible combinations MultiIndex and not MultiIndex columns (actaully yes, but get tuples from MultiIndex columns):
print (df)
feature_a feature_b label
sample
0 2 3 5
1 4 5 7
df.columns = df.columns.str.split(expand=True)
print (df)
feature label
a b NaN
sample
0 2 3 5
1 4 5 7
So better is convert all columns without separator to Index/MultiIndex
first by DataFrame.set_index
:
df = df.set_index('label')
df.columns = df.columns.str.split(expand=True)
print (df)
feature
a b
label
5 2 3
7 4 5
For prevent original index is used append=True
parameter:
df = df.set_index('label', append=True)
df.columns = df.columns.str.split(expand=True)
print (df)
feature
a b
sample label
0 5 2 3
1 7 4 5
Post a Comment for "How Can I Summarize Several Pandas Dataframe Columns Into A Parent Column Name?"