Skip to content Skip to sidebar Skip to footer

Read Dataframe Split By Nan Rows And Reshape Them Into Multiple Dataframes In Python

I have a example excel file data1.xlsx from here, which has a Sheet1 as follows: Now I want to read it with openpyxl or pandas, then convert them into new df1 and df2, I will fina

Solution 1:

Use:

#add header=Nonefordefault columns names
df = pd.read_excel('./data1.xlsx', sheet_name ='Sheet1', header=None)

#convert columns bysecondrow
df.columns = df.iloc[1].rename(None)

#createnewcolumn `city` by forward filling non missing valuesbysecondcolumn
df.insert(0, 'city', df.iloc[:, 0].mask(df.iloc[:, 1].notna()).ffill())
#convert floats to integers 
df.columns = [int(x) if isinstance(x, float) else x for x in df.columns]
#convertcolumnyearto index
df = df.set_index('year')

print(df)city2018    2019    2020  sumyearbjbjNaNNaNNaNNaNyearbj2018.0  2019.0  2020.0  sumpricebj12.04.05.021quantitybj5.05.03.013NaNbjNaNNaNNaNNaNshshNaNNaNNaNNaNyearsh2018.0  2019.0  2020.0  sumpricesh5.06.07.018quantitysh7.05.04.016NaNshNaNNaNNaNNaNNaNshNaNNaNNaNNaNgzgzNaNNaNNaNNaNyeargz2018.0  2019.0  2020.0  sumpricegz2.03.01.06quantitygz6.09.03.018NaNgzNaNNaNNaNNaNNaNgzNaNNaNNaNNaNszszNaNNaNNaNNaNyearsz2018.0  2019.0  2020.0  sumpricesz8.02.03.013quantitysz5.04.03.012

df1 = df.loc['price'].reset_index(drop=True)
print (df1)
  city  201820192020sum0   bj  12.04.05.0211   sh   5.06.07.0182   gz   2.03.01.063   sz   8.02.03.013

df2 = df.loc['quantity'].reset_index(drop=True)
print (df2)
  city  201820192020sum0   bj   5.05.03.0131   sh   7.05.04.0162   gz   6.09.03.0183   sz   5.04.03.012

Last write DataFrames to existing file is possible by mode='a' parameter, link:

with pd.ExcelWriter('data1.xlsx', mode='a') as writer:  
    df1.to_excel(writer, sheet_name='price')
    df2.to_excel(writer, sheet_name='quantity')

Post a Comment for "Read Dataframe Split By Nan Rows And Reshape Them Into Multiple Dataframes In Python"