Read Dataframe Split By Nan Rows And Reshape Them Into Multiple Dataframes In Python
I have a example excel file data1.xlsx from here, which has a Sheet1 as follows: Now I want to read it with openpyxl or pandas, then convert them into new df1 and df2, I will fina
Solution 1:
Use:
#add header=Nonefordefault columns names
df = pd.read_excel('./data1.xlsx', sheet_name ='Sheet1', header=None)
#convert columns bysecondrow
df.columns = df.iloc[1].rename(None)
#createnewcolumn `city` by forward filling non missing valuesbysecondcolumn
df.insert(0, 'city', df.iloc[:, 0].mask(df.iloc[:, 1].notna()).ffill())
#convert floats to integers
df.columns = [int(x) if isinstance(x, float) else x for x in df.columns]
#convertcolumnyearto index
df = df.set_index('year')
print(df)city2018 2019 2020 sumyearbjbjNaNNaNNaNNaNyearbj2018.0 2019.0 2020.0 sumpricebj12.04.05.021quantitybj5.05.03.013NaNbjNaNNaNNaNNaNshshNaNNaNNaNNaNyearsh2018.0 2019.0 2020.0 sumpricesh5.06.07.018quantitysh7.05.04.016NaNshNaNNaNNaNNaNNaNshNaNNaNNaNNaNgzgzNaNNaNNaNNaNyeargz2018.0 2019.0 2020.0 sumpricegz2.03.01.06quantitygz6.09.03.018NaNgzNaNNaNNaNNaNNaNgzNaNNaNNaNNaNszszNaNNaNNaNNaNyearsz2018.0 2019.0 2020.0 sumpricesz8.02.03.013quantitysz5.04.03.012
df1 = df.loc['price'].reset_index(drop=True)
print (df1)
city 201820192020sum0 bj 12.04.05.0211 sh 5.06.07.0182 gz 2.03.01.063 sz 8.02.03.013
df2 = df.loc['quantity'].reset_index(drop=True)
print (df2)
city 201820192020sum0 bj 5.05.03.0131 sh 7.05.04.0162 gz 6.09.03.0183 sz 5.04.03.012
Last write DataFrame
s to existing file is possible by mode='a'
parameter, link:
with pd.ExcelWriter('data1.xlsx', mode='a') as writer:
df1.to_excel(writer, sheet_name='price')
df2.to_excel(writer, sheet_name='quantity')
Post a Comment for "Read Dataframe Split By Nan Rows And Reshape Them Into Multiple Dataframes In Python"