Wrangling A Data Frame In Pandas (python)
I have the following data in a csv file: from StringIO import StringIO import pandas as pd the_data = ''' ABC,2016-6-9 0:00,95,{'//Purple': [115L], '//Yellow': [403L], '//Blue': [
Solution 1:
Consider converting the dictionary column values as Python dictionaries using ast.literal_eval()
and then cast them as individual dataframes for final merge with original dataframe:
from io import StringIO
import pandas as pd
import ast
...
df = pd.read_csv(StringIO(the_data), header=None,
names=['Company', 'Date', 'Value', 'Dicts'])
dfList = []
for i in df['Dicts'].tolist():
result = ast.literal_eval(i.replace('L]', ']'))
result = {k.replace('//',''):v for k,v in result.items()}
temp = pd.DataFrame(result)
dfList.append(temp)
dictdf = pd.concat(dfList).reset_index(drop=True)
df = pd.merge(df, dictdf, left_index=True, right_index=True).drop(['Dicts'], axis=1)
print(df)
# Company Date Value Black Blue NPO-Green Pink Purple White-XYZ Yellow# 0 ABC 2016-6-9 0:00 95 NaN 16.0 NaN NaN 115 0.0 403.0# 1 ABC 2016-6-10 0:00 0 NaN 90.0 NaN NaN 219 0.0 381.0# 2 ABC 2016-6-11 0:00 0 NaN 31.0 NaN NaN 817 0.0 21.0# 3 ABC 2016-6-12 0:00 0 NaN 8888.0 NaN NaN 80 0.0 2011.0# 4 ABC 2016-6-13 0:00 0 NaN 4.0 NaN NaN 32 0.0 15.0# 5 DEF 2016-6-16 0:00 0 15.0 NaN 3.0 4.0 32 NaN NaN# 6 DEF 2016-6-17 0:00 0 15.0 NaN 0.0 4.0 32 NaN NaN# 7 DEF 2016-6-18 0:00 0 15.0 NaN 7.0 4.0 32 NaN NaN# 8 DEF 2016-6-19 0:00 0 15.0 NaN 14.0 4.0 32 NaN NaN# 9 DEF 2016-6-20 0:00 0 15.0 NaN 21.0 4.0 32 NaN NaN
Solution 2:
I really don't think this pandas can do much for you here. You're data is very obtuse and seems to me to be best dealt with using regular expressions. Here's my solution:
import re
static_cols = []
dynamic_cols = []
for line in the_data.splitlines():
if line == '':
continue# deal with static columns
x = line.split(',')
company, date, other = x[0:3]
keys = ['Company', 'Date', 'Other']
values = [company, date, other]
d = {i: j for i, j inzip(keys, values)}
static_cols.append(d)
# deal with dynamic columns
keys = re.findall(r'(?<=//)[^\']*', line)
values = re.findall(r'\d+(?=L)', line)
d = {i: j for i, j inzip(keys, values)}
dynamic_cols.append(d)
df1 = pd.DataFrame(static_cols)
df2 = pd.DataFrame(dynamic_cols)
df = pd.concat([df1, df2], axis=1)
And the output:
Also, your data had an extra column after the date I wasn't sure how to deal with so I just called it 'Other'. It wasn't included in your output, so you can easily remove it if you want as well.
Post a Comment for "Wrangling A Data Frame In Pandas (python)"