Pandas Read_csv - How To Handle A Comma Inside Double Quotes That Are Themselves Inside Double Quotes
This is not the same question as double quoted elements in csv cant read with pandas. The difference is that in that question: 'ABC,DEF' was breaking the code. Here, 'ABC 'DE' ,F'
Solution 1:
Based on the two rows you have provided here is an option where the text file is read into a Series
object and then regex extract is used via Series.str.extract()
get the information you want in a DataFrame
:
withopen('so.txt') as f:
contents = f.readlines()
s = pd.Series(contents)
s
now looks like the following:
0 header1, header2, header3,header4\n
1 \n
2 2001-01-01,123456,"abc def",V4\n
3 \n
4 2001-01-02,789012,"ghi "jklm" n,op",V4
Now you can use regex extract to get what you want into a DataFrame
:
df = s.str.extract('^([0-9]{4}-[0-9]{2}-[0-9]{2}),([0-9]+),(.+),(\w{2})$')
# remove empty rowsdf = df.dropna(how='all')
df
looks like the following:
0 1 2 3
2 2001-01-01 123456 "abc def" V4
4 2001-01-02 789012 "ghi "jklm" n,op" V4
and you can set your columns names with df.columns = ['header1', 'header2', 'header3', 'header4']
Post a Comment for "Pandas Read_csv - How To Handle A Comma Inside Double Quotes That Are Themselves Inside Double Quotes"