Writing Multiple Header Lines In Pandas.dataframe.to_csv
I am putting my data into NASA's ICARTT format for archvival. This is a comma-separated file with multiple header lines, and has commas in the header lines. Something like: 46,
Solution 1:
You can, indeed, just write the header lines before the data. pandas.DataFrame.to_csv
takes a path_or_buf
as its first argument, not just a pathname:
pandas.DataFrame.to_csv(path_or_buf, *args, **kwargs)
Here's an example:
#!/usr/bin/python2import pandas as pd
import numpy as np
import sys
# Make an example data frame.
df = pd.DataFrame(np.random.randint(100, size=(5,5)),
columns=['a', 'b', 'c', 'd', 'e'])
header = '\n'.join(
# I like to make sure the header lines are at least utf8-encoded.
[unicode(line, 'utf8') for line in
[ '1001',
'Daedalus, Stephen',
'Dublin, Ireland',
'Keys',
'MINOS',
'1,1',
'1904,06,16,1922,02,02',
'time_since_8am', # Ends up being the header name for the index.
]
]
)
withopen(sys.argv[1], 'w') as ict:
# Write the header lines, including the index variable for# the last one if you're letting Pandas produce that for you.# (see above).for line in header:
ict.write(line)
# Just write the data frame to the file object instead of# to a filename. Pandas will do the right thing and realize# it's already been opened.
df.to_csv(ict)
The result is just what you wanted - to write the header lines, and then call .to_csv()
and write that:
$ python example.py test && cattest
1001
Daedalus, Stephen
Dublin, Ireland
Keys to the tower
MINOS
1, 1
1904, 06, 16, 1922, 02, 02
time_since_8am,a,b,c,d,e
0,67,85,66,18,32
1,47,4,41,82,84
2,24,50,39,53,13
3,49,24,17,12,61
4,91,5,69,2,18
Sorry if this is too late to be useful. I work in archiving these files (and use Python), so feel free to drop me a line if you have future questions.
Solution 2:
Even though it's still some years and ndt's answer is quite nice, another possibility would be to write the header first and then use to_csv() with mode='a' (append):
# write the header
header = '46, 1001\nlastname, firstname\n,...'withopen('test.csv', 'w') as fp
fp.write(header)
# write the rest
df.to_csv('test.csv', header=True, mode='a')
It's maybe less effective due to the two write operations, though...
Post a Comment for "Writing Multiple Header Lines In Pandas.dataframe.to_csv"