Skip to content Skip to sidebar Skip to footer

Pandas Read_csv Alters The Columns When It Starts With 0

I have a script where I read from a csv file some zipcodes. The format of the zipcodes are like this: zipcode 75180 90672 01037 20253 09117 31029 07745 90453 12105 18140 36108 104

Solution 1:

You need to pass the dtype as str:

reader = pd.read_csv(file, sep=';', encoding='utf-8-sig', dtype=str)

to read those values as str:

In [152]:
import pandas as pd
import io
t="""zipcode
75180
90672
01037
20253
09117
31029
07745
90453
12105
18140
36108
10403
76470
06628
93105
88069
31094
84095
63069"""
df = pd.read_csv(io.StringIO(t), dtype=str)
df

Out[152]:
   zipcode
0751801906722    01037
3202534    09117
5310296    07745
79045381210591814010361081110403127647013   06628
14931051588069163109417840951863069

by default pandas sniffs the dytpes and in this case it thinks they are numeric so you lose leading zeroes

You can also do this as a post-processing step by casting to str and then using the vectorised str.zfill:

In [154]:
df['zipcode'] = df['zipcode'].astype(str).str.zfill(5)
df

Out[154]:
   zipcode
0    75180
1    90672
2    01037
3    20253
4    09117
5    31029
6    07745
7    90453
8    12105
9    18140
10   36108
11   10403
12   76470
13   06628
14   93105
15   88069
16   31094
17   84095
18   63069

Post a Comment for "Pandas Read_csv Alters The Columns When It Starts With 0"