Skip to content Skip to sidebar Skip to footer

Pandas Read_table With Duplicate Names

When reading a table while specifying duplicate column names - let's say two different names - pandas 0.16.1 will copy the last two columns of the data over and over again. In [1]:

Solution 1:

Using duplicate values in indexes are inherently problematic. They lead to ambiguity. Code that you think works fine can suddenly fail on DataFrames with non-unique indexes. argmax, for instance, can lead to a similar pitfall when DataFrames have duplicates in the index.

It's best to avoid putting duplicate values in (row or column) indexes if you can. If you need to use a non-unique index, use them with care. Double-check the effect duplicate values have on the behavior of your code.

In this case, you could use

df = pd.read_csv('data', header=None) 
df.columns = ['one','two','one','two','one']

instead.


Post a Comment for "Pandas Read_table With Duplicate Names"