Skip to content Skip to sidebar Skip to footer

Drop Columns Contains Certain Strings While Reading Data : Python

I'm reading .txt files in a directory and want to drop columns that contains some certain string. for file in glob.iglob(files + '.txt', recursive=True): cols = list(

Solution 1:

Without reading the header separately you would pass a callable to usecols. Check whether 'EASY' or 'TRIVIAL' are not in the column name.

exclu = ['EASY', 'TRIVIAL']  # Any substring in this list excludes a column 
usecols = lambda x: not any(substr in x for substr in exclu)

df = pd.read_csv('test.csv', usecols=usecols)

0     2       4
1     6       8
2     1       1

Sample Data: test.csv


Solution 2:

few issues in your code, first you are using str.contains on the whole dataframe not the columns, secondly the str contains cannot be used on a list.

using regex

importrecols= pd.read_csv(file, nrows =1)

cols_to_use = [i for i in cols.columns if not'TRIVIAL|EASY',i)] 

df=pd.read_csv(file,header=0, skiprows=0, skipfooter=0, usecols  =cols_to_use)

Post a Comment for "Drop Columns Contains Certain Strings While Reading Data : Python"