Skip to content Skip to sidebar Skip to footer

Reshape Pandas.Df To Use In GridSearch

I am trying to use multiple feature columns in GridSearch with Pipeline. So I pass two columns for which I want to do a TfidfVectorizer, but I get into trouble when running the Gri

Solution 1:

TfidfVectorizer expects input a list of strings. That explains "AttributeError: 'numpy.ndarray' object has no attribute 'lower'" because you input 2d-array, which means a list of arrays.

So you have 2 choices, either concat 2 columns into 1 column beforehand (in pandas) or if you want to keep 2 columns, you could use feature union in the pipeline (http://scikit-learn.org/stable/modules/pipeline.html#feature-union)

About the first exception, I guess it's caused by the communication between pandas and sklearn. However you cannot tell for sure because of the above error in the code.


Post a Comment for "Reshape Pandas.Df To Use In GridSearch"