Skip to content Skip to sidebar Skip to footer

Typeerror: A Float Is Required In Sklearn.feature_extraction.featurehasher

I'm using sklearn version 0.16.1. It seems that FeatureHasher doesn't support strings (as DictVectorizer does). For example: values = [ {'city': 'Dubai', 'temperature':

Solution 1:

Your best bet for non-numeric features is to transform the keys yourself similar to how DictVectorizer does.

values = [
      {'city_Dubai':1., 'temperature': 33.},
      {'city_London':1., 'temperature': 12.},
      {'city_San Fransisco':1., 'temperature': 18.}

You could do this with a python function.

    transformed_dict = dict()
    for name, value in orig_dict.iteritems():
        ifisinstance(value , str):
            name = "%s_%s" % (name,value)
            value = 1.
        transformed_dict[name] = value
    return transformed_dict

Example usage:

transform_features({'city_Dubai':1., 'temperature': 33.})
# Returns {'city_Dubai': 1.0, 'temperature': 33.0}

Solution 2:

This is now supported, as sklearn dev team addressed this issue in

FeatureHasher should properly handle string dictionary values as of version 0.18.

Keep in mind there are still differences between FeatureHasher and DictVectorizer. Namely, DictVectorizer still handles None values (although I'm curious how), while FeatureHasher explicitly complains about it with the same error OP experienced.

If you're still experiencing the "TypeError: a float is required" with sklearn version >= 0.18, it is probably due to this issue, and you have a None value.

There's no easy way to debug this, and I ended up modifying sklearn's code to catch the TypeError exception and print the last item provided. I did that by editing the _iteritems() function at the top of sklearn/feature_extraction/

Solution 3:

It is a known sklearn issue: FeatureHasher does not currently support string values for its dict input format

Post a Comment for "Typeerror: A Float Is Required In Sklearn.feature_extraction.featurehasher"