Typeerror: A Float Is Required In Sklearn.feature_extraction.featurehasher
Solution 1:
Your best bet for non-numeric features is to transform the keys yourself similar to how DictVectorizer does.
values = [
{'city_Dubai':1., 'temperature': 33.},
{'city_London':1., 'temperature': 12.},
{'city_San Fransisco':1., 'temperature': 18.}
]
You could do this with a python function.
deftransform_features(orig_dict):
transformed_dict = dict()
for name, value in orig_dict.iteritems():
ifisinstance(value , str):
name = "%s_%s" % (name,value)
value = 1.
transformed_dict[name] = value
return transformed_dict
Example usage:
transform_features({'city_Dubai':1., 'temperature': 33.})
# Returns {'city_Dubai': 1.0, 'temperature': 33.0}
Solution 2:
This is now supported, as sklearn dev team addressed this issue in https://github.com/scikit-learn/scikit-learn/pull/6173
FeatureHasher should properly handle string dictionary values as of version 0.18.
Keep in mind there are still differences between FeatureHasher and DictVectorizer. Namely, DictVectorizer still handles None values (although I'm curious how), while FeatureHasher explicitly complains about it with the same error OP experienced.
If you're still experiencing the "TypeError: a float is required" with sklearn version >= 0.18, it is probably due to this issue, and you have a None value.
There's no easy way to debug this, and I ended up modifying sklearn's code to catch the TypeError exception and print the last item provided.
I did that by editing the _iteritems() function at the top of sklearn/feature_extraction/hashing.py
Solution 3:
It is a known sklearn issue: FeatureHasher does not currently support string values for its dict input format
Post a Comment for "Typeerror: A Float Is Required In Sklearn.feature_extraction.featurehasher"