Typeerror: A Float Is Required In Sklearn.feature_extraction.featurehasher
Solution 1:
Your best bet for non-numeric features is to transform the keys yourself similar to how DictVectorizer
does.
values = [
{'city_Dubai':1., 'temperature': 33.},
{'city_London':1., 'temperature': 12.},
{'city_San Fransisco':1., 'temperature': 18.}
]
You could do this with a python function.
deftransform_features(orig_dict):
transformed_dict = dict()
for name, value in orig_dict.iteritems():
ifisinstance(value , str):
name = "%s_%s" % (name,value)
value = 1.
transformed_dict[name] = value
return transformed_dict
Example usage:
transform_features({'city_Dubai':1., 'temperature': 33.})
# Returns {'city_Dubai': 1.0, 'temperature': 33.0}
Solution 2:
This is now supported, as sklearn dev team addressed this issue in https://github.com/scikit-learn/scikit-learn/pull/6173
FeatureHasher
should properly handle string dictionary values as of version 0.18.
Keep in mind there are still differences between FeatureHasher
and DictVectorizer
. Namely, DictVectorizer
still handles None
values (although I'm curious how), while FeatureHasher
explicitly complains about it with the same error OP experienced.
If you're still experiencing the "TypeError: a float is required" with sklearn version >= 0.18, it is probably due to this issue, and you have a None
value.
There's no easy way to debug this, and I ended up modifying sklearn's code to catch the TypeError exception and print the last item provided.
I did that by editing the _iteritems()
function at the top of sklearn/feature_extraction/hashing.py
Solution 3:
It is a known sklearn issue: FeatureHasher does not currently support string values for its dict input format
Post a Comment for "Typeerror: A Float Is Required In Sklearn.feature_extraction.featurehasher"