Handmade Estimator Modifies Parameters In __init__?
Solution 1:
Check out the developer guide of sklearn, here and the following paragraph. I would try to cohere as much to it as possible to make sure such messages are avoided (even if you never intend to contribute it).
They prescribe that estimators should have no logic in the __init__ function! This most likely causes your error.
I put my validation or transformation of init parameters (as prescribed also in the description) at the beginning of the fit() method, which has to be called in any case.
Also, note this utility which you can use to test your estimator if it confirms to the scikit learn API.
Edit (as response to your comment, but with code formatting):
Well, not logic. To quote from the links: "To summarize, an __init__ should look like:
def __init__(self, param1=1, param2=2):
self.param1 = param1
self.param2 = param2
There should be no logic, not even input validation, and the parameters should not be changed." 1
So I guess as @uberwach detailed the set construction and creation of SnowballStemmer instance probably violates the "should not be changed"part.
Edit 2:
As addition to the below comment. This would be one general way of doing it (another specific as mentioned by @uberwach later in your tokenize method):
class NLTKPreprocessor(BaseEstimator, TransformerMixin):
def __init__(self, stopwords=stopwords.STOPWORDS_DE,
punct=string.punctuation,
lower=True, strip=True, lang='german'):
self.lower = lower
self.strip = strip
self.stopwords = stopwords
self.punct = punct
self.lang = lang
def fit(self, X, y=None):
self.stopword_set = set(self.stopwords)
self.punct_set = set(self.punct)
self.stemmer = SnowballStemmer(self.lang)
return self
Solution 2:
I read the code under https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py
Could reproduce the warning message. They went away after two changes:
frozenset
rather thanset
. As aset
considered mutable and as such will turn out to be different after a copy.Initializing
self.stemmer
in thetokenize
method rather than in__init__
.
Post a Comment for "Handmade Estimator Modifies Parameters In __init__?"