Skip to content Skip to sidebar Skip to footer

Handmade Estimator Modifies Parameters In __init__?

I am preparing a tailored preprocessing phase which is suppose to become part of a sklearn.pipeline.Pipeline. Here's the code of the preprocessor: import string from nltk import wo

Solution 1:

Check out the developer guide of sklearn, here and the following paragraph. I would try to cohere as much to it as possible to make sure such messages are avoided (even if you never intend to contribute it).

They prescribe that estimators should have no logic in the __init__ function! This most likely causes your error.

I put my validation or transformation of init parameters (as prescribed also in the description) at the beginning of the fit() method, which has to be called in any case.

Also, note this utility which you can use to test your estimator if it confirms to the scikit learn API.

Edit (as response to your comment, but with code formatting):

Well, not logic. To quote from the links: "To summarize, an __init__ should look like:

def __init__(self, param1=1, param2=2):
    self.param1 = param1
    self.param2 = param2

There should be no logic, not even input validation, and the parameters should not be changed." 1

So I guess as @uberwach detailed the set construction and creation of SnowballStemmer instance probably violates the "should not be changed"part.

Edit 2:

As addition to the below comment. This would be one general way of doing it (another specific as mentioned by @uberwach later in your tokenize method):

class NLTKPreprocessor(BaseEstimator, TransformerMixin):
    def __init__(self, stopwords=stopwords.STOPWORDS_DE,
                 punct=string.punctuation,
                 lower=True, strip=True, lang='german'):
        self.lower = lower
        self.strip = strip
        self.stopwords = stopwords
        self.punct = punct
        self.lang = lang

    def fit(self, X, y=None):
        self.stopword_set = set(self.stopwords)
        self.punct_set = set(self.punct)
        self.stemmer = SnowballStemmer(self.lang)
        return self

Solution 2:

I read the code under https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/base.py

Could reproduce the warning message. They went away after two changes:

  1. frozenset rather than set. As a set considered mutable and as such will turn out to be different after a copy.

  2. Initializing self.stemmer in the tokenize method rather than in __init__.


Post a Comment for "Handmade Estimator Modifies Parameters In __init__?"