Skip to content Skip to sidebar Skip to footer

Using .loc Inside Custom Transformer Produces Copy With Slice Error

EDIT: the question remains the same but the code has changed. I am working on the home credit dataset on Kaggle and specifically on instalment_payment.csv. Following are my custom

Solution 1:

Like i said in comment, I first extract the feature I need to learn from(.fit) using:

from sklearn.base import TransformerMixin

classFeatureExtractor(TransformerMixin):
    def__init__(self, cols):
        self.cols = cols
        print(self.cols)
    
    deffit(self, X, y=None):
        # stateless transformerreturn self
    
    deftransform(self, X):
        # assumes X is Pandas Dataframe
        X_cols = X.loc[:, self.cols]
        return X_cols

Then use this class to learn from one of the columns from the data:

classSynopsisNumWords(TransformerMixin):
    def__init__(self):
        returnNone# self.text_array = text_arraydeffit(self,  X, y=None, **fit_params):
        return self
    
    deftransform(self, X, y=None, **fit_params):
        X = X.copy()
        # # rename the series to not have the same column name as inputreturn X.loc[:,'Synopsis'].apply(lambda x: len(str(x).split())).rename('Synopsis_num_words').to_frame()

Then union all the features to make a single dataframe using this:

classDFFeatureUnion(TransformerMixin):
    # FeatureUnion but for pandas DataFramesdef__init__(self, transformer_list):
        self.transformer_list = transformer_list

    deffit(self, X, y=None):
        for (name, t) in self.transformer_list:
            t.fit(X)
        return self

    deftransform(self, X):
        # X must be a DataFrame
        Xts = [t.transform(X) for _, t in self.transformer_list]
        Xunion = reduce(lambda X1, X2: pd.merge(X1, X2, left_index=True, right_index=True), Xts)
        return Xunion

Then unite all of it and make a pipeline like below. This pipeline takes a dataframe of 9 columns, learns from a column, generates another column from it, then unite all of them and return the dataframe with 10 columns.

from sklearn.pipeline import Pipeline
synopsis_feat_gen_pipeline = Pipeline(steps=[('engineer_data',
                                        DFFeatureUnion([
                                                     ('extract_all_columns',
                                                      Pipeline(steps=[
                                                                      ('extract_all_features',
                                                                       FeatureExtractor(['Synopsis', 'Title', 'Author', 'Edition',
                                                                                         'Reviews', 'Ratings', 'Genre', 'BookCategory', 'Price'])
                                                                       )
                                                                      ], verbose=True
                                                               )
                                                     ),
                                                     ('generate_num_words_column',
                                                      Pipeline(steps=[
                                                                      ('extract_Synopsis_feature', FeatureExtractor(['Synopsis'])),
                                                                      ('generate_num_words', SynopsisNumWords())
                                                                      ], verbose=True
                                                               )
                                                      ),
                                                     ]))
                                     ],
                              verbose=True)

Post a Comment for "Using .loc Inside Custom Transformer Produces Copy With Slice Error"