Using .loc Inside Custom Transformer Produces Copy With Slice Error
EDIT: the question remains the same but the code has changed. I am working on the home credit dataset on Kaggle and specifically on instalment_payment.csv. Following are my custom
Solution 1:
Like i said in comment, I first extract the feature I need to learn from(.fit
) using:
from sklearn.base import TransformerMixin
classFeatureExtractor(TransformerMixin):
def__init__(self, cols):
self.cols = cols
print(self.cols)
deffit(self, X, y=None):
# stateless transformerreturn self
deftransform(self, X):
# assumes X is Pandas Dataframe
X_cols = X.loc[:, self.cols]
return X_cols
Then use this class to learn from one of the columns from the data:
classSynopsisNumWords(TransformerMixin):
def__init__(self):
returnNone# self.text_array = text_arraydeffit(self, X, y=None, **fit_params):
return self
deftransform(self, X, y=None, **fit_params):
X = X.copy()
# # rename the series to not have the same column name as inputreturn X.loc[:,'Synopsis'].apply(lambda x: len(str(x).split())).rename('Synopsis_num_words').to_frame()
Then union all the features to make a single dataframe using this:
classDFFeatureUnion(TransformerMixin):
# FeatureUnion but for pandas DataFramesdef__init__(self, transformer_list):
self.transformer_list = transformer_list
deffit(self, X, y=None):
for (name, t) in self.transformer_list:
t.fit(X)
return self
deftransform(self, X):
# X must be a DataFrame
Xts = [t.transform(X) for _, t in self.transformer_list]
Xunion = reduce(lambda X1, X2: pd.merge(X1, X2, left_index=True, right_index=True), Xts)
return Xunion
Then unite all of it and make a pipeline like below. This pipeline takes a dataframe of 9 columns, learns from a column, generates another column from it, then unite all of them and return the dataframe with 10 columns.
from sklearn.pipeline import Pipeline
synopsis_feat_gen_pipeline = Pipeline(steps=[('engineer_data',
DFFeatureUnion([
('extract_all_columns',
Pipeline(steps=[
('extract_all_features',
FeatureExtractor(['Synopsis', 'Title', 'Author', 'Edition',
'Reviews', 'Ratings', 'Genre', 'BookCategory', 'Price'])
)
], verbose=True
)
),
('generate_num_words_column',
Pipeline(steps=[
('extract_Synopsis_feature', FeatureExtractor(['Synopsis'])),
('generate_num_words', SynopsisNumWords())
], verbose=True
)
),
]))
],
verbose=True)
Post a Comment for "Using .loc Inside Custom Transformer Produces Copy With Slice Error"