Skip to content Skip to sidebar Skip to footer

What Is _passthrough_scorer And How Can I Change Scorers In Gridsearchcv (sklearn)?

http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html (for reference) x = [[2], [1], [3], [1] ... ] # about 1000 data grid = GridSearchCV(KernelDe

Solution 1:

The default metric for KernelDensity is minkowski with p=2 which is a a euclidean metric. GridSearchCV will use KernelDensity metric for scoring if you do not assign any other scoring method.

The formula for mean squared error is: sum((y_true - y_estimated)^2)/n. You got the error since you need to have a y_true to calculate it.

Here is a made-up example of applying GridSearchCV to KernelDensity :

from sklearn.neighbors import KernelDensity
from sklearn.grid_search import GridSearchCV
import numpy as np

N = 20
X = np.concatenate((np.random.randint(0, 10, 50),
                    np.random.randint(5, 10, 50)))[:, np.newaxis]

params = {'bandwidth': np.logspace(-1.0, 1.0, 10)}
grid = GridSearchCV(KernelDensity(), params)
grid.fit(X)
print(grid.grid_scores_)
print('Best parameter: ',grid.best_params_)
print('Best score: ',grid.best_score_)
print('Best estimator: ',grid.best_estimator_)

and output is:

[mean:-96.94890, std:100.60046, params: {'bandwidth':0.10000000000000001},


 mean:-70.44643, std:40.44537, params: {'bandwidth':0.16681005372000587},
 mean:-71.75293, std:18.97729, params: {'bandwidth':0.27825594022071243},
 mean:-77.83446, std:11.24102, params: {'bandwidth':0.46415888336127786},
 mean:-78.65182, std:8.72507, params: {'bandwidth':0.774263682681127},
 mean:-79.78828, std:6.98582, params: {'bandwidth':1.2915496650148841},
 mean:-81.65532, std:4.77806, params: {'bandwidth':2.1544346900318834},
 mean:-86.27481, std:2.71635, params: {'bandwidth':3.5938136638046259},
 mean:-95.86093, std:1.84887, params: {'bandwidth':5.9948425031894086},
 mean:-109.52306, std:1.71232, params: {'bandwidth':10.0}]
 Best parameter:  {'bandwidth':0.16681005372000587}
 Best score:-70.4464315885Best estimator:KernelDensity(algorithm='auto',atol=0,bandwidth=0.16681005372000587,breadth_first=True,kernel='gaussian',leaf_size=40,metric='euclidean',metric_params=None,rtol=0)

The valid scoring methods for GridSeachCV usually need y_true. In your case, you may want to change the metric of sklearn.KernelDensity to other metrics (for instance to sklearn.metrics.pairwise.pairwise_kernels, sklearn.metrics.pairwise.pairwise_distances) as grid search will use them for scoring.

Post a Comment for "What Is _passthrough_scorer And How Can I Change Scorers In Gridsearchcv (sklearn)?"