Skip to content Skip to sidebar Skip to footer

Cross_val_score Is Not Working With Roc_auc And Multiclass

What I want to do: I wish to compute a cross_val_score using roc_auc on a multiclass problem What I tried to do: Here is a reproducible example made with iris data set. from sklear

Solution 1:

An unnecessary annoyance with the cross-validation functionality of scikit-learn is that, by default, the data are not shuffled; it would arguably be a good idea to make shuffling the default choice - of course, this would pre-suppose that a shuffling argument would be available for cross_val_score in the first place, but unfortunately it is not (docs).

So, here is what is happening; the 150 samples of the iris dataset are stratified:

iris.target[0:50]# result
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0])

iris.target[50:100]# result:
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1])

iris.target[100:150]# result:
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2])

Now, a 3-fold CV procedure with 150 samples stratified as shown above and an error message saying:

ValueError: Onlyone class present in y_true

should hopefully start making sense: in each one of your 3 validation folds only one label is present, so no ROC calculation is possible (let alone the fact that in each validation fold the model sees labels unseen in the respective training folds).

So, just shuffle your data before:

from sklearn.utilsimport shuffle
X_s, y_s = shuffle(X, y)
cross_val_score(model, X_s, y_s, cv=3, scoring="roc_auc")

and you should be fine.

Post a Comment for "Cross_val_score Is Not Working With Roc_auc And Multiclass"