Skip to content Skip to sidebar Skip to footer

Role Of Random_state In Train_test_split And Classifiers

Based on this answer: Random state (Pseudo-random number)in Scikit learn, if I use the same integer (say 42) as random_state, then each time it does train-test split, it should giv

Solution 1:

1: Since you are changing the test size, the random state won't impact the selected rows between test-sizes and that wouldn't necessarily be desired behavior anyways since you are simply trying to get scores based on various sample sizes. What this will do for you, is allow you to compare models that use the input data, split by the same random state. The test sets will be the exact same from one loop run to the next. Allowing you to properly compare model performance on the same samples.

2: For models such as decision tree classifiers and many others, there are initialization parameters that are set at random. The random state here is ensuring that those parameters are set the exact same from one run to the next, creating reproducible behavior.

3: If the test size is different, and you multiply it by 100, then you will be creating different random states for each test set. But from one full run to the next it will create reproducible behavior. You could just as easily set a static value there.

Not all models use random state in the same way as each have different parameters that they are setting at random. For RandomForest, it's selecting random features.. for neural networks it's initializing random weights.. etc.

Solution 2:

You can check this with the code:

import pandas as pd 
from sklearn.model_selection import train_test_split
test_series = pd.Series(range(100))
size30split = train_test_split(test_series,random_state = 42,test_size = .3)
size25split = train_test_split(test_series,random_state = 42,test_size = .25)
common = [element for element in size25split[0] if element in size30split[0]]
print(len(common))

This gives an output of 70, indicating that it just moved elements from the test set to the training set.

train_test_split creates a random permutation of the rows, and selects based on the first n rows of that permutation, where n is based on the test size.

What does random_state do here?

When the DecisionTreeClassifier object named clf is created, it's initialized with its random_state attribute set to 0. Note that if you type print(clf.random_state), the value 0 will be printed. When you call methods of clf, such as clf.fit, those methods may use the random_state attribute as a parameter.

Post a Comment for "Role Of Random_state In Train_test_split And Classifiers"