Splitting Data To Training, Testing And Valuation When Making Keras Model
Solution 1:
Generally, in training time (model. fit
), you have two sets: one is for the training set and another is for validation/tuning/development set. With the training set, you train the model, and with the validation set, you need to find the best set of hyper-parameter. And when you're done, you may then test your model with unseen data set - a set that was completely hidden from the model unlike the training or validation set.
Now, when you used
X_train, X_test, y_train, y_test = train_test_split(features, results, test_size=0.33)
By this, you split the features
and results
into 33%
of data for testing, 67%
for training. Now, you can do two things
- use the (
X_test
andy_test
as validation set inmodel.fit(...)
. Or, - use them for final prediction in
model. predict(...)
So, if you choose these test sets as a validation set ( number 1 ), you would do as follows:
model.fit(x=X_train, y=y_trian,
validation_data = (X_test, y_test), ...)
In the training log, you will get the validation results along with the training score. The validation results should be the same if you later compute model.evaluate(X_test, y_test)
.
Now, if you choose those test set as a final prediction or final evaluation set ( number 2 ), then you need to make validation set newly or use the validation_split
argument as follows:
model.fit(x=X_train, y=y_trian,
validation_split = 0.2, ...)
The Keras
API will take the .2
percentage of the training data (X_train
and y_train
) and use it for validation. And lastly, for the final evaluation of your model, you can do as follows:
y_pred = model.predict(x_test, batch_size=50)
Now, you can compare with y_test
and y_pred
with some relevant metrics.
Solution 2:
Generally, you'd want to use your X_train, y_train data that you have split as arguments in the fit method. So it would look something like:
history = model.fit(X_train, y_train, batch_size=50)
While not splitting your data before throwing it into the fit method and adding the validation_split arguments work as well, just be careful to refer to the keras documentation on the validation_data and validation_split arguments to make sure that you are splitting them up as expected.
There is a related question here: https://datascience.stackexchange.com/questions/38955/how-does-the-validation-split-parameter-of-keras-fit-function-work
Keras documentation: https://keras.rstudio.com/reference/fit.html
Solution 3:
I have read on the internet that fitting the data into model should look like this:
That means you need to fit features and labels. You already split them into x_train
& y_train
. So your fit should look like this:
history = model.fit(x_train, y_train, validation_split = 0.2, epochs = 10, batch_size=50)
So confusion starts when I need to evaluate the model:
score = model.evaluate(x_test, y_test, batch_size=50) --> Is this correct?
That's correct, you evaluate the model by using testing features and corresponding labels. Furthermore if you want to get only for example predicted labels, you can use:
y_hat = model.predict(X_test)
Then you can compare y_hat
with y_test
, i.e get a confusion matrix etc.
Post a Comment for "Splitting Data To Training, Testing And Valuation When Making Keras Model"