Finding the training- and test-error for a glm-model

I have a data-set with approx. 200 000 observations and 10 predictors, with continuous target. I have divided this data into a training set and a test set (70%/30%). I want to compare a glm-model against a random forest-model. I'm attempting to do so by comparing their performance on the test set. I would like to calculate the training- and the test-errors for the glm-model. So far I've tried 10-fold cross-validation on the training set:

library(caret) train.control 

then in the summary, it says RMSE=0.2915827, is this the training-error? How do I get the test-error from this?

asked Jun 11, 2020 at 8:19 AnnieFrannie AnnieFrannie 139 10 10 bronze badges

1 Answer 1

$\begingroup$

The RMSE reported by the summary of the model being trained is the training error. Think about it in terms of what that model object has seen; it’s only been trained on the training set.

To find the test error comparable to the training RMSE use the predict function and basic math expressions:

Predictions = predict(model, data=test) testRMSE = sqrt(mean((Predictions-test$y)^2)) testRMSE 

Where test is your test set of observations and y is the column variable you are predicting