I have a data-set with approx. 200 000 observations and 10 predictors, with continuous target. I have divided this data into a training set and a test set (70%/30%). I want to compare a glm-model against a random forest-model. I'm attempting to do so by comparing their performance on the test set. I would like to calculate the training- and the test-errors for the glm-model. So far I've tried 10-fold cross-validation on the training set:
library(caret) train.control
then in the summary, it says RMSE=0.2915827, is this the training-error? How do I get the test-error from this?
The RMSE reported by the summary of the model being trained is the training error. Think about it in terms of what that model object has seen; it’s only been trained on the training set.
To find the test error comparable to the training RMSE use the predict function and basic math expressions:
Predictions = predict(model, data=test) testRMSE = sqrt(mean((Predictions-test$y)^2)) testRMSE
Where test is your test set of observations and y is the column variable you are predicting