So, what is model selection all about? Model selection in the context of machine learning can have different meanings, corresponding to different levels of abstraction.
For one thing, we might be interested in selecting the best hyperparameters for a selected machine learning method. Hyperparameters are the parameters of the learning method itself which we have to specify a priori, i.e., before model fitting. In contrast, model parameters are parameters which arise as a result of the fit [1]. In a logistic regression model, for example, the regularization strength (as well as the regularization type, if any) is a hyperparameter which has to be specified prior to the fitting, while the coefficients of the fitted model are model parameters. Finding the right hyperparameters for a model can be crucial for the model performance on given data.
For another thing, we might want to select the best learning method (and their corresponding “optimal” hyperparameters) from a set of eligible machine learning methods. In the following, we will refer to this as algorithm selection. With a classification problem at hand, we might wonder, for instance, whether a logistic regression model or a random forest classifier yields the best classification performance on the given task.