Why (and how) to create a baseline model before training the final model
So you’ve collected data. You’ve outlined your business case, decided on a candidate model (e.g. Random Forest), set up your development environment, and got your hands on the keyboard. You are ready to build and train your time series model.
Wait. Don’t start yet. Before training and testing your Random Forest model, you need to: First, we train the baseline model.
all basic model A simple model used to create a benchmark or reference point from which to build a more complex final machine learning model.
Data scientists create baseline models for the following reasons:
- A baseline model can give you a good idea of ​​how more complex models work.
- If your baseline model is performing poorly, this may be a sign of data quality issues that need to be addressed.
- If the base model performs better than the final model, it may indicate problems with the algorithm, features, hyperparameters, or other data preprocessing.
- If the performance of the baseline model and the complex model is more or less the same, it may mean that the complex model needs more fine-tuning of features, architecture, or hyperparameters. It may also show that a more complex model is not needed and that a simpler model is sufficient.
Typically the basic models are: statistical model, something like a moving average model. Or a simpler version of the target model. For example, if you are training a Random Forest model, you can first train it against a decision tree model.
For time series data, there are several popular options for baseline models that I would like to share with you. Both of these work well because they assume a temporal ordering of the data and make predictions based on data patterns.
a naive prediction
The naive prediction is the simplest. Assume that the next value will be the same as the previous value.