Introducing backtesting for time series using the Skforecast library
Time-series forecasting consists of making predictions based on historical time data to drive future strategic decision-making in a wide range of applications.
When evaluating a model, we split our data into a training and a test set. While the training set is used to train the model and determine the optimal hyperparameters, the test set is used to evaluate it. To have a more robust evaluation of the model performance, it is common to use cross-validation. Cross-validation is a resampling method that uses different data sets to test and train a model on several iterations.
However, it is not possible to implement straightforward cross-validation on time series data as it ignores the temporal components between the observations. Therefore, this article presents different methods used to evaluate time series models, known as backtesting.
Backtesting is a term used in modeling that refers to assessing a model using existing historic data. It involves selecting several training and test sets going step-by-step forward in time. The main idea behind backtesting is similar to the one behind cross-validation, except that backtesting considers the temporal component of the data. This method enables us to (1) assess and visualize how the model error develops over time and (2) estimate the variance of the model error.
In production, it is a common practice to first determine the optimal parameters using a backtesting method and then retrain the model with the available data. But, this retraining does not necessarily need to be with all the available data or every time new data is available. Depending on our strategy, we can select a different backtesting method.
1. Backtesting with refit and increasing training size
The model is tested on a sequentially increased training set, always having a fixed origin and using all the data available. In this method, there is a fixed origin and the size of the training set increases for each iteration, as displayed in Figure 1.
2. Backtesting with refit and fixed training size
This method is similar to the previous one except that it rolls the origin of the forecast. Therefore, the size of the training set remains constant, as displayed in Figure 2. This method can be considered a time series analogous to cross-validation techniques.
Compared to the previous method, this method is less expensive as the size of the training set stays the same for each iteration. It also allows for distinct error distribution by lead time and desensitizes the error measures to special events at any single origin [2]. An example where this approach is interesting is when there have been events or “abnormal” periods, such as COVID, within the historical data.
3. Backtesting without refit
The last backtesting approach consists of training the model with an initial training set and assessing it sequentially without updating it. This strategy has the advantage of being much faster since the model is trained only once. However, the model does not incorporate the latest data available, so it may lose predictive capacity over time.
This approach is interesting if it is necessary to make predictions with a high frequency on new data coming into the system.
Here is the implementation of backtesting using the Skforecast library. Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters. It also works with any regressor compatible with the scikit-learn API (pipelines, CatBoost, LightGBM, XGBoost, Ranger…).
For testing purposes, we have used the publicly available h2o datastet under the GitHub MIT license, whose data goes from 1991–07–01 to 2008–06–01 monthly.
Below, there are the three described backtesting methods with a random forest regressor used as autoregression.
When looking at the implementation, the difference between the backtesting methods relies on the following parameters:
- initial_train_size: Number of samples in the initial train split.
- fixed_train_size: If True, train size doesn’t increase but moves by ‘steps’ in each iteration.
- refit: Whether to re-fit the forecaster in each iteration.
- steps: Number of steps to predict.
1. Backtesting with refit and increasing training size
The model is first trained with the series length set until 2002–01–01, to then sequentially add ten new data into the training. This process is repeated until the entire series has been run.
To set this method, the fixed_train_size and refit parameters are set to False and True, respectively.
As observed, the training set increases over time while the test set remains constant over time.
2. Backtesting with refit and fixed training size
Similar to backtesting with refit and increasing training size, the model is first trained with the series length set until 2002–01–01 to then sequentially add ten new data into the training. However, in this method, the size keeps constant over time, which means both the training and test sets have always the same size.
To set this method, both the fixed_train_size and refit parameters are set to True.
3. Backtesting without refit
Similar to backtesting with refit and increasing training size, the model is first trained with the series length set until 2002–01–01. However, the training set does not change over time where as the test set moves ten steps each iteration.
To set this method, both the fixed_train_size and refit parameters are set to False.