Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

Model Evaluation in Time Series Forecasting | by Javier Fernandez | Mar, 2023

admin by admin
March 7, 2023
in Artificial Intelligence


Introducing backtesting for time series using the Skforecast library

Photo by Lukas on Pexels

Time-series forecasting consists of making predictions based on historical time data to drive future strategic decision-making in a wide range of applications.

When evaluating a model, we split our data into a training and a test set. While the training set is used to train the model and determine the optimal hyperparameters, the test set is used to evaluate it. To have a more robust evaluation of the model performance, it is common to use cross-validation. Cross-validation is a resampling method that uses different data sets to test and train a model on several iterations.

However, it is not possible to implement straightforward cross-validation on time series data as it ignores the temporal components between the observations. Therefore, this article presents different methods used to evaluate time series models, known as backtesting.

Backtesting is a term used in modeling that refers to assessing a model using existing historic data. It involves selecting several training and test sets going step-by-step forward in time. The main idea behind backtesting is similar to the one behind cross-validation, except that backtesting considers the temporal component of the data. This method enables us to (1) assess and visualize how the model error develops over time and (2) estimate the variance of the model error.

In production, it is a common practice to first determine the optimal parameters using a backtesting method and then retrain the model with the available data. But, this retraining does not necessarily need to be with all the available data or every time new data is available. Depending on our strategy, we can select a different backtesting method.

1. Backtesting with refit and increasing training size

The model is tested on a sequentially increased training set, always having a fixed origin and using all the data available. In this method, there is a fixed origin and the size of the training set increases for each iteration, as displayed in Figure 1.

Fig. 1. Time series backtesting diagram with an initial training size of ten observations, a prediction horizon of 3 steps, and retraining at each iteration. Ref: Skforecast [1].

2. Backtesting with refit and fixed training size

This method is similar to the previous one except that it rolls the origin of the forecast. Therefore, the size of the training set remains constant, as displayed in Figure 2. This method can be considered a time series analogous to cross-validation techniques.

Compared to the previous method, this method is less expensive as the size of the training set stays the same for each iteration. It also allows for distinct error distribution by lead time and desensitizes the error measures to special events at any single origin [2]. An example where this approach is interesting is when there have been events or “abnormal” periods, such as COVID, within the historical data.

Fig. 2. Time series backtesting diagram with an initial training size of ten observations, a prediction horizon of 3 steps, and a training set of constant size. Ref: Skforecast [1].

3. Backtesting without refit

The last backtesting approach consists of training the model with an initial training set and assessing it sequentially without updating it. This strategy has the advantage of being much faster since the model is trained only once. However, the model does not incorporate the latest data available, so it may lose predictive capacity over time.

This approach is interesting if it is necessary to make predictions with a high frequency on new data coming into the system.

Fig. 3. Time series backtesting diagram with an initial training size of ten observations, a prediction horizon of 3 steps, and no retraining at each iteration. Ref: Skforecast [1].

Here is the implementation of backtesting using the Skforecast library. Skforecast is a python library that eases using scikit-learn regressors as multi-step forecasters. It also works with any regressor compatible with the scikit-learn API (pipelines, CatBoost, LightGBM, XGBoost, Ranger…).

For testing purposes, we have used the publicly available h2o datastet under the GitHub MIT license, whose data goes from 1991–07–01 to 2008–06–01 monthly.

Fig. 4. Visualization of the dataset, where blue data is used for training and orange for testing. Ref: Image by author.

Below, there are the three described backtesting methods with a random forest regressor used as autoregression.

When looking at the implementation, the difference between the backtesting methods relies on the following parameters:

  • initial_train_size: Number of samples in the initial train split.
  • fixed_train_size: If True, train size doesn’t increase but moves by ‘steps’ in each iteration.
  • refit: Whether to re-fit the forecaster in each iteration.
  • steps: Number of steps to predict.

1. Backtesting with refit and increasing training size

The model is first trained with the series length set until 2002–01–01, to then sequentially add ten new data into the training. This process is repeated until the entire series has been run.

To set this method, the fixed_train_size and refit parameters are set to False and True, respectively.

As observed, the training set increases over time while the test set remains constant over time.

2. Backtesting with refit and fixed training size

Similar to backtesting with refit and increasing training size, the model is first trained with the series length set until 2002–01–01 to then sequentially add ten new data into the training. However, in this method, the size keeps constant over time, which means both the training and test sets have always the same size.

To set this method, both the fixed_train_size and refit parameters are set to True.

3. Backtesting without refit

Similar to backtesting with refit and increasing training size, the model is first trained with the series length set until 2002–01–01. However, the training set does not change over time where as the test set moves ten steps each iteration.

To set this method, both the fixed_train_size and refit parameters are set to False.



Source link

Previous Post

7 Ways to Get Started Designing for AI/ML Products | by Lola Salehu | Mar, 2023

Next Post

Autoencoder in Computer Vision – Complete 2023 Guide

Next Post

Autoencoder in Computer Vision - Complete 2023 Guide

ChatGPT vs Google Bard: A Comparison of the Technical Differences

Lattice Joins Autotech Council to Accelerate Next Generation Automotive Application Development

Related Post

Artificial Intelligence

10 Most Common Yet Confusing Machine Learning Model Names | by Angela Shi | Mar, 2023

by admin
March 26, 2023
Machine Learning

How Machine Learning Will Shape The Future of the Hiring Industry | by unnanu | Mar, 2023

by admin
March 26, 2023
Machine Learning

The Pros & Cons of Accounts Payable Outsourcing

by admin
March 26, 2023
Artificial Intelligence

Best practices for viewing and querying Amazon SageMaker service quota usage

by admin
March 26, 2023
Edge AI

March 2023 Edge AI and Vision Innovation Forum Presentation Videos

by admin
March 26, 2023
Artificial Intelligence

Hierarchical text-conditional image generation with CLIP latents

by admin
March 26, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.