Deterministic trends vs stochastic trends, and how to deal with them
Detecting and dealing with the trend is a key step in the modeling of time series.
In this article, we’ll:
- Describe what is the trend of a time series, and its different characteristics;
- Explore how to detect it;
- Discuss ways of dealing with trend;
Trend as a building block of time series
At any given time, a time series can be decomposed into three parts: trend, seasonality, and the remainder.
The trend represents the long-term change in the level of a time series. This change can be either upward (increase in level) or downward (decrease in level). If the change is systematic in one direction, then the trend is monotonic.
Trend as a cause of non-stationarity
A time series is stationary if its statistical properties do not change. This includes the level of the time series, which is constant under stationary conditions.
So, when a time series exhibits a trend, the stationarity assumption is not met. Modeling non-stationary time series is challenging. If untreated, statistical tests and forecasts can be misleading. This is why it’s important to detect and deal with the trend before modeling time series.
A proper characterization of the trend affects modeling decisions. This, further down the line, impacts forecasting performance.
A trend can be either deterministic or stochastic.
Deterministic trends can be modeled with a well-defined mathematical function. This means that the long-term behavior of the time series is predictable. Any deviation from the trend line is only temporary.
In most cases, deterministic trends are linear and can be written as follows:
But, trends can also follow an exponential or polynomial form.
In the economy, there are several examples of time series that increase exponentially, such as GDP:
A time series with a deterministic trend is called trend-stationary. This means the series becomes stationary after removing the trend component.
Linear trends can also be modeled by including time as an explanatory variable. Here’s an example of how you could do this:
import numpy as np
import pandas as pd
from statsmodels.tsa.arima.model import ARIMA
series = pd.read_csv('data/gdp-countries.csv')['United States']
series.index = pd.date_range(start='12/31/1959', periods=len(series), freq='Y')
log_gdp = np.log(series)
linear_trend = np.arange(1, len(log_gdp) + 1)
model = ARIMA(endog=log_gdp, order=(1, 0, 0), exog=linear_trend)
result = model.fit()
A stochastic trend can change randomly, which makes their behavior difficult to predict.
A random walk is an example of a time series with a stochastic trend:
rw = np.cumsum(np.random.choice([-1, 1], size=1000))
Stochastic trends are related to unit roots, integration, and differencing.
Time series with stochastic trends are referred to as difference-stationary. This means that the time series can be made stationary by differencing operations. Differencing means taking the difference between consecutive values.
Difference-stationary time series are also called integrated. For example, ARIMA (Auto-Regressive Integrated Moving Average) models contain a specific term (I) for integrated time series. This term involves applying differencing steps until the series becomes stationary.
Finally, difference-stationary or integrated time series are characterized by unit roots. Without going into mathematical details, a unit root is a characteristic of non-stationary time series.
Deterministic and stochastic trends have different implications for forecasting.
Deterministic trends have a constant variance throughout time. In the case of a linear trend, this implies that the slope will not change. But, real-world time series show complex dynamics with the trend changing over long periods. So, long-term forecasting with deterministic trend models can lead to poor performance. The assumption of constant variance leads to narrow forecasting intervals that underestimate uncertainty.
Stochastic trends are assumed to change over time. As a result, the variance of a time series increases across time. This makes stochastic trends better for long-term forecasting because they provide more reasonable uncertainty estimations.
Stochastic trends can be detected using unit root tests. For example, the augmented Dickey-Fuller test, or the KPSS test.
Augmented Dickey-Fuller (ADF) test
The ADF test checks whether an auto-regressive model contains a unit root. The hypotheses of the test are:
- Null hypothesis: There is a unit root (the time series is not stationary);
- Alternative hypothesis: There’s no unit root.
This test is available in statsmodels:
from statsmodels.tsa.stattools import adfuller
pvalue_adf = adfuller(x=log_gdp, regression='ct')
The parameter regression=‘ct’ is used to include a constant term and the deterministic trend in the model. As you can check in the documentation, there are four possible alternative values to this parameter:
- c: including a constant term (default value);
- ct: a constant term plus linear trend;
- ctt: constant term plus a linear and quadratic trend;
- n: no constant or trend.
Choosing which terms should be included is important. A wrong inclusion or exclusion of a term can substantially reduce the power of the test. In our case, we used the ct option because the log GPD series shows a linear deterministic trend behavior.
The KPSS test can also be used to detect stochastic trends. The test hypotheses are opposite relative to ADF:
Null hypothesis: the time series is trend-stationary;
Alternative hypothesis: There is a unit root.
from statsmodels.tsa.stattools import kpss
pvalue_kpss = kpss(x=log_gdp, regression='ct')
The KPSS rejects the null hypothesis, while ADF doesn’t. So, both tests signal the presence of a unit root. Note that a time series can have a trend with both deterministic and stochastic components.
So, how can you deal with unit roots?
We’ve explored how to use time as an explanatory variable to account for a linear trend.
Another way to deal with trends is by differencing. Instead of working with the absolute values, you model how the time series changes in consecutive periods.
A single differencing operation is usually enough to achieve stationarity. Yet, sometimes you need to do this process many times. You can use ADF or KPSS to estimate the required number of differencing steps. The pmdarima library wraps this process in the function ndiffs:
from pmdarima.arima import ndiffs
# how many differencing steps are needed for stationarity?
In this case, the log GPD series needs 2 differencing steps for stationarity:
diff_log_gdp = log_gdp.diff().diff()