[ad_1]

To exemplify the underlying issue, I created an artificial, two-dimensional and linear time-series:

Right now, you might — rightfully — argue that the underperformance of the differencing model was due to pure chance. Indeed, we would need much broader experiments to verify our initial claim empirically.

It is, however, possible to actually prove why differencing can be bad for multivariate time-series analysis. To do so, let us take a step back to univariate time-series models and why difference transformations work here.

We will only look at **AR(1)** and **VAR(1)** time-series for simplicity. All results can be shown to hold for higher-order AR/VAR, too.

Mathematically, an AR(1) time-series looks as follows:

In order for differencing to make sense, we need the time-series to have a unit root. This is the case when solution of characteristic polynomial

lies on the unit-circle, i.e.

The only choice for the AR-parameter is therefore

and thus

In order to make this equation stationary, we subtract the lagged variable from both sides:

Clearly, the best possible forecast now is to predict white noise. Keep in mind that we could equally well fit a model on the untransformed variable. However, the differenced time-series directly uncovers the lack of any truly autoregressive component.

On the one hand, differencing is clearly a good choice in univariate time-series with unit-roots. Things are not as simple for multivariate time-series, though.

Consider now a VAR(1) time where we replace the scalars in the AR(1) model with vectors (bold, lower-case) and vectors (upper case):

A unit-root in a VAR(1) time-series imply, similarly to the AR(1) case, that

In the trivial case, the autoregression parameter is the identity matrix. This implies that the marginals in our VAR(1) time-series are all independent and unit-root. If we exclude this case and proceed as for AR(1), we get

The last line is also called an Vector Error Correcting Representation of a VAR time-series. If you scroll back to our simulation, this is the exact formula that was used to generate the time-series.

By making Atilde rank-deficient, the time-series becomes cointegrated, as explained by Lütkepohl. There exists another, broader definition of cointegration but we won’t cover that today.

Clearly, a cointegrated VAR(1) time-series differs from the univariate AR(1) case. Even after differencing, the transformed values depend on the past of the original time-series. We would therefore lose important information if we don’t account for the original time-series anymore.

If you are working with multivariate data, you should therefore not just blindly apply differencing.

The above result begs the question of what we should do to handle cointegration. Typically, time-series analysis is concerned either with forecasting or inference. Therefore, two different approaches come to mind:

**Cross-validation and backtesting** — the pragmatic, ‘data sciency’ approach. If our goal is primarily to build the most accurate forecast, we don’t necessarily need to detect cointegration at all. As long as the resulting model is performant and reliable, nearly anything goes.

As usually, the ‘best’ model can be selected based on cross-validation and out-of-sample performance tests. The primary implication from cointegration is then to apply differencing with some care.

On the other hand, the above result also suggests that adding the original time-series as a feature might be a good idea in general.

**Statistical tests** — the classical statistics way. Obviously, cointegration is nothing new to econometricians and statisticians. If you are interested in learning about the generating process itself, this approach is likely mo r e expedient.

Luckily, the work of James MacKinnon provides extensive insights into tests for cointegration. Other popular cointegration tests have been developed by Engle and Granger and Søren Johansen.

In Python, you can find the MacKinnon test in the statsmodels library. For the above time-series, the test yields a p-value of almost zero.

Hopefully, this article was an eye-opener to you to not just difference every time-series straight ahead. You should be aware by now that cointegration is a peculiarity of multivariate time-series that needs to be treated with care.

Keep in mind that standard cointegration is concerned with linear time-series only. Once non-linear dynamics are present, things could become even more messy and differencing might be even less suitable.

Indeed, there exists some recent research on non-linear cointegration. You might want to take a look at it for further details.

**[1]** Engle, Robert F.; Granger, Clive WJ. Co-integration and error correction: representation, estimation, and testing. Econometrica: journal of the Econometric Society, 1987, p. 251–276.

**[2]** Hamilton, James Douglas. *Time series analysis*. Princeton university press, 2020.

**[3]** Lütkepohl, Helmut. New introduction to multiple time series analysis. Springer Science & Business Media, 2005.

[ad_2]

Source link