## The Continuous Ranked Probability Score is a statistical metric that compares distributional predictions to ground-truth values

An important part of the machine learning workflow is the model evaluation. The process itself can be considered common knowledge: split the data into train and test sets, train the model on the train set, and evaluate its performance on the test set using a score function.

The score function (or metric) is a mapping of the ground truth values and their predictions into a single and comparable value [1]. For example, for continuous predictions one could use score functions such as the RMSE, MAE, MAPE or R-squared. But what if the prediction is not a point-wise estimate, but a distribution?

In Bayesian machine learning, the predictions are often not point-wise estimates but distributions of values. For example, the prediction could be estimated parameters of a distribution, or, in the non-parametric case—an array of samples from an MCMC method.

In these cases, traditional score functions do not suit the statistical design; one could aggregate the predicted distributions into their mean or median values, but that would result with a great loss of information regarding the dispersion and shape of the predicted distribution.

## The Continuous Ranked Probability Score

The CRPS — Continuous Ranked Probability Score — is a score function that compares a single ground truth value to a Cumulative Distribution Function (CDF):

First introduced in the 70’s [4] and primarily used in weather forecasts, it is now gaining renewed attention in the literature and industry [1] [6]. It can be used as a metric to evaluate a model’s performance when the target variable is continuous and the model predicts the target’s distribution; Examples include Bayesian Regression or Bayesian Time Series models [5].

The fact that the theoretical definition includes the CDF makes the CRPS useful for both parametric and non-parametric predictions: for many distributions there is an analytic expression for the CRPS [3], and for non-parametric predictions, one could use the CRPS with the Empirical Cumulative Distribution Function (eCDF).

After computing the CRPS for each observation in our test set, we are left to aggregate the results into a single value. Similarly to the RMSE and MAE, we’ll aggregate them using a (possibly weighted) average:

## Intuition

The main challenge of comparing a single value to a distribution is how to translate the single value into the domain of distributions. The CRPS deals with that by translating the ground truth value into a degenerate distribution with the indicator function. For example, if our ground truth value is 7, we can translate it with:

The indicator function is a valid CDF answering all the requirements of a CDF. Now we are left with comparing the predicted distribution to the degenerate distribution of the ground truth value. Clearly, we want the predicted distribution to be as close as possible to the ground truth; this is expressed mathematically by measuring the (squared) area trapped between these two CDFs:

## Relation to the MAE

The CRPS is closely related to the well-known MAE (Mean Absolute Error). If we take a point-wise prediction, treat it as a degenerate CDF and inject it into to the CRPS equation, we get:

So, if the predicted distribution is a degenerate distribution (e.g. a point-wise estimate), the CRPS reduces to the MAE. This helps to get another intuition for the CRPS: it can be viewed as a **generalization of the MAE into distributional predictions**: **The MAE is a special case of the CRPS** when the predicted distribution is degenerate.

## Empirical Evaluation

When the model’s prediction is a parametric distribution (e.g. the model predicts the distribution’s parameters), the CRPS has an analytic expression for some common distributions [3]. For example, if the model predicts the parameters *μ* & *σ *of the Normal distribution, the CRPS can be calculated with:

Analytic solutions are known for distributions such as Beta, Gamma, Logistic, Log-Normal and others [3].

When the prediction is non-parametric, or more specifically — the prediction is an array of simulations, calculating the integral over the eCDF is a hefty task. However, the CRPS can also be analytically expressed by:

Where *X, X’* are independently and identically distributed according to *F*. These expressions, while still a bit computationally intensive, are simpler to estimate:

You can check out an example on a Bayesian Ridge Regression in a Jupyter notebook here, where I demonstrate the usage of both the parametric and non-parametric CRPS.

## Summary

The Continuous Ranked Probability Score (CRPS) is a scoring function that compares a single ground-truth value to its predicted distribution. This property makes it relevant to Bayesian machine learning, where models usually output distributional predictions rather than point-wise estimates. It can be viewed as a generalization of the well known MAE to distributional predictions.

It has analytical expressions for parametric predictions, and can be simply computed for non-parametric predictions. All together, the CRPS emerges as the new standard way to evaluate the performance of Bayesian machine learning models with a continuous target.

## References

*Strictly Proper Scoring Rules, Prediction, and Estimation,*Gneiting & Raftery (2007)*Estimation of the Continuous Ranked Probability Score with Limited Information and Applications to Ensemble Weather Forecasts*, Zamo & Naveau (2017)*Calibrated Ensemble Forecasts Using Quantile Regression Forests and Ensemble Model Output Statistics*, Taillardat, Zamo & Naveau (2016)*Scoring Rules for Continuous Probability Distributions*, Matheson & Winklers (1976)*Distributional Regression and its Evaluation with the CRPS: Bounds and Convergence of the Minimax Risk,*Pic, Dombry, Naveau & Taillardat (2022)- CRPS Implementation in Pyro-PPL, Uber Technologies, Inc.
- CRPS Implementation in properscoring, The Climate Corporation