[ad_1]

## A computationally effective way of carrying out Bayesian statistics

In **Bayesian statistics**, the **conjugate prior** is when the posterior and prior distributions belong to the same distribution. This phenomenon allows for simpler calculations of the posterior making **Bayesian inference** a lot easier.

In this article, we will gain an in-depth view of the conjugate prior. We will show the need for it, derive an example from first principles and finally apply it to a real world problem.

## Bayes’ Theorem

Lets have a quick recap over **Bayes’ theorem****:**

The*P(H):***prior****.**The probability of the hypothesis,*H.*The*P(D|H):***likelihood****.**The probability of the data,, given our current hypothesis,*D**H.*The*P(H|D):***posterior**. The probability of the current hypothesis,, given the data,*H**D.*The*P(D):***normalising constant****.**This**Law of total probability**:

If you want a more in-depth derivation and understanding of Bayes’ theorem, check out my previous article here:

## Bayesian Updating

We use Bayes’ theorem to update our belief about a certain event when we receive more data about it.

In general we carry out the update as follows:

Then when new data arrives, the posterior becomes the new prior. This process is constantly repeated with new data, hence it is called Bayesian updating. This is in essence what Bayesian inference is.

You can read more about Bayesian updating here:

However, if we want to obtain valid probabilities we need to compute ** P(D)**. As displayed above, this is the sum of the products of the likelihoods and priors. Another way of describing the summation is through an integral:

This integral is often **intractable**. This basically means it is very computationally expensive or it doesnt have a **closed form solution**. I have linked **here** a StatExchange thread that explains why it is intractable.

## Background

Conjugate priors is one way of getting around the intractable integral issue in Bayesian inference. This is when both the prior and posterior are of the same distribution. This allows us to simplify the expression of calculating the posterior. In the next section we will show this phenomenan mathematically.

## Binomial and Beta Contingency

One of the simplest and common conjugate distribution pair is the **Beta** (prior) and **Binomial** (likelihood).

**Beta Distribution**:

- Referred to as the
**distribution of probabilities**because its domain is bounded between 0 and 1. - Conveys the the most probable probabilities about the success of an event.

Its **probability density function**** (PDF)** is written as:

Here ** x **is bounded as

**so it can easily be interpreted as probability and**

*0 ≤ x ≤ 1,***is the**

*B(α, β)***Beta function**

**.**

If you want a full run down on the Beta distribution, you should skim over my previous article on it:

**Binomial Distribution:**

- Conveys the probability of a certain number of successes
from*k*trials where the probability of success is*n*.*x*

PDF:

**Crucial Point**

- The key difference between the Binomial and Beta distributions is that for the Beta distribution the
**probability,**however for the Binomial distribution the*x*, is a random variable,**probability,***x*, is a fixed parameter.

## Relation To Bayes

Now let’s go through some fun maths!

We can rewrite Bayes’ theorem using the probability of success, ** x**, for an event and the data,

**, which is the number of successes we observe:**

*k*Our posterior is basically the probability distribution over all the possible probabilities of the success rate. In other-words, the posterior is a Beta distribution.

We can express the above equation using the Binomial distribution as our likelihood and the Beta distribution as our prior:

Yeah, doesn’t look that nice. Nevertheless, we are now going to simplify it:

Some of you may notice something special about that integral. It is the definition of the **Beta function****!**

Therefore, the final form of our posterior is:

A Beta Distribution!

Voilà, we have just gone from a Beta prior to a Beta posterior, hence we have a conjugate prior!

If you are interested more about the Beta-Binomial conjugate prior, there is a great online book that describes their relationship in depth

here.

## Why Is It Useful?

You may be scratching your head wondering why I have taken you through this awful derivation just to get another version of a Beta distribution?

What this beautiful result shows us, is that to do a Bayesian update we no longer need to compute the product of the likelihood and prior. This is computationally expensive and sometimes not feasible as I discussed earlier. We can now just use simple addition!

## Problem Background

In Major League Baseball (MLB), the rate the batters hit the ball divided by the number of balls they are pitched is known as batting average. The batting average in 2021 in the MLB was 0.244 (24.4%).

A player starts the season very well and hits his first **3** balls. What would his batting average be? A **frequentist** would say it is **100%**, however us Bayesians would come to a different conclusion.

## Prior

We know that the batting average is 0.244, but what about the possible range of values? A good average is considered to be around 0.3, which is the upper range and one below 0.2 is considered to be quite bad.

Using these values we can construct a suitable Beta prior distribution:

from scipy.stats import beta as beta_dist

import matplotlib.pyplot as plt

import numpy as npalpha = 49

beta = 151

probability = np.arange (0, 1, 0.001)

prior = beta_dist.pdf(probability, alpha, beta)plt.figure(figsize=(12,6))

plt.plot(probability, prior, linewidth=3)

plt.xlabel('Batting Average', fontsize=20)

plt.ylabel('PDF', fontsize=20)

plt.xticks(fontsize=18)

plt.yticks(fontsize=18)

plt.axvline(0.244, linestyle = 'dashed', color='black', label='Average')

plt.legend(fontsize=18)

plt.show()

This looks reasonable as our range is pretty confined between 0.2 and 0.3. There was no particular reason why I chose the values of ** α=49 **and

**they just satisify what we know about the prior distribution.**

*β=151,*However, this is often the argument made against Bayesian statistics. As the prior is subjective, then so is the posterior. This means probability is no longer objective, but rather a personal belief.

## Likelihood and Posterior

The likelihood of the data is that the new player has hit **3** from **3**, therefore they have have an extra **3 successes **and **0 failures**.

Using our knowledge of the conjugate prior, we can simply add an **extra 3** to the value of ** α **and

**0**to

*β:*alpha = 49

beta = 151

new_alpha = 49+3

new_beta = 151

probability = np.arange (0, 1, 0.001)

prior = beta_dist.pdf(probability, alpha, beta)

posterior = beta_dist.pdf(probability, new_alpha, new_beta)plt.figure(figsize=(12,6))

plt.plot(probability, prior, linewidth=3, label='Prior')

plt.plot(probability, posterior, linewidth=3, label='Posterior')

plt.xlabel('Batting Average', fontsize=20)

plt.ylabel('PDF', fontsize=20)

plt.xticks(fontsize=18)

plt.yticks(fontsize=18)

plt.axvline(0.244, linestyle = 'dashed', color='black', label='Average')

plt.legend(fontsize=18)

plt.show()

It makes sense why the average has barely shifted as three balls is not that many. What if we now said the player hit **40** out of **50** balls, what would the posterior now look like?

alpha = 49

beta = 151

new_alpha = 49+40

new_beta = 151+10

probability = np.arange (0, 1, 0.001)

prior = beta_dist.pdf(probability, alpha, beta)

posterior = beta_dist.pdf(probability, new_alpha, new_beta)plt.figure(figsize=(12,6))

plt.plot(probability, prior, linewidth=3, label='Prior')

plt.plot(probability, posterior, linewidth=3, label='Posterior')

plt.xlabel('Batting Average', fontsize=20)

plt.ylabel('PDF', fontsize=20)

plt.xticks(fontsize=18)

plt.yticks(fontsize=18)

plt.axvline(0.244, linestyle = 'dashed', color='black', label='Average')

plt.legend(fontsize=18)

plt.show()

We see a greater change as we now have more data.

## Without Conjugate Priors

Without conjugate priors, we would have to compute the posterior using the products of the likelihoods and priors. Lets go through this process for the purpose of completeness.

We will use the example where the player hit 40 out of 50 balls. Our likelihood in this case is:

Where we have used the Binomial** ****probability mass function (PMF)****.**

Peforming the Bayesian update and plotting the posterior:

alpha = 49

beta = 151probability = np.arange(0, 1, 0.001)

prior = beta_dist.pdf(x, alpha, beta)

likelihood = 10272278170*probability**40*(1-probability)**10

posterior = prior*likelihood

posterior = posterior/sum(posterior)plt.figure(figsize=(12,6))

plt.plot(probability, posterior, linewidth=3, label='Posterior')

plt.xlabel('Batting Average', fontsize=20)

plt.ylabel('Probability', fontsize=20)

plt.xticks(fontsize=18)

plt.yticks(fontsize=18)

plt.axvline(0.244, linestyle = 'dashed', color='black', label='Average')

plt.legend(fontsize=18)

plt.show()

We arrive at the same distribution as before!

The keen eyed of you may notice one difference, the y-scale is different. This is related to the SciPy Beta function returning the PDF, whereas here we are working with the PMF.

Beta-Binomial aren’t the only conjugate distributions out there:

Just to name a few.

The sad part is that not all problems, in fact very few, can be solved using Conjugate priors.

However, there are more general alternatives such as **Markov Chain Monte Carlo****, **which provide another solution to the intractable integral.

In this article, we described how conjugate priors allow us to easily compute the posterior with simple addition. This is very useful, as it removes the need in calculating the product of the likelihoods and priors which can lead to intractable integrals.

The full code used in this article can be found on my GitHub:

[ad_2]

Source link