[ad_1]

Multicollinearity is a well-known challenge in multiple regression. The term refers to the high correlation between two or more explanatory variables, i.e. predictors.

Two Basic kinds of Multicollinearity:

**Structural multicollinearity**: This type occurs when we create a model term using other terms. In other words, it’s a byproduct of the model that we specify rather than being present in the data itself. For example, if you square term X to model curvature, clearly there is a correlation between X and X2.**Data multicollinearity**: This type of multicollinearity is present in the data itself rather than being an artifact of our model. Observational experiments are more likely to exhibit this kind of multicollinearity.

**According to ****Graham’s study****, multicollinearity in multiple regression leads to:**

- Inaccurate parameter estimates,
- Decreased power
- Exclusion of significant predictors

When these X variables themselves are related to each other than this problem is called Multicollinearity.

**Intuition :**

Consider following Regression:

**Lawyer Salary: β0+ β1(Years of Experience)+ β2(Age)+εi**

•Β1-> The marginal effect on salary of 1 additional year experience, holding other variables constant.

•Β2->The marginal effect on salary of 1 additional year of age, holding other variables constant.

Why we care About Multicollinearity?

**Ex: Lawyer Salary: β0+ β1(Years of Experience)+ β2(Age)+εi**

Correlations

Check the Correlations between all pairs of X-variables:

How much correlation is more correlation?

For this , We used Rule of Thumb:

If correlation >0.9, Then we consider this is a problem.

Variance Inflation Factors

**Y=β0+β1X1+ β2X2+ β3X3+β4X4+ εi**

Create Auxiliary regression for each x variable:

**X1=β*0+β*1(X2)+β*2(X3)+β*3(X4)+ ε***

Find R² By the given regression:

To Find VIF using the R-squared from each regression.

**VIF=1/1-R²**

How high is too High VIF?

Again According to the Rule of thumb:

VIF’s above 10 are problematic.

**Option 1** : Do nothing.

*i)If model is used for prediction only.*

*ii)If correlated variables are not of particular interest to study Question.*

*iii)If correlation is not extreme.*

**Options 2 **:To Remove one of the correlated variables.

If variables are providing the same information.

**NOTE**: Be aware of omitted variable bias!

This problem is created if we follow option2 but chances are very rare.

**Options 3:** combine the correlated variables

Ex Include a “Seniority’ score combining both experience’ and ‘age’.

**Option 4**:Use partial least squares of principal component analysis.

[ad_2]

Source link