The term regression means trying to find and analyze the relationship between one dependent variable and one or more independent variables. In linear regression, we have one dependent variable or target variable and one independent variable. Independent variables are feature variables by which the target variable is predicted. For example, suppose we are trying to predict the price of a house. There are various factors that can increase or decrease the price of a house, for example, the number of bedrooms, size of the house, surrounding area and etc. These are features or independent variables and price is the dependent variable or target variable because it s dependent on the above-mentioned features. Suppose we are trying to predict the price with the help of a single feature i.e. size of the house then this is called Linear Regression because here only a single independent variable or feature variable is used to predict the price of a house. If more than one feature is used to predict the target variable or dependent variable then this type of regression is called Multiple Linear Regression.
The hypothesis i.e. used in linear regression is h(θ)(x)=θ0+ θ1X1, this is simply a line equation. In this equation, θ0 is the y-intercept and θ1 is the weight or slope of the line. X1 is the feature vector and h(θ)(x) is the target variable or independent variable. Now the question is how can we find the value of these parameters i.e. θ0 and θ1? We have to find those values of θ0 and θ1 which can give the minimum error.
The cost function is also called the Squared Error function or Mean Squared Error function. This function is basically used to measure the accuracy of our hypothesis. It is basically the difference between the predicted value and the actual value. The cost function equation with the details of each parameter is given in the image below:
Now, the optimization objective for our learning algorithm is to choose the value of θ0 and θ1, that minimizes the value of J(θ0,θ1). This is our objective function for linear regression. Gradient descent is used to find the value of θ0 and θ1 that minimizes the cost function.
Gradient descent is used for minimizing the cost function ‘J’. It is not only used in Linear regression but all over the place in Machine Learning. The gradient descent equation for linear regression is given below:
Initially, you have to choose the value of theta0, theta1, and alpha (0.01 or 0.001 or etc. something like that) randomly. Update the value of theta 0 and theta 1 simultaneously until they converge to optimum minima. This means you have to stop at that point when your value of theta0 and theta1 start repeating. At that time the gradient will be zero means the slope will be zero. This will be the point where your cost will be minimal and you will get the best value of theta0 and theta 1. Now, I will explain this concept with the help of an example and the complete mathematical process is given in the below image:
Now we got the best value of theta0 and theta1 because the value of theta0 and theta1 are repeating. Now we will find the cost of this function which is given below:
Now, these theta0 and theta1 values will be used in the hypothesis h(θ)(x)=θ0+ θ1X1, and with the help of these values, we can find the target data for a new feature variable data.
Thank you for reading this article.
“You can follow me if you love to read about Python, Data Science, Artificial Intelligence, and IoT”.
Reference: Coursera- Machine Learning