Machine Learning Fundamentals 1: Cost Functions | Fitting Best Line to Data
Andrew NG, Stanford Professor and the most known personality in the field of Machine Learning said:
To get started with ML, most students spend too much time learning about which language they should learn for ML. In fact, you can learn any language later or earlier, but the most important thing in ML is to grasp its concepts. Later on, it’s your choice, which tool you wanna use to practice and implement these concepts.
I personally recommend Andrew NG’s course on Coursera to start with ML. To dig more, you can get help from different youtube channels and websites which are mentioned at the end.
I research some of the terms and concepts from internet while taking a course of Andrew NG as it was lit bit briefly explained the concepts. So, In this article, I will explain completely in detail the most fundamental concept of ML which is Cost Function, so that if there’s someone who finding difficulty to understand the concept or an absolute beginner in ML.
What is Cost Function?
Whenever we are given a problem like in Linear Regression, we have to find the best line which fits our dataset. This line tells us the predictions based on our actual data. There can be multiple parameters in our prediction model. But let’s start with a basic example.
In the above example, Lets say, on the x-axis its mouse weight and on the y-axis its mouse size. we want to find the Mouse Size on the basis of Mouse weight. We have given some sample data which are red dots on the graph. We have 9 values for the actual data, but what if we want to find the mouse size based on whatever the mouse weight is given? For this, we must have to draw the best line which fits so that it is best near to all the actual data points. By this, we can find any size. But the question is how we would find this best line that fits all data points perfectly. we don’t know which of these lines is the best line. Here comes the Cost Function.
Cost Function is nothing but a simple function that helps to find this best line. It is the sum of the square of the difference between actual data and our hypothesis line. This difference is called Residuals and their square is squared residuals.
This is our Cost function. Here m are the number of actual data points. h(x) is our hypothesis line or point i at hypothesis line and y is the corresponding data point. It is squared because some data points might be above our hypothesis line which means negative values. But that makes the calculations tougher. So we use square to make all values positive.
To make equation understanding easier, let’s expand the equation and dig a lit bit more. Consider that our hypothesis line is ‘b’ (h(x) in the cost function).
And after getting the residuals of line and data points:
nd this is for one hypothesis line. What if we rotate our lines bit by bit and check which of our line is best. Our line will be best when we have minimum sum of squared residual. The answer of the squared residual for above particular example is 24.6. If we rotate a line lit bit, it would be around 14 its best fit. But if we rotate it a lot more like this. It would give 31 which is worst fit.
We know that this ‘b’ or h(x) or Hypothesis line is actually the equation of line, Y = ax+ b or θ0 + θ1x where thetas are the parameters. We can add as much parameter as we want, lets suppose if we want to predict house price on basis of house size, number of floors, house age. Then we have 3 parameters. But now we are talking about only one paramter.
So if we put equation of line, the more general term will be
The work doesn’t end here, to find actually best line we have to find squared residual of each hypothesis or cost function by rotating line lit by lit and this happens using derivatives which actually gives best line where slope curve is zero. And that involves Gradient Descent which is next topic. To give you brief overview, gradient descent used derivative of cost function. And we have discussed what is cost function.
I hope you find it helpful and grasp the concept of cost function. If there’s still query you can comment down. For more further help, you can visit, Statquest channel which is quite helpful to understand statistics and Machine Learning.