Most of us are familiar with a basic cartesian coordinate system. Where we plot points based on their corresponding *x* and *y*-values. From there, we later learned, in our early algebra careers, how we can solve for the slope between two or more points using the slope-intercept form, *y = mx + b*. Where *m* represents the slope, and *b* represents the *y*-intercept. Although most of these concepts seem fairly basic, they also make up many of the fundamentals used with linear regression.

In the case of basic linear regression, we are often comparing two variables amogst one another, again our *x* and* y*-values. But, more specifically these values are called: the *independent variable *(*x*), and the *dependent variable *(*y*).

The independent and dependent variables are used when comparing two values, where *y* might be dependent on *x*. A good example of this would be comparing the height and weight, using a large sample size of people. Height would act as our independent variable, where weight would act as the dependent variable of *x*, as seen in *Image 1.1*.

But notice how “might” was used when comparing both independent and dependent variables. This is where a hypothesis comes into play. Like our slope-intercept form, we can rewrite our formula as:

The reason we use a hypothesis, is because our linear regression first needs to test if there’s any relationship between both variables. It could be the case that our variables have no relationship. This would result in our hypothesis being constant or, having “no relationship”. Another thing to note, the greater the value of our slope, the stronger the relationship exists between both variables, and the same is also true when our slope decreases in value.

So, how does machine learning play a role in all of this? The name of the game when plotting our line or *fitted values*, is to achieve the **smallest** summation of squared residuals….what?! Okay, first what’s a residual? A residual is simply the distance between our line or fitted values, and the various data points, as seen below in *Image 1.3.*

Because a residual can result in being either negative or positive, we end up squaring each residual, so it remains positive when calculating our summation. The formal name for this approach is called, the *residual sum of squares*. Again, when the model determines where the smallest value of residual sum of squares exists, our fitted line is drawn. Then, based on the slope of our fitted line, we can determine if our variables have a strong, weak or no relationship between one another.

Hopefully this article helped others enjoy the process of mathematics, data science and machine learning all coincide. I hope I can continue to work on these during my downtime. Thanks for reading!