Linear regression is the base algorithm of the machine learning journey If you want to make a career in AI then you must know about linear regression.
Linear regression is a supervised machine learning algorithm that models the relationship between a dependent variable, denoted as Y, and one or more independent variables, denoted as X, using a linear equation. The linear equation typically has the form:
Y=β0+β1X1+β2X2+…+βnXn+ϵ
Where:
- Y is the dependent variable (the variable we want to predict).
- X1,X2,…,Xn are the independent variables (features).
- β0,β1,β2,…,βn are the coefficients or weights that represent the strength and direction of the relationship between Y and Xi.
- ϵ represents the error term, which accounts for the variability in Y that is not explained by the linear relationship with X.
This is all about linear regression so first we write a program of linear regression of sklearn after that we will create our custom linear regression program in Python.
Let’s start
Define Linear regression of sklearn and predict the y and check it’s coefficients and intercept values.
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_scorelr = LinearRegression()
lr.fit(X_train, y_train)
pred1 = lr.predict(X_test)
print("R2 score - ", r2_score(y_test,pred1))
print("Coefficients - ", lr.coef_)
print("Intercepts - ", lr.intercept_)
Output:
R2 score - 0.9866697681563952
Coefficients - array([83.45521756, 7.7242072 ])
Intercepts - -0.28248314677215247
So this is the output that we get in sklearn’s linear regression algorithm.
Now we create our linear regression program in Python.
First of all, we create one class of CustomLinearRegression and create a fit method in it.
class CustomLinearRegression():def fit(self, X_train, y_train):
pass
we need to define the random coefficient and intercept so we define it as 1.
Here we create numpy array that contains intercept and coefficient both as 0 index is intercept and index 1 to N all are coefficient.
If you don’t get an idea then you need to know the maths behind the linear regression.
class CustomLinearRegression():def fit(self, X_train, y_train):
self.intercept_coef = np.ones(X_train.shape[1]+1)
Array Like this numpy array = array([1., 1., 1.])
Next, we add one extra column of one in X_train for the matrix multiplication.
class CustomLinearRegression():def fit(self, X_train, y_train):
self.intercept_coef = np.ones(X_train.shape[1]+1)
self.new_X_train = np.insert(
X_train, 0,np.ones(X_train.shape[0]),axis= 1
)
Output:array([[ 1. , -0.17762849, 0.79531447],
[ 1. , -0.56554341, 1.76573537],
[ 1. , 1.32675363, -2.20703739],
...,
[ 1. , -0.61634935, 1.35707294],
[ 1. , 0.15436551, -0.81172461],
[ 1. , -0.0790978 , -1.39129491]])
Next, we find the dot product of new_X_train and intercept_coef so we get the new target value.
class CustomLinearRegression():def fit(self, X_train, y_train):
self.intercept_coef = np.ones(X_train.shape[1]+1)
self.new_X_train = np.insert(X_train, 0,np.ones(X_train.shape[0]),axis= 1)
self.y_train = np.dot(self.new_X_train, self.intercept_coef)
Now, we update our intercept and coefficient values for making predictions.
class CustomLinearRegression():def fit(self, X_train, y_train):
self.intercept_coef = np.ones(X_train.shape[1]+1)
self.new_X_train = np.insert(X_train, 0,np.ones(X_train.shape[0]),axis= 1)
self.y_train = np.dot(self.new_X_train, self.intercept_coef)
self.intercept_coef = np.dot(
np.linalg.inv(np.dot(
self.new_X_train.T, self.new_X_train)
),
np.dot(self.new_X_train.T, y_train))
return self.intercept_coef[0], self.intercept_coef[1:]
Output(-0.2824831467721534, array([83.45521756, 7.7242072 ]))
Here first is the intercept value and the second and third are coefficients. it is the same as sklearn’s algorithm.
Great!!!!!
* This is the last step *
Now we make the function of prediction and predict test data.
class CustomLinearRegression():def fit(self, X_train, y_train):
self.intercept_coef = np.ones(X_train.shape[1]+1)
self.new_X_train = np.insert(X_train, 0,np.ones(X_train.shape[0]),axis= 1)
self.y_train = np.dot(self.new_X_train, self.intercept_coef)
self.intercept_coef = np.dot(np.linalg.inv(np.dot(self.new_X_train.T, self.new_X_train)), np.dot(self.new_X_train.T, y_train))
return self.intercept_coef[0], self.intercept_coef[1:]
def predict(self, X_test):
self.new_X_test = np.insert(X_test, 0,np.ones(X_test.shape[0]),axis= 1)
self.predictions = np.dot(self.new_X_test, self.intercept_coef)
return self.predictions
It is the same as earlier when we dot product on train data.
Finally, We make custom linear regression which same as sklearn’s algorithm.
clr =CustomLinearRegression()
clr.fit(X_train,y_train)
Output(-0.2824831467721534, array([83.45521756, 7.7242072 ]))
pred2 = clr.predict(X_test)
r2_score(y_test, pred2)
Output0.9866697681563952
This is my first blog on Medium If you want to motivate me then please support me.
UPI ID — vayakakshay08–2@okicici