Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Machine Learning

Implementing Linear Regression — Using Admission Prediction data set | by Nitish kumar singh | Sep, 2022

admin by admin
September 9, 2022
in Machine Learning


When we talk about various algorithms of Machine Learning, Linear Regression is the first and most basic algorithm. It is a supervised machine learning model, where we provide a known data set with marked labels and features and, in turn, expect the algorithm to predict the outcome following the same pattern of the provided data set.

Examples of linear regression —

  1. Predicting the price of a house given house features
  2. Predicting the sale of ice cream based on the season (temperature outside)
  3. Predicting the impact of SAT/GRE scores on college admissions

In the above examples, we can observe that we have a linear relationship between both factors. For example, the sale of ice cream will increase with the increase in temperature.

In this article, we will work on the admission prediction dataset, where we will predict the chance of getting admission into the desired college/university based on the given factors.

Let’s investigate the data set first—

columns of admission prediction dataset

It is noticeably clear from the image that we must predict the chance of admission into the college based on the GRE, TOEFL Score, University Rating, SOP, LOR, CGPA, and Research.

Importing all the required libraries —

We have imported the libraries for the following reasons-

After importing the libraries, we will import the data set using the pandas library, and then we will generate a detailed report of the data set using pandas profiling.

Key observations —

Profiling report
Spearman correlation
  • We can observe various levels of collinearity from the spearman correlation. CGPA, GRE score, and TOEFL score are highly correlated with the chance of admit, except serial no. all other columns are correlated with the chance of admit.

After exploring the data, we will now do feature engineering —

Firstly, we will fill in the missing values.

We can fill in the missing data using mean, median, and mode values depending upon a range of factors, which we will discuss later.

checking null values

All the null values have been filled. We can check this using df.describe() code also.

The Serial no. column is adding no value to our analysis; hence it can be dropped.

Code:

df.drop(columns=[‘Serial No.’], inplace=True)

Splitting our data set —

After cleaning and filling in the data, let’s split our data for model training. We will now separate the label (y-axis or the prediction) and feature (x-axis or the input value) from the data set.

Spitting the dataset

In the feature column, the GRE Score value has a higher range of values than others. This might affect our model accuracy, due to which the predicted value might fluctuate. To overcome this, we will do standard scaling.

scaling the data

We will further check the multicollinearity of our data set, i.e. whether our feature data set is collinear among themselves or not. It might affect our model’s accuracy.

To check the multi-collinearity, we have multiple ways, but here, we will use VIF (variance inflation factor) to limit the multi-collinearity.

VIF implementation
  1. VIF starts at 1 and has no upper limit
  2. VIF = 1, no correlation between the independent variable and the other variables
  3. VIF exceeding 5 or 10 indicates high multicollinearity between this independent variable and the others
VIF outcome

All the values are below 5; hence we can conclude that our dataset is not multicollinear.

Finally, we are ready to train our model; let’s now split our data set into two parts.

Code:

x_train,x_test,y_train,y_test=train_test_split(arr,y,test_size=0.25, random_state=10)

After splitting the data, we pass it to the model for training purposes.

Training the model

At this time, we can save this model for further use and make this entire thing portable and handy we used pickle library.

Dumping the model

As we have completed our training phase, let’s check the model’s score and predict the outcome.

Our model has generated a decent score of approximately 81 per cent. This can be increased in several ways that we will discuss later.

Here we discussed the most fundamental way of linear regression. We also have some more advanced and regularised linear regression tools like lasso, ridge, and elastic net.



Source link

Previous Post

How to Write a Telegram Bot with Python | by Ari Joury, PhD | Sep, 2022

Next Post

Data Contracts — From Zero To Hero | by mehdio | Sep, 2022

Next Post

Data Contracts — From Zero To Hero | by mehdio | Sep, 2022

DocumentArray push/pull: Big changes coming soon | by Alex C-G | Jina AI | Sep, 2022

3-D Clustering of NBA Players with Plotly and K-Means | by Bruno Caraffa | Sep, 2022

Related Post

Artificial Intelligence

Dates and Subqueries in SQL. Working with dates in SQL | by Michael Grogan | Jan, 2023

by admin
January 27, 2023
Machine Learning

ChatGPT Is Here To Stay For A Long Time | by Jack Martin | Jan, 2023

by admin
January 27, 2023
Machine Learning

5 steps to organize digital files effectively

by admin
January 27, 2023
Artificial Intelligence

Explain text classification model predictions using Amazon SageMaker Clarify

by admin
January 27, 2023
Artificial Intelligence

Human Resource Management Challenges and The Role of Artificial Intelligence in 2023 | by Ghulam Mustafa Shoaib | Jan, 2023

by admin
January 27, 2023
Deep Learning

Training Neural Nets: a Hacker’s Perspective

by admin
January 27, 2023

© 2023 Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.