Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Machine Learning

Credit Risk Analysis with Machine Learning | by Evelyn | Sep, 2023

admin by admin
September 10, 2023
in Machine Learning


Utilizing machine learning in identifying how likely a loan applicant will default on their loan so that creditors could lower their risk

Credit risk can be defined as the risk associated to the financial lost that resulted from failure in loan repayment by the borrower. Creditors or lenders need to minimize this risk in order to prevent cash flow interruption or incur additional cost in to collect the defaulted loan. This is why conducting credit risk analysis based on history data is crucial before granting any loan. Considering there’s huge amount of historical data, this is where machine learning will come in handy to make the process easier and faster.

Since this is a binary classification problem (default or not), logistic regression would probably be the most common go-to method. Logistic regression is different from linear regression in that it uses sigmoid function to estimate probability, which can only be in between 0 and 1. Therefore, logistic regression can be used in machine learning for predictive modelling. If the probability of default for a particular loan applicant is high, they can automatically be classified as high risk. Then, creditor can either refuse to grant the loan, raise the interest, or demand assets as collateral.

The dataset is obtained from Kaggle, which contains several essential information and characteristics of loan applicants such as their age, income, home ownership status, loan amount, loan interest rate, etc. The data is actually unbalance with significantly more data of not default than default. Undersampling method will be used to balance this out by randomly selecting data from the majority class and taking only as much sample as what we have in the minority class.

data_def = data[data["Default"] == "Y"]
data_non_def = data[data["Default"] == "N"]
data_non_def = data_non_def.sample(n=len(data_def), random_state=123)
data = pd.concat([data_def, data_non_def])

For this analysis, we’ll focus on how loan amount and interest rate affect the risk of default. The data will be split into train and test data with 80:20 ratio.

X = data[["Amount", "Rate"]]
y = data["Default"]
y = y.apply(lambda x: 1 if x == "Y" else 0)

X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.2,
stratify=y,
random_state=123)

Before running the logistic regression, standardization for the input variables is required as both of them have a significantly different scale. The loan amounts are in the thousands while the loan rate are in percentage. This is done by subtracting the data with the mean and dividing it with its standard deviation

def scaler_transform(data, scaler=StandardScaler()):
scaler.fit(data)
data_scaled = scaler.transform(data)
data_scaled = pd.DataFrame(data_scaled)
data_scaled.columns = data.columns
data_scaled.index = data.index
return data_scaled

X_train_scaled = scaler_transform(data=X_train)

Next, the best parameters for logistic regression can be determined using Grid Search:

logreg = LogisticRegression(random_state=123)
parameters = {'solver': ['liblinear', 'saga'],
'C': np.logspace(0, 10, 10)}
logreg_cv = GridSearchCV(estimator=logreg,
param_grid=parameters,
cv=5)
logreg_cv.fit(X=X_train_scaled, y=y_train)

Use the best parameters to build the model, fit it to our data, and check the performance using our train data:

logreg = LogisticRegression(C=logreg_cv.best_params_['C'],
solver=logreg_cv.best_params_['solver'],
random_state=123)

logreg.fit(X_train_scaled, y_train)
y_pred_train = logreg.predict(X_train_scaled)

print(confusion_matrix(y_true=y_train, y_pred=y_pred_train))
print(classification_report(y_true=y_train,
y_pred=y_pred_train,
target_names=["Not default", "default"]))

Result:

It shows that our model have above 80% precision for both default and not default prediction. This looks like a great result, but it still needs to be validated on the test data and check the result:

The model seems to work well with the test data as well, even slightly better for the not default class at 91% precision. Since the model performance seems good enough, lets visualize it on our data:

The black dashed line shows the decision boundary generated by the logistic regression. It shows that if the standardized loan rate is above 0, which means it is above average, the loan is more likely to default.

Codes used are available from GitHub.

Using logistic regression, it was found that interest rate has a more pronounced effect on default risk than loan amount. As interest rate rise above average, the loan will most probably default. In contrary, higher amount of loans at the same interest rate doesn’t really increase the default risk. In fact those with higher loan amount can actually handle a slightly higher interest rate before risking loan default.

However, this doesn’t necessarily mean that lowering the interest rate will lower the risk of default. This is because often, creditors will assign higher interest rate to individuals with already bad credit history, such as those who have late payments in the past. Therefore, more information on past credit behaviors of these same individuals is required for better understanding

Reference:

https://machinelearningmastery.com/logistic-regression-for-machine-learning/

https://machinelearningmastery.com/standardscaler-and-minmaxscaler-transforms-in-python/



Source link

Previous Post

Semantic image search for articles using Amazon Rekognition, Amazon SageMaker foundation models, and Amazon OpenSearch Service

Next Post

A Primer on Foundational Concepts You Need to Start Running Statistical Tests | by Murtaza Ali | Sep, 2023

Next Post

A Primer on Foundational Concepts You Need to Start Running Statistical Tests | by Murtaza Ali | Sep, 2023

BrainChip Showcases Foundation for Next-generation AI Solutions at AI Hardware & Edge AI Summit

Improving asset health and grid resilience using machine learning

Related Post

Artificial Intelligence

16, 8, and 4-bit Floating Point Formats — How Does it Work? | by Dmitrii Eliuseev | Sep, 2023

by admin
September 30, 2023
Machine Learning

The Transformative Power of Machine Learning in Industrial IoT | by Ashish Jagdish Sharma | Sep, 2023

by admin
September 30, 2023
Machine Learning

Top 6 Accounts Payable KPIs to measure

by admin
September 30, 2023
Artificial Intelligence

Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities

by admin
September 30, 2023
Edge AI

The History of AI: How Generative AI Grew from Early Research

by admin
September 30, 2023
Artificial Intelligence

Energy Supply and Demand Optimisation: Mathematical Modelling Using Gurobi Python | by Kong You Liow | Sep, 2023

by admin
September 29, 2023

© Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.