The objective of this exercise is to develop a deep learning model (Multi-layer Perceptron) to predict the probability of a customer to default on the loan. The pre-approved loan means that the lender has already evaluated the financial standing and credit history of the applicant. Hence the processing time for the loan is short and the disbursal is quick.
There are 3 stages in model explainability:
· Feature selection (Section 2)
· Model selection (Section 3)
· Explainable output (Section 4)
There are 68,294 observations and 10 columns (please refer appendix A). There are 4 categorical variables, 5 numerical variables, and 1 dependent variable.
The model is developed on 68,294 observations. There are 61,770 (90.4%) non-defaults and 6,524 (9.6%) defaults.
Based on the default rate the categories are combined together. The chart below shows the default rate for the combined categories.

Decision trees are used to bin the numerical variables (max depth of the tree is 3 and min samples leaf is 1,360). Based on the default rate the bins are combined together. The chart below shows the default rate for the combined bins.

Amount and Duration are not used in the model as they are not monotonic in nature.
Correlation matrix is used to identify if the variables are correlated with each other. It is observed that the correlation between the independent variables is more than -0.7 and less than 0.7. Hence there is no multi co-linearity.

Deep learning is the science to allow computers to learn just like humans, particularly learn patterns from information. Machine learning has supervised, unsupervised and semi-supervised algorithms. Deep learning is a part of machine learning. There are specific algorithms that are a part of deep learning. Deep learning consists of a stack of layers consisting of neurons and activation function.
· Supervised algorithm: Teaching the algorithm using inputs and outputs. The output is the label identifying fraud and not fraud.
· Feature extraction: Extracting the most valuable features.

GridSearchCV exhaustively considers all parameter combinations. It is used for tuning the hyper-parameters of an estimator. The GridSearchCV instance implements the usual estimator API, when “fitting” it on a dataset all the possible combinations of parameter values are evaluated and the best combination is retained.
The optimal hyper-parameters are determined using iterative process (please refer appendix B). The best hyper-parameters are:
· hidden_layer_sizes: 4,4,4 (The ith element represents the number of neurons in the jth hidden layer). There are 3 hidden layers in the model and each layer has 4 neurons.
· activation: logistic (Activation function for the hidden layer. The logistic sigmoid function, returns f(x) = 1 / (1 + exp(-x)))
· solver: lbfgs (The solver for weight optimization. The optimizer lbfgs is from the family of quasi-Newton methods)
· It is observed that the cut-off is at 0.55. Cut-off is decided based on Accuracy.
· The model has accuracy of 0.933 (Accuracy = (True positive + True negative ) / All)
· The model has F1 score of 0.518 (F1 = 2 * (precision * recall) / (precision + recall))
· The model has AUROC of 0.803 (measures how well the model is able to distinguish between good and bad)
For each of the binned variables, the imputed value and the direction are shown below. It is observed that there is monotonic trend for all the independent variables.

There are 56 combinations (7 x 2 x 2 x 2 = 56). Score (7 bins), Acc type (2 bins), Payment (2 bins) and Month (2 bins). For each of the combination the predicted PD is calculated (please refer appendix C).
Since categorical variables and binned numerical variables are used in model it is possible to get all the possible combinations of the input data. For each of the combination of the input data the predicted PD is calculated.
· Inputs: Score (TU) — numerical bin, Account type — category, Payment type — category and Month — numerical bin are selected from the drop down menu
· Output: The Probability of default (PD) and Decision. If PD > 0.55 then BAD else GOOD.
· Interpretability: Each observation gets its own predicted values. This helps to explain why a case receives its prediction and the contributions (direction) of the predictors.

Categorical Variables:
· id — the account id of the applicant (not used in model development)
· Payment — the manner of payment code (1, 2, 3, 4, 5, 7, 8, 9, 8A, 8P, 9B, 9P and UR)
· Acc type — the account type code (A, B, C, D, F, G, H, I, L, M, N, P, R, S, T and U)
· Pay type — the payment type code (B and U)
Numerical Variables:
· Year — the year when the loan was taken (not used in model development)
· Month — the month when the loan was taken
· Score — the Trans Union (TU) score (external score)
· Amount — the amount of loan amount in USD
· Duration — the duration of the loan in months
Hyperparameters control the over-fitting and under-fitting of the model. For each proposed hyperparameter setting the model is evaluated. The hyperparameters that give the best model are selected.

There are 56 combinations. For each of the combination the predicted PD is calculated
