Machine Learning News Hubb
Advertisement Banner
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us
Machine Learning News Hubb
No Result
View All Result
Home Artificial Intelligence

Deep Dive into PFI for Model Interpretability | by Tiago Toledo Jr. | Jul, 2023

admin by admin
July 21, 2023
in Artificial Intelligence


Another interpretability tool for your toolbox

Towards Data Science

Photo by fabio on Unsplash

Knowing how to assess your model is essential for your work as a data scientist. No one will sign off on your solution if you’re not able to fully understand and communicate it to your stakeholders. This is why knowing interpretability methods is so important.

The lack of interpretability can kill a very good model. I haven’t developed a model where my stakeholders weren’t interested in understanding how the predictions were made. Therefore, knowing how to interpret a model and communicate it to the business is an essential ability for a data scientist.

In this post, we’re going to explore the Permutation Feature Importance (PFI), an model agnostic methodology that can help us identify what are the most important features of our model, and therefore, communicate better what the model is considering when doing its predictions.

The PFI method tries to estimate how important a feature is for model results based on what happens to the model when we change the feature connected to the target variable.

To do that, for each feature, we want to analyze the importance, we random shuffle it while keeping all the other features and target the same way.

This makes the feature useless to predict the target since we broke the relationship between them by changing their joint distribution.

Then, we can use our model to predict our shuffled dataset. The amount of performance reduction in our model will indicate how important that feature is.

The algorithm then looks something like this:

  • We train a model in a training dataset and then assess its performance on both the training and the testing dataset
  • For each feature, we create a new dataset where the feature is shuffled
  • We then use the trained model to predict the output of the new dataset
  • The quotient of the new performance metric by the old one gives us the importance of the feature

Notice that if a feature is not important, the performance of the model should not vary a lot. If it is, then the performance must suffer a lot.

Now that we know how to calculate the PFI, how do we interpret it?

It depends on which fold we are applying the PFI to. We usually have two options: applying it to the training or the test dataset.

During training, our model learns the patterns of the data and tries to represent it. Of course, during training, we have no idea of how well our model generalizes to unseen data.

Therefore, by applying the PFI to the training dataset we are going to see which features were the most relevant for the learning of the representation of the data by the model.

In business terms, this indicates which features were the most important for the model construction.

Now, if we apply the method to the test dataset, we are going to see the feature impact on the generalization of the model.

Let’s think about it. If we see the performance of the model go down in the test set after we shuffled a feature, it means that that feature was important for the performance on that set. Since the test set is what we use to test generalization (if you’re doing everything right), then we can say that it is important for generalization.

The PFI analyzes the effect of a feature in your model performance, therefore, it does not state anything about the raw data. If your model performance is poor, then any relation you find with PFI will be meaningless.

This is true for both sets, if your model is underfitting (low prediction power on the training set) or overfitting (low prediction power on the test set) then you cannot take insights from this method.

Also, when two features are highly correlated the PFI can mislead your interpretation. If you shuffle one feature but the required information is encoded into another one, then the performance may not suffer at all, which would make you think the feature is useless, which may not be the case.

To implement the PFI in Python we must first import our required libraries. For this, we are going to use mainly the libraries numpy, pandas, tqdm, and sklearn:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from tqdm import tqdm
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes, load_iris
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.metrics import accuracy_score, r2_score

Now, we must load our dataset, which is going to be the Iris dataset. Then, we’re going to fit a Random Forest to the data.

X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.3, random_state=12, shuffle=True
)

rf = RandomForestClassifier(
n_estimators=3, random_state=32
).fit(X_train, y_train)

With our model fitted, let’s analyze its performance to see if we can safely apply the PFI to see how the features impact our model:

print(accuracy_score(rf.predict(X_train), y_train))
print(accuracy_score(rf.predict(X_test), y_test))

We can see we achieved a 99% accuracy on the training set and a 95.5% accuracy on the test set. Looks good for now. Let’s get the original error scores for a later comparison:

original_error_train = 1 - accuracy_score(rf.predict(X_train), y_train)
original_error_test = 1 - accuracy_score(rf.predict(X_test), y_test)

Now let’s calculate the permutation scores. For that, it is usual to run the shuffle for each feature several times to achieve a statistic of the feature scores to avoid any coincidences. In our case, let’s do 10 repetitions for each feature:

n_steps = 10

feature_values = {}
for feature in range(X.shape[1]):
# We will save each new performance point for each feature
errors_permuted_train = []
errors_permuted_test = []

for step in range(n_steps):
# We grab the data again because the np.random.shuffle function shuffles in place
X, y = load_iris(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=12, shuffle=True)
np.random.shuffle(X_train[:, feature])
np.random.shuffle(X_test[:, feature])

# Apply our previously fitted model on the new data to get the performance
errors_permuted_train.append(1 - accuracy_score(rf.predict(X_train), y_train))
errors_permuted_test.append(1 - accuracy_score(rf.predict(X_test), y_test))

feature_values[f'{feature}_train'] = errors_permuted_train
feature_values[f'{feature}_test'] = errors_permuted_test

Now we have a dictionary with the performance for each shuffle we did. Now, let’s generate a table that has, for each feature in each fold, the average and the standard deviation of the performance when compared to the original performance of our model:

PFI = pd.DataFrame()
for feature in feature_values:
if 'train' in feature:
aux = feature_values[feature] / original_error_train
fold = 'train'
elif 'test' in feature:
aux = feature_values[feature] / original_error_test
fold = 'test'

PFI = PFI.append({
'feature': feature.replace(f'_{fold}', ''),
'pfold': fold,
'mean':np.mean(aux),
'std':np.std(aux),
}, ignore_index=True)

PFI = PFI.pivot(index='feature', columns='fold', values=['mean', 'std']).reset_index().sort_values(('mean', 'test'), ascending=False)

We will end up with something like this:

We can see that feature 2 seems to be the most important feature in our dataset for both folds, followed by feature 3. Since we’re not fixing the random seed for the shuffle function from numpy we can expect this number to vary.

We can then plot the importance in a graph to have a better visualization of the importance:

The PFI is a simple methodology that can help you quickly identify the most important features. Go ahead and try to apply it to some model you’re developing to see how it is behaving.

But also be aware of the limitations of the method. Not knowing where a method falls short will end up making you do an incorrect interpretation.

Also, notices that the PFI shows the importance of the feature but does not states in which direction it is influencing the model output.

So, tell me, how are you going to use this in your next models?

Stay tuned for more posts about interpretability methods that can improve your overall understanding of a model.



Source link

Previous Post

Jual Tilt Switch harga termurah terbaik dan terbaru — 082134658880 | by Syubban Jaya Serasi | Jul, 2023

Next Post

What is Superalignment & Why It is Important?

Next Post

What is Superalignment & Why It is Important?

The Enduring Relevance of EDI in the Age of APIs

Reflections from RSS: Three Reasons DL Fails at Autonomy

Related Post

Artificial Intelligence

Genius Cliques: Mapping out the Nobel Network | by Milan Janosov | Sep, 2023

by admin
October 1, 2023
Machine Learning

Detecting Anomalies with Z-Scores: A Practical Approach | by Akash Srivastava | Oct, 2023

by admin
October 1, 2023
Machine Learning

What are SWIFT Payments and How Does It Work?

by admin
October 1, 2023
Artificial Intelligence

Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs

by admin
October 1, 2023
Edge AI

Unleashing LiDAR’s Potential: A Conversation with Innovusion

by admin
October 1, 2023
Artificial Intelligence

16, 8, and 4-bit Floating Point Formats — How Does it Work? | by Dmitrii Eliuseev | Sep, 2023

by admin
September 30, 2023

© Machine Learning News Hubb All rights reserved.

Use of these names, logos, and brands does not imply endorsement unless specified. By using this site, you agree to the Privacy Policy and Terms & Conditions.

Navigate Site

  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

Newsletter Sign Up.

No Result
View All Result
  • Home
  • Machine Learning
  • Artificial Intelligence
  • Big Data
  • Deep Learning
  • Edge AI
  • Neural Network
  • Contact Us

© 2023 JNews - Premium WordPress news & magazine theme by Jegtheme.