[ad_1]

## SHAP, Lime, Explainable Boosting Machine, Saliency maps, TCAV, Distillation, Counterfactual, and interpretML

ML/AI models are getting more complex and challenging to interpret and explain. A simple, easy-to-explain regression or decision tree model can no longer fully satisfy technical and business needs. More and more people use ensemble methods and deep neural networks to get better predictions and accuracy. However, those more complex models are hard to explain, debug, and understand. Thus, many people call these models black-box models.

When we train an ML/AI model, we often focus on technical details like step size, layers, early stopping, dropout, dilation, and so on. We don’t really know why our model behaves a certain way. For example, consider a credit risk model. Why does our model assign a certain score to an individual? What features does our model rely on? Does our model heavily rely on a feature that is incorrect? Even if our model does not take race and gender as input features, does our model infer those attributes from other features and introduce biases toward certain groups? Can stakeholders understand and trust the model behavior? Can the model provide people guidelines on how to improve their credit scores? Model interpretation and explanation can offer insights into these questions, help us debug the model, mitigate bias, and establish transparency and trust.

There has been an increasing interest in machine learning model interpretability and explainability. Researchers and ML practitioners have designed many explanation techniques. In this article, we will provide a high-level overview of eight popular model explanation techniques and tools including **SHAP, Lime, Explainable Boosting Machine, Saliency maps, TCAV, Distillation, Counterfactual, and interpretML**.

“SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions.”

Figure 1 shows the gist of how SHAP works. Assuming that the base rate, i.e., the prior background expectation for the predicted value E[f(x)], is 0.1. Now we have a new observation with the features age=65, sex=F, BP=180, BMI=40. The predicted value of this new observation is 0.4. How do we explain this difference between the output value of 0.4 and the base rate of 0.1? This is where Shapley values come into play:

- Let’s start with the base rate 0.1.
- Adding the BMI Shapely value of 0.1, we get 0.2.
- Adding the BP Shapely value of 0.1, we get 0.3.
- Adding the Sex Shapely value of -0.3, we get 0.
- Adding the Age Shapely value of 0.4, we get 0.4

Based on the additive nature of Shapely values, the model prediction is 0.4 for this new observation. The SHAP values provide insights into the importance of each feature and explain how the model prediction works.

How do we calculate the Shapley values? SHAP plots partial dependence plots from a linear model where the x-axis represents the feature and the y-axis represent the expected value of the output given the feature (see Figure 2). The Shapley value for a feature is the difference between the expected model output and the partial dependence plot at a given value of this feature, i.e., the length of the red line in the figure.

Shapley values can be complicated to compute since it takes the average of all possible permutations of coalitions. The `shap`

library uses sampling and optimization techniques to handle all the computation complexities and returns straightforward results for tabular data, text data, and even image data (see Figure 3). Install SHAP via `conda install -c conda-forge shap`

and gives it a try.

Models can be complex globally. Instead of focusing on the overall complex model behavior, Lime (Local Interpretable Model-Agnostic Explanations) focuses on a local region and uses a linear approximation to reflect the behavior around the instance being predicted.

Figure 4 illustrates how Lime works. The blue/pink background represents the decision function of the original complex model. The red cross (we call it X) is the instance/new observation that we’d like to predict and explain.

- Sample points around X
- Use the original model to predict each sampled point
- Weight samples according to their proximity to X (points with larger weights correspond to a larger size in the figure)
- Fit a linear model (dashed line) on weighted samples
- Use this linear model to explain locally around X

With Lime, we can explain model behavior locally for tabular data, text data, and image data. Here is an example using Lime to explain a text classifier. We can see that this classifier predicts the instance correctly, but for the wrong reasons.

To learn more about Lime, check out the Github page and install via `pip install lime`

.

“Explainable Boosting Machine (EBM) is a tree-based, cyclic gradient boosting Generalized Additive Model with automatic interaction detection. EBMs are often as accurate as state-of-the-art blackbox models while remaining completely interpretable.”

EBM works as follows:

- In each iteration, we train a bagging and gradient boosting tree on one feature at a time in a round-robin fashion. We train on the first feature first, then update the residue and train on the second feature, and continue until we finish training on all the features.
- Then we repeat this process many many times.
- Because EBM cycles through the features one by one, it can show the importance of each feature in the final prediction.

EBM is implemented in interpret.ml, which we will cover later in this article.

The saliency map method is widely used in interpreting neural network image classification tasks. It measures the importance of each pixel and highlights which pixels are important for the prediction. On a high level, the saliency map takes the gradient or derivative of each class with respect to each image input pixel and visualizes the gradients (see Figure 6).

The PAIR Saliency project provides the “framework-agnostic implementation for state-of-the-art saliency methods” including Guided Integrated Gradients, XRAI, SmoothGrad, Vanilla Gradients, Guided, Backpropagation, Integrated Gradients, Occlusion, Grad-CAM, Blur IG.

To learn more about Saliency methods, check out the Github page and install via pip install saliency.

TCAV stands for quantitative Testing with Concept Activation Vectors (CAVs). TCAV “quantifies the degree to which a user-defined concept is important to a classification result–for example, how sensitive a prediction of zebra is to the presence of stripes” (Kim, 2018).

TCAV performs the following steps to determine if a concept is important:

- Define concept activation vectors (Figure 7 step a-d)

TCAV uses examples of concept images (images with striped objects) and random images as inputs and retrieves the layer activations. Then it trains a linear classifier to separate the activations and takes the vector (CAV) that’s orthogonal to the hyperplane decision boundary. The CAV represents the stripe-ness in the images.

- Compute TCAV score (figure 7 step e)

TCAV scores are calculated by taking the directional derivative with CAV. It represents the models’ sensitivity to a specific concept like stripes.

To test if a concept is statically meaningful. The same process can be done with random vs random images. We can compare the concept vs random image TCAV score distribution with the random vs random image TCAV score distribution. A two-sided t-test can be performed to test the TCAV score distribution differences.

To learn more about TCAV, check out the Github page and install via `pip install tcav`

.

“In machine learning, knowledge distillation is the process of transferring knowledge from a large model to a smaller one.”

In the model explanation context, the large model is the black box model and also the teacher model. The smaller model is the explainer, the student model. The student model tries to mimic the behavior of the teacher model and is interpretable.

For example, one can construct a decision tree to approximate the original complex model (Bastani, 2019). The Bastani paper “proposes a model extraction algorithm for learning decision trees — to avoid overfitting, the algorithm generates new training data by actively sampling new inputs and labeling them using the complex model.”

Counterfactual describes the smallest amount of input feature changes needed in order to change the model prediction. Counterfactual asks a lot of what-if questions. What if we increase this feature or decrease that feature? For example, according to a black box model, John has a high risk of heart disease. What if John exercises 5 days a week? What if John is a vegetarian? What if John does not smoke? Would those changes lead to a change in model prediction? Those counterfactuals provide easy-to-understand explanations.

There is much research on and many methods for generating counterfactuals. For example, DiCE (Diverse Counterfactual Explanations) generates a set of diverse feature-perturbed options for the same person whose loan gets rejected but would have gotten approved if the income increased by $10,000 or income increased by $5,000 and have 1 more year of credit history. DiCE optimizes for both diversity and proximity to the original input with the support of user-specific requirements.

To learn more about interpretML, check out the documentation and install via `conda install -c conda-forge dice-ml`

.

“InterpretML is an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof.”

interpretML provides explanations for glassbox models including

- explainable boosting machine,
- linear model,
- decision tree,
- and decision rule.

It also provides explanations for black-box models using

- Shapley additive explanations,
- local interpretable model-agnostic explanations,
- partial dependence plot,
- and Morris sensitivity analysis.

The results of interpretML can be shown in a Plotly dashboard with nice interactive interfaces:

To learn more about interpretML, check out the documentation and install via `conda install -c interpretml interpret`

.

Overall, we have walked through a high-level overview of some of the popular model explainability techniques and tools including SHAP, Lime, Explainable Boosting Machine, Saliency maps, TCAV, Distillation, Counterfactual, and interpretML. Each technique comes with its own variations. We will get into each technique in detail in the future.

- https://shap.readthedocs.io/
- “How do I fool you?”: Manipulating User Trust via Misleading Black Box Explanations. Himabindu Lakkaraju, Osbert Bastani. AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2020.
- Faithful and Customizable Explanations of Black Box Models. Himabindu Lakkaraju, Ece Kamar, Rich Carauna, Jure Leskovec. AAAI/ACM Conference on Artificial Intelligence, Ethics, and Society (AIES), 2019.
- https://explainml-tutorial.github.io/neurips20
- https://christophm.github.io/interpretable-ml-book
- https://homes.cs.washington.edu/~marcotcr/blog/lime/
- “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.
- Faithful and customizable explanations of black box models. Lakkaraju, Kamar, Caruana, and Leskovec. AIES, 2019.
- Interpreting Blackbox Models via Model Extraction. Osbert Bastani, Carolyn Kim, Hamsa Bastani. 2019
- Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV). Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, Rory Sayres. 2018.
- https://www.youtube.com/watch?v=Ff-Dx79QEEY
- pair-code.github.io/saliency/
- https://interpret.ml/
- Explaining Machine Learning Classifiers through Diverse Counterfactual Explanations. Ramaravind K. Mothilal, Amit Sharma, Chenhao Tan. 2019.

. . .

By Sophia Yang on August 23, 2022.

Sophia Yang is a Senior Data Scientist at Anaconda. Connect with me on LinkedIn, Twitter, and YouTube and join the DS/ML Book Club ❤️

[ad_2]

Source link