Explainable artificial intelligence (XAI) is a set of processes and methods that allows human users to comprehend and trust the results and output created by machine learning algorithms. To promote explainable AI, researchers have been developing tools and techniques and here we look at some of the tools and techniques that demystify explainable AI.

This article was originally published by Neptune.

Imagine that you have to present your newly built facial recognition feature to the technical heads of a SaaS product. The presentation goes relatively well until the CTO asks you “so what exactly goes on inside?” and all you can say is “nobody knows, it’s a black box”.

Pretty soon, other stakeholders would start to worry. “How can we trust something, if we don’t know what it does?”.

It’s a valid concern. For a long time, ML models were universally viewed as black boxes because we couldn’t explain what happened to the data between the input and the output. But now, we have explainability.

In this article, we’re going to explain explainability, explore why it’s necessary, and talk about techniques and tools that simplify explainability.

Explainability Black Box
ML Black Box | Source: Author

What is Explainability in ML, and What is Explainable AI (XAI)?

Explainability in machine learning means that you can explain what happens in your model from input to output. It makes models transparent and solves the black box problem.

Explainable AI (XAI) is the more formal way to describe this and applies to all artificial intelligence. XAI means methods that help human experts understand solutions developed by AI.

‘Explainability’ and ‘interpretability’ are often used interchangeably. Although they have the same goal (‘understand the model.

In his book, “Interpretable Machine Learning”, Christoph Molnar defines interpretability as the degree to which a human can understand the cause of a decision or the degree to which a human can consistently predict ML model results.

Take an example: you’re building a model that predicts pricing trends in the fashion industry. The model might be interpretable — you can see what you’re doing. But it’s not explainable yet. It will be explainable once you dig into the data and features behind the generated results. Understanding what features contribute to the model’s prediction and why they do is what explainability is all about.

A car needs fuel to move, i.e it is the fuel that causes the engines to move – interpretability. Understanding how and why the engine consumes and uses the fuel – explainability.

Most tools and techniques mentioned in this article can be used for both Explainability and Interpretability because like I mentioned earlier both concepts give a perspective on understanding what the model is about.

Explainable AI is about understanding ML models better. How they make decisions, and why. The three most important aspects of model explainability are:

  1. Transparency
  2. Ability to question
  3. Ease of understanding

Approaches to Explainability

You can approach explainability in two ways:

  1. Globally – This is the overall explanation of model behavior. It shows us a big picture view of the model, and how features in the data collectively affect the result.
  2. Locally – This tells us about each instance and feature in the data individually (kind of like explaining observations seen at certain points in the model), and how features individually affect the result.

Why is Explainability Important?

Machine Learning gets a bad reputation when it negatively impacts business profits. This often happens because of the disconnect between the data science team and the business team.

XAI connects the data science team and non-technical execs, improving knowledge exchange, and giving all stakeholders a better understanding of product requirements and limitations. All of this promotes better governance.

But there are at least five more reasons why ML explainability is important:

1. Accountability: When a model makes a wrong or rogue decision, knowing the factors that caused that decision, or who is responsible for that failure, is necessary to avoid similar problems in the future. With XAI, data science teams can give organizations more control over their AI tools.

2. Trust: In high-risk domains (like healthcare or finance), trust is critical. Before ML solutions can be used and trusted, all stakeholders must fully understand what the model does. If you claim that your model makes better decisions and notices patterns that humans don’t see, you need to be able to back it up with evidence. Domain experts will be naturally skeptical towards any technology that claims to see more than them.

3. Compliance: Model explainability is critical for data scientists, auditors, and business decision-makers alike to ensure compliance with company policies, industry standards, and government regulations. According to article 14 of the European data protection laws (GDPR), when a company uses automated decision-making tools it must provide meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject. Similar regulations are being put in place across the world.

4. Performance: Explainability can also improve performance. If you understand why and how your model works, you know exactly what to fine-tune and optimize.

5. Enhanced control: Understanding the decision-making process of your models shows you unknown vulnerabilities and flaws. With these insights, control is easy. The ability to rapidly identify and correct mistakes in low-risk situations adds up, especially when applied across all models in production.

Explainable Models

A few models in ML have the characteristics property of explainability i.e transparency, ease of understanding, and ability to question. Let’s take a look at a few of them.

1. Linear models: Linear models such as linear regression, SVMs with linear kernel, etc follow the linearity principle that two or more variables can be added together so that their sum is also a solution. E.g y = mx + c .

So a change in one of the features will affect the output. This is easy to understand and explain.

Explainability linear models

2. Decision Tree Algorithms: Models that use decision trees are trained by learning simple decision rules gotten from prior data. Since they follow a specific set of rules, understanding the outcome simply depends on learning and understanding the rules that led to the outcome.  With the plot_tree  function in scikit-learn, you can see the visualization of how the algorithm got its output. Using the iris dataset:

fig = plt.figure(figsize=(25,20))
_ = tree.plot_tree(clf, 

We get:

Explainability decision tree

Read Also

Gradient Boosted Decision Trees [Guide] – a Conceptual Explanation

3. Generalized Additive Models (GAM): GAMs are models where the usual relationship between predictive variables and dependent variable(response) is replaced by linear and nonlinear smooth functions to model and capture the non-linearity in the data. GAMs are generalized linear models with a smoothing function Owing to their addictive nature, each variable contributes to the output. Hence, we can explain the output of a GAM by simply understanding the predictive variables.

The thing with most explainable models is that they most times do not capture the complexity of some real-world problems and can be inadequate. Also, because a model is simple or linear that doesn’t guarantee explainability.

Neural networks or ensemble models, etc are complex models.

So, for complex models, we use techniques and tools to make them explainable. There are two main approaches:

  1. Model-Agnostic Approach
  2. Model-Specific Approach


Model-agnostic techniques/tools can be used on any machine learning model, no matter how complicated. These agnostic methods usually work by analyzing feature input and output pairs. A good example is LIME.


Model-specific techniques/tools are specific to a single type of model or a group of models.  They depend on the nature and functions of the specific model, for example, tree interpreters.

Techniques for Explainability in ML

Let’s do a broad overview of some interesting explainability techniques, starting with PDP.

Partial Dependence Plots (PDP)

Get a global visual representation of how one or two features influence the predicted outcome of the model, with other features held constant. PDP tells you if the relationship between the target and chosen feature is linear or complex. PDP is model-agnostic.

Scikit learn inspection modules provides a function for partial dependence plot called plot_partial_dependence that creates a one-way and two-way partial dependence plot:

from sklearn.datasets import make_hastie_10_2
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.inspection import plot_partial_dependence

X, y = make_hastie_10_2(random_state=0)
clf = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1,
    max_depth=1, random_state=0).fit(X, y)
features = [0, 1, (0, 1)]
plot_partial_dependence(clf, X, features)
Explainability Partial Dependence Plots
The figure shows two one-way and one two-way partial dependence plots for the California housing dataset | Source: scikit-learn.org

Individual Condition Expectations Plots (ICE)

This gives you a local visual representation of the effect of a feature in the model with respect to the target feature. Unlike PDP, ICE shows separate predictions of the dependence on the feature with one line per sample. It’s also model-agnostic. You can create an ICE plot with the PyCEbox package using Python and R. With scikit-learn, you can implement ICE plots on your model, it uses the plot_partial_dependece function also and you have to set kind=’individual’.

X, y = make_hasplot_partial_dependence(clf, X, features,

Note: Check out the scikit-learn documentation for more details.

Leave One Column Out (LOCO)

This is a very simplistic approach. It leaves one column out, retrains the model, and then computes the differences of each LOCO model to the original model prediction score. If the score changes a lot, the variable that was left out must be important. Depending on model width (amount of features), this approach can be time-consuming.

There are some drawbacks that PDP, ICE, and LOCO share:

  • They don’t directly capture feature interactions,
  • They can be too approximate, which is potentially problematic for categorical data and one-hot encoding that’s frequently used in natural language processing.

Accumulated Local Effects (ALE)

ALE plots were originally proposed by D. Apley (et al) in the paper “Visualizing the Effects of Predictor Variables in Black Box Supervised Learning Models”. It’s different from PDP in the way that it uses a small window on the features, and makes differences between the predictions instead of averages. Since it’s not based on comparing averages, ALE is less biased and has better performance. The python version of ALE can be installed via:

 pip install PyALE

from PyALE import ALE
ale_eff = ale(
    X=X[features], model=model, feature=["carat"], grid_size=50, include_CI=False

Given an already processed dataset with some features an ALE plot would be implemented like this:

Explainability ALE Plots

Note: Check here to learn more about ALE plots.

Local Interpretable Model-Agnostic Explanations (LIME)

LIME was developed by University Of Washington researchers to see what happens inside an algorithm by capturing feature interactions. LIME performs various multi-feature perturbations around a particular prediction and measures the results. It also handles irregular input.

When the number of dimensions is high, maintaining local fidelity for such models becomes increasingly hard. LIME solves a much more feasible task — finding a model that approximates the original model locally.

LIME tries to replicate the output of a model through a series of experiments. The creators also introduced SP-LIME, a method for selecting representative and non-redundant predictions, providing a global view of the model to users.

Note: You can learn more about LIME here.


This is built by the same creators as LIME. The anchor method explains individual predictions of a model by using easily understandable IF-THEN rules — “anchors” — that support (anchor) the predictions well enough.

To find anchors, the authors use reinforcement techniques in combination with a graph search algorithm to explore the sets of perturbations around the data and their effect on the predictions. This is another model-agnostic method.

In the original paper, the author compared LIME with Anchors and visualized how they process a complex binary classifier model (+ or ) to arrive at a result. As shown below, the LIME explanation works by learning a linear decision boundary that best approximates the model, with some local weighting, while Anchors adapts its coverage to the model behavior and makes their boundaries clear.

Explainability Anchors
LIME vs. Anchors — A Toy Visualization. Figure from Ribeiro, Singh, and Guestrin (2018) | Source

Anchors were also tested on a variety of machine learning tasks such as classification, text generation, structured predictions.

SHapley Additive exPlanations (SHAP)

SHAP uses the game theory concept of Shapley values to optimally assign feature importances.

The Shapley Value SHAP (SHapley Additive exPlanations) is the average marginal contribution of a feature value over all possible coalitions.

Coalitions are combinations of features used to estimate the Shapley value of a specific feature. It’s a unified approach to explain the output of machine learning models like linear & logistics regression, NLP, boosted tree model, and addictive models. It can be installed via PyPI or conda-forge:

pip install shap


conda install -c conda-forge shap
Explainability SHAP
This shows how each feature is contributing to the model’s output. | Source

Deep SHAP, a variant of SHAP for deep learning, is a high-speed approximation algorithm that uses background samples instead of single reference values and uses the Shapely equations to linearize operations such as softmax, max, products, etc. Deep SHAP is supported by Tensorflow, Keras, and Pytorch.

Deep Learning Important Features (DeepLIFT)

DeepLIFT is a deep-learning explainability method that uses backpropagation to compare the activation of each neuron to a ‘reference activation’, and then records and assigns that contribution score according to neuron differences.

Essentially, DeepLIFT just digs back into the feature selection of the neural network and finds neurons and weights that had major effects on the output formation. DeepLIFT gives separate consideration to positive and negative contributions. It can also reveal dependencies that are missed by other approaches. Scores can be computed efficiently in a single backward pass.

DeepLIFT is on pypi, so it can be installed using pip:

pip install deeplift

Layer-Wise Relevance Propagation (LRP)

Layer-wise relevance propagation is similar to DeepLIFT, it does backward propagation using a set of purposely-designed propagation rules from the output, identifying the most relevant neurons within the neural network until you return to the input. So, you get all the neurons (e.g pixels that really contribute to the output. LRP works well on CNNs and it can be used to explain LSTMs.

Check out this interactive demo to see how LRP works.

Explainability LRP
Visual representation of how LRP does backpropagation from output node through the hidden layer neurons to input, identifying the neurons that had an impact on the model’s output. | Source

Contrastive Explanations Method (CEM)

Contrastive explanations are facts about an event that — if found true — would constitute an actual case of a specific event. The CEM method provides a contrastive explanation of the decisions and outcomes made by the model instead of another decision or outcome.

CEM is based on the paper “Explanations based on the Missing: Towards Contrastive Explanations with Pertinent Negatives”. The open-source code implementation is here. For classification models, CEM generates instance-based explanations in terms of Pertinent Positives(PP) and Pertinent Negatives (PN).

PP looks for features that minimally, but sufficiently influence the originally predicted outcome (for example, important pixels in an image), while the PN identifies features that minimally and necessarily don’t affect the originally predicted outcome. PN provides a minimal set that differentiates it from the closest different class. CEM can be implemented in TensorFlow. To learn more about CEM, check here.


In 2018, Amit (et al) did a paper “Improving Simple Models with Confidence Profiles”. The paper proposed the ProfWeight method for model explainability. ProfWeight transfers the high test accuracy of a pre-trained deep neural network to a shallow network with low test accuracy.

Like a teacher transferring knowledge to a student, ProfWeight uses probes (weights in the sample according to the difficulty of the network) to transfer knowledge.

ProfWeight can be summarized in four main steps:

  1. Attach and train probes on intermediate representations of a high-performing neural network,
  2. Train a simple model on the original dataset,
  3. Learn weights for examples in the dataset as a function of the simple model and the probes,
  4. Retrain the simple model on the final weighted dataset.

Permutation Feature Importance

Permutation feature importance shows the decrease in the score( accuracy, F1, R2) of a model when a single feature is randomly shuffled. It shows how important a feature is for a particular model. It is a model inspection technique that shows the relationship between the feature and target and it is useful for non-linear and opaque estimators.

It can be implemented in the sci-kit learn library. Check here to see how it’s done.

Tools for Explainability in Machine Learning

We know some of the methods used for ML explainability, so what are the tools that we can use to make our work easier?

AI Explainability 360 (AIX360)

The AI Explainability 360 toolkit is an open-source library from IBM to support the interpretability and explainability of datasets and machine learning models. The AIX360 includes a collection of algorithms that cover different dimensions of explanations along with proxy explainability metrics. It also has tutorials on explainability in different use-cases, like credit approval.


Skater is an open-source, model-agnostic unified Python framework for model explainability and interpretability. Data scientists can build interpretability into a machine learning system for real-world use cases.

Skater approaches explainability both globally (inference based on a complete dataset) and locally (inference individual predictions). It supports deep neural networks, tree algorithms, and scalable Bayes.

Note: Learn more about Skater here.

Explain Like I’m Five (ELI5)

ELI5 is a python package used to understand and explain the prediction of classifiers such as sklearn regressors and classifiers, XGBoost, CatBoost, LightGBM Keras. It offers visualizations and debugging to these processes of these algorithms through its unified API. ELI5 understands text processing and can highlight text data. It can also implement techniques such as LIME and permutation importance.

ELI5 works in python 2.7 and 3.4+ and it requires scikit-learn 0.18+. Then you can install it using:

pip install eli5


Conda install -c conda-forge eli5

Note: Learn more about it here.


InterpretML is an open-source toolkit developed by Microsoft, aimed at improving model explainability for data scientists, auditors, and business leaders. Interpret is flexible and customizable. At the time of writing, InterpretML supports LIME, SHAP, linear models, and decision tree. It offers both global and local explanations to models. Key features:

  • Understand how model performance changes for different subsets of data and compare multiple models,
  • Explore model errors,
  • Analyze dataset statistics and distributions,
  • Explore global and local explanations,
  • Filter data to observe global and local feature importance,
  • Run what-if analysis to see how model explanations change if you edit a data point’s features.

Activation Atlases

Activation Atlases visualize how neural networks interact with each other and how they mature with information along with the depth of layers. Google came up with Activation Atlases in collaboration with OpenAI.

This approach was developed for looking at the inner workings of convolutional vision networks and getting a human-interpretable overview of concepts within the hidden network layers. It started with feature visualization on individual neurons but has since moved to visualize neurons jointly.

Activation Atlases - explainability tools
An activation atlas of the InceptionV1 vision classification network reveals many fully realized features, such as electronics, buildings, food, animal ears, plants, and watery backgrounds. | Source: openai.com

Alibi Explain

Alibi is an open-source Python library for model inspection and interpretation. It provides code needed to produce explanations for black-box algorithms.

Alibi explain helps with:

  • Defining restful APIs for interpretable ML models,
  • Model monitoring,
  • High-quality reference implementations of black-box ML model explanation algorithms,
  • Multiple-use cases (tabular, text and image data classification, regression),
  • Implementing the latest model explanation,
  • Concept drift algorithmic bias detection,
  • Model confidence scores on model decisions.

Note: Learn more about Alibi here.

What-if Tool (WIT)

WIT, developed by the TensorFlow team, is an interactive, visual, no-code interface for visualizing datasets and models in TensorFlow for a better understanding of model outcomes. In addition to TensorFlow models, you can also use the What-If Tool for XGBoost and Scikit-Learn models.

Once a model has been deployed, its performance can be viewed on a dataset in the What-If tool.

Additionally, you can slice the dataset by features and compare performance across those slices. Then you can identify subsets of data where the model performs best or worst. This can be very helpful for ML fairness investigations.

The tool can be accessed via Tensorboard or collab notebook. Check out the WIT website to learn more.

Microsoft Azure

We all know Azure, no need to explain what it is. Azure has interpretability classes in its SDK packages.

Azure .interpret contains functionalities like SHAP tree Explainer, SHAP Deep Explainer, SHAP Linear Explainer, and more.

Use ‘pip install azureml-interpret’ for general use.

Rulex Explainable AI

Rulex is a company that creates predictive models in the form of first-order conditional logic rules that can be immediately understood and used by everybody.

Rulex’s core machine learning algorithm, the Logic Learning Machine (LLM), works in an entirely different way from conventional AI. The product is designed to produce conditional logic rules that predict the best decision choice so that it’s immediately clear to process professionals. Rulex rules make every prediction fully self-explanatory.

Unlike decision trees and other algorithms that produce rules, Rulex rules are stateless and overlapping.

Model Agnostic Language for Exploration and Explanation (DALEX)

Dalex is a set of tools that examines any given model, simple or complex, and explains the behavior of the model. Dalex creates a level of abstraction around each model that makes it easier to explore and explain. It creates a wrapper on the model using its Explain() method (Python) or Dalex::explain function. As soon as the model is wrapped using the explain function, all functionalities can be are gotten from the function

Dalex can be used with xgboost, TensorFlow, h2o. It can be installed via Python and R.




pip install dalex -U
import dalex as dx
exp = dx.Explainer(model, X, y)

Note: Learn more about Dalex here.

DALEX - explainability tools


For a safe, reliable inclusion of AI, a seamless blend of human and artificial intelligence is needed. Human intervention should also be considered for techniques that allow practitioners to easily evaluate the quality of decision rules in use, and reduce false positives.

Make XAI into a core competency and part of your approach to AI design and QA. It will pay dividends, and lots of them, in the future.

Understanding your models isn’t just a scientific question. It’s not about curiosity. It’s about knowing where your models fall flat, how to fix them, and how to explain them to key project stakeholders so that everyone knows exactly how your model generates value.

About the author 

Radiostud.io Staff

Showcasing and curating a knowledge base of tech use cases from across the web.

TechForCXO Weekly Newsletter
TechForCXO Weekly Newsletter

TechForCXO - Our Newsletter Delivering Technology Use Case Insights Every Two Weeks