AI is arguably one of the hottest topics out there at the moment and the increase in research and successful applications has led to an increased interest in a connected area, Interpretable AI. Industries like healthcare, finance, law and transportation are examples of areas where AI algorithms can bring significant value but due to regulations and critical implications if something goes wrong, the requirements on these algorithms are high. As AI and Deep Learning models are often black-box solutions, the lack of interpretability can make it impossible to implement them in practice.
Among other recent media attention, the topic was picked up by Stockholm AI in August 2019 as they invited industry people and researchers for a seminar to discuss interpretability, its usefulness and its challenges. One side argued that interpretability is overrated. As we don’t even fully understand the mystery of the human mind, why should we put machines to higher standards? The other side argued that interaction with either human or machine is unsatisfactory without trust, and to achieve trust we need understanding.
Interpretable AI is to which extent you’re able to predict the outcome behavior of a model based on the input parameters. This is usually a desired feature of a model, but interpretability is often put against cost and the problem of misalignment. Misalignment in the sense that we want good but at the same time interpretable models. The performance and accuracy of a model is easily quantified and compared in the performance metric. The interpretability however, relying on transparency, safety, nondiscrimination and trust is often hard to fully quantify and thus it is hard to train a model towards these goals.
Let’s take a look at some common interpretability models, they can be divided into global and local models. Global interpretation models try to explain the most influential variables or features on average in a machine learning model:
- Global Surrogate: A global surrogate model is an interpretable model that is trained to approximate the predictions of a black box model. An example is tree-based models where the importance of the features can be extracted.
- Feature Importance: The feature importance of a specific feature in the model can be determined by permuting the feature’s values and studying the increase in prediction error. A high increase in prediction error means that the chosen feature is important.
- Partial Dependence Plot (PDP): A PDP shows the marginal effect a feature has on the predicted outcome of a model. E.g. how do predicted house prices change over the feature size?
Tree-based models are examples of interpretable models. After training the model we can extract the feature importance to see which features have the biggest impact on the outcome. Is age, exercise or number of eaten burgers the most important feature to predict a person’s health?
Local interpretation models try to explain the most important features for an individual prediction of a machine learning model:
- Local Surrogate: A local surrogate model is an interpretable model that is trained to approximate and explain an individual prediction of a black-box machine learning model.
- Individual Conditional Expectation (ICE): An ICE is like a PDP for every specific instance in the dataset. It shows one line per instance and how the prediction of the instance changes as the feature changes. E.g. how does the predicted house price for a specific house change over the feature size?
- Shapley Values: Shapley values is a method from cooperative game theory with the aim to fairly distribute the payout among players in a game. In a machine learning model, the features can be seen as the players in a game where the individual prediction is the payout.
There is a range of open source tools on the market implementing these models. Here are some examples of common interpretability tools for Python:
- LIME (Local Interpretable Model-Agnostic Explanations): LIME is a local surrogate model and works with tabular, image and text data.
- WIT (What-if tool): WIT is a visualization tool released by Google with minimal coding required. It works with tabular data.
- SHAP (SHapley Additive exPlanations): SHAP is one of the most popular tools that implements shapley values. It works with tabular and image data.
- tf-explain: tf-explain offers interpretability methods for Tensorflow 2.0 to help the understanding of neural networks. It works with image data.
Visualization with tf-explain for TensorFlow 2.0. To the left a gradient-weighted class activation mapping of how parts of an image affects a neural network’s output by looking into the activation maps. To the right a visualization of stabilized gradients on the inputs towards the decision. Source tf-explain (https://github.com/sicara/tf-explain) under the MIT license.
Interpretability at HiQ
For us at HiQ it is important to treat every customer case differently as interpretability usually comes with a cost either in extra time spent on development or in loss of model performance. In some industries it is however necessary to be able to explain why the model does what it does. One example is when we got the opportunity to help Sveriges Kommuner och Landsting with an AI project for their open data platform Vården i Siffror. The task was to build an anomaly detection tool for extracting interesting insights from a database of Swedish medical quality measurements. Without being able to understand where these insights came from the results would have been useless. Imagine being presented with a list of anomalies among Swedish hospitals but no explanation to why they were on the list. To make the results interpretable we used a global surrogate method where the anomaly detection model was interpreted with a random forest classifier which has a built-in feature importance estimation.
At HiQ we identify opportunities and challenges where AI can contribute to improving our customers business. Are you interested to know more or do you have a question on this topic? Please don't hesitate to contact us!