XAI Problems Part 1 of 3: Feature Importances Are Not Good Enough

“If you can’t explain it simply, you don’t understand it well enough” Albert Einstein

Explainable AI algorithms are an attempt to solve the important data science problem of explaining why a model made a prediction. There are fundamental problems with the two most commonly used XAI algorithms: LIME and SHAP, and unfortunately many users who deploy these algorithms are unaware of the conceptual and technical limitations they possess. In this 3-part blog series we explore common and widely ignored problems with common XAI algorithms and we’re going to kick off part 1 with the conceptual limitations of feature importances.

At Elula we hold the view that feature importance measures are not good explanations.

Philosophy and social sciences help us understand important notions of a good explanation. The counterfactual is a key concept. Explaining why event X happened is really answering the question “Why did X happen rather than Y?”. The answer to that is the set of facts which differentiate X from Y and guarantee that X happened instead of Y. In this case, Y is the counterfactual and X is the event we are trying to separate from a counterfactual by giving a minimal set of facts that make X happen instead of Y. But a feature importance is not a fact!

We’ve previously written an introduction to how common explainable AI algorithms work where we consider the question: “Why did Rose survive the Titanic?” A bad answer might be: “Gender, age, location”. A good answer might be “She was a young woman with a cabin close to a lifeboat”. The second kind of explanation is precisely what we wanted: a minimal set of facts which together can cause the outcome and rule out the counterfactual of Rose not surviving. If this were the explanation that came out of a prediction model, then we might say that the former kind of explanation is what the model looked at (a list of feature importances), and the latter is what the model saw.

LIME and SHAP, the most common XAI algorithms, produce feature importances. But they do not tell us what the model saw, they do not give us an explanation in terms of a minimal set of facts which guarantee the outcome.

It seems like this issue arises because of a focus on image recognition modelling when testing and evaluating explainer technology. All an image explainer has to do is highlight a section of an image, and a human can easily see why the highlighted section is relevant or not relevant. So an image explainer can get away with telling the human what the model “looked at” rather than what it “saw”.

The same is not true in other data domains. Because much of XAI research (and AI research in general) is focused on the image recognition problem, some experts have not noticed this issue with explainers. However, the majority of machine learning problems in business are not image recognition problems. Most business oriented machine learning models are working on tabular datasets rather than images, so what works in the image recognition domain is not useful in the business context. This is a good example of a common problem where data scientists lose sight of the business problem being solved.

The needs vary depending on the domains, but many organisations have ignored these differences and deployed LIME and SHAP algorithms anyway. In part 2 of this series, we will discuss the different needs of the business and research contexts, and show how LIME and SHAP fail to address them.

In support of this criticism of feature importance explanations, a recent paper out of New York University makes some very strong arguments and uses experimental evidence to argue that feature importances are not truly explanations. These feature importance explanations end up being not as satisfying or as useful as a counterfactual explanation.

How do we fix this?

At Elula, we have recognised and deployed solutions to these problems. Our proprietary explainer algorithms go much further than feature importances, they go beyond LIME and SHAP, and produce a counterfactual explanation. That is, they produce an explanation which is a minimal set of facts about a situation which cause a particular outcome. They deliver a reason why a prediction was made. This kind of explanation is far more intuitive and useful. It delivers us exactly what we want in everyday business – a short, sharp, accurate explanation that enables us to get on with business and deliver the outcomes we want.

In our retention product, Sticky, this means we produce explanations which are a minimal set of facts about a home loan customer which indicate that this customer is likely to churn. Furthermore, our explainer algorithms also produce a recommended conversation for this customer, to assist the front-line staff in retention. This is the nirvana for a bank; being able to have personalised conversations with customers – at scale.

Get in touch with us to find out more about how our XAI technology is deployed in our customer retention solution to deliver personalised and actionable reasons driving customer churn.

XAI Problems Part 1 of 3: Feature Importances Are Not Good Enough

Majella

Previous PostFeature Selection: Greedy Subset Scorer

Next PostXAI Problems Part 2 of 3: Research and Business – They’re Very Different