XAI Problems Part 2 of 3: Research and Business – They’re Very Different

“The answers are all out there, we just need to ask the right questions.” Oscar Wilde

Explainers are currently a very hot area in data science research. Dozens of papers are published on this topic every month. However much of the research is just that: research. It does not focus on the use of XAI, especially the application of it in business. Explainers in research are arcane, complex mathematical algorithms. Ironically, these explainers are:

not human-readable;
not easy to understand;
not actionable.

They aren’t asking the right questions.

The first of the modern wave of XAI algorithms was LIME. LIME outputs a list of features, and gives each feature a number which represents the importance of that feature. LIME calculates this importance by fitting a linear model, and the importance is the coefficient of the model.

Consider the three criteria mentioned above: human-readable, easy to understand, actionable. Does the LIME output produce any of these? It does not; the only people who know what “coefficient of a linear model” means are data scientists. And in the research sphere, data scientists are the primary consumers of XAI, so it is fine in academia. But in business, the primary consumers of XAI are non-technical business users – product managers, C-level executives, customer service specialists, etc. – not data scientists.

Similarly, are LIME outputs easy to understand? Even once we have read them, and even if we know what “coefficients of a linear model” are, do we know what that means for the prediction? All it can tell us is what the most important features for a prediction are. But features are often not easy to understand! In reality, where a data scientist has engaged in significant (and sometimes automated) feature engineering, the name of the feature may not reflect its actual meaning. In domains with complex models such as deep learning models or gradient boosted trees, individual feature importances do not tell us much. Complex models rely heavily on interactions between separate features, so it is unwise to consider the importance of only one feature at a time.

And finally, are LIME outputs actionable? Again, not really, they fall well short of the desired state: how can you make a decision based on a LIME explanation? In our previous blog post, Feature Importances Are Not Good Enough, we introduced the concept of the counterfactual explanation. This is a minimal set of facts which guarantee an outcome. If your objective is to change that outcome, then you have to know which facts to change. But LIME’s feature importance measures don’t give you any facts, only features. Using LIME, you can’t know which facts need changing!

Does SHAP fare any better? The only advantage of SHAP is that the feature importance metrics are more human readable. Instead of “coefficients of a linear model”, the numbers that SHAP produces can be added together. So if feature A has importance 0.2, and feature B has importance 0.3, then together they have importance 0.5. But does that really help? One might argue that SHAP outputs are more human-readable due to using a simple number which represents how much that feature contributed to the prediction in this case. And indeed, SHAP seems to be the more widely used explainer in business because of these easier to understand outputs. So you could argue that SHAP is an improvement to LIME in terms of human readability.

In terms of being easy to understand, SHAP does provide a sense of the interaction effects, however this tends to be obscure and unhelpful. We often find that, in reality, two features are useless on their own, but together are very strong. Or similarly, two features can be very strong on their own, but if they contain the same information as each other, do not become better when used together. SHAP hides these facts with simple additive feature importances. And we still have the problem that the features themselves are not easy to understand. So SHAP fails here just as LIME does.

How about actionable? SHAP has no advantage over LIME here, SHAP explanations are not at all actionable. They tell us nothing about how we ought to intervene in a case to cause an outcome that we desire. We get no direction on actions to take, or interventions which would be effective.

If we were to use the LIME and SHAP algorithms in Sticky, our AI customer retention product, they would not be useful. The purpose of an explainer in Sticky is to help the customer service specialist know what kind of conversation to have with a customer. Without knowing precisely which characteristics of a customer are leading the model to believe they will refinance, without actionable explanations, this customer service specialist is more or less going in blind.

In Sticky, our proprietary explainers solve all of these problems. Explanations are given in plain English and every-day business context, they deliver precisely which facts about a customer lead us to believe they will churn. They are easily understandable by a non-technical customer service specialist. They provide clear, personalised recommendations on what is the best conversation to have to retain that customer.

Elula’s explainers build upon the publicly available research, including SHAP and LIME, and go beyond them to produce readable, understandable, actionable insights. We turn data science into a practical reality – and enable our customers to activate the benefits of XAI and deliver very real outcomes.

XAI Problems Part 2 of 3: Research and Business – They’re Very Different

Majella

Previous PostXAI Problems Part 1 of 3: Feature Importances Are Not Good Enough

Next PostXAI Problems Part 3 of 3: Technical Problems with Common Explainer Algorithms