Aniek Markus

Erasmus Universiteit Rotterdam

Share this project

Publication date: 9 april 2025

University: Erasmus Universiteit Rotterdam

ISBN: 978-94-6510-505-5

Opening the Black Box of Explainability

Summary

Artificial intelligence (AI) has the potential to improve patient care and to help address challenges in growing expenditures, but adoption of prediction models in clinical practice is still limited. Lack of transparency is – at least in the current state of AI maturity – often seen as one of the main problems. In recent years, explainable AI (XAI) has gained a lot of attention. This thesis investigated whether explainable modeling (i.e. intrinsically interpretable models) and post-hoc explanation methods (i.e. explanations accompanying the model) can be used to make clinical prediction models more understandable, while also examining the limitations associated with different types of explanations.

In Chapter 2, we reviewed recent literature on explainable AI, addressing the following key questions: What does explainability mean? Why and when can explainability be useful? Which explainable AI methods are available? How can explainability be evaluated? How to choose amongst different explainable AI methods? We argue that the reason behind seeking explainability should guide design choices, as this determines the relative importance of the properties of explainability: interpretability and fidelity. Based on this idea, we proposed a framework to choose between explainable AI approaches (explainable modeling versus post-hoc explanation) and types of explanations (model-based, attribution-based, or example-based explanations). We also found that the benefits of explainability still need to be proven in practice and suggested additional measures to create trustworthy AI. Finally, we recommended to use explainable modeling when explainability is considered very important or the interpretability-performance trade-off is weak. This work helped to formalize the field by providing practical definitions and guidance to researchers and practitioners on the design of explainable AI systems.

We then examined the first approach of explainable AI, known as explainable modeling, in Chapter 3-5. This involves developing prediction models that are small and simple enough for users to understand. In Chapter 3, we developed and validated COVID-19 Estimated Risk (COVER) scores, which leverage influenza data to predict severity of COVID-19 (hospital admission with pneumonia, hospitalization with pneumonia requiring intensive services, and fatality). For this, we used a two-step approach: first we developed a data-driven model and then we performed a manual model reduction step that aggregated features based on clinical expertise. By leveraging standardized data (OMOP CDM) and standardized analytics (OHDSI tools) we were able to rapidly develop and externally validate these models early in the pandemic across 14 databases including patients with influenza or flu-like symptoms and 5 databases including patients with confirmed or suspected COVID-19 diagnosis.

In Chapter 4, we compared the EXPLORE algorithm, an exhaustive search algorithm generating simple decision rules, with 7 state-of-the-art model algorithms across 5 prediction tasks using data from the Dutch Integrated Primary Care Information (IPCI) database. These experiments showed that more complex models like LASSO, RandomForest, and XGBoost generally outperform simpler ones, confirming the expected trade-off between model performance and interpretability. However, the observed trade-off varied across prediction tasks. EXPLORE’s rules, with at most 5 predictors, were able to achieve AUROC scores between 0.61 and 0.71 across prediction tasks. Additionally, we demonstrated EXPLORE’s potential to find more clinically optimal decision rules by incorporating domain knowledge or exploring the space of near-optimal models (the Rashomon set).

Finally, in Chapter 5, we empirically investigated the (in)stability of clinical prediction models developed using LASSO logistic regression. We proposed three intuitive steps to evaluate this, focusing on 1) the number of variables, 2) the specific variables, and 3) the direction of effect of variables across models. Our results revealed large variability in the selected variables as well as in the sign of coefficients across databases. This underscores the need to be careful when using LASSO regression to identify ‘risk factors’ in predictive models, and highlights the risk of model overinterpretation.

In Chapter 6-8, we then explored the second approach of explainable AI, post-hoc explanation methods. These methods aim to develop an explanation that can accompany a prediction model to give insight to users. In Chapter 6 and 7 we investigated challenges with feature importance methods, which are widely used to explain models. In Chapter 6 we examined 3 key challenges that exist in different phases of the explanation process, even for a simple prediction model developed using electronic health records (EHRs). In particular, 1) certain types of explanations might be computationally infeasible, 2) explanation methods may produce differing or conflicting explanations for the same prediction model, and 3) the presented explanations lack nuance and can be misinterpreted if they do not align with user expectations. In Chapter 7 we analyzed the size of the disagreement between feature importance methods in 2 prediction tasks using the IPCI database. Additionally, we introduced a novel evaluation framework to investigate different elements of data complexity (e.g. number of features, number of outcomes, feature correlation) contributing to feature importance disagreement and applied it to 2 open-source datasets. We found that explanation disagreements were larger in real-world datasets (IPCI versus open-source) and for more complex models (neural network versus logistic regression), settings that would benefit most from additional explanations to improve transparency. Our results showed only minor changes in disagreement when modifying elements of data complexity, with the number of features having the largest impact on the level of disagreement.

Finally, in Chapter 8, we evaluated 9 counterfactual explanation methods on their ability to depict real-world relations using 2 newly proposed metrics. We provided 6 (semi-)synthetic datasets generated with known structural causal models (SCMs) to benchmark the semantic meaningfulness of new and existing counterfactual explanation methods. We concluded that none of the existing methods were able to consistently create counterfactuals that were causally consistent.

This thesis has demonstrated that: i) hybrid approaches combining data- and knowledge-based learning can help produce more interpretable models, ii) post-hoc explanation methods currently suffer from several limitations impeding the understandability of their explanations, and iii) explainable AI design choices need to be made on a case-by-case basis as the trade-offs and the explanation needs differ per task. In conclusion, we argue that explainable AI can be instrumental to develop responsible AI (i.e. actionable, trustworthy, fair, and sustainable prediction models that have a positive impact on clinical practice), but its current limitations may hinder true understandability.