BACKGROUND
The field of artificial intelligence (AI) has expanded rapidly in recent years. Generally, AI is viewed as a “black box” since understanding how it came to its presented solution is nearly impossible, which causes mistrust among end-users. This presents a problem, especially when AI is supposed to be implemented in high-stakes decision work environments. An example of such a work environment is the health care system. Additionally, to the general mistrust there are also legal regulations in place in the case of the implementation of AI systems within the health care system. The mistrust and legal regulations create a strong barrier for the widespread implementation of AI methods across the health care sector. To improve trust in the artificial intelligence systems and to fulfill the legal requirements, there has been a need for transparent, interpretable, explainable artificial intelligence systems. Though rather than developing new AI models, many researchers are working on post-hoc explainable artificial intelligence (XAI) systems which could at least provide the legally needed amount of transparency. Nevertheless, to ensure their usability, the created systems must be explainable to the end-user.
OBJECTIVE
The goal of this systematic review was to identify the number of evaluations done on the usability, user satisfaction, experience and trust of XAI systems in the health care system. We also aimed to find the most used methods for usability/user experience evaluations.
METHODS
Following the PRISMA 2020 guidelines, we extracted 6.008 references from four databases. After our concluding our screening steps 134 results remained eligible for the systematic review. The publications were categorized into 26 medical, 102 XAI method and 15 evaluation categories.
RESULTS
12 of the 15 evaluation categories were user-based. Only 35 of the 134 papers were sorted into user-based evaluation categories. A large portion of the 35 publications used self-designed questionnaires. Only 3 of the 35 presented a User Centered Design-Process. Our hypothesis that XAI is rarely evaluated, let alone developed, in relation to the needs of the end-user was confirmed.
CONCLUSIONS
We conclude that there is still a strong need for more involvement of the end-user during the development or at least during the evaluation of the created XAI models. Additionally, we recommend the development of a standardized framework to improve the generalizability of XAI methods. If XAI isn’t developed closer to the needs of the end-user, evaluated from the end-user, or at best developed with the users, we expect that the implementation of explainable artificial intelligence in the health care environment will get increasingly hard.