Marjon Blondeel, AI R&D developer, looks at a robot equipped with artificial Intelligence at the AI Xperience Center at the Vrije Universiteit Brussel in Brussels, Belgium, on February 19, 2020…
Much of artificial intelligence, and particularly deep learning, is plagued by the “black box problem.” While we may know the inputs and outputs of a model, in many cases we do not know what happens in between. AI developers make choices about how to design the model and the learning environment, but they typically do not determine the value of specific parameters and how an answer is reached. The lack of understanding about how an AI system works, in some cases even by the people who have developed it, is one of the reasons AI poses novel safety, ethical, and legal considerations, and why oversight and governance are especially important. Black box deep learning models are vulnerable to adversarial attacks and prone to racial, gender, and other demographic biases. Opacity is especially problematic in high-stakes settings such as health care, lending, and criminal justice, where significant harms have already been reported.
Explainable AI (XAI) is often offered as the answer to the black box problem and is broadly defined as “machine learning techniques that make it possible for human users to understand, appropriately trust, and effectively manage AI.” Around the world, explainability has been referenced as a guiding principle for AI development, including in Europe’s General Data Protection Regulation. Explainable AI has also been a major research focus of the Defence Advanced Research Projects Agency (DARPA) since 2016. However, after years of research and application, the XAI field has generally struggled to realize the goals of understandable, trustworthy, and controllable AI in practice.
This gap stems largely from divergent conceptions of what explainability is expected to achieve and unequal prioritization of various stakeholder objectives. Studies of XAI in practice reveal that engineering priorities are generally placed ahead of other considerations, with explainability largely failing to meet the needs of users, external stakeholders, and impacted communities. By improving clarity about the diversity of XAI objectives, AI organizations and standards bodies can make explicit choices about what they are optimizing and why. AI developers can be held accountable for providing meaningful explanations and mitigating risks—to the organization, to users, and to society at large.
The end goal of explainability depends on the stakeholder and the domain. Explainability enables interactions between people and AI systems by providing information about how decisions and events come about, but developers, domain experts, users, and regulators all have different needs from the explanations of AI models. These differences are not only related to degrees of technical expertise and understanding, but also include domain-specific norms and decision-making mechanisms. Achieving explainability goals in one domain will often not satisfy the goals of another.
Consider, for example, the different needs of developers and users in making an AI system explainable. A developer might use Google’s What-If Tool to review complex dashboards that provide visualizations of a model’s performance in different hypothetical situations, analyze the importance of different data features, and test different conceptions of fairness. Users, on the other hand, may prefer something more targeted. In a credit scoring system, it might be as simple as informing a user which factors, such as a late payment, led to a deduction of points. Different users and scenarios will call for different outputs.
For now, users and other external stakeholders are typically afforded little if any insight into the behind-the-scenes workings of the AI systems that impact their lives and opportunities. This asymmetry of knowledge about how an AI system works, and the power to do anything about it, is one of the key dilemmas at the heart of explainability. Accessible and meaningful explanations can help reduce this asymmetry, but explanations are often incomplete and can be used (intentionally or not) to increase the power differentials between those creating AI systems and those impacted by them.
To understand the ways practitioners in different domains have different expectations for what they hope to achieve by building explainable AI systems, it is helpful to explicitly compare their goals. Below, I consider how three different domains—engineering, deployment, and governance—articulate the goals of explainable AI.
Engineering. In 2018, the Institute of Electrical and Electronics Engineers (IEEE) published a survey on explainable AI that illustrates how the technical and engineering domain conceptualizes the goals of XAI:
Deployment. As AI applications are rolled out, the technology will increasingly interact with human beings, and the deployment domain seeks to understand how explainability impacts the human relationship with an AI system, including in military and other high-stakes contexts. An overview of DARPA’s XAI program provides an example of the deployment domain’s goals for XAI:
Governance. A policy briefing on explainable AI by the Royal Society provides an example of the goals the policy and governance domain imagines XAI will achieve:
These differences are highlighted in simplified form in the table below.
Engineering | Deployment | Governance |
Ensure efficacyImprove controlImprove performance Discover information | Explain its rationaleCharacterize strengths and weaknessesInform future expectations Promote human-machine cooperation | Promote trustProtect against biasFollow regulations and policies Enable human agency |
All three domains agree about the importance of explainability providing assurance about the effectiveness and appropriateness of a system at achieving its intended task, but the domains also differ in key ways. The engineering domain highlights the importance of control, which is either assumed in the other domains or not prioritized. And while the governance domain stresses the value of human agency, this is not a necessary outcome of goals in other domains. The engineering domain treats AI systems as constantly in flux and capable of regular improvement, while the other domains apparently expect greater consistency to enable informed expectations and adherance with policies. All three domains imagine different feedback loops. In the engineering domain, it is engineers’ input that is incorporated; in the deployment domain, it is users’ input that is incorporated; only in the governance domain is the impact on broader communities and the technology’s relation to the broader world taken into consideration.
The reality of organizations’ use of explainability methods diverge sharply from the aspirations outlined above, according to a 2020 study of explainable AI deployments. In this study of 20 organizations using explainable AI, the majority of deployments were used internally to support engineering efforts, rather than reinforcing transparency or trust with users or other external stakeholders. The study included interviews with roughly 30 people from both for-profit and non-profit groups employing elements of XAI in their operations. Study participants were asked about the types of explanations they have used, how they decided when and where to use them, and the audience and context of their explanations.
The results revealed that local explainability techniques that aim to understand a model’s behavior for one specific input, such as feature importance, were the most commonly used. The primary use of the explanations were to serve as “sanity checks” for the organization’s engineers and research scientists and to identify spurious correlations. Participants looking for a more holistic understanding were interested in deploying global explainability techniques, which aim to understand the high-level concepts and reasoning used by a model, but these were described as much harder to implement. Study participants said it was difficult to provide explanations to end-users because of privacy risks and the challenges of providing real-time information of sufficiently high quality. But most importantly, organizations struggled to implement explainability because they lacked clarity about its objectives.
This study highlights the current primacy of engineering goals for explainability and how the needs of users and other stakehodlers are more difficult to meet. It shows that engineers often use explainability techniques to identify where their models are going wrong and that they may not have sufficient incentives to share this information, which is percieved as sensitive and complex, more broadly. While users and regulators want to see the vulnerabilities of AI systems, they may also want to see plans to fix uncovered problems or mitigate any negative impacts. The findings of this study are consistent with other examples of XAI in practice. For example, one machine learning engineer’s account of explainability case studies documents her experiences of how they were used (successfully) for internal debugging and sanity checks, but not for user engagement.
Another 2020 study documents insights derived from interviews with 20 UX and design practitioners at IBM working on explainability for AI models and futher explains the challenges practitioners face in meeting users’ needs. The study identifies a range of motivations for explainability that emerged from the participants’ focus on user needs, including to gain further insights or evidence about the AI system, to appropriately evaluate its capability, to adapt usage or interaction behaviors to better utilize the system, to improve performance, and to satisfy ethical responsibilities. The study participants said that realizing these motivations was difficult due to the inadequacy of current XAI techniques, which largely failed to live up to user expectations. Participants also described the challenge of needing to balance multiple organizational goals that can be at odds with explainability, including protecting proprietary data and providing users with seamless integration.
These studies highlight that while there are numerous different explainability methods currently in operation, they primarily map onto a small subset of the objectives outlined above. Two of the engineering objectives—ensuring efficacy and improving performance—appear to be the best represented. Other objectives, including supporting user understanding and insight about broader societal impacts, are currently neglected.
The five recommendations below are intended primarily for organizations developing XAI standards and practices. They offer an initial roadmap, highlighting relevant research and priorities that can help address the limitations and risks of explainability.
Explainability is seen as a central pillar of trustworthy AI because, in an ideal world, it provides understanding about how a model behaves and where its use is appropriate. The prevalence of bias and vulnerabilities in AI models means that trust is unwarranted without sufficient understanding of how a system works. Currently, there is a significant discrepancy between the vision of explainability as a principle that reaches across domains and works for diverse stakeholders, and how it is being incorporated in practice. Bridging that gap requires greater transparency about the goals being optimized, and further work to ensure those goals align with the needs of users and the benefit of society at large.
Without clear articulation of the objectives of explainability from different communities, AI is more likely to serve the interests of the powerful. AI companies should clarify how they are using XAI techniques, to what end, and why, and make full explanations as transparent as possible. The entities currently developing XAI standards and regulations, including the National Institute of Standards and Technology, should take note of current limitations of XAI in practice and seek out diverse expertise about how to better align incentives and governance with a full picture of XAI objectives. It is only with the active involvement of many stakeholders, from the social sciences, computer science, civil society, and industry, that we may realize the goals of understandable, trustworthy, and controllable AI in practice.
Jessica Newman is a research fellow at the UC Berkeley Center for Long-Term Cybersecurity
IBM provides financial support to the Brookings Institution, a nonprofit organization devoted to rigorous, independent, in-depth public policy research.