Interpretability in Machine learning from a Responsible AI perspective (1): the context — Maxime Hardy's blog

This post opens a sequence on Interpretable machine learning addressing its context, nomenclature and definitions. The continuation follows: 2. State of the art and 3. Limitations and best practices.

Contents

Definitions, examples
Machine Learning Interpretability: the Motivations
The Taxonomy
Integrating Interpretability into a Workflow

Understanding and trusting models and their results is a hallmark of good science. Analysts, engineers, physicians, researchers, scientists, and humans in general have the need to understand and trust mod‐ els and modeling results that affect our work and our lives. For dec‐ ades, choosing a model that was transparent to human practitioners or consumers often meant choosing straightforward data sources and simpler model forms such as linear models, single decision trees, or business rule systems. Although these simpler approaches were often the correct choice, and still are today, they can fail in real-world scenarios when the underlying modeled phenomena are nonlinear, rare or faint, or highly specific to certain individuals. Today, the trade-off between the accuracy and interpretability of predictive models has been broken (and maybe it never really exis‐ ted1). The tools now exist to build accurate and sophisticated model‐ ing systems based on heterogeneous data and machine learning algorithms and to enable human understanding and trust in these complex systems. In short, you can now have your accuracy and interpretability cake…and eat it too.

To help practitioners make the most of recent and disruptive break‐ throughs in debugging, explainability, fairness, and interpretability techniques for machine learning, this report defines key terms, introduces the human and commercial motivations for the techni‐

1 Cynthia Rudin, “Please Stop Explaining Black Box Models for High-Stakes Decisions,”
arXiv:1811.10154, 2018, https://arxiv.org/pdf/1811.10154.pdf.

ques, and discusses predictive modeling and machine learning from an applied perspective, focusing on the common challenges of busi‐ ness adoption, internal model documentation, governance, valida‐ tion requirements, and external regulatory mandates. We’ll also discuss an applied taxonomy for debugging, explainability, fairness, and interpretability techniques and outline the broad set of available software tools for using these methods. Some general limitations and testing approaches for the outlined techniques are addressed, and finally, a set of open source code examples is presented.

Definitions and Examples

To facilitate detailed discussion and to avoid ambiguity, we present here definitions and examples for the following terms: interpretable, explanation, explainable machine learning or artificial intel igence, interpretable or white-box models, model debugging, and fairness.

Interpretable and explanation

In the context of machine learning, we can define interpretable as “the ability to explain or to present in understandable terms to a human,” from “Towards a Rigorous Science of Interpretable Machine Learning” by Doshi-V[elez and Kim.2] An-Introduction-to-Machine-Learning-Interpretability-Second-Editions.html#8) (In the recent
past, and according to the Doshi-Velez and Kim definition, interpretable was often used as a broader umbrella term. That is how we use the term in this report. Today, more leading researchers use interpretable to refer to directly transparent modeling mechanisms as discussed below.) For our working definition of a good explanation we can use “when you can no longer keep asking why,” from “Explaining Explanations: An Approach to Evaluating Interpretability of Machine Learning” by Gilpin et al.3 These two thoughtful characterizations of inter‐
pretable and explanation link explanation to some machine learning process being interpretable and also provide a feasible, abstract objective for any machine learning explanation task.

2 Finale Doshi-Velez and Been Kim, “Towards a Rigorous Science of Interpretable
Machine Learning,” arXiv:1702.08608, 2017, https://arxiv.org/pdf/1702.08608.pdf.
3 Leilani H. Gilpin et al., “Explaining Explanations: An Approach to Evaluating Inter‐
pretability of Machine Learning,” arXiv:1806.00069, 2018, https://arxiv.org/pdf/
1806.00069.pdf.

Explainable machine learning

Getting even more specific, explainable machine learning, or explainable artificial intelligence (XAI), typically refers to post hoc analysis and techniques used to understand a previously trained model or its predictions. Examples of common techni‐ ques include:

Reason code generating techniques. In particular, local interpretable model-agnostic explanations (LIME) and Shapley values.[4,5] (An-Introduction-to-Machine-Learning-Interpretability-Second-Editions.html#9)
Local and global visualizations of model predictions Accumulated local effect (ALE) plots, one- and two- dimensional partial dependence plots, individual condi‐ tional expectation (ICE) plots, and decision tree surrogate models.6,7,8,9

XAI is also associated with a group of DARPA researchers that seem primarily interested in increasing explainability in sophis‐ ticated pattern recognition models needed for military and security applications.

Interpretable or white-box models

Over the past few years, more researchers have been designing new machine learning algorithms that are nonlinear and highly accurate, but also directly interpretable, and interpretable as a term has become more associated with these new models.

4 Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin, “‘Why Should I Trust You?’:
Explaining the Predictions of Any Classifier,” in Proceedings of the 22nd ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, ACM (2016): 1135–
1144. https://oreil.ly/2OQyGXx.
5 Scott M. Lundberg and Su-In Lee, “A Unified Approach to Interpreting Model Predic‐
tions,” in I. Guyon et al., eds., Advances in Neural Information Processing Systems 30
(Red Hook, NY: Curran Associates, Inc., 2017): 4765–4774. https://oreil.ly/2OWsZYf.
6 Daniel W. Apley, “Visualizing the Effects of Predictor Variables in Black Box Supervised
Learning Models,” arXiv:1612.08468, 2016, https://arxiv.org/pdf/1612.08468.pdf.
7 Trevor Hastie, Robert Tibshirani, and Jerome Friedman, The Elements of Statistical
Learning, Second Edition (New York: Springer, 2009). https://oreil.ly/31FBpoe.
8 Alex Goldstein et al., “Peeking Inside the Black Box: Visualizing Statistical Learning
with Plots of Individual Conditional Expectation,” Journal of Computational and Graph‐
ical Statistics 24, no. 1 (2015), https://arxiv.org/pdf/1309.6392.pdf.
9 Osbert Bastani, Carolyn Kim, and Hamsa Bastani, “Interpreting Blackbox Models via
Model Extraction,” arXiv:1705.08504, 2017, https://arxiv.org/pdf/1705.08504.pdf.

Examples of these newer Bayesian or constrained variants of traditional black-box machine learning models include explain‐ able neural networks (XNNs),[10] An-Introduction-to-Machine-Learning-Interpretability-Second-Editions.html#10) explainable boosting machines
(EBMs), monotonically constrained gradient boosting machines, scalable Bayesian rule lists,[11] An-Introduction-to-Machine-Learning-Interpretability-Second-Editions.html#10) and super-sparse linear
integer models (SLIMs).[12,13] An-Introduction-to-Machine-Learning-Interpretability-Second-Editions.html#10) In this report, interpretable or
white-box models will also include traditional linear models, decision trees, and business rule systems. Because interpretable is now often associated with a model itself, traditional black-box machine learning models, such as multilayer perceptron (MLP) neural networks and gradient boosting machines (GBMs), are said to be uninterpretable in this report. As explanation is cur‐ rently most associated with post hoc processes, unconstrained, black-box machine learning models are usually also said to be at least partially explainable by applying explanation techniques after model training. Although difficult to quantify, credible research efforts into scientific measures of model interpretabil‐ ity are also underway.[14] An-Introduction-to-Machine-Learning-Interpretability-Second-Editions.html#10) The ability to measure degrees implies
interpretability is not a binary, on-off quantity. So, there are shades of interpretability between the most transparent white- box model and the most opaque black-box model. Use more interpretable models for high-stakes applications or applications that affect humans.

Model debugging

Refers to testing machine learning models to increase trust in model mechanisms and predictions.15 Examples of model debugging techniques include variants of sensitivity (i.e., “What

10 Joel Vaughan et al., “Explainable Neural Networks Based on Additive Index Models,”
arXiv:1806.01933, 2018, https://arxiv.org/pdf/1806.01933.pdf.
11 Hongyu Yang, Cynthia Rudin, and Margo Seltzer, “Scalable Bayesian Rule Lists,” in Pro‐
ceedings of the 34th International Conference on Machine Learning (ICML), 2017, https://
arxiv.org/pdf/1602.08610.pdf.
12 Berk Ustun and Cynthia Rudin, “Supersparse Linear Integer Models for Optimized
Medical Scoring Systems,” Machine Learning 102, no. 3 (2016): 349–391, https://oreil.ly/
31CyzjV.
13 Microsoft Interpret GitHub Repository: https://oreil.ly/2z275YJ.
14 Christoph Molnar, Giuseppe Casalicchio, and Bernd Bischl, “Quantifying Interpretabil‐
ity of Arbitrary Machine Learning Models Through Functional Decomposition,” arXiv:
1904.03867, 2019, https://arxiv.org/pdf/1904.03867.pdf.
15 Debugging Machine Learning Models: https://debug-ml-iclr2019.github.io.

if?”) analysis, residual analysis, prediction assertions, and unit tests to verify the accuracy or security of machine learning models. Model debugging should also include remediating any discovered errors or vulnerabilities.

Fairness

Fairness is an extremely complex subject and this report will focus mostly on the more straightforward concept of disparate impact (i.e., when a model’s predictions are observed to be dif‐ ferent across demographic groups, beyond some reasonable threshold, often 20%). Here, fairness techniques refer to dispa‐ rate impact analysis, model selection by minimization of dispa‐ rate impact, remediation techniques such as disparate impact removal preprocessing, equalized odds postprocessing, or sev‐ eral additional techniques discussed in this report.16,17 The group Fairness, Accountability, and Transparency in Machine Learning (FATML) is often associated with fairness techniques and research for machine learning, computer science, law, vari‐ ous social sciences, and government. Their site hosts useful resources for practitioners such as full lists of relevant scholar‐ ship and best practices.

Machine Learning Interpretability: the Motivations

The now-contemplated field of data science amounts to a superset of the fields of statistics and machine learning, which adds some technology for “scaling up” to “big data.” This chosen superset is motivated by commer‐ cial rather than intel ectual developments. Choosing in this way is likely to miss out on the real y important intel ectual event of the next 50 years. —David Donoho18

16 Michael Feldman et al., “Certifying and Removing Disparate Impact,” in Proceedings of
the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Min‐
ing, ACM (2015): 259–268. https://arxiv.org/pdf/1412.3756.pdf.
17 Moritz Hardt et al., “Equality of Opportunity in Supervised Learning,” in Advances in
Neural Information Processing Systems (2016): 3315–3323. https://oreil.ly/2KyRdnd.
18 David Donoho, “50 Years of Data Science,” Tukey Centennial Workshop, 2015, http://
bit.ly/2GQOh1J.

Among many other applications, machine learning is used today to make life-altering decisions about employment, bail, parole, and lending. Furthermore, usage of AI and machine learning models is likely to become more commonplace as larger swaths of the econ‐ omy embrace automation and data-driven decision making. Because artificial intelligence, and its to-date most viable subdiscipline of machine learning, has such broad and disruptive applications, let’s heed the warning from Professor Donoho and focus first on the intellectual and social motivations for more interpretability in machine learning.

Intellectual and social motivations boil down to trust and under‐ standing of an exciting, revolutionary, but also potentially danger‐ ous technology. Trust and understanding are overlapping, but also different, concepts and goals. Many of the techniques discussed in this report are helpful for both, but better suited to one or the other. Trust is mostly related to the accuracy, fairness, and security of machine learning systems as implemented through model debug‐ ging and disparate impact analysis and remediation techniques. Understanding is mostly related to the transparency of machine learning systems, such as directly interpretable models and explana‐ tions for each decision a system generates.

Human trust of machine learning models. As consumers of machine learning, we need to know that any auto‐ mated system generating a decision that effects us is secure and accurate and exhibits minimal disparate impact. An illustrative example of problems and solutions for trust in machine learning is the Gender Shades project and related follow-up work. As part of the Gender Shades project, an accuracy and disparate impact prob‐ lem was discovered and then debugged in several commercial facial recognition systems. These facial recognition systems exhibited highly disparate levels of accuracy across men and women and across skin tones. Not only were these cutting-edge models wrong in many cases, they were consistently wrong more often for women and people with darker skin tones. Once Gender Shades researchers pointed out these problems, the organizations they targeted took remediation steps including creating more diverse training datasets and devising ethical standards for machine learning projects. In most cases, the result was more accurate models with less disparate impact, leading to much more trustworthy machine learning sys‐ tems. Unfortunately, at least one well-known facial recognition sys‐ tem disputed the concerns highlighted by Gender Shades, likely damaging their trustworthiness with machine learning consumers.

Hacking and adversarial attacks on machine learning systems are another wide-ranging and serious trust problem. In 2017, research‐ ers discovered that slight changes, such as applying stickers, can pre‐ vent machine learning systems from recognizing street signs.19 These physical adversarial attacks, which require almost no software engineering expertise, can obviously have severe societal conse‐ quences. For a hacker with more technical expertise, many more types of attacks against machine learning are possible.20 Models and even training data can be manipulated or stolen through public APIs or other model endpoints. So, another key to establishing trust in machine learning is ensuring systems are secure and behaving as expected in real time. Without interpretable models, debugging, explanation, and fairness techniques, it can be very difficult to deter‐ mine whether a machine learning system’s training data has been compromised, whether its outputs have been altered, or whether the system’s inputs can be changed to create unwanted or unpredictable decisions. Security is as important for trust as accuracy or fairness, and the three are inextricably related. All the testing you can do to prove a model is accurate and fair doesn’t really matter if the data or model can be altered later without your knowledge.

Human understanding of machine learning models. Consumers of machine learning also need to know exactly how any automated decision that affects us is made. There are two intellec‐ tual drivers of this need: one, to facilitate human learning from machine learning, and two, to appeal wrong machine learning deci‐ sions. Exact explanation of machine-learned decisions is one of the most fundamental applications of machine learning interpretability technologies. Explanation enables humans to learn how machine

19 Kevin Eykholt et al., “Robust Physical-World Attacks on Deep Learning Visual Classifi‐
cation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni‐
tion (2018): 1625-1634. https://oreil.ly/2yX8W11.
20 Patrick Hall, “Proposals for Model Vulnerability and Security,” O’Reilly.com (Ideas),
March 20, 2019. https://oreil.ly/308qKm0.

learning systems make decisions, which can satisfy basic curiosity or lead to new types of data-driven insights. Perhaps more importantly, explanation provides a basis for the appeal of automated decisions made by machine learning models. Consider being negatively impacted by an erroneous black-box model decision, say for instance being wrongly denied a loan or parole. How would you argue your case for appeal without knowing how model decisions were made? According to the New York Times, a man named Glenn Rodríguez found himself in this unfortunate position in a penitenti‐ ary in upstate New York in 2016.21 Without information about exactly why a proprietary black-box model was mistakenly recom‐ mending he remain in prison, he was unable to build a direct case to appeal that decision. Like the problems exposed by the Gender Shades study, the inability to appeal automated decisions is not some far-off danger on the horizon, it’s a present danger. Fortu‐ nately, the technologies exist today to explain even very complex model decisions, and once understanding and trust can be assured, broader possibilities for the use of machine learning come into view.

Guaranteeing the promise of machine learning. One of the greatest hopes for data science and machine learning is simply increased convenience, automation, and organization in our day-to-day lives. Even today, we are beginning to see fully automa‐ ted baggage scanners at airports and our phones are constantly rec‐ ommending new music (that we might actually like). As these types of automation and conveniences grow more common, consumers will likely want to understand them more deeply and machine learn‐ ing engineers will need more and better tools to debug these ever- more present decision-making systems. Machine learning also promises quick, accurate, and unbiased decision making in life- changing scenarios. Computers can theoretically use machine learn‐ ing to make objective, data-driven decisions in critical situations like criminal convictions, medical diagnoses, and college admissions, but interpretability, among other technological advances, is needed to guarantee the promises of correctness and objectivity. Without extremely high levels of trust and understanding in machine learn‐ ing decisions, there is no certainty that a machine learning system is

21 Rebecca Wexler, “When a Computer Program Keeps You in Jail,” New York Times, June
13, 2017, https://oreil.ly/2TyHIr5.

not simply relearning and reapplying long-held, regrettable, and erroneous human biases. Nor are there any assurances that human operators, or hackers, have not forced a machine learning system to make intentionally prejudicial or harmful decisions.

Commercial Motivations

Companies and organizations use machine learning and predictive models for a very wide variety of revenue- or value-generating applications. Just a few examples include facial recognition, lending decisions, hospital release decisions, parole release decisions, or gen‐ erating customized recommendations for new products or services. Many principles of applied machine learning are shared across industries, but the practice of machine learning at banks, insurance companies, healthcare providers, and in other regulated industries is often quite different from machine learning as conceptualized in popular blogs, the news and technology media, and academia. It’s also somewhat different from the practice of machine learning in the technologically advanced and less regulated digital, ecommerce, FinTech, and internet verticals.

In commercial practice, concerns regarding machine learning algo‐ rithms are often overshadowed by talent acquisition, data engineer‐ ing, data security, hardened deployment of machine learning apps and systems, managing and monitoring an ever-increasing number of predictive models, modeling process documentation, and regula‐ tory compliance.22 Successful entities in both traditional enterprise and in modern digital, ecommerce, FinTech, and internet verticals have learned to balance these competing business interests. Many digital, ecommerce, FinTech, and internet companies, operating out‐ side of most regulatory oversight, and often with direct access to web-scale, and sometimes unethically sourced, data stores, have often made web data and machine learning products central to their business. Larger, more established companies tend to practice statis‐ tics, analytics, and data mining at the margins of their business to optimize revenue or allocation of other valuable assets. For all these reasons, commercial motivations for interpretability vary across industry verticals, but center around improved margins for previ‐

22 Patrick Hall, Wen Phan, and Katie Whitson, The Evolution of Analytics (Sebastopol, CA:
O’Reilly Media, 2016). https://oreil.ly/2Z3eBxk.

ously existing analytics projects, business partner and customer adoption of new machine learning products or services, regulatory compliance, and lessened model and reputational risk.

Enhancing established analytical processes. For traditional and often more-regulated commercial applications, machine learning can enhance established analytical practices, typi‐ cally by increasing prediction accuracy over conventional but highly interpretable linear models. Machine learning can also enable the incorporation of unstructured data into analytical pursuits, again leading to more accurate model outcomes in many cases. Because linear models have long been the preferred tools for predictive mod‐ eling, many practitioners and decision-makers are simply suspicious of machine learning. If nonlinear models—generated by training machine learning algorithms—make more accurate predictions on previously unseen data, this typically translates into improved finan‐ cial margins…but only if the model is accepted by internal validation teams, business partners, and customers. Interpretable machine learning models and debugging, explanation, and fairness techni‐ ques can increase understanding and trust in newer or more robust machine learning approaches, allowing more sophisticated and potentially more accurate models to be used in place of previously existing linear models.

Regulatory compliance. Interpretable, fair, and transparent models are simply a legal man‐ date in certain parts of the banking, insurance, and healthcare industries.23 Because of increased regulatory scrutiny, these more traditional companies typically must use techniques, algorithms, and models that are simple and transparent enough to allow for detailed documentation of internal system mechanisms and in- depth analysis by government regulators. Some major regulatory statutes currently governing these industries include the Civil Rights Acts of 1964 and 1991, the Americans with Disabilities Act, the Genetic Information Nondiscrimination Act, the Health Insurance Portability and Accountability Act, the Equal Credit Opportunity Act (ECOA), the Fair Credit Reporting Act (FCRA), the Fair Hous‐ ing Act, Federal Reserve SR 11-7, and European Union (EU) Greater

23 Fast Forward Labs—Interpretability. https://oreil.ly/301LuM4.

Data Privacy Regula[tion (GDPR) Article 22.24 These regula] (An-Introduction-to-Machine-Learning-Interpretability-Second-Editions.html#17)tory
regimes, key drivers of what constitutes interpretability in applied machine learning, change over time or with political winds.

Tools like those discussed in this report are already used to docu‐ ment, understand, and validate different types of models in the financial services industry (and probably others). Many organiza‐ tions are now also experimenting with machine learning and the reason codes or adverse actions notices that are mandated under ECOA and FCRA for credit lending, employment, and insurance decisions in the United States. If newer machine learning approaches are used for such decisions, those decisions must be explained in terms of adverse action notices. Equifax’s NeuroDeci‐ sion is a great example of constraining a machine learning technique (an MLP) to be interpretable, using it to make measurably more accurate predictions than a linear model, and doing so in a regulated space. To make automated credit-lending decisions, NeuroDecision uses modified MLPs, which are somewhat more accurate than con‐ ventional regression models and also produce the regulator- mandated adverse action notices that explain the logic behind a credit-lending decision. NeuroDecision’s increased accuracy could lead to credit lending in a broader portion of the market than previ‐ ously possible, such as new-to-credit consumers, increasing the margins associated with the preexisting linear model techniques.25,26 Shapley values, and similar local variable importance approaches we will discuss later, also provide a convenient methodology to rank the contribution of input variables to machine learning model decisions and potentially generate customer-specific adverse action notices.

Adoption and acceptance. For digital, ecommerce, FinTech, and internet companies today, interpretability is often an important but secondary concern. Less- traditional and typically less-regulated companies currently face a greatly reduced burden when it comes to creating transparent and

24 Andrew Burt, “How Will the GDPR Impact Machine Learning?” O’Reilly.com (Ideas),
May 16, 2018, https://oreil.ly/304nxDI.
25 Hall et al., The Evolution of Analytics.
26 Bob Crutchfield, “Approve More Business Customers,” Equifax Insights Blog, March
16, 2017, https://oreil.ly/2NcWnar.

trustworthy machine learning products or services. Even though transparency into complex data and machine learning products might be necessary for internal debugging, validation, or business adoption purposes, many newer firms are not compelled by regula‐ tion to prove their models are accurate, transparent, or nondiscrimi‐ natory. However, as the apps and systems that such companies create (often based on machine learning) continue to change from occasional conveniences or novelties into day-to-day necessities, consumer and government demand for accuracy, fairness, and transparency in these products will likely increase.

Reducing risk. No matter what space you are operating in as a business, hacking of prediction APIs or other model endpoints and discriminatory model decisions can be costly, both to your reputation and to your bottom line. Interpretable models, model debugging, explanation, and fairness tools can mitigate both of these risks. While direct hacks of machine learning models still appear rare, there are numer‐ ous documented hacking methods in the machine learning security literature, and several simpler insider attacks that can change your model outcomes to benefit a malicious actor or deny service to legit‐ imate customers.27,28,29 You can use explanation and debugging tools in white-hat hacking exercises to assess your vulnerability to adver‐ sarial example, membership inference, and model stealing attacks. You can use fair (e.g., learning fair representations, LFR) or private (e.g., private aggregation of teaching ensembles, PATE) models as an active measure to prevent many attacks.30,31 Also, real-time disparate impact monitoring can alert you to data poisoning attempts to

27 Marco Barreno et al., “The Security of Machine Learning,” Machine Learning 81, no. 2
(2010): 121–148. https://oreil.ly/31JwoLL.
28 Reza Shokri et al., “Membership Inference Attacks Against Machine Learning Models,”
IEEE Symposium on Security and Privacy (SP), 2017, https://oreil.ly/2Z22LHI.
29 Nicholas Papernot, “A Marauder’s Map of Security and Privacy in Machine Learning:
An Overview of Current and Future Research Directions for Making Machine Learn‐
ing Secure and Private,” in Proceedings of the 11th ACM Workshop on Artificial Intel i‐
gence and Security, ACM, 2018, https://arxiv.org/pdf/1811.01134.pdf.
30 Rich Zemel et al., “Learning Fair Representations,” in International Conference on
Machine Learning (2013): 325-333. https://oreil.ly/305wjBE.
31 Nicolas Papernot et al., “Scalable Private Learning with PATE,” arXiv:1802.08908, 2018,
https://arxiv.org/pdf/1802.08908.pdf.

change your model behavior to benefit or harm certain groups of people. Moreover, basic checks for disparate impact should always be conducted if your model will affect humans. Even if your com‐ pany can’t be sued for noncompliance under FCRA or ECOA, it can be called out in the media for deploying a discriminatory machine learning model or violating customer privacy. As public awareness of security vulnerabilities and algorithmic discrimination grows, don’t be surprised if a reputational hit in the media results in cus‐ tomers taking business elsewhere, causing real financial losses.

Machine Learning Interpretability: The Taxonomy

A heuristic, practical, and previously defined taxonomy is presented in this section.32 This taxonomy will be used to characterize the interpretability of various popular machine learning and statistics techniques used in commercial data mining, analytics, data science, and machine learning applications. This taxonomy describes approaches in terms of:

Their ability to promote understanding and trust
Their complexity
The global or local scope of information they generate
The families of algorithms to which they can be applied
Technical challenges as well as the needs and perspectives of differ‐ ent user communities make characterizing machine learning inter‐ pretability techniques a subjective and complicated task. Many other authors have grappled with organizing and categorizing a variety of general concepts related to interpretability and explanations. Some of these efforts include: “A Survey of Methods for Explaining Black Box Models” by Riccardo Guidotti et al.,33 “The Mythos of Model
Interpretability” by Zachary Lipton,34 “Interpretable Machine Learn‐

32 Patrick Hall, Wen Phan, and SriSatish Ambati, “Ideas on Interpreting Machine Learn‐
ing,” O’Reilly.com (Ideas), March 15, 2017, https://oreil.ly/2H4aIC8.
33 Riccardo Guidotti et al., “A Survey of Methods for Explaining Black Box Models,” ACM
Computing Surveys (CSUR) 51, no. 5 (2018): 93. https://arxiv.org/pdf/1802.01933.pdf.
34 Zachary C. Lipton, “The Mythos of Model Interpretability,” arXiv:1606.03490, 2016,
https://arxiv.org/pdf/1606.03490.pdf.

ing” by Christoph Molnar,35 “Interpretable Machine Learning: Definitions, Methods, and Applications” by W. James Murdoch et al.,36 and “Challenges for Transparency” by Adrian Weller.37 Interested readers are encouraged to dive into these more technical, detailed, and nuanced analyses too!

Understanding and Trust

Some interpretability techniques are more geared toward fostering understanding, some help engender trust, and some enhance both. Trust and understanding are different, but not orthogonal, phenom‐ ena. Both are also important goals for any machine learning project. Understanding through transparency is necessary for human learn‐ ing from machine learning, for appeal of automated decisions, and for regulatory compliance. The discussed techniques enhance understanding by either providing transparency and specific insights into the mechanisms of the algorithms and the functions they create or by providing detailed information for the answers they provide. Trust grows from the tangible accuracy, fairness, and security of machine learning systems. The techniques that follow enhance trust by enabling users to observe or ensure the fairness, stability, and dependability of machine learning algorithms, the functions they create, and the answers they generate.

A Scale for Interpretability

The complexity of a machine learning model is often related to its interpretability. Generally, the more complex and unconstrained the model, the more difficult it is to interpret and explain. The number of weights or rules in a model or its Vapnik–Chervonenkis dimen‐ sion, a more formal measure, are good ways to quantify a model’s complexity. However, analyzing the functional form of a model is particularly useful for commercial applications such as credit scor‐ ing. The following list describes the functional forms of models and discusses their degree of interpretability in various use cases.

35 Christoph Molnar, Interpretable Machine Learning (christophm.github.io: 2019),
https://oreil.ly/2YI5ruC.
36 W. James Murdoch et al., “Interpretable Machine Learning: Definitions, Methods, and
Applications,” arXiv:1901.04592, 2019, https://arxiv.org/pdf/1901.04592.pdf.
37 Adrian Weller, “Challenges for Transparency,” arXiv:1708.01870, 2017, https://
arxiv.org/pdf/1708.01870.pdf.

High interpretability: linear, monotonic functions

Functions created by traditional regression algorithms are prob‐ ably the most interpretable class of models. We refer to these models here as “linear and monotonic,” meaning that for a change in any given input variable (or sometimes combination or function of an input variable), the output of the response function changes at a defined rate, in only one direction, and at a magnitude represented by a readily available coefficient. Monotonicity also enables intuitive and even automatic reason‐ ing about predictions. For instance, if a credit lender rejects your credit card application, it can easily tell you why because its probability-of-default model often assumes your credit score, your account balances, and the length of your credit history are monotonically related to your ability to pay your credit card bill. When these explanations are created automatically, they are typically called adverse action notices or reason codes. Linear, monotonic functions play another important role in machine learning interpretability. Besides being highly interpretable themselves, linear and monotonic functions are also used in explanatory techniques, including the popular LIME approach.

Medium interpretability: nonlinear, monotonic functions

Although most machine-learned response functions are nonlin‐ ear, some can be constrained to be monotonic with respect to any given independent variable. Although there is no single coefficient that represents the change in the response function output induced by a change in a single input variable, nonlinear and monotonic functions do always change in one direction as a single input variable changes. They usually allow for the genera‐ tion of plots that describe their behavior and both reason codes and variable importance measures. Nonlinear, monotonic response functions are therefore fairly interpretable and poten‐ tially suitable for use in regulated applications. (Of course, there are linear, nonmonotonic machine-learned response functions that can, for instance, be created by the mul‐ tivariate adaptive regression splines (MARS) approach. These functions could be of interest for your machine learning project and they likely share the medium interpretability characteristics of nonlinear, monotonic functions.)

Low interpretability: nonlinear, nonmonotonic functions

Most machine learning algorithms create nonlinear, nonmono‐ tonic response functions. This class of functions is the most dif‐ ficult to interpret, as they can change in a positive and negative direction and at a varying rate for any change in an input vari‐ able. Typically, the only standard interpretability measures these functions provide are relative variable importance measures. You should use a combination of several techniques, presented in the sections that follow, to interpret, explain, debug, and test these extremely complex models. You should also consider the accuracy, fairness, and security problems associated with black- box machine learning before deploying a nonlinear, nonmono‐ tonic model for any application with high stakes or that affects humans.

Global and Local Interpretability

It’s often important to understand and test your trained model on a global scale, and also to zoom into local regions of your data or your predictions and derive local information. Global measures help us understand the inputs and their entire modeled relationship with the prediction target, but global interpretations can be highly approximate in some cases. Local information helps us understand our model or predictions for a single row of data or a group of simi‐ lar rows. Because small parts of a machine-learned response func‐ tion are more likely to be linear, monotonic, or otherwise well- behaved, local information can be more accurate than global information. It’s also very likely that the best analysis of a machine learning model will come from combining the results of global and local interpretation techniques. In subsequent sections we will use the following descriptors to classify the scope of an interpretable machine learning approach:

Global interpretability

Some machine learning interpretability techniques facilitate global measurement of machine learning algorithms, their results, or the machine-learned relationships between the pre‐ diction target(s) and the input variables across entire partitions of data.

Local interpretability

Local interpretations promote understanding of small regions of the machine-learned relationship between the prediction target(s) and the input variables, such as clusters of input records and their corresponding predictions, or deciles of predictions and their corresponding input rows, or even single rows of data.

Model-Agnostic and Model-Specific Interpretability

Another important way to classify model interpretability techniques is to determine whether they are model agnostic, meaning they can be applied to different types of machine learning algorithms, or model specific, meaning techniques that are applicable only for a sin‐ gle type or class of algorithm. For instance, the LIME technique is model agnostic and can be used to interpret nearly any set of machine learning inputs and machine learning predictions. On the other hand, the technique known as Tree SHAP is model specific and can be applied only to decision tree models. Although model- agnostic interpretability techniques are convenient, and in some ways ideal, they often rely on surrogate models or other approxima‐ tions that can degrade the accuracy of the information they provide. Model-specific interpretation techniques tend to use the model to be interpreted directly, leading to potentially more accurate measure‐ ments.

Integrating Interpretability into a Workflow

Now that you’ve read about these machine learning interpretability techniques, you may be wondering how to fit them into your professional workflow. The following figure shows one way to augment the standard data mining workflow with steps for increasing accuracy, interpretability, privacy, security, transparency, and trustworthiness using the classes of techniques presented further.

A proposed holistic training and deployment workflow for human-centered or other high-stakes machine learning applications (source)

We suggest using the introduced techniques in these workflow steps. For instance, you may visualize and explore your data using projec‐ tions and network graphs, preprocess your data using fairness reweighing, and train a monotonic GBM model. You might then explain the monotonic GBM with a combination of techniques such as decision tree surrogates, partial dependence and ICE plots, and Shapley explanations. Then you could conduct disparate impact testing to ensure fairness in your predictions and debug your model with sensitivity analysis. Such a combination represents a current best guess for a viable human-centered, or other high-stakes appli‐ cation, machine learning workflow.

Next in this sequence: 2. State of the art