LICENSE | ||
README.md |
Papers on Explainable Artificial Intelligence
This is an on-going attempt to consolidate interesting efforts in the area of understanding / interpreting / explaining / visualizing a pre-trained ML model.
GUI tools
- DeepVis: Deep Visualization Toolbox. Yosinski et al. 2015 code | pdf
- SWAP: Generate adversarial poses of objects in a 3D space. Alcorn et al. 2018 code | pdf
Libraries
- https://github.com/utkuozbulak/pytorch-cnn-visualizations (activation maximization)
- https://github.com/albermax/innvestigate (heatmaps)
- https://github.com/tensorflow/lucid (activation maximization, heatmaps)
Surveys
- Methods for Interpreting and Understanding Deep Neural Networks. Montavon et al. 2017 pdf
- The Mythos of Model Interpretability. Lipton 2016 pdf
- Towards A Rigorous Science of Interpretable Machine Learning Doshi-Velez & Kim. 2017 pdf
- Visualizations of Deep Neural Networks in Computer Vision: A Survey. Seifert et al. 2017 pdf
- How convolutional neural network see the world - A survey of convolutional neural network visualization methods. Qin et al. 2018 pdf
- A brief survey of visualization methods for deep learning models from the perspective of Explainable AI. Chalkiadakis 2018 pdf
- A Survey Of Methods For Explaining Black Box Models. Guidotti et al. 2018 pdf
A. Explaining inner-workings
A1. Visualizing Preferred Stimuli
Synthesizing images / Activation Maximization
- AM: Visualizing higher-layer features of a deep network. Erhan et al. 2009 pdf
- DeepVis: Understanding Neural Networks through Deep Visualization. Yosinski et al. 2015 pdf | url
- MFV: Multifaceted Feature Visualization: Uncovering the different types of features learned by each neuron in deep neural networks. Nguyen et al. 2016 pdf | code
- DGN-AM: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. Nguyen et al. 2016 pdf | code
- PPGN: Plug and Play Generative Networks. Nguyen et al. 2017 pdf | code
- Feature Visualization. Olah et al. 2017 url
- Diverse feature visualizations reveal invariances in early layers of deep neural networks. Cadena et al. 2018 pdf
Real images / Segmentation Masks
- Visualizing and Understanding Recurrent Networks. Kaparthey et al. 2015 pdf
- Object Detectors Emerge in Deep Scene CNNs. Zhou et al. 2015 pdf
- Understanding Deep Architectures by Interpretable Visual Summaries pdf
A2. Inverting Neural Networks
- Understanding Deep Image Representations by Inverting Them pdf
- Inverting Visual Representations with Convolutional Networks pdf
- Neural network inversion beyond gradient descent pdf
A3. Distilling DNNs into more interpretable models
- Interpreting CNNs via Decision Trees pdf
- Distilling a Neural Network Into a Soft Decision Tree pdf
- Distill-and-Compare: Auditing Black-Box Models Using Transparent Model Distillation. Tan et al. 2018 pdf
- Improving the Interpretability of Deep Neural Networks with Knowledge Distillation. Liu et al. 2018 pdf
A4. Quantitatively characterizing hidden features
- TCAV: Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors. Kim et al. 2018 pdf | code
- Automating Interpretability: Discovering and Testing Visual Concepts Learned by Neural Networks. Ghorbani et al. 2019 pdf
- SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. Raghu et al. 2017 pdf | code
- A Peek Into the Hidden Layers of a Convolutional Neural Network Through a Factorization Lens. Saini et al. 2018 pdf
- Network Dissection: Quantifying Interpretability of Deep Visual Representations. Bau et al. 2017 url | pdf
A5. Network surgery
- How Important Is a Neuron? Dhamdhere et al. 2018 pdf
A6. Sensitivity analysis
- NLIZE: A Perturbation-Driven Visual Interrogation Tool for Analyzing and Interpreting Natural Language Inference Models. Liu et al. 2018 pdf
B. Explaining decisions
B1. Heatmaps / Attribution
White-box
- A Theoretical Explanation for Perplexing Behaviors of Backpropagation-based Visualizations. Nie et al. 2018 pdf
- A Taxonomy and Library for Visualizing Learned Features in Convolutional Neural Networks pdf
- CAM: Learning Deep Features for Discriminative Localization. Zhou et al. 2016 code | web
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization. Selvaraju et al. 2017 pdf
- Grad-CAM++: Improved Visual Explanations for Deep Convolutional Networks. Chattopadhyay et al. 2017 pdf | code
- LRP: Beyond saliency: understanding convolutional neural networks from saliency prediction on layer-wise relevance propagation pdf
- DTD: Explaining NonLinear Classification Decisions With Deep Tayor Decomposition pdf
- Regional Multi-scale Approach for Visually Pleasing Explanations of Deep Neural Networks. Seo et al. 2018 pdf
- Integrated Gradients: Axiomatic Attribution for Deep Networks. Sundararajan et al. 2018 pdf | code
- The (Un)reliability of saliency methods. Kindermans et al. 2018 pdf
- Sanity Checks for Saliency Maps. Adebayo et al. 2018 pdf
Black-box
- RISE: Randomized Input Sampling for Explanation of Black-box Models. Petsiuk et al. 2018 pdf
- LIME: Why should i trust you?: Explaining the predictions of any classifier. Ribeiro et al. 2016 pdf | blog
B2. Learning to explain
- Learning how to explain neural networks: PatternNet and PatternAttribution pdf
- Deep Learning for Case-Based Reasoning through Prototypes pdf
- Unsupervised Learning of Neural Networks to Explain Neural Networks pdf
- Automated Rationale Generation: A Technique for Explainable AI and its Effects on Human Perceptions pdf
- Rationalization: A Neural Machine Translation Approach to Generating Natural Language Explanations pdf
- Towards robust interpretability with self-explaining neural networks. Melis and Jaakola 2018. pdf
C. Unclassified
- Yang, S. C. H., & Shafto, P. Explainable Artificial Intelligence via Bayesian Teaching. NIPS 2017 pdf
- Explainable AI for Designers: A Human-Centered Perspective on Mixed-Initiative Co-Creation pdf
- ICADx: Interpretable computer aided diagnosis of breast masses. Kim et al. 2018 pdf
- Neural Network Interpretation via Fine Grained Textual Summarization. Guo et al. 2018 pdf
- LS-Tree: Model Interpretation When the Data Are Linguistic. Chen et al. 2019 pdf