What you see is what you classify: black box attributions

The lack of transparency of black-box models is a fundamental problem in modern Artificial Intelligence and Machine Learning. This work focuses on how to unbox deep learning models for image classification problems.
By
Steven Stalder, Nathanaël Perraudin, Radhakrishna Achanta, Fernando Perez-Cruz & Michele Volpi
September 23, 2022
Share this post

In this work, Swiss Data Science Center researchers Steven Stalder, Nathanaël Perraudin, Radhakrishna Achanta, Fernando Perez-Cruz, and Michele Volpi tackle a fundamental problem in modern Artificial Intelligence (AI) and Machine Learning (ML): the lack of transparency of black-box models. Specifically, they focus on how to unbox deep learning models for image classification problems.

Attribution in image recognition systems

One can train AI models to predict which object class – a dog, a cat, a bike, a person, etc. – is present in a given image. Although these models are highly accurate at this task, they do not provide any additional information on which portions of the image they relied on to arrive at their prediction. While humans can easily explain which part of an image contains a class of interest, it is challenging to understand the behavior of deep learning models due to their black-box nature. In computer vision, identifying the regions of an image that are responsible for a given prediction is called attribution.

Figure 1: Attributions for five selected VOC  classes provided by Grad-CAM (GCam) , Extremal Perturbations (EP) , and our Explainer. Areas with high attribution score for the class given in the column are highlighted in red.

Our contribution

We present an attribution method making use of a model, which we  call the Explainer. The Explainer is a deep learning model trained to explain the output of a target model, which is an independently pre-trained image classifier. The innovation lies in the ability of the Explainer to directly provide explanations for all classes, without needing to access the trained classifier’s internals, nor having to retrain any parameters for new images. This makes our proposed method a flexible tool to be used with a wide range of classifiers and datasets.

Our contribution shows significant improvements over other attribution methods on two common computer vision benchmarks. Most importantly, we demonstrate that the Explainer is more accurate in localizing salient image portions. Additionally, the Explainer is significantly more computationally efficient than most of the competing methods. A single forward pass through the Explainer directly provides accurate explanations for all possible object classes from the dataset – as shown in Fig. 1 and 2 – in less than a second.

Figure 2: Qualitative comparison of class-aggregated attributions generated by Grad-CAM (GCam) , RISE , Extremal Perturbations (EP) , iGOS++ (iGOS) , Real Time Image Saliency for Black Box Classifiers (RTIS) , and our Explainer.

The relevance of attribution and explainability of AI

Attribution techniques are an essential step toward the widespread adoption of AI in fields where providing a correct prediction alone is insufficient. In many application domains, users need to trust the models and the inferences provided by the AI. An obvious approach toward increasing trust in AI systems, is to understand why such models make a decision. Why is the model predicting the presence of a tumor in this scan? Why does the model confidently recognize this object class over these other ones? Besides these critical questions, our proposed model also provides easy insights into the mistakes a model makes. In this way, the Explainer highlights biases and errors in datasets used to train image recognition systems, and helps AI makers to develop fair (unbiased) AI systems.

Understanding the datasets and the complications they come with, as well as tools to see why complex models provide such decisions, is only a first step toward inherently interpretable models. In the future, developers of ML models should make an effort to not only achieve the best possible prediction accuracies but also provide ways to directly make their model’s decisions analyzable for non-expert human users. Once again, this approach will be especially critical in domains like medicine, the legal system, or other areas where trust in a model’s decisions is indispensable.

To go further

  • Our paper was accepted at the Thirty-Sixth Conference on Neural Information Processing Systems (NeurIPS 2022). You can already have a look at it on arXiv: https://arxiv.org/abs/2205.11266.
  • Stalder, S., Perraudin, N., Achanta, R., Perez-Cruz, F., & Volpi, M. (2022). What You See is What You Classify: Black Box Attributions. arXiv preprint arXiv:2205.11266.
  • The code for the project is available at https://github.com/stevenstalder/NN-Explainer.

References

  1. M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. The Pascal visual object classes challenge: A retrospective. International Journal of Computer Vision, 111(1):98–136, Jan. 2015.
  2. R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In IEEE/CVF International Conference on Computer Vision, pages 618–626, 2017.
  3. R. C. Fong, M. Patrick, and A. Vedaldi. Understanding deep networks via extremal perturbations and smooth masks. In IEEE/CVF International Conference on Computer Vision, pages 2950–2958, 2019.
  4. V. Petsiuk, A. Das, and K. Saenko. Rise: Randomized input sampling for explanation of black-box models. In British Machine Vision Conference, 2018.
  5. S. Khorram, T. Lawson, and L. Fuxin. iGOS++: Integrated gradient optimized saliency by bilateral perturbations. In Proceedings of the Conference on Health, Inference, and Learning, CHIL ’21, page 174–182, New York, NY, USA, 2021. Association for Computing Machinery. ISBN 9781450383592. doi: 10.1145/3450439.3451865. URL https://doi.org/10.1145/3450439.3451865.
  6. P. Dabkowski and Y. Gal. Real time image saliency for black box classifiers. In Advances in Neural Information Processing Systems, pages 6967–6976, 2017.

About the author

Share this post

More blog posts

January 22, 2025

AIXD | Generative AI toolbox for architects and engineers

AIXD | Generative AI toolbox for architects and engineers

Introducing AIXD (AI-eXtended Design), a toolbox for forward and inverse modeling for exhaustive design exploration.
Blog
January 10, 2025

The SDSC transitions to a National Research Infrastructure

The SDSC transitions to a National Research Infrastructure

As of January 1st, 2025, the Swiss Data Science Center embraces its new role as a National Research Infrastructure.
Our News
December 12, 2024

SDSC News - December 2024 Newsletter

SDSC News - December 2024 Newsletter

Dear SDSC Community, we are excited to introduce SDSC News, a platform to keep you informed about the latest developments, projects,...
SDSC Newsletters

More news

February 21, 2024

License Flowers | Art and AI at SDSC

License Flowers | Art and AI at SDSC

An adventure to create art using AI to raise awareness on code licenses
Blog
April 3, 2023

Whitepaper | Swiss Data Custodian

Whitepaper | Swiss Data Custodian

Ensuring compliance with privacy regulations can be a challenging task, requiring continuous monitoring and the ongoing collection of consent for data usage. The Swiss Data Custodian is an open-source framework that provides the necessary tools to govern data access and processing and enables secure and compliant data usage through contractual agreements.
Our News
January 22, 2025

AIXD | Generative AI toolbox for architects and engineers

AIXD | Generative AI toolbox for architects and engineers

Introducing AIXD (AI-eXtended Design), a toolbox for forward and inverse modeling for exhaustive design exploration.
Blog

Contact us

Let’s talk Data Science

Do you need our services or expertise?
Contact us for your next Data Science project!