Disentangling Visual Transformers: Patch-level Interpretability for Image Classification

1ISIR - Sorbonne University 2Normandy University, ENSICAEN, UNICAEN, CNRS, GREYC
*Work done at GREYC Laboratory, now at ISIR Laboratory

Abstract

Visual transformers have achieved remarkable performance in image classification tasks, but this performance gain has come at the cost of interpretability. One of the main obstacles to the interpretation of transformers is the self-attention mechanism, which mixes visual information across the whole image in a complex way. In this paper, we propose Hindered Transformer (HiT), a novel interpretable by design architecture inspired by visual transformers. Our proposed architecture rethinks the design of transformers to better disentangle patch influences at the classification stage. Ultimately, HiT can be interpreted as a linear combination of patch-level information. We show that the advantages of our approach in terms of explicability come with a reasonable trade-off in performance, making it an attractive alternative for applications where interpretability is paramount.

Quantitative Evaluation

MY ALT TEXT

HiT vs post-hoc methods.

MY ALT TEXT

HiT vs other Interpretable by Design transformers.

Qualitative Results

MY ALT TEXT

We visually compare post-hoc methods vs HiT.

Citation

@InProceedings{jeanneret2025disentangling,
    author = {Jeanneret, Guillaume and Simon, Lo{\"\i}c and Jurie, Fr{\'e}d{\'e}ric},
    title = {Disentangling Visual Transformers: Patch-level Interpretability for Image Classification},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
    month = {June},
    year = {2025}
}