Skip to content

[Summary] Add internals-based feature attribution methods #108

@gsarti

Description

@gsarti

🚀 Feature Request

The following is a non-exhaustive list of attention-based feature attribution methods that could be added to the library:

Method name Source Code implementation  Status
Last-Layer Attention Jain and Wallace '19 successar/AttentionExplanation
Aggregated Attention Jain and Wallace '19 successar/AttentionExplanation
Attention Flow Abnar and Zuidema '20 samiraabnar/attention_flow
Attention Rollout Abnar and Zuidema '20 samiraabnar/attention_flow
Attention with Values Norm (Attn-N) Kobayashi et al '20 gorokoba560/norm-analysis-of-transformer
Attention with Residual Norm (AttnRes-N) Kobayashi et al '20 gorokoba560/norm-analysis-of-transformer
Attention with Attention Block Norm (AttnResLn-N or LnAttnRes-N) Kobayashi et al '21 gorokoba560/norm-analysis-of-transformer
Attention-driven Relevance Propagation Chefer et al. '21 hila-chefer/Transformer-MM-Explainability
ALTI+ Ferrando et al '22 mt-upc/transformer-contributions-nmt
GlobEnc Modarressi et al. '22 mohsenfayyaz/globenc
Attention with Attention Block + FFN Norm (AttnResLnFF-N or LnAttnResFF-N) Kobayashi et al '23 -
Attention x Transformer Block Norm Kobayashi et al '23 -
Logit Ferrando et al '23 mt-upc/logit-explanations
ALTI-Logit Ferrando et al '23 mt-upc/logit-explanations
DecompX Modarressi et al '23 mohsenfayyaz/DecompX

Notes:

  1. Add the possibility to scale attention weights by the norm of value vectors, shown to be effective for alignment and encoder models (Ferrando and Costa-jussà '21, Treviso et al. '21)
  2. The ALTI+ technique is an extension of the ALTI method by Ferrando et al. '22 (paper, code) to Encoder-Decoder architectures. It was recently used by the Facebook team to detect hallucinated toxicity by highlighting toxic keywords paying attention to the source (NLLB paper, Figure 31).
  3. Attention Flow is very computationally expensive to compute but has proven SHAP guarantees for same-layer attribution, which is not the case for Rollout or other methods. Flow and rollout should be propagation methods rather than stand-alone approaches since they are used for most attention-based attributions.
  4. GlobEnc corresponds roughly to Attention x Transformer Block Norm but ignores the FFN part, that in the latter is incorporated by a localized application of Integrated Gradients with 0-valued baselines (authors' default)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededsummarySummarizes multiple sub-tasks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions