Skip to content

[Summary] Add gradient-based attribution methodsΒ #122

@gsarti

Description

@gsarti

πŸš€ Feature Request

The following is a non-exhaustive list of gradient-based feature attribution methods that could be added to the library:

Method name Source In Captum Code implementation Status
DeepLiftSHAP - βœ… pytorch/captum
GradientSHAP1 Lundberg and Lee '17 βœ… pytorch/captum
Guided Backprop Springenberg et al. '15 βœ… pytorch/captum
LRP 2 Bach et al. '15 βœ… pytorch/captum
Guided Integrated Gradients Kapishnikov et al. '21 PAIR-code/saliency
Projected Gradient Descent (PGD) 3 Madry et al. '18, Yin et al. '22 uclanlp/NLP-Interpretation-Faithfulness
Sequential Integrated Gradients Enguehard '23 josephenguehard/time_interpret βœ…
Greedy PIG 4 Axiotis et al. '23
AttnLRP Achtibat et al. '24 rachtibat/LRP-for-Transformers βœ…
Randomized Path-Integration (RPI) 5 Barkan et al. '24 rpiconf/rpi

Notes:

  1. The Deconvolution method can also be added, but it seems to perform the same procedure as Guided Backprop, so it wasn't included to avoid deduplication.

Footnotes

  1. The method was already present in inseq but was removed due to instability in the single example vs. batched setting, reintroducing it will need this problem to be fixed.
  2. Custom rules for the supported architectures need to be defined in order to adapt the LRP attribution method to our use-case. An existing implementation of LRP rules for Transformer models in Tensorflow is available here: [lena-voita/the-story-of-heads](https://github.com/lena-voita/the-story-of-heads).
  3. The method leverage gradient information to perform adversarial replacement, so its collocation in the gradient-based family should be reviewed.
  4. Similar to Sequential Integrated Gradient, but instead of focusing on one word at a time, at every iteration the top features identified by attribution are fixed (i.e. baseline is set to identity) and the remaining ones are attributed again in the next round.
  5. Integrates gradients over perturbed attention distributions

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is neededsummarySummarizes multiple sub-tasks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions