[Summary] Add internals-based feature attribution methods

## 🚀 Feature Request

The following is a non-exhaustive list of attention-based feature attribution methods that could be added to the library:

<table>
<tr>
	<td> Method name </td>
	<td> Source </td>
	<td> Code implementation </td>
	<td> Status</td>
<tr>
	<td>Last-Layer Attention </td>
	<td> <a href="https://aclanthology.org/N19-1357/"> Jain and Wallace '19 </a> </td>
	<td> <a href="https://github.com/successar/AttentionExplanation"><code>successar/AttentionExplanation</code></a> </td>
	<td> ✅ </td>
<tr>
	<td>Aggregated Attention </td>
	<td> <a href="https://aclanthology.org/N19-1357/"> Jain and Wallace '19 </a> </td>
	<td> <a href="https://github.com/successar/AttentionExplanation"><code>successar/AttentionExplanation</code></a> </td>
	<td> ✅</td>
<tr>
	<td>Attention Flow </td>
	<td> <a href="https://arxiv.org/abs/2005.00928"> Abnar and Zuidema '20 </a> </td>
	<td> <a href="https://github.com/samiraabnar/attention_flow"><code>samiraabnar/attention_flow</code></a> </td>
	<td> </td>
<tr>
	<td>Attention Rollout </td>
	<td> <a href="https://arxiv.org/abs/2005.00928"> Abnar and Zuidema '20 </a> </td>
	<td> <a href="https://github.com/samiraabnar/attention_flow"><code>samiraabnar/attention_flow</code></a> </td>
	<td> </td>
<tr>
	<td>Attention with Values Norm (Attn-N) </td>
	<td> <a href="https://aclanthology.org/2020.emnlp-main.574/"> Kobayashi et al '20 </a> </td>
	<td> <a href="https://github.com/gorokoba560/norm-analysis-of-transformer"><code>gorokoba560/norm-analysis-of-transformer</code></a> </td>
	<td> </td>
<tr>
	<td>Attention with Residual Norm (AttnRes-N) </td>
	<td> <a href="https://aclanthology.org/2020.emnlp-main.574/"> Kobayashi et al '20 </a> </td>
	<td> <a href="https://github.com/gorokoba560/norm-analysis-of-transformer"><code>gorokoba560/norm-analysis-of-transformer</code></a> </td>
	<td> </td>
<tr>
	<td>Attention with Attention Block Norm (AttnResLn-N or LnAttnRes-N) </td>
	<td> <a href="https://aclanthology.org/2021.emnlp-main.373/"> Kobayashi et al '21 </a> </td>
	<td> <a href="https://github.com/gorokoba560/norm-analysis-of-transformer"><code>gorokoba560/norm-analysis-of-transformer</code></a> </td>
	<td> </td>
<tr>
	<td>Attention-driven Relevance Propagation</td>
	<td> <a href="https://arxiv.org/pdf/2103.15679"> Chefer et al. '21 </a></td>
	<td> <a href="https://github.com/hila-chefer/Transformer-MM-Explainability"><code>hila-chefer/Transformer-MM-Explainability</code></a> </td>
	<td> </td>
<tr>
	<td>ALTI+ </td>
	<td> <a href="https://arxiv.org/abs/2205.11631"> Ferrando et al '22 </a> </td>
	<td> <a href="https://github.com/mt-upc/transformer-contributions-nmt"><code>mt-upc/transformer-contributions-nmt</code></a> </td>
	<td> </td>
<tr>
	<td>GlobEnc </td>
	<td> <a href="https://aclanthology.org/2022.naacl-main.19"> Modarressi et al. '22</a> </td>
	<td> <a href="https://github.com/mohsenfayyaz/globenc"> <code>mohsenfayyaz/globenc</code></a> </td>
	<td> </td>
<tr>
	<td>Attention with Attention Block + FFN Norm (AttnResLnFF-N or LnAttnResFF-N) </td>
	<td> <a href="http://arxiv.org/abs/2302.00456"> Kobayashi et al '23 </a> </td>
	<td> - </td>
	<td> </td>
<tr>
	<td>Attention x Transformer Block Norm </td>
	<td> <a href="http://arxiv.org/abs/2302.00456"> Kobayashi et al '23 </a> </td>
	<td> - </td>
	<td> </td>
<tr>
	<td>Logit </td>
	<td> <a href="https://arxiv.org/abs/2305.12535"> Ferrando et al '23 </a> </td>
	<td> <a href="https://github.com/mt-upc/logit-explanations"> <code>mt-upc/logit-explanations</code></a> </td>
	<td> </td>
<tr>
	<td>ALTI-Logit </td>
	<td> <a href="https://arxiv.org/abs/2305.12535"> Ferrando et al '23 </a> </td>
	<td> <a href="https://github.com/mt-upc/logit-explanations"> <code>mt-upc/logit-explanations</code></a> </td>
	<td> </td>
<tr>
	<td>DecompX </td>
	<td> <a href="https://arxiv.org/abs/2306.02873"> Modarressi et al '23 </a> </td>
	<td> <a href="https://github.com/mohsenfayyaz/DecompX"> <code>mohsenfayyaz/DecompX</code></a> </td>
	<td> </td>
</table>

**Notes**:
1. Add the possibility to scale attention weights by the norm of value vectors, shown to be effective for alignment and encoder models ([Ferrando and Costa-jussà '21](https://arxiv.org/abs/2109.05853), [Treviso et al. '21](https://aclanthology.org/2021.eval4nlp-1.14/))
2. The ALTI+ technique is an extension of the ALTI method by Ferrando et al. '22 ([paper](https://arxiv.org/abs/2203.04212), [code](https://github.com/mt-upc/transformer-contributions)) to Encoder-Decoder architectures. It was recently used by the Facebook team to detect hallucinated toxicity by highlighting toxic keywords paying attention to the source ([NLLB paper](https://arxiv.org/ftp/arxiv/papers/2207/2207.04672.pdf), Figure 31).
3. Attention Flow is very computationally expensive to compute but has [proven SHAP guarantees](https://arxiv.org/abs/2105.14652) for same-layer attribution, which is not the case for Rollout or other methods. Flow and rollout should be propagation methods rather than stand-alone approaches since they are used for most attention-based attributions.
4. GlobEnc corresponds roughly to Attention x Transformer Block Norm but ignores the FFN part, that in the latter is incorporated by a localized application of Integrated Gradients with 0-valued baselines (authors' default)

Method name	Source	Code implementation	Status
Last-Layer Attention	Jain and Wallace '19	`successar/AttentionExplanation`	✅
Aggregated Attention	Jain and Wallace '19	`successar/AttentionExplanation`	✅
Attention Flow	Abnar and Zuidema '20	`samiraabnar/attention_flow`
Attention Rollout	Abnar and Zuidema '20	`samiraabnar/attention_flow`
Attention with Values Norm (Attn-N)	Kobayashi et al '20	`gorokoba560/norm-analysis-of-transformer`
Attention with Residual Norm (AttnRes-N)	Kobayashi et al '20	`gorokoba560/norm-analysis-of-transformer`
Attention with Attention Block Norm (AttnResLn-N or LnAttnRes-N)	Kobayashi et al '21	`gorokoba560/norm-analysis-of-transformer`
Attention-driven Relevance Propagation	Chefer et al. '21	`hila-chefer/Transformer-MM-Explainability`
ALTI+	Ferrando et al '22	`mt-upc/transformer-contributions-nmt`
GlobEnc	Modarressi et al. '22	`mohsenfayyaz/globenc`
Attention with Attention Block + FFN Norm (AttnResLnFF-N or LnAttnResFF-N)	Kobayashi et al '23	-
Attention x Transformer Block Norm	Kobayashi et al '23	-
Logit	Ferrando et al '23	`mt-upc/logit-explanations`
ALTI-Logit	Ferrando et al '23	`mt-upc/logit-explanations`
DecompX	Modarressi et al '23	`mohsenfayyaz/DecompX`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Summary] Add internals-based feature attribution methods #108

🚀 Feature Request

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Summary] Add internals-based feature attribution methods #108

Description

🚀 Feature Request

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions