Skip to content

🚀 An advice about changing variable name from "attention_mask" to "adder" #4141

@ianliuy

Description

@ianliuy

🚀 Feature request

Motivation

I noticed that some users are pretty confused when reading source codes about variable attention_mask
like:
What is the meaning of Attention Mask #205
Clarifying attention mask #542
And I refer to the origional BERT repository - google-research/bert. Compared to the origin, I find in this repo sometimes the concepts of attention_mask and adder are mixed.

refering original BERT: ./modeling.py#L707

attention_mask = tf.expand_dims(attention_mask, axis=[1])
adder = (1.0 - tf.cast(attention_mask, tf.float32)) * -10000.0
attention_scores += adder

But in this repo: take src/transformers/modeling_tf_openai.py#L282 as an example:

attention_mask = attention_mask[:, tf.newaxis, tf.newaxis, :]
attention_mask = tf.cast(attention_mask, tf.float32)
attention_mask = (1.0 - attention_mask) * -10000.0

and inside the method TFAttention._attn() src/transformers/modeling_tf_openai.py#L112:

if attention_mask is not None:
  # Apply the attention mask
  w = w + attention_mask

Your contribution

may be changing its name is way better?
like:

attention_mask = attention_mask[:, tf.newaxis, tf.newaxis, :]
attention_mask = tf.cast(attention_mask, tf.float32)
adder = (1.0 - attention_mask) * -10000.0

and then:

if adder is not None:
  # Apply the attention mask
  attention_score = w + adder

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions