The k bias is always zero in code. Is there any reason for this? This is different from the normal implement.
|
qkv_bias = torch.cat((self.q_bias, torch.zeros_like(self.v_bias, requires_grad=False), self.v_bias)) |
In my test. when finetune, k bias has little affect on performance. But I do not have a test on pretrain.
The k bias is always zero in code. Is there any reason for this? This is different from the normal implement.
unilm/beit/modeling_finetune.py
Line 124 in 421cffe
In my test. when finetune, k bias has little affect on performance. But I do not have a test on pretrain.