## 🚀 Feature Request to lower count_nonzero, https://pytorch.org/docs/stable/generated/torch.count_nonzero.html ## Motivation This op can be used for fixing autoregressive decoding in TPUs for models like GPT-2 which uses positional embedding E.g.:
🚀 Feature
Request to lower count_nonzero, https://pytorch.org/docs/stable/generated/torch.count_nonzero.html
Motivation
This op can be used for fixing autoregressive decoding in TPUs for models like GPT-2 which uses positional embedding
E.g.: