currently, the stateless method torch.multinomial is not exposed for CUDA tensors. See: https://github.com/pytorch/examples/blob/master/word_language_model/generate.py#L60