Add RWKV2 (fast)

### Model description

I would like to implement a new model architecture.

## Short description

RWKV v2 is an "RNN with transformer-level performance, without using attention. Similar to Apple's Attention Free Transformer. All trained models open-source. Inference is very fast (even on CPUs) and might work on cell phones. There's also a GPT-type implementation." -- ([Hochreiter's description](https://twitter.com/HochreiterSepp/status/1524270961314484227))  

RWKV v2 is parallelizable because the time-decay of each channel is data-independent (and trainable). For example, in usual RNN you can adjust the time-decay of a channel from say 0.8 to 0.5 (these are called "gates"), while in RWKV v2 you simply move the information from a W-0.8-channel to a W-0.5-channel to achieve the same effect. RWKV can leverage GPUs, but doesn't need to.

### Open source status

- [X] The model implementation is available
- [X] The model weights are available

### Provide useful links for the implementation

## Implementation and weights

There's an implementation at [BlinkDL/RWKV-LM](https://github.com/BlinkDL/RWKV-LM) which also gives a detailed description of the model internals and some performance benchmarks. Model weights currently are being trained for a few datasets, including the Pile (see e.g. [BlinkDL/RWKV-v2-RNN-Pile](https://github.com/BlinkDL/RWKV-v2-RNN-Pile/)) and [Danish Gigaword](https://gigaword.dk) by me. Both will be openly available - some checkpoints for the Pile already are, even though it's an ongoing process.

## Status

The model seems quite exciting and I'm able to replicate preliminary results. I'm already talking with @BlinkDL about the implementation. I'm happy to implement/port the model architecture (for both RNN and GPT variants), tokenizer, and tests myself (and have already started) and would appreciate help and advice. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add RWKV2 (fast) #17230

Model description

Short description

Open source status

Provide useful links for the implementation

Implementation and weights

Status

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add RWKV2 (fast) #17230

Description

Model description

Short description

Open source status

Provide useful links for the implementation

Implementation and weights

Status

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions