Skip to content

RWKV4neo  #20737

@ArEnSc

Description

@ArEnSc

Model description

RWKV - Receptance Weighted Key Value

RWKV is a Sequence to Sequence Model that takes the best features of Generative PreTraining (GPT) and Recurrent Nueral Networks (RNN) that performs Language Modelling (LM). This is used to generate text Auto Regressive manner (AR).

This is a hybrid model.

It has Transformer Level Performance without the quadratic attention mechanism. It borrows ideas from Attention Free Transformers, meaning the attention is a linear in complexity. Allowing for infinite context through the hidden state in RWKV_RNN.

There are two models for RWKV, they are refered to as modes.

RWKV_RNN: This mode is designed for running inference quickly.
RWKV_GPT: This mode is for training or fine tuning your model quickly.

In the first pass we will be implementing RWKV_RNN Although we can weight share to have RWKV_GPT generate the inital context for RWKV_RNN.

Open source status

  • The model implementation is available
  • The model weights are available
  • Scaffolding
  • API Discussion

Provide useful links for the implementation

More from the Research and Development Repository: https://github.com/BlinkDL/RWKV-LM

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions