System Info
Is there some reason for RwkvForCausalLM does not support gradient checkpointing, since RWKV-LM supports it?
@ArthurZucker and @younesbelkada
Who can help?
No response
Information
Tasks
Reproduction
model.gradient_checkpointing_enable()
ValueError(f"{self.__class__.__name__} does not support gradient checkpointing.")
ValueError: RwkvForCausalLM does not support gradient checkpointing.
Expected behavior
No errors, as long as RWKV-LM supports it.