Skip to content

[Feature] Support Cohere Command-A (Cohere2ForCausalLM arch) #4570

@aikitoria

Description

@aikitoria

Checklist

Motivation

It would be great to support this new model: https://huggingface.co/CohereForAI/c4ai-command-a-03-2025

What's special about this model is that they use an unusual architecture where some layers require sliding windows and some don't:

The model features three layers with sliding window attention (window size 4096) and RoPE for efficient local context modeling and relative positional encoding. A fourth layer uses global attention without positional embeddings, enabling unrestricted token interactions across the entire sequence.

I've found a Cohere2ForCausalLM in this project already but it appears to be a stub that is not implemented yet:

class Cohere2ForCausalLM(CohereForCausalLM):

I previously attempted to implement this model in TensorRT-LLM ( NVIDIA/TensorRT-LLM#2912 ) but ultimately failed as they do not support layers using sliding windows without forcing a cyclic kv cache which breaks prefix caching, and the code that would need changing to fix it is missing. Extremely frustrating. Will there be better luck in this library?

Related resources

Transformers impl here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/cohere2/modular_cohere2.py

vLLM impl here (note for some reason they merged the models and added the sliding window support for CohereForCausalLM): https://github.com/vllm-project/vllm/blob/61f412187d972a006aef1653bfe348aeaefb6a0b/vllm/model_executor/models/commandr.py#L336

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions