[Feature] Support Cohere Command-A (Cohere2ForCausalLM arch)

### Checklist

- [x] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
- [x] 2. Please use English, otherwise it will be closed.

### Motivation

It would be great to support this new model: https://huggingface.co/CohereForAI/c4ai-command-a-03-2025

What's special about this model is that they use an unusual architecture where some layers require sliding windows and some don't:

> The model features three layers with sliding window attention (window size 4096) and RoPE for efficient local context modeling and relative positional encoding. A fourth layer uses global attention without positional embeddings, enabling unrestricted token interactions across the entire sequence.

I've found a `Cohere2ForCausalLM` in this project already but it appears to be a stub that is not implemented yet: https://github.com/sgl-project/sglang/blob/90532b762777302cd46a9a38b667570360661e23/python/sglang/srt/models/commandr.py#L413

I previously attempted to implement this model in TensorRT-LLM ( https://github.com/NVIDIA/TensorRT-LLM/issues/2912 ) but ultimately failed as they do not support layers using sliding windows without forcing a cyclic kv cache which breaks prefix caching, and the code that would need changing to fix it is missing. Extremely frustrating. Will there be better luck in this library?

### Related resources

Transformers impl here: https://github.com/huggingface/transformers/blob/main/src/transformers/models/cohere2/modular_cohere2.py

vLLM impl here (note for some reason they merged the models and added the sliding window support for CohereForCausalLM): https://github.com/vllm-project/vllm/blob/61f412187d972a006aef1653bfe348aeaefb6a0b/vllm/model_executor/models/commandr.py#L336

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support Cohere Command-A (Cohere2ForCausalLM arch) #4570

Checklist

Motivation

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Support Cohere Command-A (Cohere2ForCausalLM arch) #4570

Description

Checklist

Motivation

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions