Prerequisites
Feature Description
Add support for DeepSeek Sparse Attention, used by models such as DeepSeek-V3.2 and GLM-5.
Creating an issue here for tracking since #19460 was merged and #16331 became stale.
I understand that this architecture is rather unwieldy to develop on, as the smallest model that currently uses this architecture is a hefty 671B parameters 😅.
Motivation
This is a new architecture that has now been utilized for two major open weights releases, so it would be cool to have support in llama.cpp.
deepseek-ai/DeepSeek-V3.2 and its variants
zai-org/GLM-5
Possible Implementation
vLLM implementation:
https://github.com/vllm-project/vllm/blob/88ef733ff5125b1b29e7aaae13063b2a109d5e8f/vllm/model_executor/models/deepseek_v2.py#L768
https://github.com/vllm-project/vllm/blob/88ef733ff5125b1b29e7aaae13063b2a109d5e8f/vllm/model_executor/models/deepseek_v2.py#L768
Prerequisites
Feature Description
Add support for DeepSeek Sparse Attention, used by models such as DeepSeek-V3.2 and GLM-5.
Creating an issue here for tracking since #19460 was merged and #16331 became stale.
I understand that this architecture is rather unwieldy to develop on, as the smallest model that currently uses this architecture is a hefty 671B parameters 😅.
Motivation
This is a new architecture that has now been utilized for two major open weights releases, so it would be cool to have support in
llama.cpp.deepseek-ai/DeepSeek-V3.2and its variantszai-org/GLM-5Possible Implementation
vLLM implementation:
https://github.com/vllm-project/vllm/blob/88ef733ff5125b1b29e7aaae13063b2a109d5e8f/vllm/model_executor/models/deepseek_v2.py#L768
https://github.com/vllm-project/vllm/blob/88ef733ff5125b1b29e7aaae13063b2a109d5e8f/vllm/model_executor/models/deepseek_v2.py#L768