Skip to content

Feature Request: DSA lightning indexer support #20363

@DocShotgun

Description

@DocShotgun

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Add support for DeepSeek Sparse Attention, used by models such as DeepSeek-V3.2 and GLM-5.

Creating an issue here for tracking since #19460 was merged and #16331 became stale.

I understand that this architecture is rather unwieldy to develop on, as the smallest model that currently uses this architecture is a hefty 671B parameters 😅.

Motivation

This is a new architecture that has now been utilized for two major open weights releases, so it would be cool to have support in llama.cpp.

  • deepseek-ai/DeepSeek-V3.2 and its variants
  • zai-org/GLM-5

Possible Implementation

vLLM implementation:

https://github.com/vllm-project/vllm/blob/88ef733ff5125b1b29e7aaae13063b2a109d5e8f/vllm/model_executor/models/deepseek_v2.py#L768

https://github.com/vllm-project/vllm/blob/88ef733ff5125b1b29e7aaae13063b2a109d5e8f/vllm/model_executor/models/deepseek_v2.py#L768

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions