Skip to content

docs(llama.cpp): note tensor split now works with quantized KV cache#10135

Merged
mudler merged 1 commit into
masterfrom
worktree-docs-tensor-split-quant-kv
Jun 2, 2026
Merged

docs(llama.cpp): note tensor split now works with quantized KV cache#10135
mudler merged 1 commit into
masterfrom
worktree-docs-tensor-split-quant-kv

Conversation

@mudler

@mudler mudler commented Jun 2, 2026

Copy link
Copy Markdown
Owner

The split_mode: tensor description claimed tensor parallelism requires KV-cache quantization to be disabled. ggml-org/llama.cpp#23792 lifts that restriction by extending the meta backend to preserve shape information through KV-cache flatten/reshape, so cache_type_k/cache_type_v quantization can be combined with -sm tensor on builds that include it.

Documentation only: no backend code, grpc-server.cpp comment, or llama.cpp pin changes.

Assisted-by: Claude Code:claude-opus-4-8

Description

This PR fixes #

Notes for Reviewers

Signed commits

  • Yes, I signed my commits.

The split_mode: tensor description claimed tensor parallelism requires
KV-cache quantization to be disabled. ggml-org/llama.cpp#23792 lifts that
restriction by extending the meta backend to preserve shape information
through KV-cache flatten/reshape, so cache_type_k/cache_type_v
quantization can be combined with -sm tensor on builds that include it.

Documentation only: no backend code, grpc-server.cpp comment, or
llama.cpp pin changes.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: Claude Code:claude-opus-4-8
@mudler mudler merged commit 595e448 into master Jun 2, 2026
52 checks passed
@mudler mudler deleted the worktree-docs-tensor-split-quant-kv branch June 2, 2026 13:52
@localai-bot localai-bot added the kind/documentation Improvements or additions to documentation label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants