sampling: Integrate Top-nσ into main sampling chain (and add it to the server) by oobabooga · Pull Request #13264 · ggml-org/llama.cpp

oobabooga · 2025-05-02T14:31:48Z

Top-nσ support was added in #11223, where it was implemented as a special case that ignored samplers other than top_k and temperature when top_n_sigma was present.

Following #11896 (comment), this PR integrates this sampler into the main sampling chain. This removes the special case handling and makes it possible to combine top_n_sigma with other sampling methods like min_p.

I have used #11896 as a starting point, so this PR also makes top_n_sigma available in llama-server.

Verification

I have tested it with llama-server and it seems to work. Below are the top probabilities after My name is with top_n_sigma=1 (left) and top_n_sigma=5 (right).

tools/server/server.cpp

* origin/master: (27 commits) llama : fix build_ffn without gate (ggml-org#13336) CUDA: fix bad asserts for partial offload (ggml-org#13337) convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331) CUDA: fix --split-mode row for MMQ (ggml-org#13323) gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036) CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320) sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264) server : Webui - change setText command from parent window to also send the message. (ggml-org#13309) mtmd : rename llava directory to mtmd (ggml-org#13311) clip : fix confused naming ffn_up and ffn_down (ggml-org#13290) convert : bailingmoe : set yarn metadata if present (ggml-org#13312) SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308) mtmd : add C public API (ggml-org#13184) rpc : use backend registry, support dl backends (ggml-org#13304) ggml : activate s390x simd for Q3_K (ggml-org#13301) llava/mtmd : fixes to fully support dl backends (ggml-org#13303) llama : build windows releases with dl backends (ggml-org#13220) CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299) CUDA: fix race condition in MMQ ids_dst (ggml-org#13294) vulkan: Additional type support for unary, binary, and copy (ggml-org#13266) ...

betweenus · 2025-05-07T10:24:53Z

Please update documentation (server README).

…he server) (ggml-org#13264) * sampling: add Top-nσ sampler to `llama-server` and sampler ordering * revert: sampler ordering * revert: VS' crappy auto-formatting * revert: VS' crappy auto-formatting pt.2 * revert: my crappy eye sight... * sampling: add XTC to Top-nσ sampler chain * sampling: add Dyna. Temp. to Top-nσ sampler chain * sampling: actually remove Top-nσ from sampler(oops) * Integrate top_n_sigma into main sampler chain * Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA * Formatting * Lint * Exit early in the sampler if nsigma < 0 --------- Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>

CasualAutopsy and others added 11 commits February 15, 2025 16:52

sampling: add Top-nσ sampler to llama-server and sampler ordering

1bdb603

revert: sampler ordering

ff8b612

revert: VS' crappy auto-formatting

1dc5d84

revert: VS' crappy auto-formatting pt.2

9068068

revert: my crappy eye sight...

c05e9e0

sampling: add XTC to Top-nσ sampler chain

a9e7af0

sampling: add Dyna. Temp. to Top-nσ sampler chain

cc1a170

sampling: actually remove Top-nσ from sampler(oops)

a558d3a

Integrate top_n_sigma into main sampler chain

ca992ad

Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA

5ccb05b

Formatting

fa11bd8

oobabooga requested a review from ngxson as a code owner May 2, 2025 14:31

github-actions bot added examples server labels May 2, 2025

Lint

4c4cea2

oobabooga changed the title ~~sampling: Integrate Top-nσ into main sampling chain~~ sampling: Integrate Top-nσ into main sampling chain (and add it to the server) May 2, 2025

Merge remote-tracking branch 'ggerganov/master' into nsigma

456ddb3

CISC reviewed May 5, 2025

View reviewed changes

tools/server/server.cpp Show resolved Hide resolved

oobabooga added 2 commits May 5, 2025 12:20

Exit early in the sampler if nsigma < 0

9152659

Merge remote-tracking branch 'ggerganov/master' into nsigma

b7420c0

CISC approved these changes May 5, 2025

View reviewed changes

CISC mentioned this pull request May 5, 2025

sampling: add Top-nσ sampler to llama-server #11896

Closed

CISC merged commit 233461f into ggml-org:master May 5, 2025
46 checks passed

Beinsezii mentioned this pull request May 6, 2025

[FEATURE_REQUEST] llama.cpp Text Completion top_n_sigma SillyTavern/SillyTavern#3960

Closed

DocShotgun mentioned this pull request May 6, 2025

Updates/fixes for llama.cpp textgen settings SillyTavern/SillyTavern#3961

Merged

1 task

Ph0rk0z mentioned this pull request May 20, 2025

Feature Request: Top n-sigma sampler ikawrakow/ik_llama.cpp#440

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sampling: Integrate Top-nσ into main sampling chain (and add it to the server)#13264

sampling: Integrate Top-nσ into main sampling chain (and add it to the server)#13264
CISC merged 15 commits intoggml-org:masterfrom
oobabooga:nsigma

oobabooga commented May 2, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

betweenus commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

oobabooga commented May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Verification

Uh oh!

Uh oh!

Uh oh!

betweenus commented May 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

oobabooga commented May 2, 2025 •

edited

Loading