Skip to content

sampling: Integrate Top-nσ into main sampling chain (and add it to the server)#13264

Merged
CISC merged 15 commits intoggml-org:masterfrom
oobabooga:nsigma
May 5, 2025
Merged

sampling: Integrate Top-nσ into main sampling chain (and add it to the server)#13264
CISC merged 15 commits intoggml-org:masterfrom
oobabooga:nsigma

Conversation

@oobabooga
Copy link
Copy Markdown
Contributor

@oobabooga oobabooga commented May 2, 2025

Top-nσ support was added in #11223, where it was implemented as a special case that ignored samplers other than top_k and temperature when top_n_sigma was present.

Following #11896 (comment), this PR integrates this sampler into the main sampling chain. This removes the special case handling and makes it possible to combine top_n_sigma with other sampling methods like min_p.

I have used #11896 as a starting point, so this PR also makes top_n_sigma available in llama-server.

Verification

I have tested it with llama-server and it seems to work. Below are the top probabilities after My name is with top_n_sigma=1 (left) and top_n_sigma=5 (right).

print

@oobabooga oobabooga changed the title sampling: Integrate Top-nσ into main sampling chain sampling: Integrate Top-nσ into main sampling chain (and add it to the server) May 2, 2025
@CISC CISC merged commit 233461f into ggml-org:master May 5, 2025
46 checks passed
gabe-l-hart added a commit to gabe-l-hart/llama.cpp that referenced this pull request May 6, 2025
* origin/master: (27 commits)
llama : fix build_ffn without gate (ggml-org#13336)
CUDA: fix bad asserts for partial offload (ggml-org#13337)
convert : qwen2/3moe : set yarn metadata if present (ggml-org#13331)
CUDA: fix --split-mode row for MMQ (ggml-org#13323)
gguf-py : avoid requiring pyside6 for other scripts (ggml-org#13036)
CUDA: fix logic for clearing padding with -ngl 0 (ggml-org#13320)
sampling : Integrate Top-nσ into main sampling chain (and add it to the server) (ggml-org#13264)
server : Webui - change setText command from parent window to also send the message. (ggml-org#13309)
mtmd : rename llava directory to mtmd (ggml-org#13311)
clip : fix confused naming ffn_up and ffn_down (ggml-org#13290)
convert : bailingmoe : set yarn metadata if present (ggml-org#13312)
SYCL: Disable mul_mat kernels for noncontiguous tensor b (ggml-org#13308)
mtmd : add C public API (ggml-org#13184)
rpc : use backend registry, support dl backends (ggml-org#13304)
ggml : activate s390x simd for Q3_K (ggml-org#13301)
llava/mtmd : fixes to fully support dl backends (ggml-org#13303)
llama : build windows releases with dl backends (ggml-org#13220)
CUDA: fix race condition in MMQ stream-k fixup (ggml-org#13299)
CUDA: fix race condition in MMQ ids_dst (ggml-org#13294)
vulkan: Additional type support for unary, binary, and copy (ggml-org#13266)
...
@betweenus
Copy link
Copy Markdown

Please update documentation (server README).

timwu pushed a commit to timwu/llama.cpp that referenced this pull request Dec 20, 2025
…he server) (ggml-org#13264)

* sampling: add Top-nσ sampler to `llama-server` and sampler ordering

* revert: sampler ordering

* revert: VS' crappy auto-formatting

* revert: VS' crappy auto-formatting pt.2

* revert: my crappy eye sight...

* sampling: add XTC to Top-nσ sampler chain

* sampling: add Dyna. Temp. to Top-nσ sampler chain

* sampling: actually remove Top-nσ from sampler(oops)

* Integrate top_n_sigma into main sampler chain

* Define COMMON_SAMPLER_TYPE_TOP_N_SIGMA

* Formatting

* Lint

* Exit early in the sampler if nsigma < 0

---------

Co-authored-by: CasualAutopsy <casual_autopsy@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants