llama : add option for greedy sampling with probs by ggerganov · Pull Request #3813 · ggml-org/llama.cpp

ggerganov · 2023-10-27T13:13:20Z

On master when using temp <= 0.0, we get greedy sampling but we don't have the probs of the tokens.
This PR adds an option when using temp == 0.0, to do greedy sampling but also apply softmax so we get the probs.

KerfuffleV2 · 2023-10-27T14:26:17Z

common/sampling.cpp


-    if (temp <= 0) {
-        // greedy sampling
+    if (temp < 0.0) {


The result is the same in either case right? I'm not entirely sure it's worth special casing this instead of just changing greedy sampling to do:

llama_sample_softmax(ctx_main, &cur_p); id = cur_p.data[0].id;

But if you did go that way, you'd probably also want to change the common args parsing stuff to clamp the user-specified temperature to 0.0 so if they pass a negative value it's the same.

It's only internal stuff that would care about probs generated vs no probs unless I'm misunderstanding.

It's the same result yes. The probs are not used only internally - we are using them in speculative. Before this PR, we had to do the hack with temp = 0.01f; to get probs. Now we get them with temp = 0.0f;

The user specified input should not be affected by this change. Technically, the user would normally want to pass temp = -1.0f to save the extra softmax compute, but it's probably not something that would affect performance in measurable way.

Sorry, "internal" was a poor choice of words. I meant it's not something someone calling the application and passing --temp on the commandline would care about. So if they do --temp -1 for an example that doesn't care about probs then it's kind of weird/unnecessary to turn on generating probs in that case.

So what I'm proposing is that the argument handling stuff would do something like:

params.sparams.temp = std::max(0.0f, atof(blah));

when parsing the commandline arguments, so even if the user does --temp -1 it's still just 0.0. Then something like speculative which cares about probs in the greedy sampling case can do:

if (params.sparams.temp == 0.0f) { params.sparams.temp = -1.0f; }

edit: Actually, you'd need to reverse the logic for the softmax case a bit also: so 0.0 = greedy sampling, no softmax. < 0.0 = greedy sampling with softmax.

Got it. Should be good now

* llama : add option for greedy sampling with probs * llama : add comment about llama_sample_token_greedy() missing probs * sampling : temp == 0.0 -> no probs, temp < 0.0 -> probs

* ggml-org/llama.cpp#3813

* llama : add option for greedy sampling with probs * llama : add comment about llama_sample_token_greedy() missing probs * sampling : temp == 0.0 -> no probs, temp < 0.0 -> probs

* ggml-org/llama.cpp#3813

ggerganov requested a review from KerfuffleV2 October 27, 2023 13:13

llama : add option for greedy sampling with probs

4aa1fb0

ggerganov force-pushed the sampling-greedy-with-probs branch from e274fe3 to 4aa1fb0 Compare October 27, 2023 13:22

KerfuffleV2 reviewed Oct 27, 2023

View reviewed changes

ggerganov added 2 commits October 28, 2023 13:21

llama : add comment about llama_sample_token_greedy() missing probs

c86cca8

sampling : temp == 0.0 -> no probs, temp < 0.0 -> probs

bbfc62a

KerfuffleV2 approved these changes Oct 28, 2023

View reviewed changes

ggerganov merged commit ee1a0ec into master Oct 28, 2023

berkut1 mentioned this pull request Nov 9, 2023

Greedy sampling oobabooga/text-generation-webui#4524

Closed

1 task

brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 17, 2023

Allow returning probs w/ greedy sampling (negative temp)

638c0ff

* ggml-org/llama.cpp#3813

brittlewis12 added a commit to brittlewis12/llmfarm_core.swift that referenced this pull request Nov 30, 2023

Allow returning probs w/ greedy sampling (negative temp)

a5ba40d

* ggml-org/llama.cpp#3813

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : add option for greedy sampling with probs#3813

llama : add option for greedy sampling with probs#3813
ggerganov merged 3 commits intomasterfrom
sampling-greedy-with-probs

ggerganov commented Oct 27, 2023

Uh oh!

KerfuffleV2 Oct 27, 2023

Uh oh!

ggerganov Oct 28, 2023

Uh oh!

KerfuffleV2 Oct 28, 2023 •

edited

Loading

Uh oh!

ggerganov Oct 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented Oct 27, 2023

Uh oh!

KerfuffleV2 Oct 27, 2023

Choose a reason for hiding this comment

Uh oh!

ggerganov Oct 28, 2023

Choose a reason for hiding this comment

Uh oh!

KerfuffleV2 Oct 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov Oct 28, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

KerfuffleV2 Oct 28, 2023 •

edited

Loading