Skip to content

Releases: ggml-org/llama.cpp

b7751

16 Jan 00:33
785a710

Choose a tag to compare

b7750

16 Jan 00:23
6e7fc8a

Choose a tag to compare

b7749

15 Jan 23:22
be8e3d9

Choose a tag to compare

b7748

15 Jan 23:16
13f1e4a

Choose a tag to compare

llama : add adaptive-p sampler (#17927)

  • initial commit for branch

  • simplify constants

  • add params to struct common_params_sampling, add reference to PR

  • explicitly clamp min_target and max_target to [0.0, 1.0]

  • add args, rename queue_size -> window_size

  • improved comments

  • minor

  • remove old unused code from algorithm

  • minor

  • add power law case to common_sampler_init, add sampler name mappings

  • clarify behaviour when window_size = 0

  • add missing enums

  • remove target_range param, make target == 1 no-op, cleanup code

  • oops, straggler

  • add missing parameters in server-task.cpp

  • copy from author

ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

  • remove old debug log, style nit

  • fix compiler warning, add commented-out logging per token

  • re-write + change parameters + simplify

  • oops forgot args.cpp

  • fix leftover window_size

  • add missing values to common_params_sampling::print()

  • with logging

  • does this fix it?

  • no, but does this?

  • update default decay

  • optimize

  • fix bad merge

my git skills are lacking

  • silence missing initializer for member

  • update default decay to 0.9

  • fix logging

  • format (double)

  • add power law to the new samplers vector

  • log sampler init values

  • improve logging messages in llama_sampler_power_law

  • remove extraneous logging

  • simplify target computation

last commit with debug logging!

  • remove debug logging, explicitly clamp params at init

  • add use_power_law flag + logic, minor cleanup

  • update power-law -> adaptive-p

  • fix cold start EMA

  • ctx->weighted_sum is now initialized and reset to target / (1.0f - clamped_decay)
  • ctx->total_weight is now initialized and reset to 1.0f / (1.0f - clamped_decay)

this fixes a "cold start" problem with the moving average

  • update SHARPNESS constant to 10.0f

  • minor style fixes

no functional changes

  • minor style fixes cont.

  • update llama_sampler_adaptive_p_i for backend sampling (ref: #17004)

  • separate into apply + accept functions

  • pending_token_idx: switch from llama_token to int32

functionally identical (llama.h has typedef int32_t llama_token;),
but its more correct now

  • don't transform logits <= -1e9f

  • fix masking in backend top-p, min-p

  • address review comments

  • typo in comments RND -> RNG

  • add docs

  • add recommended values in completion docs

  • address PR feedback

  • remove trailing whitespace (for CI editorconfig)

  • add to adaptive-p to common_sampler_types_from_chars

macOS/iOS:

Linux:

Windows:

openEuler:

b7747

15 Jan 23:14
a04c2b0

Choose a tag to compare

server: improve slots scheduling for n_cmpl (#18789)

  • server : make sure children tasks are scheduled to launch with parent

  • fix

  • add comment pointing to this PR

  • fix

  • clean up

  • more debug messages

  • add pop_deferred_task with specific ID version

  • improve the logic

  • simple approach

  • no double move

  • correct return type of launch_slots_with_parent_task

macOS/iOS:

Linux:

Windows:

openEuler:

b7746

15 Jan 21:34
39173bc

Choose a tag to compare

context : reserve new scheduler when graph topology changes (#18547)

  • context : reserve new scheduler when graph topology changes

  • cont : fix

  • cont : fix reserve

  • cont : reserve only when changes occur + timing

  • context : add comments

  • llama : reserve on sampler changes

  • common : allow null common_sampler

  • server : task declares needs (embd, logits, sampling)

  • server : do not init sampler if not needed

  • llama : fix need_reserve when unsetting a sampler

  • server : consolidate slot reset/clear logic

macOS/iOS:

Linux:

Windows:

openEuler:

b7745

15 Jan 21:05
5c662d2

Choose a tag to compare

b7744

15 Jan 13:38
8cc0ba9

Choose a tag to compare

b7743

15 Jan 12:40
a7e6ddb

Choose a tag to compare

lora: make sure model keep track of associated adapters (#18490)

  • lora: make sure model keep track of associated adapters

  • deprecate llama_adapter_lora_free

  • minor : std::unordered_set over std::set


Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

b7742

15 Jan 12:40
2a13180

Choose a tag to compare