Releases · ggml-org/llama.cpp

OpenCL: add SOLVE_TRI op support (#18846)

macOS/iOS:

Linux:

Windows:

openEuler:

cuda : print less debug logs when disabling cuda graphs (#18868)

macOS/iOS:

Linux:

Windows:

openEuler:

context : do not reserve scheduler for warmups (#18867)

macOS/iOS:

Linux:

Windows:

openEuler:

llama : add adaptive-p sampler (#17927)

initial commit for branch
simplify constants
add params to struct common_params_sampling, add reference to PR
explicitly clamp min_target and max_target to [0.0, 1.0]
add args, rename queue_size -> window_size
improved comments
minor
remove old unused code from algorithm
minor
add power law case to common_sampler_init, add sampler name mappings
clarify behaviour when window_size = 0
add missing enums
remove target_range param, make target == 1 no-op, cleanup code
oops, straggler
add missing parameters in server-task.cpp
copy from author

ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069

remove old debug log, style nit
fix compiler warning, add commented-out logging per token
re-write + change parameters + simplify
oops forgot args.cpp
fix leftover window_size
add missing values to common_params_sampling::print()
with logging
does this fix it?
no, but does this?
update default decay
optimize
fix bad merge

my git skills are lacking

silence missing initializer for member
update default decay to 0.9
fix logging
format (double)
add power law to the new samplers vector
log sampler init values
improve logging messages in llama_sampler_power_law
remove extraneous logging
simplify target computation

last commit with debug logging!

remove debug logging, explicitly clamp params at init
add use_power_law flag + logic, minor cleanup
update power-law -> adaptive-p
fix cold start EMA

ctx->weighted_sum is now initialized and reset to target / (1.0f - clamped_decay)
ctx->total_weight is now initialized and reset to 1.0f / (1.0f - clamped_decay)

this fixes a "cold start" problem with the moving average

update SHARPNESS constant to 10.0f
minor style fixes

no functional changes

minor style fixes cont.
update llama_sampler_adaptive_p_i for backend sampling (ref: #17004)
separate into apply + accept functions
pending_token_idx: switch from llama_token to int32

functionally identical (llama.h has typedef int32_t llama_token;),
but its more correct now

don't transform logits <= -1e9f
fix masking in backend top-p, min-p
address review comments
typo in comments RND -> RNG
add docs
add recommended values in completion docs
address PR feedback
remove trailing whitespace (for CI editorconfig)
add to adaptive-p to common_sampler_types_from_chars

macOS/iOS:

Linux:

Windows:

openEuler:

server: improve slots scheduling for n_cmpl (#18789)

server : make sure children tasks are scheduled to launch with parent
fix
add comment pointing to this PR
fix
clean up
more debug messages
add pop_deferred_task with specific ID version
improve the logic
simple approach
no double move
correct return type of launch_slots_with_parent_task

macOS/iOS:

Linux:

Windows:

openEuler:

context : reserve new scheduler when graph topology changes (#18547)

context : reserve new scheduler when graph topology changes
cont : fix
cont : fix reserve
cont : reserve only when changes occur + timing
context : add comments
llama : reserve on sampler changes
common : allow null common_sampler
server : task declares needs (embd, logits, sampling)
server : do not init sampler if not needed
llama : fix need_reserve when unsetting a sampler
server : consolidate slot reset/clear logic

macOS/iOS:

Linux:

Windows:

openEuler:

CUDA: fix allignment on register spill for FA (#18815)

macOS/iOS:

Linux:

Windows:

openEuler:

ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (#18837)

macOS/iOS:

Linux:

Windows:

openEuler:

lora: make sure model keep track of associated adapters (#18490)

lora: make sure model keep track of associated adapters
deprecate llama_adapter_lora_free
minor : std::unordered_set over std::set

Co-authored-by: Georgi Gerganov ggerganov@gmail.com

macOS/iOS:

Linux:

Windows:

openEuler:

model-loader : support bool array sliding window pattern (#18850)

macOS/iOS:

Linux:

Windows:

openEuler:

Releases: ggml-org/llama.cpp

b7751

Uh oh!

b7750

Uh oh!

b7749

Uh oh!

b7748

Uh oh!

b7747

Uh oh!

b7746

Uh oh!

b7745

Uh oh!

b7744

Uh oh!

b7743

Uh oh!

b7742

Uh oh!