Releases: ggml-org/llama.cpp
b7751
OpenCL: add SOLVE_TRI op support (#18846)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7750
cuda : print less debug logs when disabling cuda graphs (#18868)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7749
context : do not reserve scheduler for warmups (#18867)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7748
llama : add adaptive-p sampler (#17927)
-
initial commit for branch
-
simplify constants
-
add params to
struct common_params_sampling, add reference to PR -
explicitly clamp
min_targetandmax_targetto[0.0, 1.0] -
add args, rename
queue_size->window_size -
improved comments
-
minor
-
remove old unused code from algorithm
-
minor
-
add power law case to
common_sampler_init, add sampler name mappings -
clarify behaviour when
window_size = 0 -
add missing enums
-
remove
target_rangeparam, maketarget == 1no-op, cleanup code -
oops, straggler
-
add missing parameters in
server-task.cpp -
copy from author
ref:
https://gist.github.com/MrJackSpade/9be99c7efbba7b95a41377e123b7b069
-
remove old debug log, style nit
-
fix compiler warning, add commented-out logging per token
-
re-write + change parameters + simplify
-
oops forgot args.cpp
-
fix leftover
window_size -
add missing values to
common_params_sampling::print() -
with logging
-
does this fix it?
-
no, but does this?
-
update default decay
-
optimize
-
fix bad merge
my git skills are lacking
-
silence
missing initializer for member -
update default decay to 0.9
-
fix logging
-
format (double)
-
add power law to the new
samplersvector -
log sampler init values
-
improve logging messages in llama_sampler_power_law
-
remove extraneous logging
-
simplify target computation
last commit with debug logging!
-
remove debug logging, explicitly clamp params at init
-
add
use_power_lawflag + logic, minor cleanup -
update
power-law->adaptive-p -
fix cold start EMA
ctx->weighted_sumis now initialized and reset totarget / (1.0f - clamped_decay)ctx->total_weightis now initialized and reset to1.0f / (1.0f - clamped_decay)
this fixes a "cold start" problem with the moving average
-
update
SHARPNESSconstant to10.0f -
minor style fixes
no functional changes
-
minor style fixes cont.
-
update
llama_sampler_adaptive_p_ifor backend sampling (ref: #17004) -
separate into
apply+acceptfunctions -
pending_token_idx: switch fromllama_tokentoint32
functionally identical (llama.h has typedef int32_t llama_token;),
but its more correct now
-
don't transform logits <= -1e9f
-
fix masking in backend top-p, min-p
-
address review comments
-
typo in comments
RND->RNG -
add docs
-
add recommended values in completion docs
-
address PR feedback
-
remove trailing whitespace (for CI
editorconfig) -
add to adaptive-p to
common_sampler_types_from_chars
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7747
server: improve slots scheduling for n_cmpl (#18789)
-
server : make sure children tasks are scheduled to launch with parent
-
fix
-
add comment pointing to this PR
-
fix
-
clean up
-
more debug messages
-
add pop_deferred_task with specific ID version
-
improve the logic
-
simple approach
-
no double move
-
correct return type of launch_slots_with_parent_task
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7746
context : reserve new scheduler when graph topology changes (#18547)
-
context : reserve new scheduler when graph topology changes
-
cont : fix
-
cont : fix reserve
-
cont : reserve only when changes occur + timing
-
context : add comments
-
llama : reserve on sampler changes
-
common : allow null common_sampler
-
server : task declares needs (embd, logits, sampling)
-
server : do not init sampler if not needed
-
llama : fix need_reserve when unsetting a sampler
-
server : consolidate slot reset/clear logic
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7745
CUDA: fix allignment on register spill for FA (#18815)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7744
ggml-cpu: optimize ggml_vec_dot_bf16 for Power9 (#18837)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7743
lora: make sure model keep track of associated adapters (#18490)
-
lora: make sure model keep track of associated adapters
-
deprecate llama_adapter_lora_free
-
minor : std::unordered_set over std::set
Co-authored-by: Georgi Gerganov ggerganov@gmail.com
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler:
b7742
model-loader : support bool array sliding window pattern (#18850)
macOS/iOS:
Linux:
Windows:
- Windows x64 (CPU)
- Windows arm64 (CPU)
- Windows x64 (CUDA 12) - CUDA 12.4 DLLs
- Windows x64 (CUDA 13) - CUDA 13.1 DLLs
- Windows x64 (Vulkan)
- Windows x64 (SYCL)
- Windows x64 (HIP)
openEuler: