save-load-state : refactor tests and improve readability by ggerganov · Pull Request #23196 · ggml-org/llama.cpp

ggerganov · 2026-05-17T09:37:05Z

Overview

Refactor the save-load-state example for cleaner structure and readability.
Add model_only option to common_init_from_params to skip context creation and context-dependent setup.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES. llama.cpp + pi

- Split monolithic main() into 4 self-contained phase functions, each managing its own context/sampler/batch lifecycle - Each function tokenizes internally using its local ctx instance - main() is now a clean orchestrator: init -> run phases -> assert results - Proper resource cleanup on every exit path (return {} on error) Assisted-by: llama.cpp:local pi

- Remove state_file parameter from all phase functions - Each function accesses params.out_file directly - Initialize params.out_file in main alongside params.prompt Assisted-by: llama.cpp:local pi

- Replace raw llama_context* with llama_context_ptr - Replace raw llama_sampler* with llama_sampler_ptr - Remove all manual llama_free() and llama_sampler_free() calls - Keep llama_batch as raw (managed manually with llama_batch_free) Assisted-by: llama.cpp:local pi

- Add llama_batch_ptr struct holding llama_batch by value - Calls llama_batch_free() in destructor - Eliminates all manual llama_batch_free() calls Assisted-by: llama.cpp:local pi

- Add log.h include - Replace fprintf(stderr, ...) errors with LOG_ERR - Replace fprintf(stderr, ...) info with LOG_TRC - Replace printf output with LOG Assisted-by: llama.cpp:local pi

Each follow-up phase now accepts an expected result and performs the comparison internally instead of collecting results in main(). Assisted-by: llama.cpp:local pi

Add phase labels, remove redundant run prefixes, and show PASS after each test. Assisted-by: llama.cpp:local pi

Change get() to return a reference and remove operator*(). Use batch.get() throughout for consistency. Assisted-by: llama.cpp:local pi

Factor out the repeated token generation loop into a shared helper function used by all phases. Assisted-by: llama.cpp:local pi

Replace "Phase" with "Test" and list each test's steps as bullet points. Assisted-by: llama.cpp:local pi

Rename to test_baseline, test_state_load, test_seq_cp_host, test_seq_cp_device. Update comments and logs accordingly. Assisted-by: llama.cpp:local pi

Assisted-by: llama.cpp:local pi

Add bool model_only parameter to skip context creation, sampler init, and context-dependent setup. Use in save-load-state to initialize only the model, with each test creating its own context. Assisted-by: llama.cpp:local pi

) * save-load-state : refactor into separate phase functions - Split monolithic main() into 4 self-contained phase functions, each managing its own context/sampler/batch lifecycle - Each function tokenizes internally using its local ctx instance - main() is now a clean orchestrator: init -> run phases -> assert results - Proper resource cleanup on every exit path (return {} on error) Assisted-by: llama.cpp:local pi * save-load-state : use params.out_file instead of separate state_file - Remove state_file parameter from all phase functions - Each function accesses params.out_file directly - Initialize params.out_file in main alongside params.prompt Assisted-by: llama.cpp:local pi * save-load-state : use smart pointers for ctx and smpl - Replace raw llama_context* with llama_context_ptr - Replace raw llama_sampler* with llama_sampler_ptr - Remove all manual llama_free() and llama_sampler_free() calls - Keep llama_batch as raw (managed manually with llama_batch_free) Assisted-by: llama.cpp:local pi * save-load-state : add local llama_batch_ptr RAII wrapper - Add llama_batch_ptr struct holding llama_batch by value - Calls llama_batch_free() in destructor - Eliminates all manual llama_batch_free() calls Assisted-by: llama.cpp:local pi * save-load-state : replace printf/fprintf with logging macros - Add log.h include - Replace fprintf(stderr, ...) errors with LOG_ERR - Replace fprintf(stderr, ...) info with LOG_TRC - Replace printf output with LOG Assisted-by: llama.cpp:local pi * save-load-state : refactor tests to check results inline Each follow-up phase now accepts an expected result and performs the comparison internally instead of collecting results in main(). Assisted-by: llama.cpp:local pi * save-load-state : improve test output readability Add phase labels, remove redundant run prefixes, and show PASS after each test. Assisted-by: llama.cpp:local pi * pi : add rule about git signing * save-load-state : simplify llama_batch_ptr Change get() to return a reference and remove operator*(). Use batch.get() throughout for consistency. Assisted-by: llama.cpp:local pi * save-load-state : extract generate_tokens helper Factor out the repeated token generation loop into a shared helper function used by all phases. Assisted-by: llama.cpp:local pi * save-load-state : update comments to use test terminology Replace "Phase" with "Test" and list each test's steps as bullet points. Assisted-by: llama.cpp:local pi * save-load-state : rename test functions Rename to test_baseline, test_state_load, test_seq_cp_host, test_seq_cp_device. Update comments and logs accordingly. Assisted-by: llama.cpp:local pi * pi : add rule to never git push without confirmation Assisted-by: llama.cpp:local pi * common : add model_only option to common_init_from_params Add bool model_only parameter to skip context creation, sampler init, and context-dependent setup. Use in save-load-state to initialize only the model, with each test creating its own context. Assisted-by: llama.cpp:local pi --------- Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>

* upstream/HEAD: ci : install server kleidiai runner dependencies (ggml-org#23259) server-context: guarantee there is at least 1 token to decode (ggml-org#23280) server : print graphs reused in slot timings (ggml-org#23279) save-load-state : refactor tests and improve readability (ggml-org#23196) llama-eval : add per-task summary stats (ggml-org#23151) ggml-webgpu : extend GDN for K>1 (ggml-org#23299) [SCYL] add chapter for performance reference in SYCL.md (ggml-org#23315) convert : filter lora tensor names (ggml-org#23077) sycl: add GGML_SYCL_USE_ASYNC_MEM_OP env toggle (ggml-org#22153) rpc : keep last_graph_uid in the device context (ggml-org#23273)

) * save-load-state : refactor into separate phase functions - Split monolithic main() into 4 self-contained phase functions, each managing its own context/sampler/batch lifecycle - Each function tokenizes internally using its local ctx instance - main() is now a clean orchestrator: init -> run phases -> assert results - Proper resource cleanup on every exit path (return {} on error) Assisted-by: llama.cpp:local pi * save-load-state : use params.out_file instead of separate state_file - Remove state_file parameter from all phase functions - Each function accesses params.out_file directly - Initialize params.out_file in main alongside params.prompt Assisted-by: llama.cpp:local pi * save-load-state : use smart pointers for ctx and smpl - Replace raw llama_context* with llama_context_ptr - Replace raw llama_sampler* with llama_sampler_ptr - Remove all manual llama_free() and llama_sampler_free() calls - Keep llama_batch as raw (managed manually with llama_batch_free) Assisted-by: llama.cpp:local pi * save-load-state : add local llama_batch_ptr RAII wrapper - Add llama_batch_ptr struct holding llama_batch by value - Calls llama_batch_free() in destructor - Eliminates all manual llama_batch_free() calls Assisted-by: llama.cpp:local pi * save-load-state : replace printf/fprintf with logging macros - Add log.h include - Replace fprintf(stderr, ...) errors with LOG_ERR - Replace fprintf(stderr, ...) info with LOG_TRC - Replace printf output with LOG Assisted-by: llama.cpp:local pi * save-load-state : refactor tests to check results inline Each follow-up phase now accepts an expected result and performs the comparison internally instead of collecting results in main(). Assisted-by: llama.cpp:local pi * save-load-state : improve test output readability Add phase labels, remove redundant run prefixes, and show PASS after each test. Assisted-by: llama.cpp:local pi * pi : add rule about git signing * save-load-state : simplify llama_batch_ptr Change get() to return a reference and remove operator*(). Use batch.get() throughout for consistency. Assisted-by: llama.cpp:local pi * save-load-state : extract generate_tokens helper Factor out the repeated token generation loop into a shared helper function used by all phases. Assisted-by: llama.cpp:local pi * save-load-state : update comments to use test terminology Replace "Phase" with "Test" and list each test's steps as bullet points. Assisted-by: llama.cpp:local pi * save-load-state : rename test functions Rename to test_baseline, test_state_load, test_seq_cp_host, test_seq_cp_device. Update comments and logs accordingly. Assisted-by: llama.cpp:local pi * pi : add rule to never git push without confirmation Assisted-by: llama.cpp:local pi * common : add model_only option to common_init_from_params Add bool model_only parameter to skip context creation, sampler init, and context-dependent setup. Use in save-load-state to initialize only the model, with each test creating its own context. Assisted-by: llama.cpp:local pi --------- Co-authored-by: ggerganov <ggerganov@users.noreply.github.com>

* upstream/HEAD: (25 commits) metal : optimize pad + cpy (ggml-org#23354) snapdragon: update toolchain to v0.6 (ggml-org#23369) ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps (ggml-org#23349) opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (ggml-org#23303) hexagon: add MROPE and IMROPE support in HTP rope op (ggml-org#23317) refactor: Chat Screen UI rendering (ggml-org#23333) github: mention --log-file in issue templates (ggml-org#23277) common: fix --help for --verbosity (ggml-org#23278) common: fix --fit verbosity with --verbosity 4 (ggml-org#23282) convert : update mtp related help (ggml-org#23334) hexagon: enable support for NORM op (ggml-org#23319) model : clarify MTP layer comment in qwen35.cpp [no ci] (ggml-org#23338) llama : MTP clean-up (ggml-org#23269) ui: Bump packages + address build warnings (ggml-org#23300) ci : install libssl-dev (ggml-org#23325) ci : install server kleidiai runner dependencies (ggml-org#23259) server-context: guarantee there is at least 1 token to decode (ggml-org#23280) server : print graphs reused in slot timings (ggml-org#23279) save-load-state : refactor tests and improve readability (ggml-org#23196) llama-eval : add per-task summary stats (ggml-org#23151) ...

ggerganov and others added 14 commits May 17, 2026 11:24

save-load-state : use params.out_file instead of separate state_file

e18c970

- Remove state_file parameter from all phase functions - Each function accesses params.out_file directly - Initialize params.out_file in main alongside params.prompt Assisted-by: llama.cpp:local pi

save-load-state : add local llama_batch_ptr RAII wrapper

88a1437

- Add llama_batch_ptr struct holding llama_batch by value - Calls llama_batch_free() in destructor - Eliminates all manual llama_batch_free() calls Assisted-by: llama.cpp:local pi

save-load-state : replace printf/fprintf with logging macros

f962428

- Add log.h include - Replace fprintf(stderr, ...) errors with LOG_ERR - Replace fprintf(stderr, ...) info with LOG_TRC - Replace printf output with LOG Assisted-by: llama.cpp:local pi

save-load-state : refactor tests to check results inline

4c376fd

Each follow-up phase now accepts an expected result and performs the comparison internally instead of collecting results in main(). Assisted-by: llama.cpp:local pi

save-load-state : improve test output readability

3d96bb6

Add phase labels, remove redundant run prefixes, and show PASS after each test. Assisted-by: llama.cpp:local pi

pi : add rule about git signing

b01e8fb

save-load-state : simplify llama_batch_ptr

0452d43

Change get() to return a reference and remove operator*(). Use batch.get() throughout for consistency. Assisted-by: llama.cpp:local pi

save-load-state : extract generate_tokens helper

eacbd49

Factor out the repeated token generation loop into a shared helper function used by all phases. Assisted-by: llama.cpp:local pi

save-load-state : update comments to use test terminology

7411587

Replace "Phase" with "Test" and list each test's steps as bullet points. Assisted-by: llama.cpp:local pi

save-load-state : rename test functions

65bdbb5

Rename to test_baseline, test_state_load, test_seq_cp_host, test_seq_cp_device. Update comments and logs accordingly. Assisted-by: llama.cpp:local pi

pi : add rule to never git push without confirmation

1f5ac03

Assisted-by: llama.cpp:local pi

common : add model_only option to common_init_from_params

7370634

Add bool model_only parameter to skip context creation, sampler init, and context-dependent setup. Use in save-load-state to initialize only the model, with each test creating its own context. Assisted-by: llama.cpp:local pi

github-actions Bot added the examples label May 17, 2026

ggerganov marked this pull request as ready for review May 18, 2026 14:32

ggerganov requested a review from a team as a code owner May 18, 2026 14:32

ggerganov merged commit cd963fe into master May 19, 2026
45 of 49 checks passed

ggerganov deleted the gg/tests-save-load-refactor branch May 19, 2026 06:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

save-load-state : refactor tests and improve readability#23196

save-load-state : refactor tests and improve readability#23196
ggerganov merged 14 commits into
masterfrom
gg/tests-save-load-refactor

ggerganov commented May 17, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ggerganov commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Requirements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ggerganov commented May 17, 2026 •

edited

Loading