server: log prompts to directory#22031
Conversation
|
This is very useful feature. I used mitmproxy for debugging and it was pain in a... |
999701f to
eff56d7
Compare
|
I was able to run it also on Windows:
then connect Opencode and try to reproduce the unnecessary prompt reprocessing: then we can use In prompt 16, These logs are useful for detecting cases like this. |
eff56d7 to
10c4950
Compare
|
Hello, The log file name should follow a more robust format. Proper naming helps prevent overwriting after restarts and avoids conflicts during concurrent writes. The simplest solution is to use a Unix timestamp combined with a UUID or request ID. |
|
Agreed with @0x62ash - within the directory specified, each session should create its timestamped subdirectory and within it should be the dumps for the single prompts. |
|
I switched from OpenCode to Pi Coding Agent, but I’m seeing a very similar issue. When comparing two prompt logs produced by this PR, I can see that Pi removes the thoughts from the next prompt. As a result, the prompt content changes between requests and llama.cpp has to reprocess the changed suffix of the context, which can take minutes. |
10c4950 to
cf0414e
Compare
This is expected behavior for Gemma 4. The reasoning traces are only kept between tool calls and tool responses. A new user message will remove the traces from prior messages. |
| static std::atomic<int> prompt_counter(0); | ||
| const int file_name = ++prompt_counter; |
There was a problem hiding this comment.
maybe better to use a timestamp for file name, ggml_time_ms() ?
assuming that this feature is mostly used for debugging, the chance of collision should be negligible
There was a problem hiding this comment.
With ggml_time_ms:
Directory: C:\Users\jacek\git\llama.cpp\build\prompt-logs
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 05.06.2026 18:47 58 000000006048.txt
-a---- 05.06.2026 18:48 173 000000009115.txt
-a---- 05.06.2026 18:48 356 000000010614.txt
pwilkin
left a comment
There was a problem hiding this comment.
Let's prioritize this, this would be of great help with the ton of various non-specific parser issue reports we're getting.
that was the idea, I will rebase to master and change file name as requested |
| add_opt(common_arg( | ||
| {"--log-prompts-dir"}, "PATH", | ||
| "Log prompts to directory", | ||
| [](common_params ¶ms, const std::string & value) { | ||
| params.path_prompts_log_dir = value; | ||
| } | ||
| )); |
There was a problem hiding this comment.
| add_opt(common_arg( | |
| {"--log-prompts-dir"}, "PATH", | |
| "Log prompts to directory", | |
| [](common_params ¶ms, const std::string & value) { | |
| params.path_prompts_log_dir = value; | |
| } | |
| )); | |
| add_opt(common_arg( | |
| {"--log-prompts-dir"}, "PATH", | |
| "Log prompts to directory (only used for debugging, default: disabled)", | |
| [](common_params ¶ms, const std::string & value) { | |
| params.path_prompts_log_dir = value; | |
| } | |
| ).set_examples({LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_CLI})); |
Add `--log-prompts-dir` to write each prompt to a separate text file in the specified directory.
cf0414e to
d69585e
Compare
* upstream/HEAD: (329 commits) vendor : update LibreSSL to 4.3.2 (ggml-org#24397) Remove padding and multiple D2D copies for MTP (ggml-org#24086) chat: fix LFM2/LFM2.5 ignoring json_schema (ggml-org#24377) CUDA: Fix ssm_scan_f32 data-races (ggml-org#24360) ci : bump komac version (ggml-org#24396) speculative : fix "ngram-map-k4v" name in logging (ggml-org#24253) webui: implement pinned conversations support (ggml-org#21387) graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (ggml-org#24357) ci : fix windows release (ggml-org#24369) ui: add opt-in run_javascript frontend tool (ggml-org#24244) mtmd: build_vit batching (ggml-org#24352) vulkan: reduce iq1 shared memory usage for mul_mm (ggml-org#24287) vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (ggml-org#24123) ui: Fix excessive style recalculation on hover (ggml-org#24243) mtmd: refactor video subproc handling (ggml-org#24316) server: log prompts to directory (ggml-org#22031) ui: fix mobile chat form overflow and bust stale bundle cache (ggml-org#24158) ggml : add GGML_OP_COL2IM_1D (ggml-org#24206) server : do not clear slots without unified KV cache (ggml-org#24190) models : fix plamo2 attention_key/value_length regression (ggml-org#24317) ...
* server: log prompts to directory Add `--log-prompts-dir` to write each prompt to a separate text file in the specified directory. * Apply suggestion from @ngxson --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* server: log prompts to directory Add `--log-prompts-dir` to write each prompt to a separate text file in the specified directory. * Apply suggestion from @ngxson --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
* server: log prompts to directory Add `--log-prompts-dir` to write each prompt to a separate text file in the specified directory. * Apply suggestion from @ngxson --------- Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Overview
Log each prompt into a separate text file.
Additional information
There is a recurring class of issues related to prompt-processing cache behavior, for example:
#19394
Server logs are useful for observing cache-related metrics, but they are not enough when the goal is to compare the exact prompt contents between requests.
This small helper writes each prompt to a separate plain-text file, making it easy to inspect and compare prompts with tools like diff.
It helped me quickly identify that opencode reorders
system-reminder, which changes middle of the prompt and confuses llama.cpp prompt caching.The change is intentionally minimal and does not even create a new directory. I initially considered logging more information, but this turned out to be enough to understand what client is doing.
Requirements