Skip to content

server: log prompts to directory#22031

Merged
pwilkin merged 2 commits into
ggml-org:masterfrom
jacekpoplawski:prompt-logging
Jun 9, 2026
Merged

server: log prompts to directory#22031
pwilkin merged 2 commits into
ggml-org:masterfrom
jacekpoplawski:prompt-logging

Conversation

@jacekpoplawski

Copy link
Copy Markdown
Contributor

Overview

Log each prompt into a separate text file.

Additional information

There is a recurring class of issues related to prompt-processing cache behavior, for example:
#19394

Server logs are useful for observing cache-related metrics, but they are not enough when the goal is to compare the exact prompt contents between requests.

This small helper writes each prompt to a separate plain-text file, making it easy to inspect and compare prompts with tools like diff.

It helped me quickly identify that opencode reorders system-reminder, which changes middle of the prompt and confuses llama.cpp prompt caching.

The change is intentionally minimal and does not even create a new directory. I initially considered logging more information, but this turned out to be enough to understand what client is doing.

Requirements

@jacekpoplawski jacekpoplawski requested review from a team as code owners April 17, 2026 07:06
@jacekpoplawski jacekpoplawski marked this pull request as draft April 17, 2026 07:06
@0x62ash

0x62ash commented Apr 17, 2026

Copy link
Copy Markdown

This is very useful feature. I used mitmproxy for debugging and it was pain in a...

@jacekpoplawski

jacekpoplawski commented Apr 18, 2026

Copy link
Copy Markdown
Contributor Author

I was able to run it also on Windows:

.\bin\Release\llama-server.exe -c 50000 -m J:\llm\models\Qwen3.6-35B-A3B-UD-Q4_K_M.gguf --log-prompts-dir logs

then connect Opencode and try to reproduce the unnecessary prompt reprocessing:

    Directory: C:\Users\jacek\git\llama.cpp\build_2026.04.18\logs


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----        18.04.2026     17:37           3869 0001.txt
-a----        18.04.2026     17:37          55640 0002.txt
-a----        18.04.2026     17:38          57557 0003.txt
-a----        18.04.2026     17:38          56958 0004.txt
-a----        18.04.2026     17:38          62223 0005.txt
-a----        18.04.2026     17:39          64588 0006.txt
-a----        18.04.2026     17:39          65036 0007.txt
-a----        18.04.2026     17:39          65908 0008.txt
-a----        18.04.2026     17:39          66766 0009.txt
-a----        18.04.2026     17:40          66711 0010.txt
-a----        18.04.2026     17:40          69417 0011.txt
-a----        18.04.2026     17:40          70280 0012.txt
-a----        18.04.2026     17:40          75751 0013.txt
-a----        18.04.2026     17:40          76858 0014.txt
-a----        18.04.2026     17:40          77048 0015.txt
-a----        18.04.2026     17:40          77943 0016.txt
-a----        18.04.2026     17:41          77874 0017.txt

then we can use fc to compare two prompts:

PS C:\Users\jacek\git\llama.cpp\build_2026.04.18\logs> fc.exe /N .\0016.txt .\0017.txt
Comparing files .\0016.txt and .\0017.TXT
***** .\0016.txt
 1184:  <|im_start|>user
 1185:  add one comment to it
 1186:  <system-reminder>
 1187:  Your operational mode has changed from plan to build.
 1188:  You are no longer in read-only mode.
 1189:  You are permitted to make file changes, run shell commands, and utilize your arsenal of tools as needed.
 1190:  </system-reminder><|im_end|>
 1191:  <|im_start|>assistant
 1192:  <think>
 1193:  The user wants me to add a comment to the `build_norm_gated` function. I should add a descriptive comment before the function.
 1194:  </think>
 1195:
 1196:  <tool_call>
***** .\0017.TXT
 1184:  <|im_start|>user
 1185:  add one comment to it<|im_end|>
 1186:  <|im_start|>assistant
 1187:  <tool_call>
*****

***** .\0016.txt
 1222:  <|im_start|>assistant
 1223:  <think>
***** .\0017.TXT
 1213:  <|im_start|>assistant
 1214:  Added.<|im_end|>
 1215:  <|im_start|>user
 1216:  thank you
 1217:  <system-reminder>
 1218:  Your operational mode has changed from plan to build.
 1219:  You are no longer in read-only mode.
 1220:  You are permitted to make file changes, run shell commands, and utilize your arsenal of tools as needed.
 1221:  </system-reminder><|im_end|>
 1222:  <|im_start|>assistant
 1223:  <think>
*****

In prompt 16, system-reminder appears after "add one comment to it", while in prompt 17 it was moved to after "thank you". As a result, a big number of tokens had to be processed again.

These logs are useful for detecting cases like this.

@0x62ash

0x62ash commented Apr 20, 2026

Copy link
Copy Markdown

Hello,

The log file name should follow a more robust format. Proper naming helps prevent overwriting after restarts and avoids conflicts during concurrent writes.

The simplest solution is to use a Unix timestamp combined with a UUID or request ID.

@pwilkin

pwilkin commented Apr 21, 2026

Copy link
Copy Markdown
Member

Agreed with @0x62ash - within the directory specified, each session should create its timestamped subdirectory and within it should be the dumps for the single prompts.

@jacekpoplawski

jacekpoplawski commented May 3, 2026

Copy link
Copy Markdown
Contributor Author

I switched from OpenCode to Pi Coding Agent, but I’m seeing a very similar issue.

When comparing two prompt logs produced by this PR, I can see that Pi removes the thoughts from the next prompt. As a result, the prompt content changes between requests and llama.cpp has to reprocess the changed suffix of the context, which can take minutes.

--- /home/jacek/logs/1777767070/0005.txt        2026-05-03 02:11:59.883259335 +0200
+++ /home/jacek/logs/1777767070/0006.txt        2026-05-03 02:12:43.399730238 +0200
@@ -127,16 +127,7 @@
 <|turn>user
 read the docs<turn|>
 <|turn>model
-<|channel>thought
-The user wants me to "read the docs". Looking at the project context, there are several documentation locations:
(...)
+<|turn>user
+run the tests<turn|>
+<|turn>model

@aldehir

aldehir commented May 3, 2026

Copy link
Copy Markdown
Contributor

When comparing two prompt logs produced by this PR, I can see that Pi removes the thoughts from the next prompt. As a result, the prompt content changes between requests and llama.cpp has to reprocess the changed suffix of the context, which can take minutes.

This is expected behavior for Gemma 4. The reasoning traces are only kept between tool calls and tool responses. A new user message will remove the traces from prior messages.

@ggerganov ggerganov left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is useful to have as available functionality. cc @ngxson

Comment thread tools/server/server-context.cpp Outdated
Comment on lines +3208 to +3209
static std::atomic<int> prompt_counter(0);
const int file_name = ++prompt_counter;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe better to use a timestamp for file name, ggml_time_ms() ?

assuming that this feature is mostly used for debugging, the chance of collision should be negligible

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With ggml_time_ms:


    Directory: C:\Users\jacek\git\llama.cpp\build\prompt-logs


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----        05.06.2026     18:47             58 000000006048.txt
-a----        05.06.2026     18:48            173 000000009115.txt
-a----        05.06.2026     18:48            356 000000010614.txt

@pwilkin pwilkin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's prioritize this, this would be of great help with the ton of various non-specific parser issue reports we're getting.

@jacekpoplawski

Copy link
Copy Markdown
Contributor Author

this would be of great help with the ton of various non-specific parser issue reports we're getting.

that was the idea, I will rebase to master and change file name as requested

Comment thread common/arg.cpp Outdated
Comment on lines +3252 to +3258
add_opt(common_arg(
{"--log-prompts-dir"}, "PATH",
"Log prompts to directory",
[](common_params &params, const std::string & value) {
params.path_prompts_log_dir = value;
}
));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
add_opt(common_arg(
{"--log-prompts-dir"}, "PATH",
"Log prompts to directory",
[](common_params &params, const std::string & value) {
params.path_prompts_log_dir = value;
}
));
add_opt(common_arg(
{"--log-prompts-dir"}, "PATH",
"Log prompts to directory (only used for debugging, default: disabled)",
[](common_params &params, const std::string & value) {
params.path_prompts_log_dir = value;
}
).set_examples({LLAMA_EXAMPLE_SERVER, LLAMA_EXAMPLE_CLI}));

Add `--log-prompts-dir` to write each prompt to a separate text file in
the specified directory.

@pwilkin pwilkin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngxson if it's fine by you let's merge this.

Comment thread common/arg.cpp Outdated
@ngxson ngxson added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 9, 2026
@pwilkin pwilkin merged commit 1e91256 into ggml-org:master Jun 9, 2026
1 check passed
Jcfunk added a commit to Jcfunk/llama.cpp that referenced this pull request Jun 11, 2026
* upstream/HEAD: (329 commits)
  vendor : update LibreSSL to 4.3.2 (ggml-org#24397)
  Remove padding and multiple D2D copies for MTP (ggml-org#24086)
  chat: fix LFM2/LFM2.5 ignoring json_schema (ggml-org#24377)
  CUDA: Fix ssm_scan_f32 data-races (ggml-org#24360)
  ci : bump komac version (ggml-org#24396)
  speculative : fix "ngram-map-k4v" name in logging (ggml-org#24253)
  webui: implement pinned conversations support (ggml-org#21387)
  graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (ggml-org#24357)
  ci : fix windows release (ggml-org#24369)
  ui: add opt-in run_javascript frontend tool (ggml-org#24244)
  mtmd: build_vit batching (ggml-org#24352)
  vulkan: reduce iq1 shared memory usage for mul_mm (ggml-org#24287)
  vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (ggml-org#24123)
  ui: Fix excessive style recalculation on hover (ggml-org#24243)
  mtmd: refactor video subproc handling (ggml-org#24316)
  server: log prompts to directory (ggml-org#22031)
  ui: fix mobile chat form overflow and bust stale bundle cache (ggml-org#24158)
  ggml : add GGML_OP_COL2IM_1D (ggml-org#24206)
  server : do not clear slots without unified KV cache (ggml-org#24190)
  models : fix plamo2 attention_key/value_length regression (ggml-org#24317)
  ...
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 11, 2026
* server: log prompts to directory

Add `--log-prompts-dir` to write each prompt to a separate text file in
the specified directory.

* Apply suggestion from @ngxson

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
(cherry picked from commit 1e91256)
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 11, 2026
* server: log prompts to directory

Add `--log-prompts-dir` to write each prompt to a separate text file in
the specified directory.

* Apply suggestion from @ngxson

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 12, 2026
* server: log prompts to directory

Add `--log-prompts-dir` to write each prompt to a separate text file in
the specified directory.

* Apply suggestion from @ngxson

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
(cherry picked from commit 1e91256)
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 12, 2026
* server: log prompts to directory

Add `--log-prompts-dir` to write each prompt to a separate text file in
the specified directory.

* Apply suggestion from @ngxson

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 12, 2026
* server: log prompts to directory

Add `--log-prompts-dir` to write each prompt to a separate text file in
the specified directory.

* Apply suggestion from @ngxson

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
(cherry picked from commit 1e91256)
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 12, 2026
* server: log prompts to directory

Add `--log-prompts-dir` to write each prompt to a separate text file in
the specified directory.

* Apply suggestion from @ngxson

---------

Co-authored-by: Xuan-Son Nguyen <thichthat@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants