Name and Version
./llama.cpp/build/bin/llama-cli --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24161 MiB):
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24161 MiB
version: 8665 (b863507)
built with GNU 15.2.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
Sandy Bridge E (AVX)
Models
bartowski/google_gemma-4-31B-it-GGUF:IQ4_NL
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF
Problem description & steps to reproduce
b8665 does not resolve input data corruption. Newlines are ignored by the command line interfaces.
llama-cli invocation (Linux x86-64 CUDA):
LLAMA_CACHE=./model1/llama.cpp ./llama.cpp/build/bin/llama-cli --no-mmproj --hf-repo 'bartowski/google_gemma-4-31B-it-GGUF:IQ4_NL' --jinja --multiline-input --special
conversation:
build : b8665-b8635075f
model : bartowski/google_gemma-4-31B-it-GGUF:IQ4_NL
modalities : text
available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read <file> add a text file
/glob <pattern> add text files using globbing pattern
> please output this input verbatim:
1
2
3
4
5
6
[Start thinking]
* Input: "please output this input verbatim:123456"
* Instruction: Output the input verbatim.
* Target content: "123456" (though the prompt says "this input", it usually refers to the specific string provided after the colon or the entire prompt. However, based on the phrasing "output this input verbatim:123456", the user is asking for the specific sequence "123456").
* The user wants the string "123456".
* 123456
[End thinking]
123456<turn|>
llama-completion invocation:
LLAMA_CACHE=./model1/llama.cpp ./llama.cpp/build/bin/llama-completion --hf-repo 'bartowski/google_gemma-4-31B-it-GGUF:IQ4_NL' --jinja --multiline-input --special
conversation:
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- To return control to the AI, end your input with '\'.
- To return control without starting a new line, end your input with '/'.
- Not using system message. To change it, set a different value via -sys PROMPT
> please output the next prompt verbatim. conclude with the set of integers input. if understood, reply with OK
<|channel>thought
* Instruction: "please output the next prompt verbatim."
* Constraint: "conclude with the set of integers input."
* Trigger: "if understood, reply with OK"
* The user is asking me to commit to a specific behavior for the *next* turn.
* The behavior is: repeat the prompt exactly, then list the integers provided in that prompt.
* The user explicitly asked me to reply with "OK" if I understand.
* Response: "OK"<channel|>OK<turn|>
> 123
456
789
<|channel>thought
* Previous instruction: "please output the next prompt verbatim. conclude with the set of integers input."
* Current prompt: "123456789"
* Action required:
1. Output the prompt verbatim: "123456789"
2. Conclude with the set of integers input: {1, 2, 3, 4, 5, 6, 7, 8, 9} (or simply the digits/number since it's the set of integers provided).
* Verbatim output: 123456789
* Set of integers: {1, 2, 3, 4, 5, 6, 7, 8, 9} (or the number itself if interpreted as one integer). Usually, "set of integers" implies the individual digits or the number. Given the input "123456789", it's one integer. However, often users mean the digits. I'll treat it as the integer provided.
Let's look at the phrasing: "conclude with the set of integers input".
Input: 123456789.
Set: {123456789}.
Wait, if the input was "1, 2, 3", the set would be {1, 2, 3}.
With "123456789", it is a single integer.
Let's double check the "verbatim" part.
Prompt: "123456789"
Verbatim output: 123456789.
Final response construction:
123456789
{123456789} (or just 123456789)
Actually, looking at the prompt "123456789", it's just a string of digits. I will output it and then the number.
Let's try to be as literal as possible.
Verbatim: 123456789
Set of integers: {123456789}<channel|>123456789
{123456789}<turn|>
First Bad Commit
Noticed concatenations of prompt quotations within Gemma 4's CoT on release day of Gemma 4.
Relevant log output
n/a
Name and Version
./llama.cpp/build/bin/llama-cli --version
ggml_cuda_init: found 1 CUDA devices (Total VRAM: 24161 MiB):
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes, VRAM: 24161 MiB
version: 8665 (b863507)
built with GNU 15.2.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
Sandy Bridge E (AVX)
Models
bartowski/google_gemma-4-31B-it-GGUF:IQ4_NL
https://huggingface.co/bartowski/google_gemma-4-31B-it-GGUF
Problem description & steps to reproduce
b8665 does not resolve input data corruption. Newlines are ignored by the command line interfaces.
llama-cli invocation (Linux x86-64 CUDA):
conversation:
llama-completion invocation:
conversation:
First Bad Commit
Noticed concatenations of prompt quotations within Gemma 4's CoT on release day of Gemma 4.
Relevant log output
n/a