Add support for Reka Edge 2603#21616
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
5d1aa8f to
77018b5
Compare
77018b5 to
ae7730a
Compare
5e48fb3 to
30395b5
Compare
|
I'm puzzled - the autoparser seems to detect the template just fine, why would we need a specialized one for this model? |
@pwilkin when I tested with the autoparser I found that it failed for parallel tool calls. This is because of a difference in the definition of parallel tool calls. The autoparser defines parallel tool calls = multiple JSON objects comma-separated inside a single section. But for the Reka chat template, each tool call gets its own |
|
@kwajiehao ah, okay, in that case, don't make a separate parser, instead, add a workaround at the top of chat-diff-analyzer.cpp with: analysis.tools.format.section_start = "";
analysis.tools.format.section_end = "";
analysis.tools.format.per_call_start = "<tool_call>";
analysis.tools.format.per_call_end = "</tool_call>";That should be enough. |
|
@pwilkin thanks for the suggestion! I tried this and found that the autoparser has a few gaps for Reka's chat template format:
Let me know if I'm missing anything in my analysis. Happy to either keep the custom parser for Reka, or to extend the behavior of the autoparser if needed |
| GGML_ASSERT(chat_templates->template_default != nullptr); | ||
| return chat_templates->template_default->caps.to_map(); | ||
| } | ||
|
|
There was a problem hiding this comment.
Revert chat.cpp edits, this removes the trailing newline.
There was a problem hiding this comment.
Fixed by checking out common/chat.cpp from upstream/master
b332da4 to
04743ad
Compare
9e93867 to
04743ad
Compare
|
I'm re-runing one of the server CI, will merge when it passes |
|
Hmm seems like server's CI didn't pass because it misses a fix from upstream, but it's unrelated to the current PR. Should be ok for merge. |
|
Hi @sebagallo, first of all thank you for your interest in Reka Edge 2603. Unfortunately this model does not support reasoning so you will need to run Beyond that, I found an error with the gguf conversion script (due to a reliance on an older version of Can you give it a shot with |
reasoning off seems to fix the issues! my impression was that thinking was supported as I saw in the chat template Tried tool calling and the mmproj again and everything works!. Thanks for this new model! |
* feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
* feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
* feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
* feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
* feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
* feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
* feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
* feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>
* feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>

Overview
This PR adds support for Reka Edge 2603.
Model overview:
Note that while the template contains reasoning chunks, the Reka Edge 2603 model does not support reasoning.
The code changes in this PR are purely additive.
Useful links:
Other changes
I made a few additions to the Jinja engine to get Reka Edge working with llama.cpp: string repetition, ensure_ascii in tojson, and int() as a no-op on integers. This is needed for Reka Edge because of how our chat template is constructed.
I'm happy to split this PR up so that they can be reviewed separately. I just wanted to include them up front first because if I made a PR to add Jinja engine features that wouldn't make sense without the context of Reka Edge support.
Tests
Beyond basic image query vibe checks, I ran VQA v2, RefCoco A & B, mobile-actions benchmarks against Reka Edge running on the llama-server.
Contributing Guidelines
I have read and agree with the contributing guidelines. I've run the CPU-only build of the full CI on my local machine.
AI usage disclosure: Claude Code and Codex were used to explore the codebase and generate boilerplate and drafts for the functions and tests introduced. These changes were then substantively edited, fleshed out, and cleaned up by me, and tested extensively against our chosen benchmarks to ensure correctness.