Add support for Reka Edge 2603 by kwajiehao · Pull Request #21616 · ggml-org/llama.cpp

kwajiehao · 2026-04-08T09:54:56Z

Overview

This PR adds support for Reka Edge 2603.

Model overview:

LLM: Llama-style, 32 layers, GQA with 32/8 heads, RoPE, SiLU
Vision Encoder: ConvNeXt V2-based (4-stage, depths [3, 3, 27, 3], with GRN)
Projector: 2-layer MLP with GELU (2816 → 4096 → 4096), adaptive 8×8 pooling → 64 visual tokens per tile

Note that while the template contains reasoning chunks, the Reka Edge 2603 model does not support reasoning.

The code changes in this PR are purely additive.

Useful links:

Other changes

I made a few additions to the Jinja engine to get Reka Edge working with llama.cpp: string repetition, ensure_ascii in tojson, and int() as a no-op on integers. This is needed for Reka Edge because of how our chat template is constructed.

I'm happy to split this PR up so that they can be reviewed separately. I just wanted to include them up front first because if I made a PR to add Jinja engine features that wouldn't make sense without the context of Reka Edge support.

Tests

Beyond basic image query vibe checks, I ran VQA v2, RefCoco A & B, mobile-actions benchmarks against Reka Edge running on the llama-server.

Contributing Guidelines

I have read and agree with the contributing guidelines. I've run the CPU-only build of the full CI on my local machine.

AI usage disclosure: Claude Code and Codex were used to explore the codebase and generate boilerplate and drafts for the functions and tests introduced. These changes were then substantively edited, fleshed out, and cleaned up by me, and tested extensively against our chosen benchmarks to ensure correctness.

ngxson

the mtmd part looks OK, but chat template part requires additional review from @pwilkin @aldehir

pwilkin · 2026-04-10T16:33:52Z

I'm puzzled - the autoparser seems to detect the template just fine, why would we need a specialized one for this model?

--- Reasoning & Content Structure ---
reasoning_mode: TAG_BASED
reasoning_start: '<think>
'
reasoning_end: '</think>'
content_mode: PLAIN
content_start: ''
content_end: ''

--- Tool Call Structure ---
tool_mode: JSON_NATIVE
supports_tools: true
supports_parallel_calls: true
tool_section_start: '<tool_call>
'
tool_section_end: '</tool_call>'
per_call_start: ''
per_call_end: ''
func_name_prefix: ''
func_name_suffix: ''
func_close: ''
call_id_prefix: ''
call_id_suffix: ''
call_id_pos: 'NONE'
args_start: ''
args_end: ''
arg_name_prefix: ''
arg_name_suffix: ''
arg_value_prefix: ''
arg_value_suffix: ''
name_field: 'name'
args_field: 'arguments'
id_field: ''
gen_id_field: ''
parameter_order: 'name, arguments'

kwajiehao · 2026-04-10T18:13:25Z

I'm puzzled - the autoparser seems to detect the template just fine, why would we need a specialized one for this model?

--- Reasoning & Content Structure ---
reasoning_mode: TAG_BASED
reasoning_start: '<think>
'
reasoning_end: '</think>'
content_mode: PLAIN
content_start: ''
content_end: ''

--- Tool Call Structure ---
tool_mode: JSON_NATIVE
supports_tools: true
supports_parallel_calls: true
tool_section_start: '<tool_call>
'
tool_section_end: '</tool_call>'
per_call_start: ''
per_call_end: ''
func_name_prefix: ''
func_name_suffix: ''
func_close: ''
call_id_prefix: ''
call_id_suffix: ''
call_id_pos: 'NONE'
args_start: ''
args_end: ''
arg_name_prefix: ''
arg_name_suffix: ''
arg_value_prefix: ''
arg_value_suffix: ''
name_field: 'name'
args_field: 'arguments'
id_field: ''
gen_id_field: ''
parameter_order: 'name, arguments'

@pwilkin when I tested with the autoparser I found that it failed for parallel tool calls. This is because of a difference in the definition of parallel tool calls. The autoparser defines parallel tool calls = multiple JSON objects comma-separated inside a single section. But for the Reka chat template, each tool call gets its own <tool_call>...</tool_call> wrapper. Specifically, standard_json_tools (chat-peg-parser.cpp:855-866) puts multiple comma-separated JSONs inside one <tool_call> block, but the Reka template (Reka-Edge.jinja:119-136) loops and emits a separate <tool_call>...</tool_call> per call.

pwilkin · 2026-04-10T18:38:48Z

@kwajiehao ah, okay, in that case, don't make a separate parser, instead, add a workaround at the top of chat-diff-analyzer.cpp with:

analysis.tools.format.section_start = "";
analysis.tools.format.section_end = "";
analysis.tools.format.per_call_start = "<tool_call>";
analysis.tools.format.per_call_end = "</tool_call>";

That should be enough.

kwajiehao · 2026-04-13T05:16:14Z

@pwilkin thanks for the suggestion! I tried this and found that the autoparser has a few gaps for Reka's chat template format:

The Reka chat template uses JSON for tool calls, so it uses build_tool_parser_json_native. However, build_tool_parser_json_native ignores per_call_start/end and only calls standard_json_tools with only section_start/end i.e. JSON native + per-call wrapping isn't supported
In lazy mode (common/peg-parser.cpp:1796-1810), the grammar is built from the single tool-call trigger rule even if the PEG parser understands the repetition, so it only permits one call even when parallel is requested

Let me know if I'm missing anything in my analysis. Happy to either keep the custom parser for Reka, or to extend the behavior of the autoparser if needed

kwajiehao · 2026-04-16T02:46:25Z

Thanks @pwilkin - i've rebased on master, removed the custom parser code in 04743ad, and verified that everything looks good!

kwajiehao · 2026-04-20T16:19:41Z

@ngxson @pwilkin thanks for the detailed comments and review. Do you have any more feedback for this PR? Or would we be good to merge this in

pwilkin · 2026-04-20T16:48:20Z

    GGML_ASSERT(chat_templates->template_default != nullptr);
    return chat_templates->template_default->caps.to_map();
 }
-


Revert chat.cpp edits, this removes the trailing newline.

Fixed by checking out common/chat.cpp from upstream/master

ngxson

can merge after @pwilkin 's approval

ngxson · 2026-04-21T13:20:20Z

I'm re-runing one of the server CI, will merge when it passes

ngxson · 2026-04-21T18:01:48Z

Hmm seems like server's CI didn't pass because it misses a fix from upstream, but it's unrelated to the current PR. Should be ok for merge.

sebagallo · 2026-04-21T18:29:19Z

made my own quants but it looks like something is broken in the parsing:

The mmproj works, but I noticed all responses being wrapped inside the <answer> or <result> tags, and tool calls don't seem to work.

For quanting, I used the script at the top to produce BF16 quants, then ran llama-quantize following the suggestion to keep the last 8 blocks at Q8_0. The Resulting Q8_0 is producing these results.

kwajiehao · 2026-04-22T03:51:32Z

Hi @sebagallo, first of all thank you for your interest in Reka Edge 2603. Unfortunately this model does not support reasoning so you will need to run llama-server with --reasoning off

Beyond that, I found an error with the gguf conversion script (due to a reliance on an older version of convert_hf_to_gguf) and I've replaced it. With --reasoning off my tool call tests pass and I can't reproduce any tool call errors.

Can you give it a shot with --reasoning off?

sebagallo · 2026-04-22T04:41:48Z

Hi @sebagallo, first of all thank you for your interest in Reka Edge 2603. Unfortunately this model does not support reasoning so you will need to run llama-server with --reasoning off

Beyond that, I found an error with the gguf conversion script (due to a reliance on an older version of convert_hf_to_gguf) and I've replaced it. With --reasoning off my tool call tests pass and I can't reproduce any tool call errors.

Can you give it a shot with --reasoning off?

reasoning off seems to fix the issues! my impression was that thinking was supported as I saw in the chat template enable_thinking and the <think> tag.

Tried tool calling and the mmproj again and everything works!.
Regarding the conversion, at first I had problems with the tokenizer, but momentarily downgrading to transformers 4.57.3 allowed to produce the GUFFs. I didn't try your latest fix, but the files I produced with the mentioned transormers version right now work.

Thanks for this new model!

* feat: (vocab) fix stray text appended in llama_decode_text Remove accidental concatenation of the full `text` string when formatting UNK_BYTE hex escapes. Only the closing "]" should be appended. * feat(mtmd): add Yasa2 vision encoder support Add a Yasa2 (ConvNeXtV2-based) vision encoder for reka-edge: - Register PROJECTOR_TYPE_YASA2 and tensor name definitions - Add yasa2_block/yasa2_stage model structs - Implement graph builder with ConvNeXt stages, GRN, adaptive pooling - Wire into clip.cpp switch statements and mtmd.cpp init_vision - Use mtmd_image_preprocessor_fixed_size for image preprocessing * feat(chat): add reka-edge template handler (tools, thinking) - Add chat-reka.cpp/h implementing PEG-based parser for reka-edge format - Add Reka-Edge.jinja chat template - Detect reka-edge template in try_specialized_template() - Add LLAMA_EXAMPLE_MTMD to chat-template-file arg * feat: add reka vlm to gguf conversion script Converts Reka Yasa2 hf checkpoints to GGUF format: - Text decoder: Llama-arch with tiktoken/BPE vocab - Mmproj (--mmproj): ConvNeXt vision backbone + language_projection - Generates 2D sincos positional embeddings for vision encoder * test: add Reka Edge chat template and parser tests - test-chat-template: oracle tests comparing Jinja engine output vs common_chat_templates_apply for text, tools, thinking, images, video - test-chat: PEG parser tests for Reka Edge format, round-trip tests for image/video content parts, common path integration tests * scripts: add Reka Edge mixed quantization helper Q4_0 base quantization with Q8_0 override for the last 8 transformer blocks (layers 24-31) via --tensor-type regex. * fix: adapt chat-reka and tests to upstream API - Use autoparser::generation_params (not templates_params) - Add p.prefix(generation_prompt) to PEG parser - Simplify reasoning parser to match LFM2 pattern - Remove image/video oracle tests (unsupported by oaicompat parser; no other multimodal models test this path) * fix: avoid duplicate tensor loading in yasa2 vision encoder TN_YASA_PATCH_W and TN_PATCH_EMBD both resolve to "v.patch_embd.weight", causing the same tensor to be loaded twice into ctx_data and overflowing the memory pool. Reuse the tensors already loaded by the common section. * chore: update image pre-processing settings The reka-edge model depends on the following settings in an older fork of llama.cpp: 1. Fixed square resize 2. BICUBIC 3. add_padding=false In current llama.cpp, this means setting: - image_resize_algo = RESIZE_ALGO_BICUBIC - image_resize_pad = false * chore: remove reka gguf conversion script * chore: remove reka quantization script * chore: remove unnecessary changes from PR scope This commit removes a couple of unnecessary changes for the PR scope: 1. BPE decoder bug fix - this affects reka edge because there's a bug in our tokenization that doesn't represent <think> tokens as special tokens. However this isn't meant to be a thinking model so when run with --reasoning off the edge case does not affect us 2. --chat-template-file support from llama-mtmd-cli - the focus is on llama-server and the reka edge gguf contains the necessary metadata to detect the chat template 3. reka edge oracle test cases - no other model has similar test cases, so I removed it for standardization * chore: remove unnecessary ggml_cast This commit removes unnecessary ggml_cast after updating the reka vlm -> gguf conversion script on hugging face. * chore: remove redundant code * chore: remove unnecessary ggml_cont calls This commit removes all ggml_cont calls except the four that precede ggml_reshape_3d/ggml_reshape_4d. Those are necessary because ggml_reshape recomputes strides assuming contiguous layout and asserts ggml_is_contiguous. Other operations (ggml_mean, ggml_add, ggml_mul etc.) use stride-based indexing and handle non-contiguous inputs correctly and so we are ok to remove ggml_cont for those. * chore: remove unnecessary ggml_repeat calls This commit removes unnecessary ggml_repeat calls because the underlying ops already broadcast automatically. Every ggml_repeat in yasa2.cpp was expanding a smaller tensor to match a larger one's shape before passing both to an elementwise op (ggml_add, ggml_sub, ggml_mul, or ggml_div). This is unnecessary because all four of these ops already support broadcasting internally. * chore: restore ggml_cont needed for cpu operations * refactor: locate reka chat template handler in chat.cpp * chore: remove unnecessary warmup tokens * chore: add code comments on image_resize_pad * chore: remove custom reka parsing code * chore: revert common/chat.cpp * Uncomment debug logging for PEG input parsing --------- Co-authored-by: Piotr Wilkin (ilintar) <piotr.wilkin@syndatis.com>

kwajiehao changed the title ~~Feat/reka edge~~ Add support for Reka Edge 2603 Apr 8, 2026

This comment was marked as resolved.

Sign in to view

kwajiehao marked this pull request as ready for review April 8, 2026 11:04

kwajiehao requested review from a team, CISC, ggerganov and pwilkin as code owners April 8, 2026 11:04

kwajiehao force-pushed the feat/reka-edge branch 4 times, most recently from 5d1aa8f to 77018b5 Compare April 8, 2026 11:41

ngxson reviewed Apr 8, 2026

View reviewed changes

Comment thread common/jinja/runtime.cpp

kwajiehao closed this Apr 8, 2026

kwajiehao reopened this Apr 8, 2026

kwajiehao force-pushed the feat/reka-edge branch from 77018b5 to ae7730a Compare April 8, 2026 12:48

kwajiehao mentioned this pull request Apr 8, 2026

feat: jinja engine improvements for reka-edge #21623

Merged

kwajiehao force-pushed the feat/reka-edge branch 5 times, most recently from 5e48fb3 to 30395b5 Compare April 9, 2026 09:31

github-actions Bot added testing Everything test related examples labels Apr 9, 2026

ngxson approved these changes Apr 10, 2026

View reviewed changes

Comment thread common/chat-reka.cpp Outdated

chore: remove custom reka parsing code

04743ad

pwilkin approved these changes Apr 20, 2026

View reviewed changes

kwajiehao force-pushed the feat/reka-edge branch 2 times, most recently from b332da4 to 04743ad Compare April 20, 2026 16:59

ngxson approved these changes Apr 20, 2026

View reviewed changes

kwajiehao force-pushed the feat/reka-edge branch from 9e93867 to 04743ad Compare April 20, 2026 17:05

chore: revert common/chat.cpp

0d13919

kwajiehao requested a review from pwilkin April 21, 2026 04:35

pwilkin approved these changes Apr 21, 2026

View reviewed changes

Uncomment debug logging for PEG input parsing

4b8e10e

pwilkin approved these changes Apr 21, 2026

View reviewed changes

ngxson approved these changes Apr 21, 2026

View reviewed changes

ngxson merged commit 98d2d28 into ggml-org:master Apr 21, 2026
47 of 54 checks passed

Conversation

kwajiehao commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Other changes

Tests

Contributing Guidelines

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pwilkin commented Apr 10, 2026

Uh oh!

kwajiehao commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pwilkin commented Apr 10, 2026

Uh oh!

kwajiehao commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwajiehao commented Apr 16, 2026

Uh oh!

kwajiehao commented Apr 20, 2026

Uh oh!

pwilkin Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

kwajiehao Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ngxson commented Apr 21, 2026

Uh oh!

ngxson commented Apr 21, 2026

Uh oh!

Uh oh!

sebagallo commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kwajiehao commented Apr 22, 2026

Uh oh!

sebagallo commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kwajiehao commented Apr 8, 2026 •

edited

Loading

kwajiehao commented Apr 10, 2026 •

edited

Loading

kwajiehao commented Apr 13, 2026 •

edited

Loading

sebagallo commented Apr 21, 2026 •

edited

Loading