common : fix `--n-cpu-moe`, `--cpu-moe` for models with fused gate + up by ddh0 · Pull Request #20416 · ggml-org/llama.cpp

ddh0 · 2026-03-11T17:38:35Z

Changed the regex that matches conditional experts from:

const char * const LLM_FFN_EXPS_REGEX = "\\.ffn_(up|down|gate)_(ch|)exps";

to:

const char * const LLM_FFN_EXPS_REGEX = "\\.ffn_(up|down|gate|gate_up)_(ch|)exps";

I think this should fix #20414.

Nekotekina · 2026-03-11T23:04:26Z

Fixes #20431

…gml-org#20416)

* 'master' of github.com:ggml-org/llama.cpp: (33 commits) convert : better mtp check and fix return [no ci] (ggml-org#20419) vulkan: fix SSM_CONV PP scaling with large ubatch sizes (ggml-org#20379) New conversations now auto-select the first loaded model (ggml-org#20403) ggml-virtgpu: Fix some build commands (ggml-org#20341) metal : avoid divisions in bin kernel (ggml-org#20426) ci: Setup self-hosted CI for Intel Linux Vulkan backend (ggml-org#20154) vulkan: fix l2_norm epsilon handling (ggml-org#20350) vulkan: fix OOB check in flash_attn_mask_opt (ggml-org#20296) vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (ggml-org#20059) opencl: use larger workgroup size for get_rows (ggml-org#20316) opencl: add cumsum op (ggml-org#18981) hip: compile debug builds with -O2 on hip to avoid a compiler bug (ggml-org#20392) common/parser: add GigaChatV3/3.1 models support (ggml-org#19931) model : add support for Phi4ForCausalLMV (ggml-org#20168) graph : add optional scale parameter to build_lora_mm [no ci] (ggml-org#20427) common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (ggml-org#20416) ggml-webgpu: Add supports for `GGML_OP_REPEAT` (ggml-org#20230) llama : enable chunked fused GDN path (ggml-org#20340) llama : whitespace cleanup (ggml-org#20422) ggml : add NVFP4 quantization type support (ggml-org#19769) ...

maybe fix?

c68d6fb

ddh0 requested a review from ggerganov as a code owner March 11, 2026 17:38

CISC approved these changes Mar 11, 2026

View reviewed changes

CISC merged commit 4a748b8 into ggml-org:master Mar 11, 2026
65 of 75 checks passed

ddh0 deleted the fix-fused-cpu-moe branch March 11, 2026 23:31

ProgenyAlpha pushed a commit to ProgenyAlpha/llama.cpp that referenced this pull request Mar 12, 2026

common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (g…

3de338f

…gml-org#20416)

ubergarm mentioned this pull request Mar 13, 2026

Add ffn_gate_up_exps to --cpu-moe and --n-cpu-moe overrides ikawrakow/ik_llama.cpp#1422

Merged

cthach mentioned this pull request Mar 14, 2026

Update default llama.cpp commit hash to b8323. containers/ramalama#2515

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : fix `--n-cpu-moe`, `--cpu-moe` for models with fused gate + up#20416

common : fix `--n-cpu-moe`, `--cpu-moe` for models with fused gate + up#20416
CISC merged 1 commit intoggml-org:masterfrom
ddh0:fix-fused-cpu-moe

ddh0 commented Mar 11, 2026

Uh oh!

Nekotekina commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ddh0 commented Mar 11, 2026

Uh oh!

Nekotekina commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants