Skip to content

metal : avoid divisions in bin kernel#20426

Merged
ggerganov merged 2 commits intomasterfrom
gg/metal-bin-mod
Mar 12, 2026
Merged

metal : avoid divisions in bin kernel#20426
ggerganov merged 2 commits intomasterfrom
gg/metal-bin-mod

Conversation

@ggerganov
Copy link
Member

Before:

image image

After:

image image

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 11, 2026
@ggerganov ggerganov merged commit e4cff09 into master Mar 12, 2026
67 of 75 checks passed
tekintian added a commit to tekintian/llama.cpp that referenced this pull request Mar 12, 2026
* 'master' of github.com:ggml-org/llama.cpp: (33 commits)
  convert : better mtp check and fix return [no ci] (ggml-org#20419)
  vulkan: fix SSM_CONV PP scaling with large ubatch sizes (ggml-org#20379)
  New conversations now auto-select the first loaded model (ggml-org#20403)
  ggml-virtgpu: Fix some build commands (ggml-org#20341)
  metal : avoid divisions in bin kernel (ggml-org#20426)
  ci: Setup self-hosted CI for Intel Linux Vulkan backend (ggml-org#20154)
  vulkan: fix l2_norm epsilon handling (ggml-org#20350)
  vulkan: fix OOB check in flash_attn_mask_opt (ggml-org#20296)
  vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (ggml-org#20059)
  opencl: use larger workgroup size for get_rows (ggml-org#20316)
  opencl: add cumsum op (ggml-org#18981)
  hip: compile debug builds with -O2 on hip to avoid a compiler bug (ggml-org#20392)
  common/parser: add GigaChatV3/3.1 models support (ggml-org#19931)
  model : add support for Phi4ForCausalLMV (ggml-org#20168)
  graph : add optional scale parameter to build_lora_mm [no ci] (ggml-org#20427)
  common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (ggml-org#20416)
  ggml-webgpu: Add supports for `GGML_OP_REPEAT` (ggml-org#20230)
  llama : enable chunked fused GDN path (ggml-org#20340)
  llama : whitespace cleanup (ggml-org#20422)
  ggml : add NVFP4 quantization type support (ggml-org#19769)
  ...
am17an pushed a commit to am17an/llama.cpp that referenced this pull request Mar 12, 2026
* metal : avoid modulus in bin kernel when not broadcasting

* metal : fix capture_started flag
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant