Skip to content

ggml-cpu : fix rms_norm_back wrong output under in-place aliasing#24305

Merged
ggerganov merged 2 commits into
ggml-org:masterfrom
devYRPauli:fix-softmax-back-aliasing
Jun 9, 2026
Merged

ggml-cpu : fix rms_norm_back wrong output under in-place aliasing#24305
ggerganov merged 2 commits into
ggml-org:masterfrom
devYRPauli:fix-softmax-back-aliasing

Conversation

@devYRPauli

@devYRPauli devYRPauli commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Overview

ggml_compute_forward_rms_norm_back_f32 could produce wrong results when the destination aliases an input. GGML_OP_RMS_NORM_BACK is listed in ggml_op_can_inplace, so the scheduler may reuse src0 (dz) or src1 (x)'s buffer for dx. The old multi-step cpy/scale/acc/scale sequence overwrote that buffer in the dx := x step and then re-read it in the += dz step. This replaces it with a single fused read-before-write loop, which is safe under either aliasing.

Additional information

Requested by @ggerganov in ggml-org/ggml#1519, where I originally reported and fixed this (#1491). Submitting the single ops.cpp change here as asked; no regression test per that thread. Built ggml-cpu locally on macOS to confirm it compiles.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES — assistive port of my own prior fix.

@devYRPauli devYRPauli requested a review from ggerganov as a code owner June 8, 2026 13:37
@github-actions github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 8, 2026
@devYRPauli devYRPauli force-pushed the fix-softmax-back-aliasing branch from 08856ac to 3bddfef Compare June 8, 2026 13:42
@ggml-gh-bot

ggml-gh-bot Bot commented Jun 8, 2026

Copy link
Copy Markdown

Hi @devYRPauli, thanks for your contribution!

Per our contribution guidelines, the automated PR checker found the following issue(s) that need your attention:

  • Multiple open PRs from a new contributor: We limit new contributors (those without a previously merged PR) to 1 open PR at a time. You currently have 2 open PRs.

  • AI-generated content: This project does not accept PRs, descriptions or commit messages that are fully or predominantly AI-generated. If you have used AI to assist you in writing code, please make sure to disclose that explicitly.


Please note that maintainers reserve the right to make final decisions on PRs. If you believe there is a mistake, please comment below.

@devYRPauli devYRPauli changed the title ggml-cpu : fix soft_max_back wrong output when dst aliases src1 (y) ggml-cpu : fix rms_norm_back wrong output under in-place aliasing Jun 8, 2026
@ggerganov ggerganov added the merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge. label Jun 8, 2026
Comment thread ggml/src/ggml-cpu/ops.cpp Outdated
@ggerganov ggerganov merged commit fd3271e into ggml-org:master Jun 9, 2026
27 checks passed
anaisbetts pushed a commit to anaisbetts/llama.cpp that referenced this pull request Jun 16, 2026
…ml-org#24305)

* ggml-cpu : fix rms_norm_back wrong output under in-place aliasing

* cont : clean-up comment

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
(cherry picked from commit fd3271e)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning merge ready A maintainer can use this label to indicate that they consider the changes final and ready to merge.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants