Skip to content

fix(ggml): TurboQ rotate-block OOB — unbreaks Windows x64 CI#104

Merged
marksverdhei merged 1 commit into
htfrom
fix/turboq-rotate-block-oob
Jun 12, 2026
Merged

fix(ggml): TurboQ rotate-block OOB — unbreaks Windows x64 CI#104
marksverdhei merged 1 commit into
htfrom
fix/turboq-rotate-block-oob

Conversation

@marksverdhei

Copy link
Copy Markdown

Problem

windows (x64-cpu-static) has been red on every PR: test-quantize-fns SEGFAULT and test-quantize-perf exit 0xc0000374 (STATUS_HEAP_CORRUPTION). Linux legs passed silently.

Root cause

turboq_rotate_block_forward/_inverse loop to QK_K (256) but all four callers (tbq3_0/tbq4_0 quantize + dequantize) pass TBQ_BLK_SIZE (128) float buffers. Every call read and wrote 128 floats (512 bytes) past both scratch buffers. The Windows CRT heap checker trips on it; glibc doesn't, which is why Linux CI stayed green.

ASAN confirms pre-fix:

ERROR: AddressSanitizer: heap-buffer-overflow ... READ of size 4
  #0 matvec_row ggml-turboq.c:260
  #1 turboq_rotate_block_forward ggml-turboq.c:379
  #2 quantize_row_tbq3_0_ref ggml-turboq.c:476
0x... is located 0 bytes after 512-byte region (turboq_get_scratch)

Fix

Loop bound → TBQ_BLK_SIZE. Zero behavior change for the valid region — the second iteration only produced out-of-bounds garbage (the rotation tiles in TURBOQ_KV_DIM=128 chunks, so the first iteration fully covers the block). May also be implicated in the broken TBQ KV-cache on CPU (ggml-org#125) — heap corruption next to every dequant call is a plausible mechanism; not yet validated.

Validation

  • test-quantize-fns + test-quantize-perf: pass under ASAN+UBSAN (RelWithDebInfo, sm-agnostic CPU path) — pre-fix, fns aborts with the trace above.
  • RMSE checks in test-quantize-fns unchanged (tbq3_0/tbq4_0/q1_0 all pass).
  • This PR's own CI run is the Windows proof.

turboq_rotate_block_forward/inverse looped to QK_K (256) while every
caller (tbq3_0/tbq4_0 quantize + dequantize) passes TBQ_BLK_SIZE (128)
float buffers — a 128-float OOB read+write per block. Confirmed with
ASAN (heap-buffer-overflow in matvec_row via quantize_row_tbq3_0_ref)
and the cause of the Windows x64 CI failures: test-quantize-fns
SEGFAULT / test-quantize-perf 0xc0000374 (STATUS_HEAP_CORRUPTION).

Loop bound fixed to TBQ_BLK_SIZE. No behavior change for the valid
region: the extra iteration only produced the out-of-bounds garbage.
test-quantize-fns + test-quantize-perf now pass under ASAN+UBSAN.
@marksverdhei marksverdhei merged commit d351576 into ht Jun 12, 2026
6 of 7 checks passed
@marksverdhei marksverdhei deleted the fix/turboq-rotate-block-oob branch June 12, 2026 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant