common : inhibit lazy grammar sampler while reasoning is active (#20970) by Ooooze · Pull Request #1 · AtomicBot-ai/atomic-llama-cpp-turboquant

Ooooze · 2026-03-30T19:53:22Z

common : inhibit grammar while reasoning budget is active
cont : update force_pos in accept
cont : fix tests
cont : tweak should apply logic
cont : return early not using grammar sampler
Add tests
cont : prevent backend sampling when reasoning budget enabled
cont : fix typo

Overview

Additional information

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure:

Automatically creates a prerelease with the macOS ARM64 binary on every push to feature/turboquant-kv-cache. Made-with: Cursor

Without target_commitish, softprops/action-gh-release creates tags on the default branch (master) instead of the triggering branch. Made-with: Cursor

Without -DLLAMA_BUILD_BORINGSSL=ON, cmake picks up Homebrew OpenSSL and links dynamically → Team ID mismatch on codesigned macOS apps. Changes: - Add -DLLAMA_BUILD_BORINGSSL=ON (static SSL, no dynamic dependency) - Add -DCMAKE_BUILD_WITH_INSTALL_RPATH=ON (apply rpath at build time) - Switch to -DCMAKE_INSTALL_RPATH='@loader_path' (consistent with release.yml) - Add -DLLAMA_BUILD_TOOLS=ON - Add verification step: otool -L check fails CI if dynamic SSL found Made-with: Cursor

LLAMA_BUILD_BORINGSSL doesn't exist in this fork's CMakeLists.txt — the flag was silently ignored, binary still linked Homebrew OpenSSL. Correct approach: disable curl and OpenSSL entirely, build all libs statically. Produces a single self-contained binary with only system dylibs (libSystem, libc++, Metal frameworks). - BUILD_SHARED_LIBS=OFF — links libllama, libggml etc. statically - LLAMA_CURL=OFF — no curl dependency, no HF model download - LLAMA_OPENSSL=OFF — no OpenSSL/crypto dependency - hw.ncpu instead of hw.logicalcpu (correct macOS sysctl key) - Verification step: fail CI if any non-system dylib found Made-with: Cursor

…-org#20970) * common : inhibit grammar while reasoning budget is active * cont : update force_pos in accept * cont : fix tests * cont : tweak should apply logic * cont : return early not using grammar sampler * Add tests * cont : prevent backend sampling when reasoning budget enabled * cont : fix typo --------- Co-authored-by: Piotr Wilkin <piotr.wilkin@syndatis.com>

Codex post-commit review found: 1. TURBO_D was QK_TURBO3 (now 32) — broke turbo4 C array sizes 2. SET_ROWS kernel turbo3-specific but instantiated for turbo4 3. Tail block drop for non-128 head dims Fixed #3 (TURBO_D). #1 and #2 don't affect turbo3+dk128 path. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Complete experiment log: #1 4-mag LUT: 15.1 at 8K (BEST, +38%) #2 Batched extract: 13.7 (+25%) #3 Inline FA block: 13.5 (I-cache pressure) #4 Deferred norm: 12.9 (loses ILP) #5 2-pair half2: 12.0 (ternary overhead) #6 Select chain: 11.9 (branches kill) #7 Bit-arithmetic: 11.6 (ALU too heavy) #8 FMA branchless: 11.4 (ALU still too heavy) #9 Named-reg ternary: 10.3 (branches worst) #10 Main (8-LUT): 10.95 (baseline) #11 Non-vec FA: 10.2 (wrong kernel) Ceiling: 24.5 (no dequant) Apple8 hardware truth: 1 divergent constant read < 7 ALU ops (even with fma) Branches cost MORE than divergent constant reads Array indexing ALWAYS spills on Metal 4 constant addresses is the sweet spot The 4-mag LUT is the dequant-level ceiling on Apple Silicon. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-Authored-By: tturney@psyguard.ai

Vect0rM and others added 5 commits March 30, 2026 12:34

ci: add GitHub Release with downloadable tar.gz archive

a60b5bd

Automatically creates a prerelease with the macOS ARM64 binary on every push to feature/turboquant-kv-cache. Made-with: Cursor

fix: set target_commitish in release action to correct branch

c6b5330

Without target_commitish, softprops/action-gh-release creates tags on the default branch (master) instead of the triggering branch. Made-with: Cursor

github-actions Bot added testing examples server labels Mar 30, 2026

Vect0rM force-pushed the feature/turboquant-kv-cache branch from 6879672 to 7b8820c Compare March 31, 2026 08:21

Vect0rM merged commit d785414 into feature/turboquant-kv-cache Mar 31, 2026
8 of 44 checks passed

WillowOneVision mentioned this pull request May 21, 2026

Phase C.2 dispatch behavior: MTP+mmproj coexistence behind --allow-mtp-with-mmproj (5th first-in-world) #19

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : inhibit lazy grammar sampler while reasoning is active (#20970)#1

common : inhibit lazy grammar sampler while reasoning is active (#20970)#1
Vect0rM merged 5 commits into
feature/turboquant-kv-cachefrom
fix/qwen

Ooooze commented Mar 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Ooooze commented Mar 30, 2026

Overview

Additional information

Requirements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants