fix(ci): F-203 SIMD timing flake — main CI andon by noahgift · Pull Request #875 · paiml/aprender

noahgift · 2026-04-18T04:34:26Z

ANDON: main has been red since `96d7349` (PR #869 merge)

Failing: quantize::tests::tests_25::test_f203_simd_faster_than_scalar_q4_0

Root cause

Single-shot timing. The test measured exactly one 100-iteration run per path. On shared CI runners, one measurement is dominated by cache state, frequency scaling, and neighbor-process preemption — SIMD sometimes timed slower than scalar purely from environmental noise.

CI log snippet:

F203: Q4_0 Performance Falsification
  Scalar: 122.13127ms
  SIMD:   135.973634ms
  Speedup: 0.90x

Not a SIMD regression. The scalar and SIMD paths are unchanged.

Fix

Warmup round + best-of-5, take the minimum of each path. The minimum is a lower-jitter estimator of the underlying hardware cost.

The falsification assertion is unchanged (speedup > 1.0). If SIMD's best-of-5 is still slower than scalar's best-of-5, that's a real regression — the Popperian intent of F-203 is preserved, not weakened.

Verification (4090 Yoga runner, debug build)

F203: Q4_0 Performance Falsification (best-of-5)
  Scalar (min): 47.93ms
  SIMD   (min): 46.58ms
  Speedup: 1.03x

Incidental change

cargo fmt -p aprender-serve corrected a pre-existing trailing-blank-line in crates/aprender-serve/src/contract_gate.rs. Included rather than reverted — the tree is now fmt-clean.

Test plan

cargo test -p aprender-serve --lib test_f203_simd_faster_than_scalar_q4_0 -- --nocapture passes stably
CI workspace-test green
CI ci / gate green

🤖 Generated with Claude Code

…t flake **Andon**: main has been red since 96d7349 (PR #869 merge). The failing test is `quantize::tests::tests_25::test_f203_simd_faster_than_scalar_q4_0`: single-shot timing on a 256×256 Q4_0 matvec got Scalar=122ms, SIMD=136ms (speedup 0.90×) — not a regression, pure OS/CPU jitter. **Root cause**: the test measured exactly one 100-iteration run of each path. On shared CI runners, a single run is dominated by cache state, frequency scaling, and neighbor-process preemption. SIMD timing was sometimes slower than scalar purely from environmental noise. **Fix**: warmup round + best-of-5 rounds, take the minimum of each. The minimum is a lower-jitter estimator of the underlying hardware cost. If SIMD's best measurement is still slower than scalar's best, that's a real regression worth failing CI — the Popperian falsification property of F-203 is preserved, not weakened. **Verification** (4090 Yoga runner, debug build): F203: Q4_0 Performance Falsification (best-of-5) Scalar (min): 47.93ms SIMD (min): 46.58ms Speedup: 1.03x Threshold `speedup > 1.0` unchanged. Test is now deterministic within measurement precision. Also picks up a pre-existing trailing-blank-line fmt drift in `crates/aprender-serve/src/contract_gate.rs` that `cargo fmt -p aprender-serve` corrected as a collateral effect. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

…t flake (#875) **Andon**: main has been red since 96d7349 (PR #869 merge). The failing test is `quantize::tests::tests_25::test_f203_simd_faster_than_scalar_q4_0`: single-shot timing on a 256×256 Q4_0 matvec got Scalar=122ms, SIMD=136ms (speedup 0.90×) — not a regression, pure OS/CPU jitter. **Root cause**: the test measured exactly one 100-iteration run of each path. On shared CI runners, a single run is dominated by cache state, frequency scaling, and neighbor-process preemption. SIMD timing was sometimes slower than scalar purely from environmental noise. **Fix**: warmup round + best-of-5 rounds, take the minimum of each. The minimum is a lower-jitter estimator of the underlying hardware cost. If SIMD's best measurement is still slower than scalar's best, that's a real regression worth failing CI — the Popperian falsification property of F-203 is preserved, not weakened. **Verification** (4090 Yoga runner, debug build): F203: Q4_0 Performance Falsification (best-of-5) Scalar (min): 47.93ms SIMD (min): 46.58ms Speedup: 1.03x Threshold `speedup > 1.0` unchanged. Test is now deterministic within measurement precision. Also picks up a pre-existing trailing-blank-line fmt drift in `crates/aprender-serve/src/contract_gate.rs` that `cargo fmt -p aprender-serve` corrected as a collateral effect. 🤖 Generated with [Claude Code](https://claude.com/claude-code)

noahgift enabled auto-merge (squash) April 18, 2026 04:34

Merge branch 'main' into fix/ci-f203-flaky-simd-timing

029dab8

noahgift merged commit 83bbf49 into main Apr 18, 2026
10 checks passed

noahgift deleted the fix/ci-f203-flaky-simd-timing branch April 18, 2026 05:12

noahgift mentioned this pull request Apr 19, 2026

release: aprender v0.31.0 — consolidated CHANGELOG (MCP M1–M3 + parity epic + SHIP-TWO-001 teacher) #899

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ci): F-203 SIMD timing flake — main CI andon#875

fix(ci): F-203 SIMD timing flake — main CI andon#875
noahgift merged 2 commits into
mainfrom
fix/ci-f203-flaky-simd-timing

noahgift commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 18, 2026

ANDON: main has been red since 96d7349 (PR #869 merge)

Root cause

Fix

Verification (4090 Yoga runner, debug build)

Incidental change

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ANDON: main has been red since `96d7349` (PR #869 merge)