Skip to content

feat(crux-e-07): latency P50/P95/P99 percentiles classifier + CLI (5 of 6 FALSIFY FULL, 1 of 6 PARTIAL; blocked on BLOCKER-FIXTURE-ABSENT)#986

Merged
noahgift merged 3 commits into
mainfrom
feat/crux-e-07-latency-percentiles
Apr 21, 2026
Merged

feat(crux-e-07): latency P50/P95/P99 percentiles classifier + CLI (5 of 6 FALSIFY FULL, 1 of 6 PARTIAL; blocked on BLOCKER-FIXTURE-ABSENT)#986
noahgift merged 3 commits into
mainfrom
feat/crux-e-07-latency-percentiles

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

First CRUX feature shipped under the CRUX-SHIP-001 discipline (see PR #985). Adds latency P50/P95/P99 percentile reporting to apr bench --json, with pure-Rust classifier, CLI wiring, and end-to-end falsification tests in the same PR.

Surface: apr bench model.apr --percentiles 50,95,99 --json emits latency_p50_ms, latency_p95_ms, latency_p99_ms keys derived from nearest-rank percentiles of the iteration_times captured during warmed-up benchmarks. Matches vllm benchmark_serving.py convention.

CRUX-SHIP-001 Merge Gates

Gate Status Evidence
g1_classifier_green 21 unit tests pass (aprender-core metrics::percentile)
g2_cli_reachable apr bench --help advertises --percentiles with default 50,95,99
g3_e2e_runs 8 tests pass in crates/apr-cli/tests/falsification_crux_e_07.rs
g4_contract_discharged ✅ (partial allowed) 5 of 6 FALSIFY-* at FULL; FALSIFY-006 at PARTIAL_ALGORITHM_LEVEL under declared BLOCKER-FIXTURE-ABSENT (real GGUF fixture not in-tree)

Contract Discharge

contracts/crux-E-07-v1.yaml v1.1.0, status partial, pv validate 0 errors / 0 warnings:

  • FALSIFY-CRUX-E-07-001 FULL — --percentiles flag reachable from CLI
  • FALSIFY-CRUX-E-07-002 FULL — percentile monotonicity (p99 ≥ p95 ≥ p50)
  • FALSIFY-CRUX-E-07-003 FULL — no-silent-pass on empty/NaN/Inf/negative/out-of-range
  • FALSIFY-CRUX-E-07-004 FULL — nearest-rank p100 == max(samples)
  • FALSIFY-CRUX-E-07-005 FULL — all reported percentiles strictly positive
  • FALSIFY-CRUX-E-07-006 PARTIAL_ALGORITHM_LEVEL — full discharge needs real GGUF fixture (BLOCKER-FIXTURE-ABSENT)

Research Grounding

  • arXiv:2505.02502 — deployment-framework latency capability
  • vllm#4145, vllm#9722 — user demand for P50/P95/P99 under concurrency
  • vllm benchmarks/benchmark_serving.py — nearest-rank percentile convention (Aprender mirrors it)

Test Plan

  • cargo test -p aprender-core --lib metrics::percentile → 21/21 pass
  • cargo test -p apr-cli --test falsification_crux_e_07 → 8/8 pass
  • cargo run -p aprender-contracts-cli -- validate contracts/crux-E-07-v1.yaml → 0 errors, 0 warnings
  • apr bench --help | grep -F -- '--percentiles' → reachable
  • apr bench --help | grep -F '50,95,99' → default mentioned
  • Full-discharge of FALSIFY-006 when real GGUF fixture ticket lands (follow-up; blocker declared)

🤖 Generated with Claude Code

…of 6 FALSIFY FULL, 1 of 6 PARTIAL; blocked on BLOCKER-FIXTURE-ABSENT)

CRUX-SHIP-001 merge gates:
- g1_classifier_green: 21 unit tests pass (aprender-core metrics::percentile)
- g2_cli_reachable: apr bench --help advertises --percentiles
- g3_e2e_runs: 8 falsification tests pass (falsification_crux_e_07)
- g4_contract_discharged: 5 of 6 FALSIFY-* FULL; FALSIFY-006 PARTIAL_ALGORITHM_LEVEL
  under BLOCKER-FIXTURE-ABSENT (real GGUF fixture not in-tree)

Contract: contracts/crux-E-07-v1.yaml v1.1.0 status=partial
Classifier: aprender::metrics::percentile (nearest-rank, monotonicity guard,
no-silent-pass on empty/NaN/Inf/negative/out-of-range)
CLI: apr bench --percentiles 50,95,99 --json emits latency_p<N>_ms keys

Competitor parity: vllm benchmark_serving.py nearest-rank convention
Research: arXiv:2505.02502 (deployment-framework latency capability)
User demand: vllm#4145, vllm#9722

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/crux-e-07-latency-percentiles branch from 4294ea5 to 1d21832 Compare April 21, 2026 10:43
… sites

The E-07 classifier wiring added a 9th parameter `percentiles: &[f64]` to
`commands::bench::run`. The CLI dispatcher in dispatch_analysis.rs was
updated, but 23 test call sites in three internal lib-test modules
(bench_config.rs, bench_brick_name.rs, bench_calculate_stats.rs) were
missed — they only compile when `cargo test --lib` or `cargo test
--workspace --lib` is run, which is exactly what the `workspace-test`
CI job does (but not `-p apr-cli` plain builds).

Fix: add `&[]` (use-defaults sentinel) as the 9th argument to every
affected call site. No behavioral change — these tests only assert
`result.is_err()` on bogus model paths, so the default percentiles
vector has no effect.

Verified with `cargo test -p apr-cli --lib` → 5157 passed.
@noahgift noahgift merged commit 425e352 into main Apr 21, 2026
10 checks passed
@noahgift noahgift deleted the feat/crux-e-07-latency-percentiles branch April 21, 2026 12:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant