feat(crux-e-07): latency P50/P95/P99 percentiles classifier + CLI (5 of 6 FALSIFY FULL, 1 of 6 PARTIAL; blocked on BLOCKER-FIXTURE-ABSENT)#986
Merged
Conversation
6 tasks
…of 6 FALSIFY FULL, 1 of 6 PARTIAL; blocked on BLOCKER-FIXTURE-ABSENT) CRUX-SHIP-001 merge gates: - g1_classifier_green: 21 unit tests pass (aprender-core metrics::percentile) - g2_cli_reachable: apr bench --help advertises --percentiles - g3_e2e_runs: 8 falsification tests pass (falsification_crux_e_07) - g4_contract_discharged: 5 of 6 FALSIFY-* FULL; FALSIFY-006 PARTIAL_ALGORITHM_LEVEL under BLOCKER-FIXTURE-ABSENT (real GGUF fixture not in-tree) Contract: contracts/crux-E-07-v1.yaml v1.1.0 status=partial Classifier: aprender::metrics::percentile (nearest-rank, monotonicity guard, no-silent-pass on empty/NaN/Inf/negative/out-of-range) CLI: apr bench --percentiles 50,95,99 --json emits latency_p<N>_ms keys Competitor parity: vllm benchmark_serving.py nearest-rank convention Research: arXiv:2505.02502 (deployment-framework latency capability) User demand: vllm#4145, vllm#9722 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4294ea5 to
1d21832
Compare
… sites The E-07 classifier wiring added a 9th parameter `percentiles: &[f64]` to `commands::bench::run`. The CLI dispatcher in dispatch_analysis.rs was updated, but 23 test call sites in three internal lib-test modules (bench_config.rs, bench_brick_name.rs, bench_calculate_stats.rs) were missed — they only compile when `cargo test --lib` or `cargo test --workspace --lib` is run, which is exactly what the `workspace-test` CI job does (but not `-p apr-cli` plain builds). Fix: add `&[]` (use-defaults sentinel) as the 9th argument to every affected call site. No behavioral change — these tests only assert `result.is_err()` on bogus model paths, so the default percentiles vector has no effect. Verified with `cargo test -p apr-cli --lib` → 5157 passed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
First CRUX feature shipped under the CRUX-SHIP-001 discipline (see PR #985). Adds latency P50/P95/P99 percentile reporting to
apr bench --json, with pure-Rust classifier, CLI wiring, and end-to-end falsification tests in the same PR.Surface:
apr bench model.apr --percentiles 50,95,99 --jsonemitslatency_p50_ms,latency_p95_ms,latency_p99_mskeys derived from nearest-rank percentiles of theiteration_timescaptured during warmed-up benchmarks. Matches vllmbenchmark_serving.pyconvention.CRUX-SHIP-001 Merge Gates
aprender-core metrics::percentile)apr bench --helpadvertises--percentileswith default50,95,99crates/apr-cli/tests/falsification_crux_e_07.rsBLOCKER-FIXTURE-ABSENT(real GGUF fixture not in-tree)Contract Discharge
contracts/crux-E-07-v1.yamlv1.1.0, statuspartial,pv validate0 errors / 0 warnings:--percentilesflag reachable from CLIBLOCKER-FIXTURE-ABSENT)Research Grounding
Test Plan
cargo test -p aprender-core --lib metrics::percentile→ 21/21 passcargo test -p apr-cli --test falsification_crux_e_07→ 8/8 passcargo run -p aprender-contracts-cli -- validate contracts/crux-E-07-v1.yaml→ 0 errors, 0 warningsapr bench --help | grep -F -- '--percentiles'→ reachableapr bench --help | grep -F '50,95,99'→ default mentioned🤖 Generated with Claude Code