feat(crux-e-02): perplexity classifier + apr ppl CLI (5 of 6 FALSIFY FULL, 1 of 6 PARTIAL; blocked on BLOCKER-UPSTREAM-MISSING)#987
Merged
Conversation
475a1be to
93f414d
Compare
1 task
noahgift
added a commit
that referenced
this pull request
Apr 23, 2026
…bling (#1007) Flake surfaced in PR #987 workspace-test run 24782269410 — f32 SIMD rounding produced diff=0.01074 at max_val=0.854, exceeding the 1e-2 small-value tolerance. The sibling test_vecmat_associativity already uses 2e-2 uniformly (proptest_properties.rs:252). This aligns the matvec branch to match. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
…FULL, 1 of 6 PARTIAL; blocked on BLOCKER-UPSTREAM-MISSING) CRUX-SHIP-001 merge gates: - g1_classifier_green: 13 unit tests pass (aprender-core metrics::perplexity) - g2_cli_reachable: apr ppl --help advertises --log-probs-file - g3_e2e_runs: 9 falsification tests pass (falsification_crux_e_02) - g4_contract_discharged: 5 of 6 FALSIFY-* FULL; FALSIFY-006 PARTIAL_ALGORITHM_LEVEL under BLOCKER-UPSTREAM-MISSING (no stable per-token log-probs extraction path for arbitrary GGUF/APR models in-tree yet) Contract: contracts/crux-E-02-v1.yaml v1.1.0 status=partial Classifier: aprender::metrics::perplexity (pure PPL = exp(-mean(log p)); no-silent-pass on empty/NaN/Inf/positive log-prob) CLI: apr ppl --log-probs-file FILE.json --json emits ppl, mean_nll, num_tokens, log_probs_path keys Competitor parity: llama.cpp examples/perplexity nearest analogue Research: arXiv:2402.16775 (held-out PPL for pretraining evaluation) User demand: llama.cpp#7111 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
62a3b46 to
352f1cf
Compare
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second CRUX feature shipped under CRUX-SHIP-001 (after #986). Adds a pure-math perplexity classifier in
aprender::metrics::perplexityplusapr ppl --log-probs-file FILE.json --json, mirroring thellama-perplexityconvention without requiring live model compute. The live-inference half (PPL over a held-out corpus using a real GGUF/APR model) stays PARTIAL under a declaredBLOCKER-UPSTREAM-MISSINGuntil a stable per-token log-probs extraction path lands.Surface:
apr ppl --log-probs-file nll.json --jsonemitsppl,mean_nll,num_tokens,log_probs_pathkeys.PPL = exp(-mean(log p))with invariants:ppl ≥ 1.0, finite, monotone in mean NLL.CRUX-SHIP-001 Merge Gates
aprender-core metrics::perplexity)apr ppl --helpadvertises--log-probs-filecrates/apr-cli/tests/falsification_crux_e_02.rsBLOCKER-UPSTREAM-MISSINGContract Discharge
contracts/crux-E-02-v1.yamlv1.1.0, statuspartial,pv validate0 errors / 0 warnings:--log-probs-fileflag reachable from CLIpplkey with correct valuePPL ≥ 1.0and finiteBLOCKER-UPSTREAM-MISSING)Research Grounding
examples/perplexity— canonical PPL CLI we mirrorTest Plan
cargo test -p aprender-core --lib metrics::perplexity→ 13/13 passcargo test -p apr-cli --test falsification_crux_e_02→ 9/9 passcargo run -p aprender-contracts-cli -- validate contracts/crux-E-02-v1.yaml→ 0 errors, 0 warningsapr ppl --help | grep -F -- '--log-probs-file'→ reachableecho '[-0.693, -0.693, -0.693]' > /tmp/lp.json && apr --json ppl --log-probs-file /tmp/lp.json→ppl ≈ 2.0apr eval --task perplexity --corpus <path>wiring lands (follow-up; blocker declared)🤖 Generated with Claude Code