feat(falsify-ship-003): MODEL-1 apr convert q4_k_m per-layer cos ≥ 0.999 PARTIAL discharge (8/10)#1028
Merged
Merged
Conversation
…999 PARTIAL discharge (7/10)
SHIP-TWO-001 spec v2.27.0 → v2.28.0: 7th compute-free MODEL-1 PARTIAL lever,
binding AC-SHIP1-003 (per-layer cosine similarity after `apr convert --quantize
q4_k_m`) to pure verdict functions at `discharge_status: PARTIAL_ALGORITHM_LEVEL`.
New: `crates/aprender-core/src/format/ship_003.rs`
- const AC_SHIP1_003_MIN_COSINE_SIMILARITY: f32 = 0.999
- enum Ship003Verdict { Pass, Fail }
- fn verdict_from_cosine_similarity(sim: f32, threshold: f32) -> Ship003Verdict
(f32-threshold with range guard + non-finite rejection)
- fn verdict_from_per_layer_cosines(sims: &[f32], threshold: f32) -> Ship003Verdict
(aggregate-AND over per-layer vector; empty → Fail; short-circuit on first Fail)
Twin mutation surveys:
1. falsify_ship_003_cosine_similarity_threshold_logic — 8 sections:
exact boundary, ULP-below (`f32::from_bits(0x3F7FBE77 - 1)`), safe-above
{0.9999, 1.0}, safe-below {0.998, 0.5, 0.0, -1.0}, monotonic sweep [0.990..1.0]
step 1e-4, non-finite (NaN/+∞/-∞) on both sim+threshold, out-of-range guards
({-1.5, 1.5, -2.0, 2.0}), provenance pin assert_eq const == 0.999_f32.
2. falsify_ship_003_per_layer_aggregate_and — 7 sections:
all-Pass 196 (28 layers × 7 projections), single-Fail at index 100, all-Fail,
empty-Fail (conservative), single-element both directions, first-layer NaN/OOR
short-circuit, last-layer Fail not short-circuited.
Contract: contracts/qwen2-e2e-verification-v1.yaml v1.2.0 → v1.3.0 ACTIVE.
FALSIFY-QW2E-SHIP-003 now annotated with `discharge_status:
PARTIAL_ALGORITHM_LEVEL`, 3 `evidence_discharged_by` test pins,
`full_discharge_blocks_on` (real 7B .apr + 28×7=196 projection matrix harness on
RTX 4090), and 7 counter_example_classes (regressed_quantizer, drifted_floor,
relaxed_rule, empty_vector_pass, range_guard_bypass, nan_promoted,
sign_flipped_quantizer).
Spec: docs/specifications/aprender-train/ship-two-models-spec.md v2.27.0 →
v2.28.0. AC-SHIP1-003 row annotated `FALSIFY-SHIP-003 **(PARTIAL_ALGORITHM_LEVEL
v2.28.0)**`. Changelog documents first MODEL-1 PARTIAL combining single-number
threshold shape (mirrors SHIP-007/SHIP-020) with aggregate-AND combinator
(mirrors SHIP-016) in one discharge. Coverage: MODEL-1 6/10 → 7/10; 13 PARTIAL +
3 DISCHARGED across both models.
Verification:
- `cargo test -p aprender-core --lib format::ship_003` → 2 passed / 0 failed
- `cargo fmt -p aprender-core --check` → clean
- `pv validate contracts/qwen2-e2e-verification-v1.yaml` → 0 errors, 0 warnings
Full discharge blocks on: MODEL-2 lambda-labs 7B .apr + real 196-projection
cosine-parity harness runner (separate task #126 compute-dispatch).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced Apr 23, 2026
4 tasks
noahgift
added a commit
that referenced
this pull request
Apr 23, 2026
Completes the MODEL-1 compute-free PARTIAL coverage at 10/10 touched. Stacked follow-up to SHIP-003 (#1028) + SHIP-004 (#1029). Spec changes: - **Version:** 2.31.0 → 2.32.0 - Date line appended with v2.32.0 entry describing the three pure verdict fns in `crates/aprender-core/src/format/ship_001.rs`, the three bound constants (AC_SHIP1_001_SAFETENSORS_HEADER_PREFIX_LEN = 8, AC_SHIP1_001_SAFETENSORS_JSON_OPEN_BYTE = 0x7B, Result-boundary), the triple mutation survey (Result × header-size × open-byte), `cargo test -p aprender-core --lib format::ship_001` green (3/3), full-discharge blocker (live `realizar::Model::load_safetensors` on RTX 4090 with `--features cuda`), MODEL-1 9/10 → 10/10 coverage, and the 16 PARTIAL + 3 DISCHARGED aggregate count across both models. - §4.2 AC-SHIP1-001 row annotated **(PARTIAL_ALGORITHM_LEVEL v2.32.0)**. Completes MODEL-1 to 10/10 touched; only AC-SHIP1-009 (license / provenance metadata) remains pending in the MODEL-1 table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
Apr 23, 2026
Completes the MODEL-1 compute-free PARTIAL coverage at 10/10 touched. Stacked follow-up to SHIP-003 (#1028) + SHIP-004 (#1029). Spec changes: - **Version:** 2.31.0 → 2.32.0 - Date line appended with v2.32.0 entry describing the three pure verdict fns in `crates/aprender-core/src/format/ship_001.rs`, the three bound constants (AC_SHIP1_001_SAFETENSORS_HEADER_PREFIX_LEN = 8, AC_SHIP1_001_SAFETENSORS_JSON_OPEN_BYTE = 0x7B, Result-boundary), the triple mutation survey (Result × header-size × open-byte), `cargo test -p aprender-core --lib format::ship_001` green (3/3), full-discharge blocker (live `realizar::Model::load_safetensors` on RTX 4090 with `--features cuda`), MODEL-1 9/10 → 10/10 coverage, and the 16 PARTIAL + 3 DISCHARGED aggregate count across both models. - §4.2 AC-SHIP1-001 row annotated **(PARTIAL_ALGORITHM_LEVEL v2.32.0)**. Completes MODEL-1 to 10/10 touched; only AC-SHIP1-009 (license / provenance metadata) remains pending in the MODEL-1 table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
Apr 23, 2026
…TIAL discharge (10/10) (#1030) * WIP: FALSIFY-SHIP-001 PARTIAL — MODEL-1 reproducible build verdict fns Stacked atop SHIP-003 (f9c2d47) + SHIP-004 (5f1db6a). Pushed as safety net before /tmp clears — NOT PR-ready yet. Contents: - crates/aprender-core/src/format/ship_001.rs (NEW): 3 pure verdict fns + 3/3 tests green locally. - crates/aprender-core/src/format/mod.rs: adds `pub mod ship_001`. - contracts/qwen2-e2e-verification-v1.yaml: speculative v1.4.0→v1.5.0 bump. Known follow-up before opening PR in next session: - Rebase onto main (now at 651e07b / post-SHIP-010) — main already carries publish-manifest-v1 v1.4.0 at SHIP-010, so the qwen2-e2e YAML bump here must be renumbered based on the landing order against current main. - Stack-push sequence per memory `project_ship_two_001_session_wrap_20260423.md`: SHIP-003 (task #162) → SHIP-004 (#164) → SHIP-001 (#165). - Full discharge of SHIP-001 blocks on live 3-run reproducible-build harness with sha256 manifest diff on RTX 4090 host. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * spec(falsify-ship-001): bump v2.31→v2.32 for SHIP-001 PARTIAL (10/10) Completes the MODEL-1 compute-free PARTIAL coverage at 10/10 touched. Stacked follow-up to SHIP-003 (#1028) + SHIP-004 (#1029). Spec changes: - **Version:** 2.31.0 → 2.32.0 - Date line appended with v2.32.0 entry describing the three pure verdict fns in `crates/aprender-core/src/format/ship_001.rs`, the three bound constants (AC_SHIP1_001_SAFETENSORS_HEADER_PREFIX_LEN = 8, AC_SHIP1_001_SAFETENSORS_JSON_OPEN_BYTE = 0x7B, Result-boundary), the triple mutation survey (Result × header-size × open-byte), `cargo test -p aprender-core --lib format::ship_001` green (3/3), full-discharge blocker (live `realizar::Model::load_safetensors` on RTX 4090 with `--features cuda`), MODEL-1 9/10 → 10/10 coverage, and the 16 PARTIAL + 3 DISCHARGED aggregate count across both models. - §4.2 AC-SHIP1-001 row annotated **(PARTIAL_ALGORITHM_LEVEL v2.32.0)**. Completes MODEL-1 to 10/10 touched; only AC-SHIP1-009 (license / provenance metadata) remains pending in the MODEL-1 table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
AC-SHIP1-003(per-layer cosine similarity afterapr convert --quantize q4_k_m) to pure verdict functions atdischarge_status: PARTIAL_ALGORITHM_LEVEL.contracts/qwen2-e2e-verification-v1.yamlv1.3.0 → v1.4.0 ACTIVE withFALSIFY-QW2E-SHIP-003annotated with 3evidence_discharged_bypins,full_discharge_blocks_on(real 7B .apr + 28×7=196 projection matrix harness on RTX 4090), and 7 counter-example classes.crates/aprender-core/src/format/ship_003.rsconst AC_SHIP1_003_MIN_COSINE_SIMILARITY: f32 = 0.999verdict_from_cosine_similarity(sim, threshold)— f32-threshold with[-1.0, 1.0]range guard + non-finite rejectionverdict_from_per_layer_cosines(sims, threshold)— aggregate-AND; empty → Fail; short-circuit on first Failfalsify_ship_003_cosine_similarity_threshold_logic— 8 sections: exact boundary, ULP-below, safe-above/below bands, monotonic sweep, non-finite, out-of-range, provenance pin.falsify_ship_003_per_layer_aggregate_and— 7 sections: all-Pass 196, single-Fail, all-Fail, empty-Fail, single-element, first-layer NaN/OOR short-circuit, last-layer Fail.Test plan
cargo test -p aprender-core --lib format::ship_003— 2/2 passedpv validate contracts/qwen2-e2e-verification-v1.yaml— 0 errors, 0 warningsapr convert --quantize q4_k_m paiml/qwen2.5-coder-7b-apache-q4k-v1.safetensorson RTX 4090 +apr diffper-layer cosine harness (separate compute-dispatch task)🤖 Generated with Claude Code