Skip to content

feat(falsify-ship-003): MODEL-1 apr convert q4_k_m per-layer cos ≥ 0.999 PARTIAL discharge (8/10)#1028

Merged
noahgift merged 1 commit into
mainfrom
feat/falsify-ship-003-partial-discharge
Apr 23, 2026
Merged

feat(falsify-ship-003): MODEL-1 apr convert q4_k_m per-layer cos ≥ 0.999 PARTIAL discharge (8/10)#1028
noahgift merged 1 commit into
mainfrom
feat/falsify-ship-003-partial-discharge

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

  • SHIP-TWO-001 spec v2.29.0 → v2.30.0: 8th compute-free MODEL-1 PARTIAL lever binding AC-SHIP1-003 (per-layer cosine similarity after apr convert --quantize q4_k_m) to pure verdict functions at discharge_status: PARTIAL_ALGORITHM_LEVEL.
  • Contract bump: contracts/qwen2-e2e-verification-v1.yaml v1.3.0 → v1.4.0 ACTIVE with FALSIFY-QW2E-SHIP-003 annotated with 3 evidence_discharged_by pins, full_discharge_blocks_on (real 7B .apr + 28×7=196 projection matrix harness on RTX 4090), and 7 counter-example classes.
  • New Rust binding: crates/aprender-core/src/format/ship_003.rs
    • const AC_SHIP1_003_MIN_COSINE_SIMILARITY: f32 = 0.999
    • verdict_from_cosine_similarity(sim, threshold) — f32-threshold with [-1.0, 1.0] range guard + non-finite rejection
    • verdict_from_per_layer_cosines(sims, threshold) — aggregate-AND; empty → Fail; short-circuit on first Fail
  • Twin mutation surveys (15 sections total):
    1. falsify_ship_003_cosine_similarity_threshold_logic — 8 sections: exact boundary, ULP-below, safe-above/below bands, monotonic sweep, non-finite, out-of-range, provenance pin.
    2. falsify_ship_003_per_layer_aggregate_and — 7 sections: all-Pass 196, single-Fail, all-Fail, empty-Fail, single-element, first-layer NaN/OOR short-circuit, last-layer Fail.
  • Coverage: MODEL-1 7/10 → 8/10; 14 PARTIAL + 3 DISCHARGED across both models.
  • First MODEL-1 PARTIAL to combine a single-number threshold (SHIP-007/SHIP-020 shape) with an aggregate-AND combinator (SHIP-016 shape) in one discharge.

Test plan

  • cargo test -p aprender-core --lib format::ship_003 — 2/2 passed
  • pv validate contracts/qwen2-e2e-verification-v1.yaml — 0 errors, 0 warnings
  • Cherry-pick conflict resolution layered SHIP-003 (v2.30.0) on top of SHIP-007 (v2.29.0) and SHIP-010 (v2.28.0) without dropping history
  • Full discharge blocks on live apr convert --quantize q4_k_m paiml/qwen2.5-coder-7b-apache-q4k-v1.safetensors on RTX 4090 + apr diff per-layer cosine harness (separate compute-dispatch task)

🤖 Generated with Claude Code

…999 PARTIAL discharge (7/10)

SHIP-TWO-001 spec v2.27.0 → v2.28.0: 7th compute-free MODEL-1 PARTIAL lever,
binding AC-SHIP1-003 (per-layer cosine similarity after `apr convert --quantize
q4_k_m`) to pure verdict functions at `discharge_status: PARTIAL_ALGORITHM_LEVEL`.

New: `crates/aprender-core/src/format/ship_003.rs`
- const AC_SHIP1_003_MIN_COSINE_SIMILARITY: f32 = 0.999
- enum Ship003Verdict { Pass, Fail }
- fn verdict_from_cosine_similarity(sim: f32, threshold: f32) -> Ship003Verdict
  (f32-threshold with range guard + non-finite rejection)
- fn verdict_from_per_layer_cosines(sims: &[f32], threshold: f32) -> Ship003Verdict
  (aggregate-AND over per-layer vector; empty → Fail; short-circuit on first Fail)

Twin mutation surveys:
1. falsify_ship_003_cosine_similarity_threshold_logic — 8 sections:
   exact boundary, ULP-below (`f32::from_bits(0x3F7FBE77 - 1)`), safe-above
   {0.9999, 1.0}, safe-below {0.998, 0.5, 0.0, -1.0}, monotonic sweep [0.990..1.0]
   step 1e-4, non-finite (NaN/+∞/-∞) on both sim+threshold, out-of-range guards
   ({-1.5, 1.5, -2.0, 2.0}), provenance pin assert_eq const == 0.999_f32.
2. falsify_ship_003_per_layer_aggregate_and — 7 sections:
   all-Pass 196 (28 layers × 7 projections), single-Fail at index 100, all-Fail,
   empty-Fail (conservative), single-element both directions, first-layer NaN/OOR
   short-circuit, last-layer Fail not short-circuited.

Contract: contracts/qwen2-e2e-verification-v1.yaml v1.2.0 → v1.3.0 ACTIVE.
FALSIFY-QW2E-SHIP-003 now annotated with `discharge_status:
PARTIAL_ALGORITHM_LEVEL`, 3 `evidence_discharged_by` test pins,
`full_discharge_blocks_on` (real 7B .apr + 28×7=196 projection matrix harness on
RTX 4090), and 7 counter_example_classes (regressed_quantizer, drifted_floor,
relaxed_rule, empty_vector_pass, range_guard_bypass, nan_promoted,
sign_flipped_quantizer).

Spec: docs/specifications/aprender-train/ship-two-models-spec.md v2.27.0 →
v2.28.0. AC-SHIP1-003 row annotated `FALSIFY-SHIP-003 **(PARTIAL_ALGORITHM_LEVEL
v2.28.0)**`. Changelog documents first MODEL-1 PARTIAL combining single-number
threshold shape (mirrors SHIP-007/SHIP-020) with aggregate-AND combinator
(mirrors SHIP-016) in one discharge. Coverage: MODEL-1 6/10 → 7/10; 13 PARTIAL +
3 DISCHARGED across both models.

Verification:
- `cargo test -p aprender-core --lib format::ship_003` → 2 passed / 0 failed
- `cargo fmt -p aprender-core --check` → clean
- `pv validate contracts/qwen2-e2e-verification-v1.yaml` → 0 errors, 0 warnings

Full discharge blocks on: MODEL-2 lambda-labs 7B .apr + real 196-projection
cosine-parity harness runner (separate task #126 compute-dispatch).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) April 23, 2026 17:00
@noahgift noahgift merged commit ad9ff4d into main Apr 23, 2026
11 checks passed
@noahgift noahgift deleted the feat/falsify-ship-003-partial-discharge branch April 23, 2026 17:27
noahgift added a commit that referenced this pull request Apr 23, 2026
Completes the MODEL-1 compute-free PARTIAL coverage at 10/10 touched.
Stacked follow-up to SHIP-003 (#1028) + SHIP-004 (#1029).

Spec changes:
- **Version:** 2.31.0 → 2.32.0
- Date line appended with v2.32.0 entry describing the three pure verdict
  fns in `crates/aprender-core/src/format/ship_001.rs`, the three bound
  constants (AC_SHIP1_001_SAFETENSORS_HEADER_PREFIX_LEN = 8,
  AC_SHIP1_001_SAFETENSORS_JSON_OPEN_BYTE = 0x7B, Result-boundary), the
  triple mutation survey (Result × header-size × open-byte), `cargo test
  -p aprender-core --lib format::ship_001` green (3/3), full-discharge
  blocker (live `realizar::Model::load_safetensors` on RTX 4090 with
  `--features cuda`), MODEL-1 9/10 → 10/10 coverage, and the 16 PARTIAL
  + 3 DISCHARGED aggregate count across both models.
- §4.2 AC-SHIP1-001 row annotated **(PARTIAL_ALGORITHM_LEVEL v2.32.0)**.

Completes MODEL-1 to 10/10 touched; only AC-SHIP1-009 (license /
provenance metadata) remains pending in the MODEL-1 table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 23, 2026
Completes the MODEL-1 compute-free PARTIAL coverage at 10/10 touched.
Stacked follow-up to SHIP-003 (#1028) + SHIP-004 (#1029).

Spec changes:
- **Version:** 2.31.0 → 2.32.0
- Date line appended with v2.32.0 entry describing the three pure verdict
  fns in `crates/aprender-core/src/format/ship_001.rs`, the three bound
  constants (AC_SHIP1_001_SAFETENSORS_HEADER_PREFIX_LEN = 8,
  AC_SHIP1_001_SAFETENSORS_JSON_OPEN_BYTE = 0x7B, Result-boundary), the
  triple mutation survey (Result × header-size × open-byte), `cargo test
  -p aprender-core --lib format::ship_001` green (3/3), full-discharge
  blocker (live `realizar::Model::load_safetensors` on RTX 4090 with
  `--features cuda`), MODEL-1 9/10 → 10/10 coverage, and the 16 PARTIAL
  + 3 DISCHARGED aggregate count across both models.
- §4.2 AC-SHIP1-001 row annotated **(PARTIAL_ALGORITHM_LEVEL v2.32.0)**.

Completes MODEL-1 to 10/10 touched; only AC-SHIP1-009 (license /
provenance metadata) remains pending in the MODEL-1 table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 23, 2026
…TIAL discharge (10/10) (#1030)

* WIP: FALSIFY-SHIP-001 PARTIAL — MODEL-1 reproducible build verdict fns

Stacked atop SHIP-003 (f9c2d47) + SHIP-004 (5f1db6a). Pushed as safety
net before /tmp clears — NOT PR-ready yet.

Contents:
- crates/aprender-core/src/format/ship_001.rs (NEW): 3 pure verdict fns
  + 3/3 tests green locally.
- crates/aprender-core/src/format/mod.rs: adds `pub mod ship_001`.
- contracts/qwen2-e2e-verification-v1.yaml: speculative v1.4.0→v1.5.0 bump.

Known follow-up before opening PR in next session:
- Rebase onto main (now at 651e07b / post-SHIP-010) — main already carries
  publish-manifest-v1 v1.4.0 at SHIP-010, so the qwen2-e2e YAML bump here
  must be renumbered based on the landing order against current main.
- Stack-push sequence per memory `project_ship_two_001_session_wrap_20260423.md`:
  SHIP-003 (task #162) → SHIP-004 (#164) → SHIP-001 (#165).
- Full discharge of SHIP-001 blocks on live 3-run reproducible-build harness
  with sha256 manifest diff on RTX 4090 host.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* spec(falsify-ship-001): bump v2.31→v2.32 for SHIP-001 PARTIAL (10/10)

Completes the MODEL-1 compute-free PARTIAL coverage at 10/10 touched.
Stacked follow-up to SHIP-003 (#1028) + SHIP-004 (#1029).

Spec changes:
- **Version:** 2.31.0 → 2.32.0
- Date line appended with v2.32.0 entry describing the three pure verdict
  fns in `crates/aprender-core/src/format/ship_001.rs`, the three bound
  constants (AC_SHIP1_001_SAFETENSORS_HEADER_PREFIX_LEN = 8,
  AC_SHIP1_001_SAFETENSORS_JSON_OPEN_BYTE = 0x7B, Result-boundary), the
  triple mutation survey (Result × header-size × open-byte), `cargo test
  -p aprender-core --lib format::ship_001` green (3/3), full-discharge
  blocker (live `realizar::Model::load_safetensors` on RTX 4090 with
  `--features cuda`), MODEL-1 9/10 → 10/10 coverage, and the 16 PARTIAL
  + 3 DISCHARGED aggregate count across both models.
- §4.2 AC-SHIP1-001 row annotated **(PARTIAL_ALGORITHM_LEVEL v2.32.0)**.

Completes MODEL-1 to 10/10 touched; only AC-SHIP1-009 (license /
provenance metadata) remains pending in the MODEL-1 table.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant