docs(spec)+evidence: SHIP-TWO-001 §72 — 5-AC LIVE cascade SHIP-001/003/004/009/010 (MODEL-1 95% → 99%)#1646
Merged
Conversation
…IP-001/003/004/009/010 PARTIAL→LIVE-DISCHARGED (PMAT-CODE-SHIP-TWO-SECTION-72)
Closes 5 of the 6 algorithm-level PARTIALs left after §71 closed SHIP-005.
Only SHIP-007 (multi-PR CUDA cascade per §63) remains as a PARTIAL.
The cascade is EVIDENCE-ONLY — no code changes. Five ACs already had
falsifier tests at PARTIAL_ALGORITHM_LEVEL (`#[test]`s merged); they
just lacked LIVE-evidence runs on the canonical 7B Qwen2.5-Coder-
Instruct teacher.
Evidence captured (lambda-vector, RTX 4090, post-§71 main binary):
SHIP-001 apr run <safetensors> --prompt 'Hello' --max-tokens 4
→ exit 0, 62.55s load via realizar
SHIP-003 apr diff <safetensors> <q4k.apr> --values --filter weight
--limit 20 --transpose-aware
→ 20 tensors at cos_sim=1.000000 (floor 0.999)
SHIP-004 llama-cli -m <q4k.gguf> -p 'Hello' -n 8 -ngl 99 -st
→ exit 0, "Hello! How can I help you today",
133.1 gen tok/s, model 5580 MiB on RTX 4090
SHIP-009 apr inspect <q4k.apr>
→ license: Apache-2.0,
data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
SHIP-010 curl HF tree API + sha256sum on gx10 canonical teacher
→ 0a854098… == HF lfs.oid 0a854098…, 8035635524 bytes
§17.5 + AC-SHIP1 chain post-§72:
SHIP-001 LIVE-DISCHARGED ← §72
SHIP-002 LIVE-DISCHARGED (#1609 §61)
SHIP-003 LIVE-DISCHARGED ← §72
SHIP-004 LIVE-DISCHARGED ← §72
SHIP-005 LIVE-DISCHARGED (§71)
SHIP-006 LIVE-DISCHARGED (#1615 §61.8)
SHIP-007 PARTIAL — multi-PR CUDA cascade (§63)
SHIP-008 LIVE-DISCHARGED (#1614 §61)
SHIP-009 LIVE-DISCHARGED ← §72
SHIP-010 LIVE-DISCHARGED ← §72
9 of 10 AC-SHIP1-* LIVE-discharged.
Ship-% movement:
MODEL-1 ship %: 95% → 99% (5 algorithm-level PARTIALs → LIVE)
Path to 100% = SHIP-007 multi-PR CUDA cascade per §63:
Layer 1: cuBLASLt FP8 JIT warmup ILLEGAL_ADDRESS root fix
Layer 2: CUDA-vs-CPU parity (cosine -0.005 on Qwen 7B dims)
Layer 3: throughput 5.6 → 30 tok/s
Host: RTX 4090 / lambda-vector (gx10 is wrong arch)
MODEL-2 ship %: unchanged at 57%
Methodology lesson #19 NEW: algorithm-level falsifiers + small evidence
runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of
missing live evidence (not missing algorithm), batch-discharge in one
cascade rather than treating each as separate ship-row work. The 95→99%
jump is the highest-ROI move because the algorithms are already merged.
Spec v3.17.0 → v3.18.0.
Evidence:
- evidence/section-72-ship-live-cascade-2026-05-12/findings.json
- ship-001-apr-run-safetensors.txt (exit 0 + 62.55s load)
- ship-003-apr-diff-q4k-roundtrip.txt (20 tensors at cos_sim=1.000000)
- ship-004-llama-cli-stdout.txt (llama.cpp first-response on canonical GGUF)
- ship-009-apr-inspect.txt (license + provenance fields)
- ship-010-sha256-match.json + ship-010-hf-tree.json (sha256 match)
Refs:
- AC-SHIP1-001 through AC-SHIP1-010 (spec §5)
- §71 (SHIP-005 LIVE-DISCHARGED, predecessor)
- §63 (SHIP-007 multi-PR cascade scope)
- contracts/eval-harness-humaneval-v1.yaml + contracts/apr-publish-hf-large-file-v1.yaml + contracts/apr-provenance-v1.yaml (PARTIAL_ALGORITHM_LEVEL → LIVE-DISCHARGED)
Closes tasks #59-63.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
8ed8f0e to
71dcef6
Compare
6 tasks
noahgift
added a commit
that referenced
this pull request
May 13, 2026
…T-CODE-V0-33-0-RELEASE-PREP) 🎉 v0.33.0 marks **MODEL-1 SHIP % = 100%** for SHIP-TWO-001. All 10 AC-SHIP1-* falsifiers are LIVE-discharged on the canonical 7B Qwen2.5-Coder-Instruct Q4_K_M teacher (lambda-vector RTX 4090, --features cuda). This release prep PR ships: 1. CHANGELOG.md [0.33.0] entry with §69-§75 highlights: - 🎉 MODEL-1 SHIP % = 100% (all 10 AC-SHIP1-* LIVE) - Fixed: SHIP-007 F32 GEMV PTX layout (PR #1651, §75) — 124.6 tok/s - Fixed: SHIP-005 HumanEval RC3 (PR #1635, §70/§71) — pass@1 86.59% - Added: APR_EVAL_DEBUG=1 diagnostic surface (PR #1634) - Added: APR_GPU_STAGE_DUMP=<dir> diagnostic surface (PR #1649) - Added: MBPP harness H4 fix (PR #1645) - Added: 2 new falsifiable contracts (apr-eval-humaneval-harness- invariant v1.1.0, apr-ship-007-gpu-stage-bisection v1.0.0) - Methodology lessons #16-22 captured in MEMORY.md - Spec: v3.13.0 → v3.21.0 across §67-§75 2. Workspace version bump: - [workspace.package].version: 0.32.0 → 0.33.0 - Root [package].version (aprender facade crate): 0.32.0 → 0.33.0 - 28 sub-crate version literals: 0.32.0 → 0.33.0 3. `cargo check -p aprender` → clean (workspace builds at 0.33.0). Out of scope for this PR (separate steps after #1651/1652 land + this PR lands): - Tag release `v0.33.0` on main - Cascade publish to crates.io (per memory project_ship_two_001_v0_32_0_release.md — 15 user-facing crates + 7 internal-tier in topological dependency order; uses `make publish CRATE=<name>`) - Post-publish QA per `feedback_post_publish_qa_required.md` — `cargo install aprender --force` + `/dogfood` GO verdict required before declaring release done (v0.31.1 was yanked for skipping this) - GitHub Release with §75 narrative - HF artifact verification (paiml/qwen2.5-coder-7b-apache-q4k-v1 sha256 already verified by §72 SHIP-010 LIVE evidence; double-check before release announcement) This PR ships ONLY the version-bump + CHANGELOG. Publishing is the next step after merge. Refs: - §75 MODEL-1 100% (PR #1652) - §74 SHIP-007 bug localized (PR #1650) - §73 SHIP-007 cascade reduction (PR #1647) - §72 5-AC LIVE cascade (PR #1646) - §71 SHIP-005 LIVE-DISCHARGED (PR #1642) - §70 RC3 fix (PR #1636) - §69 Q4K hypothesis falsified (PR #1633) - PR #1635 RC3 prepend - PR #1634 diagnostic surface + contract - PR #1648 SHIP-007 contract scaffold - PR #1649 SHIP-007 PR-B stage dump - PR #1651 SHIP-007 PR-E F32 GEMV layout fix Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 13, 2026
…T-CODE-V0-33-0-RELEASE-PREP) (#1653) 🎉 v0.33.0 marks **MODEL-1 SHIP % = 100%** for SHIP-TWO-001. All 10 AC-SHIP1-* falsifiers are LIVE-discharged on the canonical 7B Qwen2.5-Coder-Instruct Q4_K_M teacher (lambda-vector RTX 4090, --features cuda). This release prep PR ships: 1. CHANGELOG.md [0.33.0] entry with §69-§75 highlights: - 🎉 MODEL-1 SHIP % = 100% (all 10 AC-SHIP1-* LIVE) - Fixed: SHIP-007 F32 GEMV PTX layout (PR #1651, §75) — 124.6 tok/s - Fixed: SHIP-005 HumanEval RC3 (PR #1635, §70/§71) — pass@1 86.59% - Added: APR_EVAL_DEBUG=1 diagnostic surface (PR #1634) - Added: APR_GPU_STAGE_DUMP=<dir> diagnostic surface (PR #1649) - Added: MBPP harness H4 fix (PR #1645) - Added: 2 new falsifiable contracts (apr-eval-humaneval-harness- invariant v1.1.0, apr-ship-007-gpu-stage-bisection v1.0.0) - Methodology lessons #16-22 captured in MEMORY.md - Spec: v3.13.0 → v3.21.0 across §67-§75 2. Workspace version bump: - [workspace.package].version: 0.32.0 → 0.33.0 - Root [package].version (aprender facade crate): 0.32.0 → 0.33.0 - 28 sub-crate version literals: 0.32.0 → 0.33.0 3. `cargo check -p aprender` → clean (workspace builds at 0.33.0). Out of scope for this PR (separate steps after #1651/1652 land + this PR lands): - Tag release `v0.33.0` on main - Cascade publish to crates.io (per memory project_ship_two_001_v0_32_0_release.md — 15 user-facing crates + 7 internal-tier in topological dependency order; uses `make publish CRATE=<name>`) - Post-publish QA per `feedback_post_publish_qa_required.md` — `cargo install aprender --force` + `/dogfood` GO verdict required before declaring release done (v0.31.1 was yanked for skipping this) - GitHub Release with §75 narrative - HF artifact verification (paiml/qwen2.5-coder-7b-apache-q4k-v1 sha256 already verified by §72 SHIP-010 LIVE evidence; double-check before release announcement) This PR ships ONLY the version-bump + CHANGELOG. Publishing is the next step after merge. Refs: - §75 MODEL-1 100% (PR #1652) - §74 SHIP-007 bug localized (PR #1650) - §73 SHIP-007 cascade reduction (PR #1647) - §72 5-AC LIVE cascade (PR #1646) - §71 SHIP-005 LIVE-DISCHARGED (PR #1642) - §70 RC3 fix (PR #1636) - §69 Q4K hypothesis falsified (PR #1633) - PR #1635 RC3 prepend - PR #1634 diagnostic surface + contract - PR #1648 SHIP-007 contract scaffold - PR #1649 SHIP-007 PR-B stage dump - PR #1651 SHIP-007 PR-E F32 GEMV layout fix Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Five ACs (SHIP-001/003/004/009/010) had merged falsifier tests at PARTIAL_ALGORITHM_LEVEL but no LIVE-evidence on canonical 7B Qwen2.5-Coder-Instruct Q4K teacher. §72 captures all 5 in a single ~30-min evidence-only cascade. No code changes.
After §72: 9 of 10 AC-SHIP1- LIVE-discharged.* Only SHIP-007 (multi-PR CUDA cascade per §63) remains.
Evidence captured
apr run <safetensors>exit codeapr diff <safetensors> <q4k.apr> --valuesllama-cli -m <q4k.gguf>apr inspect <q4k.apr> | grep licenselicense: Apache-2.0,data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct0a854098…== HF lfs.oid0a854098…Ship-% movement
Methodology lesson #19 NEW
Algorithm-level falsifiers + small evidence runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of missing live evidence (not missing algorithm), batch-discharge them in one cascade — highest-ROI move because the algorithms are already merged.
Test plan
evidence/section-72-ship-live-cascade-2026-05-12/Refs
🤖 Generated with Claude Code