docs(spec)+evidence: SHIP-TWO-001 §72 — 5-AC LIVE cascade SHIP-001/003/004/009/010 (MODEL-1 95% → 99%) by noahgift · Pull Request #1646 · paiml/aprender

noahgift · 2026-05-12T17:26:18Z

Summary

Five ACs (SHIP-001/003/004/009/010) had merged falsifier tests at PARTIAL_ALGORITHM_LEVEL but no LIVE-evidence on canonical 7B Qwen2.5-Coder-Instruct Q4K teacher. §72 captures all 5 in a single ~30-min evidence-only cascade. No code changes.

After §72: 9 of 10 AC-SHIP1- LIVE-discharged.* Only SHIP-007 (multi-PR CUDA cascade per §63) remains.

Evidence captured

AC	LIVE method	Result
SHIP-001	`apr run <safetensors>` exit code	0, 62.55s load
SHIP-003	`apr diff <safetensors> <q4k.apr> --values`	20 tensors at cos_sim=1.000000 (floor 0.999)
SHIP-004	`llama-cli -m <q4k.gguf>`	exit 0, "Hello! How can I help you today", 133.1 gen tok/s
SHIP-009	`apr inspect <q4k.apr> \| grep license`	`license: Apache-2.0`, `data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct`
SHIP-010	curl HF tree + sha256sum on gx10	`0a854098…` == HF lfs.oid `0a854098…`

Ship-% movement

MODEL-1: 95% → 99% (9/10 AC-SHIP1-* LIVE-discharged)
Path to 100%: SHIP-007 multi-PR CUDA cascade per §63 — needs RTX 4090 / lambda-vector
MODEL-2: unchanged at 57%

Methodology lesson #19 NEW

Algorithm-level falsifiers + small evidence runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of missing live evidence (not missing algorithm), batch-discharge them in one cascade — highest-ROI move because the algorithms are already merged.

Test plan

All 5 LIVE evidence captured on lambda-vector / gx10
Each evidence file archived under evidence/section-72-ship-live-cascade-2026-05-12/
Spec v3.17.0 → v3.18.0 with §72 narrative
No code changes; CI is docs/evidence-only

Refs

§71 (SHIP-005 LIVE-DISCHARGED predecessor)
§63 (SHIP-007 multi-PR cascade scope — remaining 1pp)
AC-SHIP1-001..010 (spec §5)

🤖 Generated with Claude Code

…IP-001/003/004/009/010 PARTIAL→LIVE-DISCHARGED (PMAT-CODE-SHIP-TWO-SECTION-72) Closes 5 of the 6 algorithm-level PARTIALs left after §71 closed SHIP-005. Only SHIP-007 (multi-PR CUDA cascade per §63) remains as a PARTIAL. The cascade is EVIDENCE-ONLY — no code changes. Five ACs already had falsifier tests at PARTIAL_ALGORITHM_LEVEL (`#[test]`s merged); they just lacked LIVE-evidence runs on the canonical 7B Qwen2.5-Coder- Instruct teacher. Evidence captured (lambda-vector, RTX 4090, post-§71 main binary): SHIP-001 apr run <safetensors> --prompt 'Hello' --max-tokens 4 → exit 0, 62.55s load via realizar SHIP-003 apr diff <safetensors> <q4k.apr> --values --filter weight --limit 20 --transpose-aware → 20 tensors at cos_sim=1.000000 (floor 0.999) SHIP-004 llama-cli -m <q4k.gguf> -p 'Hello' -n 8 -ngl 99 -st → exit 0, "Hello! How can I help you today", 133.1 gen tok/s, model 5580 MiB on RTX 4090 SHIP-009 apr inspect <q4k.apr> → license: Apache-2.0, data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct SHIP-010 curl HF tree API + sha256sum on gx10 canonical teacher → 0a854098… == HF lfs.oid 0a854098…, 8035635524 bytes §17.5 + AC-SHIP1 chain post-§72: SHIP-001 LIVE-DISCHARGED ← §72 SHIP-002 LIVE-DISCHARGED (#1609 §61) SHIP-003 LIVE-DISCHARGED ← §72 SHIP-004 LIVE-DISCHARGED ← §72 SHIP-005 LIVE-DISCHARGED (§71) SHIP-006 LIVE-DISCHARGED (#1615 §61.8) SHIP-007 PARTIAL — multi-PR CUDA cascade (§63) SHIP-008 LIVE-DISCHARGED (#1614 §61) SHIP-009 LIVE-DISCHARGED ← §72 SHIP-010 LIVE-DISCHARGED ← §72 9 of 10 AC-SHIP1-* LIVE-discharged. Ship-% movement: MODEL-1 ship %: 95% → 99% (5 algorithm-level PARTIALs → LIVE) Path to 100% = SHIP-007 multi-PR CUDA cascade per §63: Layer 1: cuBLASLt FP8 JIT warmup ILLEGAL_ADDRESS root fix Layer 2: CUDA-vs-CPU parity (cosine -0.005 on Qwen 7B dims) Layer 3: throughput 5.6 → 30 tok/s Host: RTX 4090 / lambda-vector (gx10 is wrong arch) MODEL-2 ship %: unchanged at 57% Methodology lesson #19 NEW: algorithm-level falsifiers + small evidence runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of missing live evidence (not missing algorithm), batch-discharge in one cascade rather than treating each as separate ship-row work. The 95→99% jump is the highest-ROI move because the algorithms are already merged. Spec v3.17.0 → v3.18.0. Evidence: - evidence/section-72-ship-live-cascade-2026-05-12/findings.json - ship-001-apr-run-safetensors.txt (exit 0 + 62.55s load) - ship-003-apr-diff-q4k-roundtrip.txt (20 tensors at cos_sim=1.000000) - ship-004-llama-cli-stdout.txt (llama.cpp first-response on canonical GGUF) - ship-009-apr-inspect.txt (license + provenance fields) - ship-010-sha256-match.json + ship-010-hf-tree.json (sha256 match) Refs: - AC-SHIP1-001 through AC-SHIP1-010 (spec §5) - §71 (SHIP-005 LIVE-DISCHARGED, predecessor) - §63 (SHIP-007 multi-PR cascade scope) - contracts/eval-harness-humaneval-v1.yaml + contracts/apr-publish-hf-large-file-v1.yaml + contracts/apr-provenance-v1.yaml (PARTIAL_ALGORITHM_LEVEL → LIVE-DISCHARGED) Closes tasks #59-63. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…T-CODE-V0-33-0-RELEASE-PREP) 🎉 v0.33.0 marks **MODEL-1 SHIP % = 100%** for SHIP-TWO-001. All 10 AC-SHIP1-* falsifiers are LIVE-discharged on the canonical 7B Qwen2.5-Coder-Instruct Q4_K_M teacher (lambda-vector RTX 4090, --features cuda). This release prep PR ships: 1. CHANGELOG.md [0.33.0] entry with §69-§75 highlights: - 🎉 MODEL-1 SHIP % = 100% (all 10 AC-SHIP1-* LIVE) - Fixed: SHIP-007 F32 GEMV PTX layout (PR #1651, §75) — 124.6 tok/s - Fixed: SHIP-005 HumanEval RC3 (PR #1635, §70/§71) — pass@1 86.59% - Added: APR_EVAL_DEBUG=1 diagnostic surface (PR #1634) - Added: APR_GPU_STAGE_DUMP=<dir> diagnostic surface (PR #1649) - Added: MBPP harness H4 fix (PR #1645) - Added: 2 new falsifiable contracts (apr-eval-humaneval-harness- invariant v1.1.0, apr-ship-007-gpu-stage-bisection v1.0.0) - Methodology lessons #16-22 captured in MEMORY.md - Spec: v3.13.0 → v3.21.0 across §67-§75 2. Workspace version bump: - [workspace.package].version: 0.32.0 → 0.33.0 - Root [package].version (aprender facade crate): 0.32.0 → 0.33.0 - 28 sub-crate version literals: 0.32.0 → 0.33.0 3. `cargo check -p aprender` → clean (workspace builds at 0.33.0). Out of scope for this PR (separate steps after #1651/1652 land + this PR lands): - Tag release `v0.33.0` on main - Cascade publish to crates.io (per memory project_ship_two_001_v0_32_0_release.md — 15 user-facing crates + 7 internal-tier in topological dependency order; uses `make publish CRATE=<name>`) - Post-publish QA per `feedback_post_publish_qa_required.md` — `cargo install aprender --force` + `/dogfood` GO verdict required before declaring release done (v0.31.1 was yanked for skipping this) - GitHub Release with §75 narrative - HF artifact verification (paiml/qwen2.5-coder-7b-apache-q4k-v1 sha256 already verified by §72 SHIP-010 LIVE evidence; double-check before release announcement) This PR ships ONLY the version-bump + CHANGELOG. Publishing is the next step after merge. Refs: - §75 MODEL-1 100% (PR #1652) - §74 SHIP-007 bug localized (PR #1650) - §73 SHIP-007 cascade reduction (PR #1647) - §72 5-AC LIVE cascade (PR #1646) - §71 SHIP-005 LIVE-DISCHARGED (PR #1642) - §70 RC3 fix (PR #1636) - §69 Q4K hypothesis falsified (PR #1633) - PR #1635 RC3 prepend - PR #1634 diagnostic surface + contract - PR #1648 SHIP-007 contract scaffold - PR #1649 SHIP-007 PR-B stage dump - PR #1651 SHIP-007 PR-E F32 GEMV layout fix Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…T-CODE-V0-33-0-RELEASE-PREP) (#1653) 🎉 v0.33.0 marks **MODEL-1 SHIP % = 100%** for SHIP-TWO-001. All 10 AC-SHIP1-* falsifiers are LIVE-discharged on the canonical 7B Qwen2.5-Coder-Instruct Q4_K_M teacher (lambda-vector RTX 4090, --features cuda). This release prep PR ships: 1. CHANGELOG.md [0.33.0] entry with §69-§75 highlights: - 🎉 MODEL-1 SHIP % = 100% (all 10 AC-SHIP1-* LIVE) - Fixed: SHIP-007 F32 GEMV PTX layout (PR #1651, §75) — 124.6 tok/s - Fixed: SHIP-005 HumanEval RC3 (PR #1635, §70/§71) — pass@1 86.59% - Added: APR_EVAL_DEBUG=1 diagnostic surface (PR #1634) - Added: APR_GPU_STAGE_DUMP=<dir> diagnostic surface (PR #1649) - Added: MBPP harness H4 fix (PR #1645) - Added: 2 new falsifiable contracts (apr-eval-humaneval-harness- invariant v1.1.0, apr-ship-007-gpu-stage-bisection v1.0.0) - Methodology lessons #16-22 captured in MEMORY.md - Spec: v3.13.0 → v3.21.0 across §67-§75 2. Workspace version bump: - [workspace.package].version: 0.32.0 → 0.33.0 - Root [package].version (aprender facade crate): 0.32.0 → 0.33.0 - 28 sub-crate version literals: 0.32.0 → 0.33.0 3. `cargo check -p aprender` → clean (workspace builds at 0.33.0). Out of scope for this PR (separate steps after #1651/1652 land + this PR lands): - Tag release `v0.33.0` on main - Cascade publish to crates.io (per memory project_ship_two_001_v0_32_0_release.md — 15 user-facing crates + 7 internal-tier in topological dependency order; uses `make publish CRATE=<name>`) - Post-publish QA per `feedback_post_publish_qa_required.md` — `cargo install aprender --force` + `/dogfood` GO verdict required before declaring release done (v0.31.1 was yanked for skipping this) - GitHub Release with §75 narrative - HF artifact verification (paiml/qwen2.5-coder-7b-apache-q4k-v1 sha256 already verified by §72 SHIP-010 LIVE evidence; double-check before release announcement) This PR ships ONLY the version-bump + CHANGELOG. Publishing is the next step after merge. Refs: - §75 MODEL-1 100% (PR #1652) - §74 SHIP-007 bug localized (PR #1650) - §73 SHIP-007 cascade reduction (PR #1647) - §72 5-AC LIVE cascade (PR #1646) - §71 SHIP-005 LIVE-DISCHARGED (PR #1642) - §70 RC3 fix (PR #1636) - §69 Q4K hypothesis falsified (PR #1633) - PR #1635 RC3 prepend - PR #1634 diagnostic surface + contract - PR #1648 SHIP-007 contract scaffold - PR #1649 SHIP-007 PR-B stage dump - PR #1651 SHIP-007 PR-E F32 GEMV layout fix Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 12, 2026 17:26

noahgift force-pushed the feat/ship-001-003-004-009-010-live-evidence branch from 8ed8f0e to 71dcef6 Compare May 12, 2026 21:38

Merge branch 'main' into feat/ship-001-003-004-009-010-live-evidence

974883e

noahgift merged commit 91cd0a1 into main May 12, 2026
10 checks passed

noahgift deleted the feat/ship-001-003-004-009-010-live-evidence branch May 12, 2026 23:06

noahgift mentioned this pull request May 13, 2026

fix(task-148): Toyota Way 500-line refactor + FALSIFY-CORPUS-004 + QLoRA + GPU training backend #1003

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(spec)+evidence: SHIP-TWO-001 §72 — 5-AC LIVE cascade SHIP-001/003/004/009/010 (MODEL-1 95% → 99%)#1646

docs(spec)+evidence: SHIP-TWO-001 §72 — 5-AC LIVE cascade SHIP-001/003/004/009/010 (MODEL-1 95% → 99%)#1646
noahgift merged 2 commits into
mainfrom
feat/ship-001-003-004-009-010-live-evidence

noahgift commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 12, 2026

Summary

Evidence captured

Ship-% movement

Methodology lesson #19 NEW

Test plan

Refs

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant