Skip to content

docs(spec)+evidence: SHIP-TWO-001 §72 — 5-AC LIVE cascade SHIP-001/003/004/009/010 (MODEL-1 95% → 99%)#1646

Merged
noahgift merged 2 commits into
mainfrom
feat/ship-001-003-004-009-010-live-evidence
May 12, 2026
Merged

docs(spec)+evidence: SHIP-TWO-001 §72 — 5-AC LIVE cascade SHIP-001/003/004/009/010 (MODEL-1 95% → 99%)#1646
noahgift merged 2 commits into
mainfrom
feat/ship-001-003-004-009-010-live-evidence

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Five ACs (SHIP-001/003/004/009/010) had merged falsifier tests at PARTIAL_ALGORITHM_LEVEL but no LIVE-evidence on canonical 7B Qwen2.5-Coder-Instruct Q4K teacher. §72 captures all 5 in a single ~30-min evidence-only cascade. No code changes.

After §72: 9 of 10 AC-SHIP1- LIVE-discharged.* Only SHIP-007 (multi-PR CUDA cascade per §63) remains.

Evidence captured

AC LIVE method Result
SHIP-001 apr run <safetensors> exit code 0, 62.55s load
SHIP-003 apr diff <safetensors> <q4k.apr> --values 20 tensors at cos_sim=1.000000 (floor 0.999)
SHIP-004 llama-cli -m <q4k.gguf> exit 0, "Hello! How can I help you today", 133.1 gen tok/s
SHIP-009 apr inspect <q4k.apr> | grep license license: Apache-2.0, data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
SHIP-010 curl HF tree + sha256sum on gx10 0a854098… == HF lfs.oid 0a854098…

Ship-% movement

  • MODEL-1: 95% → 99% (9/10 AC-SHIP1-* LIVE-discharged)
  • Path to 100%: SHIP-007 multi-PR CUDA cascade per §63 — needs RTX 4090 / lambda-vector
  • MODEL-2: unchanged at 57%

Methodology lesson #19 NEW

Algorithm-level falsifiers + small evidence runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of missing live evidence (not missing algorithm), batch-discharge them in one cascade — highest-ROI move because the algorithms are already merged.

Test plan

  • All 5 LIVE evidence captured on lambda-vector / gx10
  • Each evidence file archived under evidence/section-72-ship-live-cascade-2026-05-12/
  • Spec v3.17.0 → v3.18.0 with §72 narrative
  • No code changes; CI is docs/evidence-only

Refs

  • §71 (SHIP-005 LIVE-DISCHARGED predecessor)
  • §63 (SHIP-007 multi-PR cascade scope — remaining 1pp)
  • AC-SHIP1-001..010 (spec §5)

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) May 12, 2026 17:26
…IP-001/003/004/009/010 PARTIAL→LIVE-DISCHARGED (PMAT-CODE-SHIP-TWO-SECTION-72)

Closes 5 of the 6 algorithm-level PARTIALs left after §71 closed SHIP-005.
Only SHIP-007 (multi-PR CUDA cascade per §63) remains as a PARTIAL.

The cascade is EVIDENCE-ONLY — no code changes. Five ACs already had
falsifier tests at PARTIAL_ALGORITHM_LEVEL (`#[test]`s merged); they
just lacked LIVE-evidence runs on the canonical 7B Qwen2.5-Coder-
Instruct teacher.

Evidence captured (lambda-vector, RTX 4090, post-§71 main binary):

  SHIP-001  apr run <safetensors> --prompt 'Hello' --max-tokens 4
            → exit 0, 62.55s load via realizar
  SHIP-003  apr diff <safetensors> <q4k.apr> --values --filter weight
            --limit 20 --transpose-aware
            → 20 tensors at cos_sim=1.000000 (floor 0.999)
  SHIP-004  llama-cli -m <q4k.gguf> -p 'Hello' -n 8 -ngl 99 -st
            → exit 0, "Hello! How can I help you today",
              133.1 gen tok/s, model 5580 MiB on RTX 4090
  SHIP-009  apr inspect <q4k.apr>
            → license: Apache-2.0,
              data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
  SHIP-010  curl HF tree API + sha256sum on gx10 canonical teacher
            → 0a854098… == HF lfs.oid 0a854098…, 8035635524 bytes

§17.5 + AC-SHIP1 chain post-§72:

  SHIP-001  LIVE-DISCHARGED ← §72
  SHIP-002  LIVE-DISCHARGED (#1609 §61)
  SHIP-003  LIVE-DISCHARGED ← §72
  SHIP-004  LIVE-DISCHARGED ← §72
  SHIP-005  LIVE-DISCHARGED (§71)
  SHIP-006  LIVE-DISCHARGED (#1615 §61.8)
  SHIP-007  PARTIAL — multi-PR CUDA cascade (§63)
  SHIP-008  LIVE-DISCHARGED (#1614 §61)
  SHIP-009  LIVE-DISCHARGED ← §72
  SHIP-010  LIVE-DISCHARGED ← §72

9 of 10 AC-SHIP1-* LIVE-discharged.

Ship-% movement:
  MODEL-1 ship %: 95% → 99% (5 algorithm-level PARTIALs → LIVE)
  Path to 100% = SHIP-007 multi-PR CUDA cascade per §63:
    Layer 1: cuBLASLt FP8 JIT warmup ILLEGAL_ADDRESS root fix
    Layer 2: CUDA-vs-CPU parity (cosine -0.005 on Qwen 7B dims)
    Layer 3: throughput 5.6 → 30 tok/s
    Host: RTX 4090 / lambda-vector (gx10 is wrong arch)
  MODEL-2 ship %: unchanged at 57%

Methodology lesson #19 NEW: algorithm-level falsifiers + small evidence
runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of
missing live evidence (not missing algorithm), batch-discharge in one
cascade rather than treating each as separate ship-row work. The 95→99%
jump is the highest-ROI move because the algorithms are already merged.

Spec v3.17.0 → v3.18.0.

Evidence:
- evidence/section-72-ship-live-cascade-2026-05-12/findings.json
- ship-001-apr-run-safetensors.txt (exit 0 + 62.55s load)
- ship-003-apr-diff-q4k-roundtrip.txt (20 tensors at cos_sim=1.000000)
- ship-004-llama-cli-stdout.txt (llama.cpp first-response on canonical GGUF)
- ship-009-apr-inspect.txt (license + provenance fields)
- ship-010-sha256-match.json + ship-010-hf-tree.json (sha256 match)

Refs:
- AC-SHIP1-001 through AC-SHIP1-010 (spec §5)
- §71 (SHIP-005 LIVE-DISCHARGED, predecessor)
- §63 (SHIP-007 multi-PR cascade scope)
- contracts/eval-harness-humaneval-v1.yaml + contracts/apr-publish-hf-large-file-v1.yaml + contracts/apr-provenance-v1.yaml (PARTIAL_ALGORITHM_LEVEL → LIVE-DISCHARGED)

Closes tasks #59-63.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/ship-001-003-004-009-010-live-evidence branch from 8ed8f0e to 71dcef6 Compare May 12, 2026 21:38
@noahgift noahgift merged commit 91cd0a1 into main May 12, 2026
10 checks passed
@noahgift noahgift deleted the feat/ship-001-003-004-009-010-live-evidence branch May 12, 2026 23:06
noahgift added a commit that referenced this pull request May 13, 2026
…T-CODE-V0-33-0-RELEASE-PREP)

🎉 v0.33.0 marks **MODEL-1 SHIP % = 100%** for SHIP-TWO-001.

All 10 AC-SHIP1-* falsifiers are LIVE-discharged on the canonical
7B Qwen2.5-Coder-Instruct Q4_K_M teacher (lambda-vector RTX 4090,
--features cuda).

This release prep PR ships:
1. CHANGELOG.md [0.33.0] entry with §69-§75 highlights:
   - 🎉 MODEL-1 SHIP % = 100% (all 10 AC-SHIP1-* LIVE)
   - Fixed: SHIP-007 F32 GEMV PTX layout (PR #1651, §75) — 124.6 tok/s
   - Fixed: SHIP-005 HumanEval RC3 (PR #1635, §70/§71) — pass@1 86.59%
   - Added: APR_EVAL_DEBUG=1 diagnostic surface (PR #1634)
   - Added: APR_GPU_STAGE_DUMP=<dir> diagnostic surface (PR #1649)
   - Added: MBPP harness H4 fix (PR #1645)
   - Added: 2 new falsifiable contracts (apr-eval-humaneval-harness-
     invariant v1.1.0, apr-ship-007-gpu-stage-bisection v1.0.0)
   - Methodology lessons #16-22 captured in MEMORY.md
   - Spec: v3.13.0 → v3.21.0 across §67-§75

2. Workspace version bump:
   - [workspace.package].version: 0.32.0 → 0.33.0
   - Root [package].version (aprender facade crate): 0.32.0 → 0.33.0
   - 28 sub-crate version literals: 0.32.0 → 0.33.0

3. `cargo check -p aprender` → clean (workspace builds at 0.33.0).

Out of scope for this PR (separate steps after #1651/1652 land + this
PR lands):
- Tag release `v0.33.0` on main
- Cascade publish to crates.io (per memory project_ship_two_001_v0_32_0_release.md
  — 15 user-facing crates + 7 internal-tier in topological dependency
  order; uses `make publish CRATE=<name>`)
- Post-publish QA per `feedback_post_publish_qa_required.md` —
  `cargo install aprender --force` + `/dogfood` GO verdict required
  before declaring release done (v0.31.1 was yanked for skipping this)
- GitHub Release with §75 narrative
- HF artifact verification (paiml/qwen2.5-coder-7b-apache-q4k-v1 sha256
  already verified by §72 SHIP-010 LIVE evidence; double-check before
  release announcement)

This PR ships ONLY the version-bump + CHANGELOG. Publishing is the
next step after merge.

Refs:
- §75 MODEL-1 100% (PR #1652)
- §74 SHIP-007 bug localized (PR #1650)
- §73 SHIP-007 cascade reduction (PR #1647)
- §72 5-AC LIVE cascade (PR #1646)
- §71 SHIP-005 LIVE-DISCHARGED (PR #1642)
- §70 RC3 fix (PR #1636)
- §69 Q4K hypothesis falsified (PR #1633)
- PR #1635 RC3 prepend
- PR #1634 diagnostic surface + contract
- PR #1648 SHIP-007 contract scaffold
- PR #1649 SHIP-007 PR-B stage dump
- PR #1651 SHIP-007 PR-E F32 GEMV layout fix

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 13, 2026
…T-CODE-V0-33-0-RELEASE-PREP) (#1653)

🎉 v0.33.0 marks **MODEL-1 SHIP % = 100%** for SHIP-TWO-001.

All 10 AC-SHIP1-* falsifiers are LIVE-discharged on the canonical
7B Qwen2.5-Coder-Instruct Q4_K_M teacher (lambda-vector RTX 4090,
--features cuda).

This release prep PR ships:
1. CHANGELOG.md [0.33.0] entry with §69-§75 highlights:
   - 🎉 MODEL-1 SHIP % = 100% (all 10 AC-SHIP1-* LIVE)
   - Fixed: SHIP-007 F32 GEMV PTX layout (PR #1651, §75) — 124.6 tok/s
   - Fixed: SHIP-005 HumanEval RC3 (PR #1635, §70/§71) — pass@1 86.59%
   - Added: APR_EVAL_DEBUG=1 diagnostic surface (PR #1634)
   - Added: APR_GPU_STAGE_DUMP=<dir> diagnostic surface (PR #1649)
   - Added: MBPP harness H4 fix (PR #1645)
   - Added: 2 new falsifiable contracts (apr-eval-humaneval-harness-
     invariant v1.1.0, apr-ship-007-gpu-stage-bisection v1.0.0)
   - Methodology lessons #16-22 captured in MEMORY.md
   - Spec: v3.13.0 → v3.21.0 across §67-§75

2. Workspace version bump:
   - [workspace.package].version: 0.32.0 → 0.33.0
   - Root [package].version (aprender facade crate): 0.32.0 → 0.33.0
   - 28 sub-crate version literals: 0.32.0 → 0.33.0

3. `cargo check -p aprender` → clean (workspace builds at 0.33.0).

Out of scope for this PR (separate steps after #1651/1652 land + this
PR lands):
- Tag release `v0.33.0` on main
- Cascade publish to crates.io (per memory project_ship_two_001_v0_32_0_release.md
  — 15 user-facing crates + 7 internal-tier in topological dependency
  order; uses `make publish CRATE=<name>`)
- Post-publish QA per `feedback_post_publish_qa_required.md` —
  `cargo install aprender --force` + `/dogfood` GO verdict required
  before declaring release done (v0.31.1 was yanked for skipping this)
- GitHub Release with §75 narrative
- HF artifact verification (paiml/qwen2.5-coder-7b-apache-q4k-v1 sha256
  already verified by §72 SHIP-010 LIVE evidence; double-check before
  release announcement)

This PR ships ONLY the version-bump + CHANGELOG. Publishing is the
next step after merge.

Refs:
- §75 MODEL-1 100% (PR #1652)
- §74 SHIP-007 bug localized (PR #1650)
- §73 SHIP-007 cascade reduction (PR #1647)
- §72 5-AC LIVE cascade (PR #1646)
- §71 SHIP-005 LIVE-DISCHARGED (PR #1642)
- §70 RC3 fix (PR #1636)
- §69 Q4K hypothesis falsified (PR #1633)
- PR #1635 RC3 prepend
- PR #1634 diagnostic surface + contract
- PR #1648 SHIP-007 contract scaffold
- PR #1649 SHIP-007 PR-B stage dump
- PR #1651 SHIP-007 PR-E F32 GEMV layout fix

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant