feat(ship-004): FALSIFY-SHIP-004 DISCHARGED via apr export → llama-cli round-trip (4th MODEL-1 of cycle) by noahgift · Pull Request #1057 · paiml/aprender

noahgift · 2026-04-25T12:30:02Z

Summary

FALSIFY-QW2E-SHIP-004 (AC-SHIP1-004) PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector RTX 4090 via three-step end-to-end pipeline on the canonical teacher.
All three independent format-boundary verdicts PASS in one round-trip:
1. apr export exit 0 + 8.04 GB GGUF in Q4K passthrough mode (339 tensors, 20 metadata keys)
2. xxd confirms first 8 bytes = 47 47 55 46 03 00 00 00 → magic b"GGUF" + version 3 ∈ {2, 3}
3. llama-cli -ngl 99 loads + emits "Hello! How can" + exit 0 (127.5 tok/s generation)
Fourth MODEL-1 PARTIAL → DISCHARGED of the cycle (after SHIP-009 feat(ship-009): FALSIFY-SHIP-009 DISCHARGED via apr stamp local fixture-swap (MODEL-1 PARTIAL → DISCHARGED) #1054 + SHIP-010 feat(ship-010): FALSIFY-SHIP-010 DISCHARGED via apr validate-manifest --live (3 paiml manifests, 31 GB streamed, 18 gates PASS) #1055 + SHIP-001 feat(ship-001): FALSIFY-SHIP-001 DISCHARGED via apr inspect on real teacher safetensors + YAML backfill (3rd MODEL-1 of cycle) #1056). Coverage 39+6 → 38+7. Spec v2.52.0 → v2.53.0; contract v1.7.0 → v1.8.0 (stays ACTIVE).

Live evidence chain

Step	Tool	Verdict
1. apr export	apr (--features cuda)	exit 0, 8.04 GB GGUF written, 339 tensors
2. magic bytes	xxd	b"GGUF" — `verdict_from_gguf_magic_bytes` PASS
3. version	xxd bytes 4-7 LE u32	version 3 ∈ {2, 3} — `verdict_from_gguf_version` PASS
4. llama-cli load+infer	llama-cli (CUDA build b7746)	exit 0, "Hello! How can", 127.5 t/s — `verdict_from_llama_cli_exit` PASS

Drift-prevention test added

falsify_ship_004_yaml_binding_pins_discharged_status parses qwen2-e2e-verification-v1.yaml, locates the FALSIFY-QW2E-SHIP-004 block, and asserts DISCHARGED + host pin + magic_verdict=PASS + version=3 + llama_cli_exit_verdict=PASS + non-empty evidence_discharged_by_live.

Test plan

cargo test -p aprender-core --lib ship_004 — 5/5 passes (3 existing verdict + 1 gate + 1 new YAML binding)
pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS (0 errors, 0 warnings)
Live apr export exit 0 + 8.04 GB GGUF written
Live xxd confirms GGUF magic + version 3
Live llama-cli -ngl 99 exit 0 + 4 tokens emitted
CI workspace-test green (auto)
ci / gate green (auto)

Stacks with PR #1054 + #1055 + #1056

Four PRs target v2.53.0 simultaneously (#1054 SHIP-009, #1055 SHIP-010, #1056 SHIP-001, #1057 this SHIP-004). Last to merge rebases. Files are mostly non-overlapping:

SHIP-009 → apr-provenance-v1.yaml, provenance_tests.rs
SHIP-010 → publish-manifest-v1.yaml, ship_010.rs
SHIP-001 → qwen2-e2e-verification-v1.yaml v1.8.0, ship_001.rs
SHIP-004 → qwen2-e2e-verification-v1.yaml v1.8.0, ship_004.rs

Note: SHIP-001 + SHIP-004 both bump qwen2-e2e to v1.8.0. The second to merge will need a quick changelog rebase.

Files changed

File	Change
`contracts/qwen2-e2e-verification-v1.yaml`	v1.7.0 → v1.8.0; FALSIFY-QW2E-SHIP-004 PARTIAL → DISCHARGED + `discharged_evidence`
`crates/aprender-core/src/format/ship_004.rs`	Added drift-prevention YAML binding test
`docs/specifications/aprender-train/ship-two-models-spec.md`	v2.52.0 → v2.53.0
`evidence/ship-004-full-discharge/discharge-evidence-v1.json`	NEW — 4-step verification chain
`evidence/ship-004-full-discharge/llama-cli-run.txt`	NEW — trimmed llama-cli output (.txt vs .log to avoid .gitignore)

Methodology

Pure stack tooling: apr export + xxd + upstream llama-cli end-to-end on a 7.48 GiB shipped APR. No eprintln!, no bash workaround, no curl shell-out (besides the canonical xxd/llama-cli reads which ARE the contract's full_discharge_blocks_on chain). Honors feedback_apr_trace_not_eprintln.md and feedback_pv_not_bash_for_contracts.md.

🤖 Generated with Claude Code

…i round-trip on real teacher SHIP-TWO-001 spec v2.52.0 → v2.53.0: FALSIFY-QW2E-SHIP-004 (AC-SHIP1-004) flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector RTX 4090 via three-step live round-trip on the canonical teacher artifact. Fourth MODEL-1 PARTIAL → DISCHARGED of the cycle (after SHIP-009 PR #1054 + SHIP-010 PR #1055 + SHIP-001 PR #1056). Live discharge — three independent format-boundary verdicts proven in one end-to-end pipeline: 1. `apr export <teacher>.apr --format gguf -o <out>.gguf` → exit 0, 8.04 GB GGUF written in Q4K passthrough mode (zero loss, 339 tensors preserved, 20 metadata keys, contract-driven mapping for family qwen2) 2. `xxd <out>.gguf | head -1` → first 8 bytes: 47 47 55 46 03 00 00 00 → magic = b"GGUF" (verdict_from_gguf_magic_bytes Pass) → version u32 LE = 3 ∈ {2, 3} (verdict_from_gguf_version Pass) 3. `llama-cli -m <out>.gguf --prompt "hello" -n 4 -ngl 99` → loads model successfully on RTX 4090 (-ngl 99 = full offload) → emits 4 tokens: "Hello! How can" → throughput: prompt 380.8 t/s, generation 127.5 t/s → exit code 0 (verdict_from_llama_cli_exit Pass) All three gates PASS uniformly — round-trip proves apr-export's GGUF output loads end-to-end in upstream llama.cpp via the canonical RTX 4090 path. Files changed: - contracts/qwen2-e2e-verification-v1.yaml v1.7.0 → v1.8.0 FALSIFY-QW2E-SHIP-004 discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED discharged_evidence block records: host, binary, llama_cli_path, command_chain (3 commands), per-step verdicts (apr_export details, gguf_header_bytes, magic_verdict, version+verdict, llama_cli_exit_verdict), evidence_discharged_by_live array. - crates/aprender-core/src/format/ship_004.rs Added drift-prevention test `falsify_ship_004_yaml_binding_pins_discharged_status` parsing qwen2-e2e-verification-v1.yaml and asserting: * discharge_status == "DISCHARGED" * discharged_evidence.host == "noah-Lambda-Vector" * discharged_evidence.overall == "PASS" * discharged_evidence.magic_verdict == "PASS" * discharged_evidence.version == 3 * discharged_evidence.llama_cli_exit_verdict == "PASS" * evidence_discharged_by_live non-empty - docs/specifications/aprender-train/ship-two-models-spec.md v2.52.0 → v2.53.0 with full atomic-next-action narrative. Coverage tally: 39 PARTIAL + 6 DISCHARGED → 38 + 7. Note: PR #1054, #1055, #1056 also bump to v2.53.0 simultaneously; last to merge rebases. - evidence/ship-004-full-discharge/discharge-evidence-v1.json (NEW) Self-contained discharge summary with 4-step verification chain, apr_export details, gguf header bytes, llama_cli exit/throughput. - evidence/ship-004-full-discharge/llama-cli-run.txt (NEW) Trimmed log capturing model load, "Hello! How can" output, and perf line. Renamed from .log → .txt to avoid .gitignore *.log. Verification (all green): - cargo test -p aprender-core --lib ship_004 — 5/5 passes (3 verdict + 1 gate + 1 new YAML binding) - pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS - Live `apr export` exit 0 + 8.04 GB GGUF written - Live `xxd` shows GGUF magic + version 3 - Live `llama-cli -ngl 99` exit 0 + 4 tokens emitted Methodological note: zero `eprintln!`, zero bash workaround, zero curl shell-out (besides the canonical xxd/llama-cli reads which are the contract's full_discharge_blocks_on chain). Pure `apr export` + `xxd` + upstream `llama-cli` end-to-end on a 7.48 GiB shipped APR. Honors `feedback_apr_trace_not_eprintln.md` and `feedback_pv_not_bash_for_contracts.md`. Mirrors the SHIP-009 / SHIP-010 / SHIP-001 closure pattern. Memory: feedback_compute_pre_authorized.md (lambda-labs lane pre-authorized — apr export + GPU inference are within scope), reference_lambda_labs_host_locality.md (this host IS lambda-labs). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…osine sweep (mmap-enabled) (#1059) SHIP-TWO-001 spec v2.56.0 → v2.57.0: FALSIFY-QW2E-SHIP-003 (AC-SHIP1-003) flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector RTX 4090 via end-to-end per-layer cosine harness on the canonical SHIP-TWO-001 teacher artifacts. Fifth MODEL-1 PARTIAL → DISCHARGED of the cycle (after SHIP-009 PR #1054 + SHIP-001 PR #1056 + SHIP-004 PR #1057 + SHIP-010 PR #1055). Live discharge command: apr diff /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.safetensors \ /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr \ --values --transpose-aware --json --limit 339 Results: - Tensors compared: 339 - Min cosine similarity: 0.9999999403953552 (6 orders of magnitude above the 0.999 floor) - Max cosine similarity: 1.0 - Below-threshold count: 0 - Aggregate verdict: Pass (verdict_from_per_layer_cosines) - Run-time: 192 s Worst 5 tensors (still passing): - model.layers.0.mlp.down_proj.weight cos=0.9999999403953552 max_diff=4.81e-4 - model.layers.0.mlp.gate_proj.weight cos=0.9999999403953552 max_diff=4.43e-4 - model.layers.0.mlp.up_proj.weight cos=0.9999999403953552 max_diff=2.39e-4 - model.layers.0.self_attn.o_proj.weight cos=0.9999999403953552 max_diff=2.37e-4 - model.layers.1.mlp.down_proj.weight cos=0.9999999403953552 max_diff=3.59e-4 All worst-5 cluster at layer-0 MLP matrices with max_diff < 5e-4 (Q4K quantization noise within ±5% Q4_K spec tolerance). The contract's stated "196 tensor comparisons" is exceeded — this evidence walks all 339 named common tensors (28 transformer blocks × 7 projections + embed_tokens + lm_head + layer-norms + biases). Crucial dependency: PR #1058 (perf fix to RosettaStone::load_tensor_f32_apr) unblocks this scan. Before #1058, `apr diff --values --limit N` for N>10 called std::fs::read on the 8GB APR file per tensor — 339 × 8GB = 2.7TB total read traffic, infeasible. Mmap fix delivered 13× speedup on limit=50 and made the full 339-tensor sweep complete in 192 s. Files changed: - contracts/qwen2-e2e-verification-v1.yaml v1.9.0 → v1.10.0 FALSIFY-QW2E-SHIP-003 discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED discharged_evidence block: host, command, artifacts (sha+size), 339-tensor cosine_summary (min/max/below_threshold), worst_5_tensors, aggregate_verdict, evidence_discharged_by_live array, runtime_seconds, runtime_note. - crates/aprender-core/src/format/ship_003.rs Added drift-prevention YAML binding test `falsify_ship_003_yaml_binding_pins_discharged_status` parsing qwen2-e2e-verification-v1.yaml and asserting: * discharge_status == "DISCHARGED" * discharged_evidence.host == "noah-Lambda-Vector" * discharged_evidence.aggregate_verdict == "Pass" * discharged_evidence.tensors_compared == 339 * discharged_evidence.cosine_summary.below_threshold_count == 0 * evidence_discharged_by_live non-empty - docs/specifications/aprender-train/ship-two-models-spec.md v2.56.0 → v2.57.0 with full atomic-next-action narrative. Coverage tally: 35 PARTIAL + 10 DISCHARGED → 34 + 11. - evidence/ship-003-full-discharge/discharge-evidence-v1.json (NEW) Self-contained discharge summary with full artifact paths, cosine_summary, worst_5/best_5 tensors, verification_chain, tooling_chain_proof, discharge_rationale. - evidence/ship-003-full-discharge/apr-diff-339.json (NEW, 164 KB) Raw apr diff --json output: 339 tensor comparisons with per-tensor cosine_similarity, element_count, identical_count, max_diff, mean_diff, rmse, shape_a/b, status. Reproducible from the local apr binary + canonical lambda-labs paths. Verification (all green): - cargo test -p aprender-core --lib ship_003 — 4/4 PASS (3 existing verdict + 1 gate + 1 new YAML binding) - pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS - Live `apr diff --values --limit 339 --json` exit 0, 339 results emitted Methodological note: zero `eprintln!`, zero bash workaround, zero parallel-implementation. Pure `apr diff --values --transpose-aware` end-to-end on a 7.6B-param shipped teacher. Honors `feedback_apr_trace_not_eprintln.md` and `feedback_pv_not_bash_for_contracts.md`. Mirrors the SHIP-001/004/009/010 closure pattern. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) April 25, 2026 12:30

noahgift force-pushed the feat/falsify-ship-004-full-discharge branch from 86dd255 to 1664c77 Compare April 25, 2026 12:58

noahgift force-pushed the feat/falsify-ship-004-full-discharge branch from 1664c77 to d79715b Compare April 25, 2026 13:30

noahgift merged commit c4210b4 into main Apr 25, 2026
10 checks passed

noahgift deleted the feat/falsify-ship-004-full-discharge branch April 25, 2026 13:49

noahgift mentioned this pull request Apr 25, 2026

feat(ship-003): FALSIFY-SHIP-003 DISCHARGED via apr diff 339-tensor cosine sweep (5th MODEL-1 of cycle, depends on PR #1058) #1059

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ship-004): FALSIFY-SHIP-004 DISCHARGED via apr export → llama-cli round-trip (4th MODEL-1 of cycle)#1057

feat(ship-004): FALSIFY-SHIP-004 DISCHARGED via apr export → llama-cli round-trip (4th MODEL-1 of cycle)#1057
noahgift merged 1 commit into
mainfrom
feat/falsify-ship-004-full-discharge

noahgift commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 25, 2026

Summary

Live evidence chain

Drift-prevention test added

Test plan

Stacks with PR #1054 + #1055 + #1056

Files changed

Methodology

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant