Skip to content

feat(ship-004): FALSIFY-SHIP-004 DISCHARGED via apr export → llama-cli round-trip (4th MODEL-1 of cycle)#1057

Merged
noahgift merged 1 commit into
mainfrom
feat/falsify-ship-004-full-discharge
Apr 25, 2026
Merged

feat(ship-004): FALSIFY-SHIP-004 DISCHARGED via apr export → llama-cli round-trip (4th MODEL-1 of cycle)#1057
noahgift merged 1 commit into
mainfrom
feat/falsify-ship-004-full-discharge

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Live evidence chain

Step Tool Verdict
1. apr export apr (--features cuda) exit 0, 8.04 GB GGUF written, 339 tensors
2. magic bytes xxd b"GGUF" — verdict_from_gguf_magic_bytes PASS
3. version xxd bytes 4-7 LE u32 version 3 ∈ {2, 3} — verdict_from_gguf_version PASS
4. llama-cli load+infer llama-cli (CUDA build b7746) exit 0, "Hello! How can", 127.5 t/s — verdict_from_llama_cli_exit PASS

Drift-prevention test added

falsify_ship_004_yaml_binding_pins_discharged_status parses qwen2-e2e-verification-v1.yaml, locates the FALSIFY-QW2E-SHIP-004 block, and asserts DISCHARGED + host pin + magic_verdict=PASS + version=3 + llama_cli_exit_verdict=PASS + non-empty evidence_discharged_by_live.

Test plan

  • cargo test -p aprender-core --lib ship_004 — 5/5 passes (3 existing verdict + 1 gate + 1 new YAML binding)
  • pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS (0 errors, 0 warnings)
  • Live apr export exit 0 + 8.04 GB GGUF written
  • Live xxd confirms GGUF magic + version 3
  • Live llama-cli -ngl 99 exit 0 + 4 tokens emitted
  • CI workspace-test green (auto)
  • ci / gate green (auto)

Stacks with PR #1054 + #1055 + #1056

Four PRs target v2.53.0 simultaneously (#1054 SHIP-009, #1055 SHIP-010, #1056 SHIP-001, #1057 this SHIP-004). Last to merge rebases. Files are mostly non-overlapping:

  • SHIP-009 → apr-provenance-v1.yaml, provenance_tests.rs
  • SHIP-010 → publish-manifest-v1.yaml, ship_010.rs
  • SHIP-001 → qwen2-e2e-verification-v1.yaml v1.8.0, ship_001.rs
  • SHIP-004 → qwen2-e2e-verification-v1.yaml v1.8.0, ship_004.rs

Note: SHIP-001 + SHIP-004 both bump qwen2-e2e to v1.8.0. The second to merge will need a quick changelog rebase.

Files changed

File Change
contracts/qwen2-e2e-verification-v1.yaml v1.7.0 → v1.8.0; FALSIFY-QW2E-SHIP-004 PARTIAL → DISCHARGED + discharged_evidence
crates/aprender-core/src/format/ship_004.rs Added drift-prevention YAML binding test
docs/specifications/aprender-train/ship-two-models-spec.md v2.52.0 → v2.53.0
evidence/ship-004-full-discharge/discharge-evidence-v1.json NEW — 4-step verification chain
evidence/ship-004-full-discharge/llama-cli-run.txt NEW — trimmed llama-cli output (.txt vs .log to avoid .gitignore)

Methodology

Pure stack tooling: apr export + xxd + upstream llama-cli end-to-end on a 7.48 GiB shipped APR. No eprintln!, no bash workaround, no curl shell-out (besides the canonical xxd/llama-cli reads which ARE the contract's full_discharge_blocks_on chain). Honors feedback_apr_trace_not_eprintln.md and feedback_pv_not_bash_for_contracts.md.

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) April 25, 2026 12:30
@noahgift noahgift force-pushed the feat/falsify-ship-004-full-discharge branch from 86dd255 to 1664c77 Compare April 25, 2026 12:58
…i round-trip on real teacher

SHIP-TWO-001 spec v2.52.0 → v2.53.0: FALSIFY-QW2E-SHIP-004 (AC-SHIP1-004)
flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector RTX 4090
via three-step live round-trip on the canonical teacher artifact. Fourth
MODEL-1 PARTIAL → DISCHARGED of the cycle (after SHIP-009 PR #1054 +
SHIP-010 PR #1055 + SHIP-001 PR #1056).

Live discharge — three independent format-boundary verdicts proven in one
end-to-end pipeline:

1. `apr export <teacher>.apr --format gguf -o <out>.gguf`
   → exit 0, 8.04 GB GGUF written in Q4K passthrough mode
   (zero loss, 339 tensors preserved, 20 metadata keys,
   contract-driven mapping for family qwen2)

2. `xxd <out>.gguf | head -1`
   → first 8 bytes: 47 47 55 46 03 00 00 00
   → magic = b"GGUF" (verdict_from_gguf_magic_bytes Pass)
   → version u32 LE = 3 ∈ {2, 3} (verdict_from_gguf_version Pass)

3. `llama-cli -m <out>.gguf --prompt "hello" -n 4 -ngl 99`
   → loads model successfully on RTX 4090 (-ngl 99 = full offload)
   → emits 4 tokens: "Hello! How can"
   → throughput: prompt 380.8 t/s, generation 127.5 t/s
   → exit code 0 (verdict_from_llama_cli_exit Pass)

All three gates PASS uniformly — round-trip proves apr-export's GGUF output
loads end-to-end in upstream llama.cpp via the canonical RTX 4090 path.

Files changed:
- contracts/qwen2-e2e-verification-v1.yaml v1.7.0 → v1.8.0
  FALSIFY-QW2E-SHIP-004 discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED
  discharged_evidence block records: host, binary, llama_cli_path,
  command_chain (3 commands), per-step verdicts (apr_export details,
  gguf_header_bytes, magic_verdict, version+verdict, llama_cli_exit_verdict),
  evidence_discharged_by_live array.

- crates/aprender-core/src/format/ship_004.rs
  Added drift-prevention test
  `falsify_ship_004_yaml_binding_pins_discharged_status` parsing
  qwen2-e2e-verification-v1.yaml and asserting:
    * discharge_status == "DISCHARGED"
    * discharged_evidence.host == "noah-Lambda-Vector"
    * discharged_evidence.overall == "PASS"
    * discharged_evidence.magic_verdict == "PASS"
    * discharged_evidence.version == 3
    * discharged_evidence.llama_cli_exit_verdict == "PASS"
    * evidence_discharged_by_live non-empty

- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.52.0 → v2.53.0 with full atomic-next-action narrative.
  Coverage tally: 39 PARTIAL + 6 DISCHARGED → 38 + 7.
  Note: PR #1054, #1055, #1056 also bump to v2.53.0 simultaneously;
  last to merge rebases.

- evidence/ship-004-full-discharge/discharge-evidence-v1.json (NEW)
  Self-contained discharge summary with 4-step verification chain,
  apr_export details, gguf header bytes, llama_cli exit/throughput.

- evidence/ship-004-full-discharge/llama-cli-run.txt (NEW)
  Trimmed log capturing model load, "Hello! How can" output, and
  perf line. Renamed from .log → .txt to avoid .gitignore *.log.

Verification (all green):
  - cargo test -p aprender-core --lib ship_004 — 5/5 passes (3 verdict +
    1 gate + 1 new YAML binding)
  - pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS
  - Live `apr export` exit 0 + 8.04 GB GGUF written
  - Live `xxd` shows GGUF magic + version 3
  - Live `llama-cli -ngl 99` exit 0 + 4 tokens emitted

Methodological note: zero `eprintln!`, zero bash workaround, zero curl
shell-out (besides the canonical xxd/llama-cli reads which are the
contract's full_discharge_blocks_on chain). Pure `apr export` + `xxd`
+ upstream `llama-cli` end-to-end on a 7.48 GiB shipped APR. Honors
`feedback_apr_trace_not_eprintln.md` and
`feedback_pv_not_bash_for_contracts.md`. Mirrors the SHIP-009 / SHIP-010
/ SHIP-001 closure pattern.

Memory: feedback_compute_pre_authorized.md (lambda-labs lane
pre-authorized — apr export + GPU inference are within scope),
reference_lambda_labs_host_locality.md (this host IS lambda-labs).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/falsify-ship-004-full-discharge branch from 1664c77 to d79715b Compare April 25, 2026 13:30
@noahgift noahgift merged commit c4210b4 into main Apr 25, 2026
10 checks passed
@noahgift noahgift deleted the feat/falsify-ship-004-full-discharge branch April 25, 2026 13:49
noahgift added a commit that referenced this pull request Apr 25, 2026
…osine sweep (mmap-enabled) (#1059)

SHIP-TWO-001 spec v2.56.0 → v2.57.0: FALSIFY-QW2E-SHIP-003 (AC-SHIP1-003)
flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector RTX 4090
via end-to-end per-layer cosine harness on the canonical SHIP-TWO-001
teacher artifacts. Fifth MODEL-1 PARTIAL → DISCHARGED of the cycle (after
SHIP-009 PR #1054 + SHIP-001 PR #1056 + SHIP-004 PR #1057 + SHIP-010 PR #1055).

Live discharge command:
  apr diff /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.safetensors \
           /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr \
           --values --transpose-aware --json --limit 339

Results:
  - Tensors compared:        339
  - Min cosine similarity:   0.9999999403953552 (6 orders of magnitude
                              above the 0.999 floor)
  - Max cosine similarity:   1.0
  - Below-threshold count:   0
  - Aggregate verdict:       Pass (verdict_from_per_layer_cosines)
  - Run-time:                192 s

Worst 5 tensors (still passing):
  - model.layers.0.mlp.down_proj.weight  cos=0.9999999403953552 max_diff=4.81e-4
  - model.layers.0.mlp.gate_proj.weight  cos=0.9999999403953552 max_diff=4.43e-4
  - model.layers.0.mlp.up_proj.weight    cos=0.9999999403953552 max_diff=2.39e-4
  - model.layers.0.self_attn.o_proj.weight cos=0.9999999403953552 max_diff=2.37e-4
  - model.layers.1.mlp.down_proj.weight  cos=0.9999999403953552 max_diff=3.59e-4

All worst-5 cluster at layer-0 MLP matrices with max_diff < 5e-4 (Q4K
quantization noise within ±5% Q4_K spec tolerance). The contract's stated
"196 tensor comparisons" is exceeded — this evidence walks all 339 named
common tensors (28 transformer blocks × 7 projections + embed_tokens +
lm_head + layer-norms + biases).

Crucial dependency: PR #1058 (perf fix to RosettaStone::load_tensor_f32_apr)
unblocks this scan. Before #1058, `apr diff --values --limit N` for N>10
called std::fs::read on the 8GB APR file per tensor — 339 × 8GB = 2.7TB
total read traffic, infeasible. Mmap fix delivered 13× speedup on
limit=50 and made the full 339-tensor sweep complete in 192 s.

Files changed:
- contracts/qwen2-e2e-verification-v1.yaml v1.9.0 → v1.10.0
  FALSIFY-QW2E-SHIP-003 discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED
  discharged_evidence block: host, command, artifacts (sha+size), 339-tensor
  cosine_summary (min/max/below_threshold), worst_5_tensors, aggregate_verdict,
  evidence_discharged_by_live array, runtime_seconds, runtime_note.

- crates/aprender-core/src/format/ship_003.rs
  Added drift-prevention YAML binding test
  `falsify_ship_003_yaml_binding_pins_discharged_status` parsing
  qwen2-e2e-verification-v1.yaml and asserting:
    * discharge_status == "DISCHARGED"
    * discharged_evidence.host == "noah-Lambda-Vector"
    * discharged_evidence.aggregate_verdict == "Pass"
    * discharged_evidence.tensors_compared == 339
    * discharged_evidence.cosine_summary.below_threshold_count == 0
    * evidence_discharged_by_live non-empty

- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.56.0 → v2.57.0 with full atomic-next-action narrative.
  Coverage tally: 35 PARTIAL + 10 DISCHARGED → 34 + 11.

- evidence/ship-003-full-discharge/discharge-evidence-v1.json (NEW)
  Self-contained discharge summary with full artifact paths,
  cosine_summary, worst_5/best_5 tensors, verification_chain,
  tooling_chain_proof, discharge_rationale.

- evidence/ship-003-full-discharge/apr-diff-339.json (NEW, 164 KB)
  Raw apr diff --json output: 339 tensor comparisons with per-tensor
  cosine_similarity, element_count, identical_count, max_diff, mean_diff,
  rmse, shape_a/b, status. Reproducible from the local apr binary +
  canonical lambda-labs paths.

Verification (all green):
  - cargo test -p aprender-core --lib ship_003 — 4/4 PASS
    (3 existing verdict + 1 gate + 1 new YAML binding)
  - pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS
  - Live `apr diff --values --limit 339 --json` exit 0, 339 results emitted

Methodological note: zero `eprintln!`, zero bash workaround, zero
parallel-implementation. Pure `apr diff --values --transpose-aware`
end-to-end on a 7.6B-param shipped teacher. Honors
`feedback_apr_trace_not_eprintln.md` and
`feedback_pv_not_bash_for_contracts.md`. Mirrors the
SHIP-001/004/009/010 closure pattern.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant