Skip to content

feat(ship-010): FALSIFY-SHIP-010 DISCHARGED via apr validate-manifest --live (3 paiml manifests, 31 GB streamed, 18 gates PASS)#1055

Merged
noahgift merged 1 commit into
mainfrom
feat/falsify-ship-010-full-discharge
Apr 25, 2026
Merged

feat(ship-010): FALSIFY-SHIP-010 DISCHARGED via apr validate-manifest --live (3 paiml manifests, 31 GB streamed, 18 gates PASS)#1055
noahgift merged 1 commit into
mainfrom
feat/falsify-ship-010-full-discharge

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

  • FALSIFY-SHIP-010 (AC-SHIP1-010) PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector RTX 4090 via live apr validate-manifest --live --json against all 3 paiml/qwen2.5-coder-7b-apache-q4k-v1 publish manifests.
  • 31 GB streamed from HF Hub CDN with incremental sha256: APR 0a854098...c73666 over 8.04 GB; GGUF e6cac5d6...e7981 over 8.04 GB; safetensors c1058ce7...d8954 over 15.23 GB. All 3 manifests overall: PASS.
  • 18 gate verdicts asserted (6 active gates × 3 manifests, all PASS): PM-001 (required fields), PM-003 (HEAD 200 + content-length), PM-002-live (full-download sha256 byte-identical), PM-004 (SPDX), PM-005 (recipe_sha256 match), PM-006 (parent chain).
  • Second MODEL-1 PARTIAL → DISCHARGED of the cycle (after SHIP-009 PR feat(ship-009): FALSIFY-SHIP-009 DISCHARGED via apr stamp local fixture-swap (MODEL-1 PARTIAL → DISCHARGED) #1054). Coverage tally 39+6 → 38+7. Spec v2.52.0 → v2.53.0; contract v1.4.0 → v1.5.0 (stays DRAFT).
  • No follow-ups required. Unlike SHIP-009 (3 deferred irreversible-shipped follow-ups), SHIP-010 is end-to-end discharged. HF Hub artifacts are bytes-stable and serve the manifest-pinned sha256 — no upload, no fixture-swap, no manifest mutation needed.

Most-exhaustive live discharge to date

Metric Value
Manifests verified 3
Total bytes streamed 31,304,703,336 B (~29 GB)
Live sha256s computed 3
Active gate verdicts asserted 18 (6 × 3)
Live verdicts PASS 18/18
Format-specific gates DEFER (cleanly) 12 (4 × 3, gated on local --artifact)

Drift-prevention test added

falsify_ship_010_yaml_binding_pins_discharged_status parses publish-manifest-v1.yaml, locates the FALSIFY-SHIP-010 block, and asserts:

  • binds_to == "AC-SHIP1-010"
  • discharge_status == "DISCHARGED"
  • discharged_evidence.host == "noah-Lambda-Vector"
  • discharged_evidence.manifests has length 3
  • Every manifest's overall == "PASS"

Falsifier: any future regression of the contract back to PARTIAL fails this test before any network I/O.

Test plan

  • cargo test -p aprender-core --lib ship_010 — 5/5 passes (4 existing + new YAML binding test)
  • pv validate contracts/publish-manifest-v1.yaml — PASS (0 errors, 0 warnings)
  • Live apr validate-manifest paiml-...-apr.yaml --live --json overall=PASS
  • Live apr validate-manifest paiml-...-gguf.yaml --live --json overall=PASS
  • Live apr validate-manifest paiml-...-safetensors.yaml --live --json overall=PASS
  • CI workspace-test green (auto)
  • ci / gate green (auto)

Stacks with PR #1054

PR #1054 (SHIP-009 DISCHARGED) bumps the spec to the same v2.53.0 simultaneously. Whichever PR merges second will rebase and bump to v2.54.0. The two are otherwise independent — SHIP-009 touches contracts/apr-provenance-v1.yaml + provenance_tests.rs; SHIP-010 touches contracts/publish-manifest-v1.yaml + ship_010.rs. No overlapping files except the spec banner.

Files changed

File Change
contracts/publish-manifest-v1.yaml v1.4.0 → v1.5.0; FALSIFY-SHIP-010 PARTIAL → DISCHARGED + discharged_evidence block
crates/aprender-core/src/format/ship_010.rs Added drift-prevention test pinning DISCHARGED + per-manifest PASS
docs/specifications/aprender-train/ship-two-models-spec.md v2.52.0 → v2.53.0 with full atomic-next-action narrative
evidence/ship-010-full-discharge/discharge-evidence-v1.json NEW — self-contained discharge summary
evidence/ship-010-full-discharge/validate-manifest-{apr,gguf,safetensors}.json NEW — raw apr validate-manifest --live JSON outputs

Methodology

Pure stack tooling: apr validate-manifest --live --json end-to-end. No eprintln!, no bash workaround, no curl shell-out. Honors feedback_apr_trace_not_eprintln.md and feedback_pv_not_bash_for_contracts.md.

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) April 25, 2026 11:42
@noahgift noahgift force-pushed the feat/falsify-ship-010-full-discharge branch from 4ad8e79 to a74ca29 Compare April 25, 2026 12:56
noahgift added a commit that referenced this pull request Apr 25, 2026
…eacher safetensors + YAML backfill

SHIP-TWO-001 spec v2.52.0 → v2.53.0: FALSIFY-QW2E-SHIP-001 / FALSIFY-SHIP-001
(AC-SHIP1-001) flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on
noah-Lambda-Vector RTX 4090. Third MODEL-1 PARTIAL → DISCHARGED of
the cycle (after SHIP-009 PR #1054 + SHIP-010 PR #1055).

Two-in-one PR:

1. **Backfill missing FALSIFY-QW2E-SHIP-001 YAML block.** PR #1030
   added the Rust verdict fns at `crates/aprender-core/src/format/ship_001.rs`
   + claimed v1.6.0 wired the YAML entry, but the actual
   `falsification_tests` block was never written to disk. This PR
   closes that gap by adding the block at
   `qwen2-e2e-verification-v1.yaml` v1.7.0 → v1.8.0.

2. **Promote directly to DISCHARGED with live evidence.** Skip the
   PARTIAL state because both algorithm proof (the three triple-
   verdict fns from v1.6.0: verdict_from_load_result,
   verdict_from_safetensors_header_size,
   verdict_from_safetensors_json_open_byte + 2 byte-literal constants
   AC_SHIP1_001_SAFETENSORS_HEADER_PREFIX_LEN=8 +
   AC_SHIP1_001_SAFETENSORS_JSON_OPEN_BYTE=0x7B) AND live evidence
   (apr inspect on the canonical teacher safetensors) exist
   concurrently.

Live discharge evidence (noah-Lambda-Vector RTX 4090):
  $ apr inspect /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.safetensors --json
  {
    "architecture": "qwen2",
    "format": "SafeTensors",
    "file_size": 15231938404,
    "tensor_count": 339,
    "total_params": 7615616512,
    ...
  }

apr inspect exit 0 + format=SafeTensors + tensor_count=339 +
total_params=7,615,616,512 (Qwen2.5-Coder-7B canonical counts)
proves Model::load_safetensors returned Ok(_) end-to-end on the
15.23 GB shipped artifact. Err(_) would have surfaced as non-zero
exit + error JSON.

Files changed:
- contracts/qwen2-e2e-verification-v1.yaml v1.7.0 → v1.8.0
  Added FALSIFY-QW2E-SHIP-001 falsification_tests block at
  discharge_status: DISCHARGED with full discharged_evidence
  block (host=noah-Lambda-Vector, command, file_size_bytes=15231938404,
  tensor_count=339, total_params=7615616512, architecture=qwen2,
  overall=PASS, evidence_discharged_by_live array).

- crates/aprender-core/src/format/ship_001.rs
  Added drift-prevention test
  `falsify_ship_001_yaml_binding_pins_discharged_status` that
  parses qwen2-e2e-verification-v1.yaml, locates the
  FALSIFY-QW2E-SHIP-001 block, and asserts:
    * Block exists (catches the YAML backfill regression)
    * discharge_status == "DISCHARGED"
    * ship_blocking == true
    * discharged_evidence.host == "noah-Lambda-Vector"
    * discharged_evidence.overall == "PASS"
    * discharged_evidence.tensor_count == 339
    * discharged_evidence.total_params == 7,615,616,512
    * discharged_evidence.evidence_discharged_by_live non-empty

- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.52.0 → v2.53.0 with full atomic-next-action narrative.
  Coverage tally: 39 PARTIAL + 6 DISCHARGED → 38 + 7.
  Note: PR #1054 (SHIP-009) + PR #1055 (SHIP-010) bump to the same
  v2.53.0 simultaneously; last to merge rebases.

- evidence/ship-001-full-discharge/discharge-evidence-v1.json (NEW)
  Self-contained discharge summary with all metadata, command,
  verification chain, tooling-chain proof, discharge rationale.

- evidence/ship-001-full-discharge/inspect-safetensors.json (NEW)
  Raw `apr inspect --json` output from the canonical teacher
  safetensors path on noah-Lambda-Vector.

Verification (all green):
  - cargo test -p aprender-core --lib ship_001 — 5/5 passes
    (3 algorithm + 1 gate + 1 new YAML binding)
  - pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS
  - Live `apr inspect <teacher>.safetensors --json` exit 0 + all
    expected fields present

Methodological note: zero `eprintln!`, zero bash workaround. Pure
`apr inspect` end-to-end on a 15.23 GB shipped artifact. Honors
`feedback_apr_trace_not_eprintln.md` and
`feedback_pv_not_bash_for_contracts.md`. Mirrors the SHIP-009 / SHIP-010
pattern.

Memory: feedback_compute_pre_authorized.md (lambda-labs lane
pre-authorized), reference_lambda_labs_host_locality.md (this host
IS lambda-labs).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 25, 2026
…i round-trip on real teacher

SHIP-TWO-001 spec v2.52.0 → v2.53.0: FALSIFY-QW2E-SHIP-004 (AC-SHIP1-004)
flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector RTX 4090
via three-step live round-trip on the canonical teacher artifact. Fourth
MODEL-1 PARTIAL → DISCHARGED of the cycle (after SHIP-009 PR #1054 +
SHIP-010 PR #1055 + SHIP-001 PR #1056).

Live discharge — three independent format-boundary verdicts proven in one
end-to-end pipeline:

1. `apr export <teacher>.apr --format gguf -o <out>.gguf`
   → exit 0, 8.04 GB GGUF written in Q4K passthrough mode
   (zero loss, 339 tensors preserved, 20 metadata keys,
   contract-driven mapping for family qwen2)

2. `xxd <out>.gguf | head -1`
   → first 8 bytes: 47 47 55 46 03 00 00 00
   → magic = b"GGUF" (verdict_from_gguf_magic_bytes Pass)
   → version u32 LE = 3 ∈ {2, 3} (verdict_from_gguf_version Pass)

3. `llama-cli -m <out>.gguf --prompt "hello" -n 4 -ngl 99`
   → loads model successfully on RTX 4090 (-ngl 99 = full offload)
   → emits 4 tokens: "Hello! How can"
   → throughput: prompt 380.8 t/s, generation 127.5 t/s
   → exit code 0 (verdict_from_llama_cli_exit Pass)

All three gates PASS uniformly — round-trip proves apr-export's GGUF output
loads end-to-end in upstream llama.cpp via the canonical RTX 4090 path.

Files changed:
- contracts/qwen2-e2e-verification-v1.yaml v1.7.0 → v1.8.0
  FALSIFY-QW2E-SHIP-004 discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED
  discharged_evidence block records: host, binary, llama_cli_path,
  command_chain (3 commands), per-step verdicts (apr_export details,
  gguf_header_bytes, magic_verdict, version+verdict, llama_cli_exit_verdict),
  evidence_discharged_by_live array.

- crates/aprender-core/src/format/ship_004.rs
  Added drift-prevention test
  `falsify_ship_004_yaml_binding_pins_discharged_status` parsing
  qwen2-e2e-verification-v1.yaml and asserting:
    * discharge_status == "DISCHARGED"
    * discharged_evidence.host == "noah-Lambda-Vector"
    * discharged_evidence.overall == "PASS"
    * discharged_evidence.magic_verdict == "PASS"
    * discharged_evidence.version == 3
    * discharged_evidence.llama_cli_exit_verdict == "PASS"
    * evidence_discharged_by_live non-empty

- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.52.0 → v2.53.0 with full atomic-next-action narrative.
  Coverage tally: 39 PARTIAL + 6 DISCHARGED → 38 + 7.
  Note: PR #1054, #1055, #1056 also bump to v2.53.0 simultaneously;
  last to merge rebases.

- evidence/ship-004-full-discharge/discharge-evidence-v1.json (NEW)
  Self-contained discharge summary with 4-step verification chain,
  apr_export details, gguf header bytes, llama_cli exit/throughput.

- evidence/ship-004-full-discharge/llama-cli-run.txt (NEW)
  Trimmed log capturing model load, "Hello! How can" output, and
  perf line. Renamed from .log → .txt to avoid .gitignore *.log.

Verification (all green):
  - cargo test -p aprender-core --lib ship_004 — 5/5 passes (3 verdict +
    1 gate + 1 new YAML binding)
  - pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS
  - Live `apr export` exit 0 + 8.04 GB GGUF written
  - Live `xxd` shows GGUF magic + version 3
  - Live `llama-cli -ngl 99` exit 0 + 4 tokens emitted

Methodological note: zero `eprintln!`, zero bash workaround, zero curl
shell-out (besides the canonical xxd/llama-cli reads which are the
contract's full_discharge_blocks_on chain). Pure `apr export` + `xxd`
+ upstream `llama-cli` end-to-end on a 7.48 GiB shipped APR. Honors
`feedback_apr_trace_not_eprintln.md` and
`feedback_pv_not_bash_for_contracts.md`. Mirrors the SHIP-009 / SHIP-010
/ SHIP-001 closure pattern.

Memory: feedback_compute_pre_authorized.md (lambda-labs lane
pre-authorized — apr export + GPU inference are within scope),
reference_lambda_labs_host_locality.md (this host IS lambda-labs).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 25, 2026
…eacher safetensors + YAML backfill (#1056)

SHIP-TWO-001 spec v2.52.0 → v2.53.0: FALSIFY-QW2E-SHIP-001 / FALSIFY-SHIP-001
(AC-SHIP1-001) flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on
noah-Lambda-Vector RTX 4090. Third MODEL-1 PARTIAL → DISCHARGED of
the cycle (after SHIP-009 PR #1054 + SHIP-010 PR #1055).

Two-in-one PR:

1. **Backfill missing FALSIFY-QW2E-SHIP-001 YAML block.** PR #1030
   added the Rust verdict fns at `crates/aprender-core/src/format/ship_001.rs`
   + claimed v1.6.0 wired the YAML entry, but the actual
   `falsification_tests` block was never written to disk. This PR
   closes that gap by adding the block at
   `qwen2-e2e-verification-v1.yaml` v1.7.0 → v1.8.0.

2. **Promote directly to DISCHARGED with live evidence.** Skip the
   PARTIAL state because both algorithm proof (the three triple-
   verdict fns from v1.6.0: verdict_from_load_result,
   verdict_from_safetensors_header_size,
   verdict_from_safetensors_json_open_byte + 2 byte-literal constants
   AC_SHIP1_001_SAFETENSORS_HEADER_PREFIX_LEN=8 +
   AC_SHIP1_001_SAFETENSORS_JSON_OPEN_BYTE=0x7B) AND live evidence
   (apr inspect on the canonical teacher safetensors) exist
   concurrently.

Live discharge evidence (noah-Lambda-Vector RTX 4090):
  $ apr inspect /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.safetensors --json
  {
    "architecture": "qwen2",
    "format": "SafeTensors",
    "file_size": 15231938404,
    "tensor_count": 339,
    "total_params": 7615616512,
    ...
  }

apr inspect exit 0 + format=SafeTensors + tensor_count=339 +
total_params=7,615,616,512 (Qwen2.5-Coder-7B canonical counts)
proves Model::load_safetensors returned Ok(_) end-to-end on the
15.23 GB shipped artifact. Err(_) would have surfaced as non-zero
exit + error JSON.

Files changed:
- contracts/qwen2-e2e-verification-v1.yaml v1.7.0 → v1.8.0
  Added FALSIFY-QW2E-SHIP-001 falsification_tests block at
  discharge_status: DISCHARGED with full discharged_evidence
  block (host=noah-Lambda-Vector, command, file_size_bytes=15231938404,
  tensor_count=339, total_params=7615616512, architecture=qwen2,
  overall=PASS, evidence_discharged_by_live array).

- crates/aprender-core/src/format/ship_001.rs
  Added drift-prevention test
  `falsify_ship_001_yaml_binding_pins_discharged_status` that
  parses qwen2-e2e-verification-v1.yaml, locates the
  FALSIFY-QW2E-SHIP-001 block, and asserts:
    * Block exists (catches the YAML backfill regression)
    * discharge_status == "DISCHARGED"
    * ship_blocking == true
    * discharged_evidence.host == "noah-Lambda-Vector"
    * discharged_evidence.overall == "PASS"
    * discharged_evidence.tensor_count == 339
    * discharged_evidence.total_params == 7,615,616,512
    * discharged_evidence.evidence_discharged_by_live non-empty

- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.52.0 → v2.53.0 with full atomic-next-action narrative.
  Coverage tally: 39 PARTIAL + 6 DISCHARGED → 38 + 7.
  Note: PR #1054 (SHIP-009) + PR #1055 (SHIP-010) bump to the same
  v2.53.0 simultaneously; last to merge rebases.

- evidence/ship-001-full-discharge/discharge-evidence-v1.json (NEW)
  Self-contained discharge summary with all metadata, command,
  verification chain, tooling-chain proof, discharge rationale.

- evidence/ship-001-full-discharge/inspect-safetensors.json (NEW)
  Raw `apr inspect --json` output from the canonical teacher
  safetensors path on noah-Lambda-Vector.

Verification (all green):
  - cargo test -p aprender-core --lib ship_001 — 5/5 passes
    (3 algorithm + 1 gate + 1 new YAML binding)
  - pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS
  - Live `apr inspect <teacher>.safetensors --json` exit 0 + all
    expected fields present

Methodological note: zero `eprintln!`, zero bash workaround. Pure
`apr inspect` end-to-end on a 15.23 GB shipped artifact. Honors
`feedback_apr_trace_not_eprintln.md` and
`feedback_pv_not_bash_for_contracts.md`. Mirrors the SHIP-009 / SHIP-010
pattern.

Memory: feedback_compute_pre_authorized.md (lambda-labs lane
pre-authorized), reference_lambda_labs_host_locality.md (this host
IS lambda-labs).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 25, 2026
…i round-trip on real teacher

SHIP-TWO-001 spec v2.52.0 → v2.53.0: FALSIFY-QW2E-SHIP-004 (AC-SHIP1-004)
flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector RTX 4090
via three-step live round-trip on the canonical teacher artifact. Fourth
MODEL-1 PARTIAL → DISCHARGED of the cycle (after SHIP-009 PR #1054 +
SHIP-010 PR #1055 + SHIP-001 PR #1056).

Live discharge — three independent format-boundary verdicts proven in one
end-to-end pipeline:

1. `apr export <teacher>.apr --format gguf -o <out>.gguf`
   → exit 0, 8.04 GB GGUF written in Q4K passthrough mode
   (zero loss, 339 tensors preserved, 20 metadata keys,
   contract-driven mapping for family qwen2)

2. `xxd <out>.gguf | head -1`
   → first 8 bytes: 47 47 55 46 03 00 00 00
   → magic = b"GGUF" (verdict_from_gguf_magic_bytes Pass)
   → version u32 LE = 3 ∈ {2, 3} (verdict_from_gguf_version Pass)

3. `llama-cli -m <out>.gguf --prompt "hello" -n 4 -ngl 99`
   → loads model successfully on RTX 4090 (-ngl 99 = full offload)
   → emits 4 tokens: "Hello! How can"
   → throughput: prompt 380.8 t/s, generation 127.5 t/s
   → exit code 0 (verdict_from_llama_cli_exit Pass)

All three gates PASS uniformly — round-trip proves apr-export's GGUF output
loads end-to-end in upstream llama.cpp via the canonical RTX 4090 path.

Files changed:
- contracts/qwen2-e2e-verification-v1.yaml v1.7.0 → v1.8.0
  FALSIFY-QW2E-SHIP-004 discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED
  discharged_evidence block records: host, binary, llama_cli_path,
  command_chain (3 commands), per-step verdicts (apr_export details,
  gguf_header_bytes, magic_verdict, version+verdict, llama_cli_exit_verdict),
  evidence_discharged_by_live array.

- crates/aprender-core/src/format/ship_004.rs
  Added drift-prevention test
  `falsify_ship_004_yaml_binding_pins_discharged_status` parsing
  qwen2-e2e-verification-v1.yaml and asserting:
    * discharge_status == "DISCHARGED"
    * discharged_evidence.host == "noah-Lambda-Vector"
    * discharged_evidence.overall == "PASS"
    * discharged_evidence.magic_verdict == "PASS"
    * discharged_evidence.version == 3
    * discharged_evidence.llama_cli_exit_verdict == "PASS"
    * evidence_discharged_by_live non-empty

- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.52.0 → v2.53.0 with full atomic-next-action narrative.
  Coverage tally: 39 PARTIAL + 6 DISCHARGED → 38 + 7.
  Note: PR #1054, #1055, #1056 also bump to v2.53.0 simultaneously;
  last to merge rebases.

- evidence/ship-004-full-discharge/discharge-evidence-v1.json (NEW)
  Self-contained discharge summary with 4-step verification chain,
  apr_export details, gguf header bytes, llama_cli exit/throughput.

- evidence/ship-004-full-discharge/llama-cli-run.txt (NEW)
  Trimmed log capturing model load, "Hello! How can" output, and
  perf line. Renamed from .log → .txt to avoid .gitignore *.log.

Verification (all green):
  - cargo test -p aprender-core --lib ship_004 — 5/5 passes (3 verdict +
    1 gate + 1 new YAML binding)
  - pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS
  - Live `apr export` exit 0 + 8.04 GB GGUF written
  - Live `xxd` shows GGUF magic + version 3
  - Live `llama-cli -ngl 99` exit 0 + 4 tokens emitted

Methodological note: zero `eprintln!`, zero bash workaround, zero curl
shell-out (besides the canonical xxd/llama-cli reads which are the
contract's full_discharge_blocks_on chain). Pure `apr export` + `xxd`
+ upstream `llama-cli` end-to-end on a 7.48 GiB shipped APR. Honors
`feedback_apr_trace_not_eprintln.md` and
`feedback_pv_not_bash_for_contracts.md`. Mirrors the SHIP-009 / SHIP-010
/ SHIP-001 closure pattern.

Memory: feedback_compute_pre_authorized.md (lambda-labs lane
pre-authorized — apr export + GPU inference are within scope),
reference_lambda_labs_host_locality.md (this host IS lambda-labs).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/falsify-ship-010-full-discharge branch from a74ca29 to 666ec87 Compare April 25, 2026 13:31
noahgift added a commit that referenced this pull request Apr 25, 2026
…i round-trip on real teacher (#1057)

SHIP-TWO-001 spec v2.52.0 → v2.53.0: FALSIFY-QW2E-SHIP-004 (AC-SHIP1-004)
flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector RTX 4090
via three-step live round-trip on the canonical teacher artifact. Fourth
MODEL-1 PARTIAL → DISCHARGED of the cycle (after SHIP-009 PR #1054 +
SHIP-010 PR #1055 + SHIP-001 PR #1056).

Live discharge — three independent format-boundary verdicts proven in one
end-to-end pipeline:

1. `apr export <teacher>.apr --format gguf -o <out>.gguf`
   → exit 0, 8.04 GB GGUF written in Q4K passthrough mode
   (zero loss, 339 tensors preserved, 20 metadata keys,
   contract-driven mapping for family qwen2)

2. `xxd <out>.gguf | head -1`
   → first 8 bytes: 47 47 55 46 03 00 00 00
   → magic = b"GGUF" (verdict_from_gguf_magic_bytes Pass)
   → version u32 LE = 3 ∈ {2, 3} (verdict_from_gguf_version Pass)

3. `llama-cli -m <out>.gguf --prompt "hello" -n 4 -ngl 99`
   → loads model successfully on RTX 4090 (-ngl 99 = full offload)
   → emits 4 tokens: "Hello! How can"
   → throughput: prompt 380.8 t/s, generation 127.5 t/s
   → exit code 0 (verdict_from_llama_cli_exit Pass)

All three gates PASS uniformly — round-trip proves apr-export's GGUF output
loads end-to-end in upstream llama.cpp via the canonical RTX 4090 path.

Files changed:
- contracts/qwen2-e2e-verification-v1.yaml v1.7.0 → v1.8.0
  FALSIFY-QW2E-SHIP-004 discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED
  discharged_evidence block records: host, binary, llama_cli_path,
  command_chain (3 commands), per-step verdicts (apr_export details,
  gguf_header_bytes, magic_verdict, version+verdict, llama_cli_exit_verdict),
  evidence_discharged_by_live array.

- crates/aprender-core/src/format/ship_004.rs
  Added drift-prevention test
  `falsify_ship_004_yaml_binding_pins_discharged_status` parsing
  qwen2-e2e-verification-v1.yaml and asserting:
    * discharge_status == "DISCHARGED"
    * discharged_evidence.host == "noah-Lambda-Vector"
    * discharged_evidence.overall == "PASS"
    * discharged_evidence.magic_verdict == "PASS"
    * discharged_evidence.version == 3
    * discharged_evidence.llama_cli_exit_verdict == "PASS"
    * evidence_discharged_by_live non-empty

- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.52.0 → v2.53.0 with full atomic-next-action narrative.
  Coverage tally: 39 PARTIAL + 6 DISCHARGED → 38 + 7.
  Note: PR #1054, #1055, #1056 also bump to v2.53.0 simultaneously;
  last to merge rebases.

- evidence/ship-004-full-discharge/discharge-evidence-v1.json (NEW)
  Self-contained discharge summary with 4-step verification chain,
  apr_export details, gguf header bytes, llama_cli exit/throughput.

- evidence/ship-004-full-discharge/llama-cli-run.txt (NEW)
  Trimmed log capturing model load, "Hello! How can" output, and
  perf line. Renamed from .log → .txt to avoid .gitignore *.log.

Verification (all green):
  - cargo test -p aprender-core --lib ship_004 — 5/5 passes (3 verdict +
    1 gate + 1 new YAML binding)
  - pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS
  - Live `apr export` exit 0 + 8.04 GB GGUF written
  - Live `xxd` shows GGUF magic + version 3
  - Live `llama-cli -ngl 99` exit 0 + 4 tokens emitted

Methodological note: zero `eprintln!`, zero bash workaround, zero curl
shell-out (besides the canonical xxd/llama-cli reads which are the
contract's full_discharge_blocks_on chain). Pure `apr export` + `xxd`
+ upstream `llama-cli` end-to-end on a 7.48 GiB shipped APR. Honors
`feedback_apr_trace_not_eprintln.md` and
`feedback_pv_not_bash_for_contracts.md`. Mirrors the SHIP-009 / SHIP-010
/ SHIP-001 closure pattern.

Memory: feedback_compute_pre_authorized.md (lambda-labs lane
pre-authorized — apr export + GPU inference are within scope),
reference_lambda_labs_host_locality.md (this host IS lambda-labs).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
… --live (3 paiml manifests, 31 GB streamed)

SHIP-TWO-001 spec v2.52.0 → v2.53.0: FALSIFY-SHIP-010 (AC-SHIP1-010)
flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector
RTX 4090 via live `apr validate-manifest --live --json` against all
3 paiml/qwen2.5-coder-7b-apache-q4k-v1 publish manifests. Second
MODEL-1 PARTIAL → DISCHARGED of the cycle (after SHIP-009 PR #1054).

Live discharge mechanism:
1. Ran `apr validate-manifest <m>.yaml --live --json` on each of 3
   manifests sequentially. Each invocation streamed full bytes from
   HF Hub CDN, computed incremental sha256, and compared to the
   manifest-declared digest:
   - APR:         8,035,635,524 B → sha256 0a854098...c73666 PASS
   - GGUF:        8,037,129,408 B → sha256 e6cac5d6...e7981  PASS
   - Safetensors: 15,231,938,404 B → sha256 c1058ce7...d8954 PASS
2. Each manifest's overall verdict is PASS. 6 active gates per
   manifest (PM-001 required-fields, PM-003 HEAD+content-length,
   PM-002-live full-download sha256, PM-004 SPDX identifiers,
   PM-005 recipe_sha256 match, PM-006 parent-chain terminates) all
   uniformly PASS across the 3 formats.
3. ~31 GB total streamed from CDN, 3 sha256s computed, 18 gate
   verdicts asserted — most-exhaustive live discharge to date.

Files changed:
- contracts/publish-manifest-v1.yaml v1.4.0 → v1.5.0
  FALSIFY-SHIP-010 discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED
  discharged_evidence block pins host, command, per-manifest live
  verdicts, full sha256s, byte counts, gates_pass/deferred lists,
  tooling chain. expected/fails_if updated to cover live regression.

- crates/aprender-core/src/format/ship_010.rs
  Added drift-prevention test
  `falsify_ship_010_yaml_binding_pins_discharged_status` that
  parses publish-manifest-v1.yaml, locates the FALSIFY-SHIP-010
  block, and asserts:
    * binds_to == "AC-SHIP1-010"
    * discharge_status == "DISCHARGED"
    * discharged_evidence.host == "noah-Lambda-Vector"
    * discharged_evidence.manifests has length 3
    * Every manifest in discharged_evidence.manifests overall == "PASS"
  Falsifier: any future regression of the contract back to PARTIAL
  fails this test before any network I/O.

- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.52.0 → v2.53.0 with full atomic-next-action narrative.
  Coverage tally: 39 PARTIAL + 6 DISCHARGED → 38 + 7.
  Note: PR #1054 (SHIP-009) bumps to the same v2.53.0 simultaneously;
  whichever PR merges second rebases to v2.54.0.

- evidence/ship-010-full-discharge/discharge-evidence-v1.json (NEW)
  Self-contained discharge summary with all sha256s, byte counts,
  HF URLs, gate verdicts, host pin, binary path.

- evidence/ship-010-full-discharge/validate-manifest-{apr,gguf,safetensors}.json (NEW)
  Raw `apr validate-manifest --live --json` outputs from each
  manifest invocation. Captures the full 6-of-10 PASS gate list
  per manifest plus the live-sha256-verdict detail strings.

Verification (all green):
  - cargo test -p aprender-core --lib ship_010 — 5/5 passes
  - pv validate contracts/publish-manifest-v1.yaml — PASS
  - 3 live `apr validate-manifest --live --json` invocations:
    overall=PASS each

Methodological note: this PR uses `apr validate-manifest --live`
exclusively (no eprintln, no bash workaround, no curl shell-out).
The dogfooded toolchain proved end-to-end across all 3 shipped
formats. Honors `feedback_apr_trace_not_eprintln.md` and
`feedback_pv_not_bash_for_contracts.md`.

Memory: feedback_compute_pre_authorized.md (lambda-labs network
download is in-scope for pre-authorized lanes),
reference_lambda_labs_host_locality.md (this host IS lambda-labs;
no SSH wrapper needed).

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/falsify-ship-010-full-discharge branch from 666ec87 to 9fc3b38 Compare April 25, 2026 13:54
@noahgift noahgift merged commit 3cb93d6 into main Apr 25, 2026
10 checks passed
@noahgift noahgift deleted the feat/falsify-ship-010-full-discharge branch April 25, 2026 14:11
noahgift added a commit that referenced this pull request Apr 25, 2026
…osine sweep (mmap-enabled) (#1059)

SHIP-TWO-001 spec v2.56.0 → v2.57.0: FALSIFY-QW2E-SHIP-003 (AC-SHIP1-003)
flipped PARTIAL_ALGORITHM_LEVEL → DISCHARGED on noah-Lambda-Vector RTX 4090
via end-to-end per-layer cosine harness on the canonical SHIP-TWO-001
teacher artifacts. Fifth MODEL-1 PARTIAL → DISCHARGED of the cycle (after
SHIP-009 PR #1054 + SHIP-001 PR #1056 + SHIP-004 PR #1057 + SHIP-010 PR #1055).

Live discharge command:
  apr diff /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.safetensors \
           /mnt/nvme-raid0/models/ship-two-001/qwen2.5-coder-7b-instruct-q4k.apr \
           --values --transpose-aware --json --limit 339

Results:
  - Tensors compared:        339
  - Min cosine similarity:   0.9999999403953552 (6 orders of magnitude
                              above the 0.999 floor)
  - Max cosine similarity:   1.0
  - Below-threshold count:   0
  - Aggregate verdict:       Pass (verdict_from_per_layer_cosines)
  - Run-time:                192 s

Worst 5 tensors (still passing):
  - model.layers.0.mlp.down_proj.weight  cos=0.9999999403953552 max_diff=4.81e-4
  - model.layers.0.mlp.gate_proj.weight  cos=0.9999999403953552 max_diff=4.43e-4
  - model.layers.0.mlp.up_proj.weight    cos=0.9999999403953552 max_diff=2.39e-4
  - model.layers.0.self_attn.o_proj.weight cos=0.9999999403953552 max_diff=2.37e-4
  - model.layers.1.mlp.down_proj.weight  cos=0.9999999403953552 max_diff=3.59e-4

All worst-5 cluster at layer-0 MLP matrices with max_diff < 5e-4 (Q4K
quantization noise within ±5% Q4_K spec tolerance). The contract's stated
"196 tensor comparisons" is exceeded — this evidence walks all 339 named
common tensors (28 transformer blocks × 7 projections + embed_tokens +
lm_head + layer-norms + biases).

Crucial dependency: PR #1058 (perf fix to RosettaStone::load_tensor_f32_apr)
unblocks this scan. Before #1058, `apr diff --values --limit N` for N>10
called std::fs::read on the 8GB APR file per tensor — 339 × 8GB = 2.7TB
total read traffic, infeasible. Mmap fix delivered 13× speedup on
limit=50 and made the full 339-tensor sweep complete in 192 s.

Files changed:
- contracts/qwen2-e2e-verification-v1.yaml v1.9.0 → v1.10.0
  FALSIFY-QW2E-SHIP-003 discharge_status: PARTIAL_ALGORITHM_LEVEL → DISCHARGED
  discharged_evidence block: host, command, artifacts (sha+size), 339-tensor
  cosine_summary (min/max/below_threshold), worst_5_tensors, aggregate_verdict,
  evidence_discharged_by_live array, runtime_seconds, runtime_note.

- crates/aprender-core/src/format/ship_003.rs
  Added drift-prevention YAML binding test
  `falsify_ship_003_yaml_binding_pins_discharged_status` parsing
  qwen2-e2e-verification-v1.yaml and asserting:
    * discharge_status == "DISCHARGED"
    * discharged_evidence.host == "noah-Lambda-Vector"
    * discharged_evidence.aggregate_verdict == "Pass"
    * discharged_evidence.tensors_compared == 339
    * discharged_evidence.cosine_summary.below_threshold_count == 0
    * evidence_discharged_by_live non-empty

- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.56.0 → v2.57.0 with full atomic-next-action narrative.
  Coverage tally: 35 PARTIAL + 10 DISCHARGED → 34 + 11.

- evidence/ship-003-full-discharge/discharge-evidence-v1.json (NEW)
  Self-contained discharge summary with full artifact paths,
  cosine_summary, worst_5/best_5 tensors, verification_chain,
  tooling_chain_proof, discharge_rationale.

- evidence/ship-003-full-discharge/apr-diff-339.json (NEW, 164 KB)
  Raw apr diff --json output: 339 tensor comparisons with per-tensor
  cosine_similarity, element_count, identical_count, max_diff, mean_diff,
  rmse, shape_a/b, status. Reproducible from the local apr binary +
  canonical lambda-labs paths.

Verification (all green):
  - cargo test -p aprender-core --lib ship_003 — 4/4 PASS
    (3 existing verdict + 1 gate + 1 new YAML binding)
  - pv validate contracts/qwen2-e2e-verification-v1.yaml — PASS
  - Live `apr diff --values --limit 339 --json` exit 0, 339 results emitted

Methodological note: zero `eprintln!`, zero bash workaround, zero
parallel-implementation. Pure `apr diff --values --transpose-aware`
end-to-end on a 7.6B-param shipped teacher. Honors
`feedback_apr_trace_not_eprintln.md` and
`feedback_pv_not_bash_for_contracts.md`. Mirrors the
SHIP-001/004/009/010 closure pattern.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant