feat(falsify-ship-008): MODEL-1 chat template PARTIAL discharge by noahgift · Pull Request #1012 · paiml/aprender

noahgift · 2026-04-22T16:26:29Z

Summary

Discharges FALSIFY-SHIP-008 / AC-SHIP1-008 at PARTIAL_ALGORITHM_LEVEL: MODEL-1 teacher (Qwen2.5-Coder-7B-Instruct, ChatML family) render of the canonical (system, user) messages is bound byte-exact to a golden string via a pure verdict_from_chat_template_render(rendered, golden) -> Pass|Fail const fn.
Bumps contracts/chat-template-v1.yaml v1.0.0 → v1.1.0 adding GATE-CHAT-SHIP-008 with ship_blocking: true, discharge_status: PARTIAL_ALGORITHM_LEVEL, evidence_discharged_by pointing at the new harness, full_discharge_blocks_on live apr run paiml/qwen2.5-coder-7b-apache-q4k-v1 completion diff.
Amends docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 → v2.24.0: MODEL-1 coverage 1/10 → 2/10 touched — first MODEL-1 non-provenance PARTIAL; mirrors the MODEL-2 pattern set by SHIP-016/017/018/020.

What changed

New file

crates/aprender-core/src/text/chat_template/ship_008.rs (~260 lines)
- AC_SHIP1_008_CANONICAL_SYSTEM/_USER/_GOLDEN constants
- Ship008Verdict { Pass, Fail } binary enum
- verdict_from_chat_template_render const fn (byte-equality via while-loop, UTF-8-safe because inputs are ASCII+multibyte-safe)
- falsify_ship_008_chat_template_render_bind test — 5-section mutation survey: engine-binding → Pass; empty → Fail; missing gen-prompt → Fail; wrong delim (<|user|> drift) → Fail; swapped role order → Fail; single-byte trailing flip → Fail; empty==empty symmetry; provenance-pin substring assertions (<|im_start|> × 3, <|im_end|> × 2, ends-with <|im_start|>assistant\n).

Modified

crates/aprender-core/src/text/chat_template/mod.rs — include!("ship_008.rs") after existing template.rs, raw_template.rs.
contracts/chat-template-v1.yaml — v1.1.0 + GATE-CHAT-SHIP-008.
docs/specifications/aprender-train/ship-two-models-spec.md — v2.24.0 amendment + annotated AC-SHIP1-008 / FALSIFY-SHIP-008 rows.

Design

Mirrors the exact pattern set by FALSIFY-SHIP-016/017/018/020 on MODEL-2: pure binary verdict enum + const verdict fn + exhaustive counter-example survey + provenance pin + contract bump with full_discharge_blocks_on. The decision rule (byte-equality over &str) is fully provable offline; the compute-heavy tier (live teacher render + downstream completion diff) is intentionally out of scope for this PR and tracked by full_discharge_blocks_on. Any edit to either side of the bind — template engine, special tokens, golden string constant — flips the verdict to Fail before teacher inference is launched.

Test plan

cargo test -p aprender-core --lib falsify_ship_008_chat_template_render_bind → 1 passed; 0 failed.
pv validate contracts/chat-template-v1.yaml → Contract is valid (0 errors, 0 warnings).
cargo clippy -p aprender-core --lib --no-deps -- -D warnings → clean.
CI ci / test + ci / lint + workspace-test all green on this PR.
Full discharge (deferred, tracked by full_discharge_blocks_on): live apr run paiml/qwen2.5-coder-7b-apache-q4k-v1 --prompt <canonical> + byte-diff completion against spec-defined golden.

Refs: SHIP-TWO-001, task #155

🤖 Generated with Claude Code

Discharge FALSIFY-SHIP-008 / AC-SHIP1-008 at PARTIAL_ALGORITHM_LEVEL. - contracts/chat-template-v1.yaml v1.0.0 -> v1.1.0: adds GATE-CHAT-SHIP-008 binding ChatMLTemplate::format_conversation to the canonical Qwen2.5-Coder-7B (system, user) golden via a pure verdict_from_chat_template_render const fn. ship_blocking: true, discharge_status: PARTIAL_ALGORITHM_LEVEL; full discharge blocks on live `apr run paiml/qwen2.5-coder-7b-apache-q4k-v1` completion diff against golden. - crates/aprender-core/src/text/chat_template/ship_008.rs (new): AC_SHIP1_008_CANONICAL_{SYSTEM,USER,GOLDEN} constants + Ship008Verdict enum + verdict_from_chat_template_render const fn (byte-equality, UTF-8-safe) + 5-section mutation survey (engine-binding, empty Fail, missing-gen-prompt Fail, wrong-delim Fail, swapped-roles Fail, single-byte flip Fail) + symmetry + provenance pin. - crates/aprender-core/src/text/chat_template/mod.rs: include! ship_008.rs alongside existing template.rs, raw_template.rs. - docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 -> v2.24.0: AC-SHIP1-008 row + FALSIFY-SHIP-008 row annotated PARTIAL_ALGORITHM_LEVEL; v2.24.0 amendment entry records MODEL-1 coverage 1/10 -> 2/10 (first MODEL-1 non-provenance PARTIAL; mirrors SHIP-016/017/018/020 pattern). Test: cargo test -p aprender-core --lib falsify_ship_008_chat_template_render_bind -> 1 passed Contract: pv validate contracts/chat-template-v1.yaml -> Contract is valid Refs: SHIP-TWO-001, task #155 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…arge (#1013) Wires AC-SHIP1-006 "apr qa <model> — all 8 gates PASS" at PARTIAL_ALGORITHM_LEVEL: a pure aggregate-AND verdict fn bound to the 8-gate ship criterion from `docs/specifications/components/qa.md` §3 (golden / throughput / ollama parity / gpu speedup / tensor contracts / format parity / ptx parity / metadata). Files: - `crates/aprender-core/src/qa/ship_006.rs` (NEW, 217 lines) — `verdict_from_qa_gates(&[bool]) -> Ship006Verdict` const fn with 7-section mutation survey: all-Pass→Pass, all-Fail→Fail, single-gate-flip × 8, exhaustive 2^8=256 bitmask proof, Pass→Fail monotonicity, length-drift counter-examples (0 / 7 / 9 / 16), provenance pin (AC_SHIP1_006_REQUIRED_QA_GATE_COUNT = 8). - `crates/aprender-core/src/qa/mod.rs` — register `pub mod ship_006;`. - `contracts/apr-model-qa-v1.yaml` v1.1.0 → v1.2.0 — adds `FALSIFY-QA-SHIP-006` with `ship_blocking: true`, `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by` pointing at ship_006.rs + the harness test, and `full_discharge_blocks_on` live `apr qa paiml/qwen2.5-coder-7b-apache-q4k-v1 --json` on an RTX 4090 host (8× `"pass": true` entries in the JSON body). - `docs/specifications/aprender-train/ship-two-models-spec.md` v2.24.0 → v2.25.0 — annotates AC-SHIP1-006 + FALSIFY-SHIP-006 rows with PARTIAL_ALGORITHM_LEVEL markers and adds v2.25.0 amendment entry. Design: mirrors the aggregate-AND shape set by MODEL-2 SHIP-016 (task #152 on `feat/falsify-ship-016-partial-discharge`, not yet on main). Authored self-contained because SHIP-016 hasn't landed; once both ship, the two `verdict_from_qa_gates_*` fns should be deduplicated into a single parameterized helper. Required gate count differs by model (both 8 today — the spec's "All must Pass" is model-independent). MODEL-1 AC-SHIP1 coverage: 2/10 touched (SHIP-008 + SHIP-009) → **3/10** touched (+ SHIP-006). First MODEL-1 aggregate-AND PARTIAL. Full discharge blocks on a live `apr qa` run against the teacher weights on RTX 4090; the compute-heavy portion is intentionally out of scope here. Test: `cargo test -p aprender-core --lib falsify_ship_006_apr_qa_eight_gates_aggregate` → 1 passed. Contract: `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/apr-model-qa-v1.yaml` → 0 errors. Stacked on #1012 (feat/falsify-ship-008-partial-discharge). Spec v2.25.0 builds on v2.24.0. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) April 22, 2026 16:35

noahgift mentioned this pull request Apr 22, 2026

feat(falsify-ship-006): MODEL-1 apr qa 8-gate aggregate PARTIAL discharge #1013

Merged

5 tasks

noahgift mentioned this pull request Apr 22, 2026

feat(falsify-ship-007): MODEL-1 apr bench decode ≥30 tok/s PARTIAL discharge #1014

Closed

4 tasks

noahgift merged commit 1263178 into main Apr 22, 2026
10 checks passed

noahgift deleted the feat/falsify-ship-008-partial-discharge branch April 22, 2026 16:55

noahgift mentioned this pull request Apr 22, 2026

feat(falsify-ship-002): MODEL-1 apr run emits valid Python (zero syntax errors) PARTIAL discharge #1016

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(falsify-ship-008): MODEL-1 chat template PARTIAL discharge#1012

feat(falsify-ship-008): MODEL-1 chat template PARTIAL discharge#1012
noahgift merged 2 commits into
mainfrom
feat/falsify-ship-008-partial-discharge

noahgift commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 22, 2026

Summary

What changed

New file

Modified

Design

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant