feat(falsify-ship-008): MODEL-1 chat template PARTIAL discharge#1012
Merged
Conversation
Discharge FALSIFY-SHIP-008 / AC-SHIP1-008 at PARTIAL_ALGORITHM_LEVEL.
- contracts/chat-template-v1.yaml v1.0.0 -> v1.1.0: adds
GATE-CHAT-SHIP-008 binding ChatMLTemplate::format_conversation to
the canonical Qwen2.5-Coder-7B (system, user) golden via a pure
verdict_from_chat_template_render const fn. ship_blocking: true,
discharge_status: PARTIAL_ALGORITHM_LEVEL; full discharge blocks
on live `apr run paiml/qwen2.5-coder-7b-apache-q4k-v1` completion
diff against golden.
- crates/aprender-core/src/text/chat_template/ship_008.rs (new):
AC_SHIP1_008_CANONICAL_{SYSTEM,USER,GOLDEN} constants +
Ship008Verdict enum + verdict_from_chat_template_render const fn
(byte-equality, UTF-8-safe) + 5-section mutation survey
(engine-binding, empty Fail, missing-gen-prompt Fail, wrong-delim
Fail, swapped-roles Fail, single-byte flip Fail) + symmetry +
provenance pin.
- crates/aprender-core/src/text/chat_template/mod.rs: include!
ship_008.rs alongside existing template.rs, raw_template.rs.
- docs/specifications/aprender-train/ship-two-models-spec.md
v2.23.0 -> v2.24.0: AC-SHIP1-008 row + FALSIFY-SHIP-008 row
annotated PARTIAL_ALGORITHM_LEVEL; v2.24.0 amendment entry
records MODEL-1 coverage 1/10 -> 2/10 (first MODEL-1
non-provenance PARTIAL; mirrors SHIP-016/017/018/020 pattern).
Test: cargo test -p aprender-core --lib
falsify_ship_008_chat_template_render_bind -> 1 passed
Contract: pv validate contracts/chat-template-v1.yaml -> Contract is valid
Refs: SHIP-TWO-001, task #155
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
5 tasks
…arge (#1013) Wires AC-SHIP1-006 "apr qa <model> — all 8 gates PASS" at PARTIAL_ALGORITHM_LEVEL: a pure aggregate-AND verdict fn bound to the 8-gate ship criterion from `docs/specifications/components/qa.md` §3 (golden / throughput / ollama parity / gpu speedup / tensor contracts / format parity / ptx parity / metadata). Files: - `crates/aprender-core/src/qa/ship_006.rs` (NEW, 217 lines) — `verdict_from_qa_gates(&[bool]) -> Ship006Verdict` const fn with 7-section mutation survey: all-Pass→Pass, all-Fail→Fail, single-gate-flip × 8, exhaustive 2^8=256 bitmask proof, Pass→Fail monotonicity, length-drift counter-examples (0 / 7 / 9 / 16), provenance pin (AC_SHIP1_006_REQUIRED_QA_GATE_COUNT = 8). - `crates/aprender-core/src/qa/mod.rs` — register `pub mod ship_006;`. - `contracts/apr-model-qa-v1.yaml` v1.1.0 → v1.2.0 — adds `FALSIFY-QA-SHIP-006` with `ship_blocking: true`, `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by` pointing at ship_006.rs + the harness test, and `full_discharge_blocks_on` live `apr qa paiml/qwen2.5-coder-7b-apache-q4k-v1 --json` on an RTX 4090 host (8× `"pass": true` entries in the JSON body). - `docs/specifications/aprender-train/ship-two-models-spec.md` v2.24.0 → v2.25.0 — annotates AC-SHIP1-006 + FALSIFY-SHIP-006 rows with PARTIAL_ALGORITHM_LEVEL markers and adds v2.25.0 amendment entry. Design: mirrors the aggregate-AND shape set by MODEL-2 SHIP-016 (task #152 on `feat/falsify-ship-016-partial-discharge`, not yet on main). Authored self-contained because SHIP-016 hasn't landed; once both ship, the two `verdict_from_qa_gates_*` fns should be deduplicated into a single parameterized helper. Required gate count differs by model (both 8 today — the spec's "All must Pass" is model-independent). MODEL-1 AC-SHIP1 coverage: 2/10 touched (SHIP-008 + SHIP-009) → **3/10** touched (+ SHIP-006). First MODEL-1 aggregate-AND PARTIAL. Full discharge blocks on a live `apr qa` run against the teacher weights on RTX 4090; the compute-heavy portion is intentionally out of scope here. Test: `cargo test -p aprender-core --lib falsify_ship_006_apr_qa_eight_gates_aggregate` → 1 passed. Contract: `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/apr-model-qa-v1.yaml` → 0 errors. Stacked on #1012 (feat/falsify-ship-008-partial-discharge). Spec v2.25.0 builds on v2.24.0. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PARTIAL_ALGORITHM_LEVEL: MODEL-1 teacher (Qwen2.5-Coder-7B-Instruct, ChatML family) render of the canonical(system, user)messages is bound byte-exact to a golden string via a pureverdict_from_chat_template_render(rendered, golden) -> Pass|Failconst fn.contracts/chat-template-v1.yamlv1.0.0 → v1.1.0 addingGATE-CHAT-SHIP-008withship_blocking: true,discharge_status: PARTIAL_ALGORITHM_LEVEL,evidence_discharged_bypointing at the new harness,full_discharge_blocks_onliveapr run paiml/qwen2.5-coder-7b-apache-q4k-v1completion diff.docs/specifications/aprender-train/ship-two-models-spec.mdv2.23.0 → v2.24.0: MODEL-1 coverage 1/10 → 2/10 touched — first MODEL-1 non-provenance PARTIAL; mirrors the MODEL-2 pattern set by SHIP-016/017/018/020.What changed
New file
crates/aprender-core/src/text/chat_template/ship_008.rs(~260 lines)AC_SHIP1_008_CANONICAL_SYSTEM/_USER/_GOLDENconstantsShip008Verdict { Pass, Fail }binary enumverdict_from_chat_template_renderconst fn (byte-equality via while-loop, UTF-8-safe because inputs are ASCII+multibyte-safe)falsify_ship_008_chat_template_render_bindtest — 5-section mutation survey: engine-binding →Pass; empty →Fail; missing gen-prompt →Fail; wrong delim (<|user|>drift) →Fail; swapped role order →Fail; single-byte trailing flip →Fail; empty==empty symmetry; provenance-pin substring assertions (<|im_start|>× 3,<|im_end|>× 2, ends-with<|im_start|>assistant\n).Modified
crates/aprender-core/src/text/chat_template/mod.rs—include!("ship_008.rs")after existingtemplate.rs,raw_template.rs.contracts/chat-template-v1.yaml— v1.1.0 +GATE-CHAT-SHIP-008.docs/specifications/aprender-train/ship-two-models-spec.md— v2.24.0 amendment + annotated AC-SHIP1-008 / FALSIFY-SHIP-008 rows.Design
Mirrors the exact pattern set by FALSIFY-SHIP-016/017/018/020 on MODEL-2: pure binary verdict enum + const verdict fn + exhaustive counter-example survey + provenance pin + contract bump with
full_discharge_blocks_on. The decision rule (byte-equality over&str) is fully provable offline; the compute-heavy tier (live teacher render + downstream completion diff) is intentionally out of scope for this PR and tracked byfull_discharge_blocks_on. Any edit to either side of the bind — template engine, special tokens, golden string constant — flips the verdict toFailbefore teacher inference is launched.Test plan
cargo test -p aprender-core --lib falsify_ship_008_chat_template_render_bind→ 1 passed; 0 failed.pv validate contracts/chat-template-v1.yaml→ Contract is valid (0 errors, 0 warnings).cargo clippy -p aprender-core --lib --no-deps -- -D warnings→ clean.ci / test+ci / lint+workspace-testall green on this PR.full_discharge_blocks_on): liveapr run paiml/qwen2.5-coder-7b-apache-q4k-v1 --prompt <canonical>+ byte-diff completion against spec-defined golden.Refs: SHIP-TWO-001, task #155
🤖 Generated with Claude Code