Skip to content

feat(falsify-ship-008): MODEL-1 chat template PARTIAL discharge#1012

Merged
noahgift merged 2 commits into
mainfrom
feat/falsify-ship-008-partial-discharge
Apr 22, 2026
Merged

feat(falsify-ship-008): MODEL-1 chat template PARTIAL discharge#1012
noahgift merged 2 commits into
mainfrom
feat/falsify-ship-008-partial-discharge

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

  • Discharges FALSIFY-SHIP-008 / AC-SHIP1-008 at PARTIAL_ALGORITHM_LEVEL: MODEL-1 teacher (Qwen2.5-Coder-7B-Instruct, ChatML family) render of the canonical (system, user) messages is bound byte-exact to a golden string via a pure verdict_from_chat_template_render(rendered, golden) -> Pass|Fail const fn.
  • Bumps contracts/chat-template-v1.yaml v1.0.0 → v1.1.0 adding GATE-CHAT-SHIP-008 with ship_blocking: true, discharge_status: PARTIAL_ALGORITHM_LEVEL, evidence_discharged_by pointing at the new harness, full_discharge_blocks_on live apr run paiml/qwen2.5-coder-7b-apache-q4k-v1 completion diff.
  • Amends docs/specifications/aprender-train/ship-two-models-spec.md v2.23.0 → v2.24.0: MODEL-1 coverage 1/10 → 2/10 touched — first MODEL-1 non-provenance PARTIAL; mirrors the MODEL-2 pattern set by SHIP-016/017/018/020.

What changed

New file

  • crates/aprender-core/src/text/chat_template/ship_008.rs (~260 lines)
    • AC_SHIP1_008_CANONICAL_SYSTEM/_USER/_GOLDEN constants
    • Ship008Verdict { Pass, Fail } binary enum
    • verdict_from_chat_template_render const fn (byte-equality via while-loop, UTF-8-safe because inputs are ASCII+multibyte-safe)
    • falsify_ship_008_chat_template_render_bind test — 5-section mutation survey: engine-binding → Pass; empty → Fail; missing gen-prompt → Fail; wrong delim (<|user|> drift) → Fail; swapped role order → Fail; single-byte trailing flip → Fail; empty==empty symmetry; provenance-pin substring assertions (<|im_start|> × 3, <|im_end|> × 2, ends-with <|im_start|>assistant\n).

Modified

  • crates/aprender-core/src/text/chat_template/mod.rsinclude!("ship_008.rs") after existing template.rs, raw_template.rs.
  • contracts/chat-template-v1.yaml — v1.1.0 + GATE-CHAT-SHIP-008.
  • docs/specifications/aprender-train/ship-two-models-spec.md — v2.24.0 amendment + annotated AC-SHIP1-008 / FALSIFY-SHIP-008 rows.

Design

Mirrors the exact pattern set by FALSIFY-SHIP-016/017/018/020 on MODEL-2: pure binary verdict enum + const verdict fn + exhaustive counter-example survey + provenance pin + contract bump with full_discharge_blocks_on. The decision rule (byte-equality over &str) is fully provable offline; the compute-heavy tier (live teacher render + downstream completion diff) is intentionally out of scope for this PR and tracked by full_discharge_blocks_on. Any edit to either side of the bind — template engine, special tokens, golden string constant — flips the verdict to Fail before teacher inference is launched.

Test plan

  • cargo test -p aprender-core --lib falsify_ship_008_chat_template_render_bind1 passed; 0 failed.
  • pv validate contracts/chat-template-v1.yamlContract is valid (0 errors, 0 warnings).
  • cargo clippy -p aprender-core --lib --no-deps -- -D warnings → clean.
  • CI ci / test + ci / lint + workspace-test all green on this PR.
  • Full discharge (deferred, tracked by full_discharge_blocks_on): live apr run paiml/qwen2.5-coder-7b-apache-q4k-v1 --prompt <canonical> + byte-diff completion against spec-defined golden.

Refs: SHIP-TWO-001, task #155

🤖 Generated with Claude Code

Discharge FALSIFY-SHIP-008 / AC-SHIP1-008 at PARTIAL_ALGORITHM_LEVEL.

- contracts/chat-template-v1.yaml v1.0.0 -> v1.1.0: adds
  GATE-CHAT-SHIP-008 binding ChatMLTemplate::format_conversation to
  the canonical Qwen2.5-Coder-7B (system, user) golden via a pure
  verdict_from_chat_template_render const fn. ship_blocking: true,
  discharge_status: PARTIAL_ALGORITHM_LEVEL; full discharge blocks
  on live `apr run paiml/qwen2.5-coder-7b-apache-q4k-v1` completion
  diff against golden.
- crates/aprender-core/src/text/chat_template/ship_008.rs (new):
  AC_SHIP1_008_CANONICAL_{SYSTEM,USER,GOLDEN} constants +
  Ship008Verdict enum + verdict_from_chat_template_render const fn
  (byte-equality, UTF-8-safe) + 5-section mutation survey
  (engine-binding, empty Fail, missing-gen-prompt Fail, wrong-delim
  Fail, swapped-roles Fail, single-byte flip Fail) + symmetry +
  provenance pin.
- crates/aprender-core/src/text/chat_template/mod.rs: include!
  ship_008.rs alongside existing template.rs, raw_template.rs.
- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.23.0 -> v2.24.0: AC-SHIP1-008 row + FALSIFY-SHIP-008 row
  annotated PARTIAL_ALGORITHM_LEVEL; v2.24.0 amendment entry
  records MODEL-1 coverage 1/10 -> 2/10 (first MODEL-1
  non-provenance PARTIAL; mirrors SHIP-016/017/018/020 pattern).

Test: cargo test -p aprender-core --lib
  falsify_ship_008_chat_template_render_bind -> 1 passed
Contract: pv validate contracts/chat-template-v1.yaml -> Contract is valid

Refs: SHIP-TWO-001, task #155

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…arge (#1013)

Wires AC-SHIP1-006 "apr qa <model> — all 8 gates PASS" at
PARTIAL_ALGORITHM_LEVEL: a pure aggregate-AND verdict fn bound
to the 8-gate ship criterion from `docs/specifications/components/qa.md`
§3 (golden / throughput / ollama parity / gpu speedup / tensor contracts
/ format parity / ptx parity / metadata).

Files:
- `crates/aprender-core/src/qa/ship_006.rs` (NEW, 217 lines) —
  `verdict_from_qa_gates(&[bool]) -> Ship006Verdict` const fn with
  7-section mutation survey: all-Pass→Pass, all-Fail→Fail,
  single-gate-flip × 8, exhaustive 2^8=256 bitmask proof, Pass→Fail
  monotonicity, length-drift counter-examples (0 / 7 / 9 / 16),
  provenance pin (AC_SHIP1_006_REQUIRED_QA_GATE_COUNT = 8).

- `crates/aprender-core/src/qa/mod.rs` — register `pub mod ship_006;`.

- `contracts/apr-model-qa-v1.yaml` v1.1.0 → v1.2.0 — adds
  `FALSIFY-QA-SHIP-006` with `ship_blocking: true`,
  `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by`
  pointing at ship_006.rs + the harness test, and
  `full_discharge_blocks_on` live `apr qa paiml/qwen2.5-coder-7b-apache-q4k-v1
  --json` on an RTX 4090 host (8× `"pass": true` entries in the JSON body).

- `docs/specifications/aprender-train/ship-two-models-spec.md`
  v2.24.0 → v2.25.0 — annotates AC-SHIP1-006 + FALSIFY-SHIP-006 rows
  with PARTIAL_ALGORITHM_LEVEL markers and adds v2.25.0 amendment entry.

Design: mirrors the aggregate-AND shape set by MODEL-2 SHIP-016
(task #152 on `feat/falsify-ship-016-partial-discharge`, not yet on
main). Authored self-contained because SHIP-016 hasn't landed;
once both ship, the two `verdict_from_qa_gates_*` fns should be
deduplicated into a single parameterized helper. Required gate
count differs by model (both 8 today — the spec's "All must Pass"
is model-independent).

MODEL-1 AC-SHIP1 coverage: 2/10 touched (SHIP-008 + SHIP-009) →
**3/10** touched (+ SHIP-006). First MODEL-1 aggregate-AND PARTIAL.

Full discharge blocks on a live `apr qa` run against the teacher
weights on RTX 4090; the compute-heavy portion is intentionally
out of scope here.

Test: `cargo test -p aprender-core --lib falsify_ship_006_apr_qa_eight_gates_aggregate` → 1 passed.
Contract: `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/apr-model-qa-v1.yaml` → 0 errors.

Stacked on #1012 (feat/falsify-ship-008-partial-discharge). Spec
v2.25.0 builds on v2.24.0.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift merged commit 1263178 into main Apr 22, 2026
10 checks passed
@noahgift noahgift deleted the feat/falsify-ship-008-partial-discharge branch April 22, 2026 16:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant