Skip to content

feat(contract): clean-chat-output-v1 — codify M287 cascade sanitization invariants#1859

Merged
noahgift merged 3 commits into
mainfrom
feat/clean-chat-output-v1-contract
May 21, 2026
Merged

feat(contract): clean-chat-output-v1 — codify M287 cascade sanitization invariants#1859
noahgift merged 3 commits into
mainfrom
feat/clean-chat-output-v1-contract

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Authors the provable contract behind `clean_chat_output` so the six invariants established by the M287 → #1852#1853 cascade are falsifier-backed instead of merely tested.

Why

The cascade fixed three things in concert:

The implementation (`crates/aprender-serve/src/api/realize_handlers.rs::clean_chat_output`) already lives; this contract retroactively codifies its guarantees so future stop-sequence changes require a contract bump alongside the code change. Hooks `pv lint` / contract-coverage audits onto a previously-uncontracted sanitization layer.

Six falsifiers

ID Guarantee
V1_001 Leading "Human:" / "User:" / "Assistant:" stripped
V1_002 Stop sequence inside body truncates at first occurrence
V1_003 Earliest stop sequence wins when multiple are present
V1_004 Clean text passes through (trim-only)
V1_005 Empty / whitespace / stop-only collapses to ""
V1_006 STOP_SEQUENCES code constant ↔ contract YAML stay synced

All six are already covered by existing unit tests in `crates/aprender-serve/src/api/realize_handlers_clean_chat.rs` and `crates/aprender-serve/src/api/tests/format_chat_02.rs`.

Validation

```bash
$ cargo run -p aprender-contracts-cli --bin pv -- validate contracts/clean-chat-output-v1.yaml
0 error(s), 0 warning(s)
Contract is valid.
```

🤖 Generated with Claude Code

…on invariants

Author the provable contract behind `clean_chat_output` so the six
invariants established by the M287 → #1852#1853 cascade are
falsifier-backed instead of merely tested.

## Why

The cascade fixed three things in concert:
- #1852: EOS stop-token detection (`<|im_end|>` / `<|endoftext|>`)
- #1853: leading "Human:"/"User:"/"Assistant:" prefix strip
- M287 surface: 'Human: I need to...' runaway pattern post-EOS-miss

The implementation (`crates/aprender-serve/src/api/realize_handlers.rs::clean_chat_output`)
already lives; this contract retroactively codifies its guarantees so
future stop-sequence changes require a contract bump alongside the
code change. Hooks `pv lint` / contract-coverage audits onto a
previously-uncontracted sanitization layer.

## Six falsifiers

- V1_001: leading "Human:" / "User:" / "Assistant:" stripped
- V1_002: stop sequence inside body truncates at first occurrence
- V1_003: earliest stop sequence wins when multiple are present
- V1_004: clean text passes through (trim-only)
- V1_005: empty / whitespace / stop-only collapses to ""
- V1_006: STOP_SEQUENCES code constant ↔ contract YAML stay synced
  (manual audit for now; could be pv-lint check later)

## Evidence

All six are already covered by existing unit tests in
`crates/aprender-serve/src/api/realize_handlers_clean_chat.rs` and
`crates/aprender-serve/src/api/tests/format_chat_02.rs`.

## Validation

`pv validate contracts/clean-chat-output-v1.yaml` → "Contract is valid."

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 21, 2026 16:22
@noahgift noahgift merged commit fa67c9e into main May 21, 2026
10 checks passed
@noahgift noahgift deleted the feat/clean-chat-output-v1-contract branch May 21, 2026 17:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant