Skip to content

feat(aprender-core): apr-cli-qa-v1 QA-001 PARTIAL_ALGORITHM_LEVEL#1172

Merged
noahgift merged 1 commit into
mainfrom
feat/qa-001-partial-discharge
Apr 30, 2026
Merged

feat(aprender-core): apr-cli-qa-v1 QA-001 PARTIAL_ALGORITHM_LEVEL#1172
noahgift merged 1 commit into
mainfrom
feat/qa-001-partial-discharge

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

  • Algorithm-level PARTIAL discharge for FALSIFY-QA-001 (every registered apr command exits 0 on --help) per contracts/apr-cli-qa-v1.yaml.
  • New module crates/aprender-core/src/format/qa_001.rs exporting verdict_from_help_subcommand_smoke(commands_tested, commands_passing_help_smoke) -> Qa001Verdict.
  • Pinned constant: AC_QA_001_EXPECTED_COMMAND_COUNT = 58 (canonical apr CLI surface per spec §26.8 / CLAUDE.md).
  • Pass iff: tested == 58 AND passing == tested AND passing <= tested.
  • 13 unit tests across 7 mutation-survey sections.

Why two-counter shape

A single passing == 58 check would silently pass "60 commands registered, 58 happen to pass --help, 2 broken" — registry-size drift class. Tracking commands_tested separately catches phantom (59) AND dropped (57) subcommands AND broken-dispatch (passing < tested) — three distinct regression classes.

Why pin 58

Per spec §26.8 / CLAUDE.md: 58 subcommands (57 original + mcp PR #864 2026-04-17). Adding a fourth enforcement layer to the existing 3-surface-drift trio (registered_commands test, apr-cli-commands-v1.yaml, CLAUDE.md).

Five-Whys (commit-message body has full chain)

  1. Bind now → phantom/broken-dispatch ships invisibly.
  2. (u64, u64) pair → algorithm-level pin.
  3. Pin 58 → catches add/drop drift.
  4. Two counters → distinguishes registry-size from dispatch-broken.
  5. 13 tests → 7 sections incl. composite property test.

Test plan

  • cargo test -p aprender-core --lib qa_001 → 13 passed
  • PMAT pre-commit gates pass

🤖 Generated with Claude Code

Algorithm-level PARTIAL discharge for FALSIFY-QA-001 (every
registered apr command exits 0 on --help) per
`contracts/apr-cli-qa-v1.yaml`.

## What this binds

`verdict_from_help_subcommand_smoke(commands_tested, commands_passing_help_smoke)`
returns Pass iff:

1. `commands_tested == 58` (canonical apr CLI surface)
2. `commands_passing_help_smoke == commands_tested` (every command's
   --help exited 0)
3. `commands_passing_help_smoke <= commands_tested` (counter sanity)

Pinned constant: `AC_QA_001_EXPECTED_COMMAND_COUNT = 58`.

## Why pin 58 commands explicitly

Per spec §26.8 / CLAUDE.md: the canonical apr CLI surface is 58
subcommands (57 original + `mcp` added in PR #864 on 2026-04-17).
A regression that adds or drops a subcommand without bumping the
contract trips the gate three different ways:
- `tested == 59`: phantom subcommand added (caught by
  `fail_phantom_subcommand_added_57_to_59`)
- `tested == 57`: subcommand dropped (caught by
  `fail_subcommand_dropped`)
- `passing < tested`: dispatch broken (caught by
  `fail_one_command_broken`)

## Why two-counter shape (tested + passing)

A single `passing == 58` check would conflate two distinct
regression classes:
- "Registry has 58 commands and all dispatch" (true Pass)
- "Registry has 60 commands, 58 happen to pass --help, but 2 are
   broken" (silent Pass under single-counter)

Tracking `commands_tested` separately catches the second class.
Section 4's `fail_phantom_subcommand_added_57_to_59` test exercises
exactly this regression shape — 59 commands all pass --help, but
the count drift is the actual defect.

## Five-Whys

1. Why bind QA-001 now? — `apr --help` smoke is the most
   user-facing CLI gate; without a verdict pin, a phantom
   subcommand or broken dispatch ships invisibly until users
   hit "command not found".
2. Why a (u64, u64) pair? — Algorithm-level pin; the actual
   per-subcommand `--help` invocation harness is FULL_DISCHARGE.
3. Why pin 58? — Catches add/drop drift; matches CLAUDE.md
   canonical claim.
4. Why two counters (not one)? — Distinguishes "registry size
   drift" from "dispatch broken" — both must Fail but for
   different reasons.
5. Why 13 tests across 7 sections? — Provenance pin (×1),
   pass band (×1: canonical 58/58), broken-dispatch fail (×3),
   count-drift fail (×3: phantom, dropped, zero), partition
   violations (×2), passing-sweep at 58 (×8), tested-sweep
   when perfect (×7), composite property test (×7 cells).

## Cross-reference

Note the "tested = 58" pin should be kept in sync with:
- `crates/apr-cli/tests/cli_commands.rs::registered_commands()`
- `contracts/apr-cli-commands-v1.yaml`
- CLAUDE.md "58 subcommands" claim

Per `feedback_cli_subcommand_three_surface_drift` memory, all
three surfaces must update together when adding a new command.
QA-001's verdict pin adds a fourth enforcement layer.

## Scope

PARTIAL_ALGORITHM_LEVEL only. Wiring this into the actual
`registered_commands` test harness that produces the
`(commands_tested, commands_passing_help_smoke)` pair is
FULL_DISCHARGE work for the qa-pipeline implementation PR.

## Tests

13 unit tests, all green.
@noahgift noahgift enabled auto-merge (squash) April 30, 2026 14:44
@noahgift noahgift merged commit 9694484 into main Apr 30, 2026
11 checks passed
@noahgift noahgift deleted the feat/qa-001-partial-discharge branch April 30, 2026 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant