Skip to content

feat(falsify-ship-007): MODEL-1 apr bench decode ≥30 tok/s PARTIAL discharge#1014

Closed
noahgift wants to merge 4 commits into
mainfrom
feat/falsify-ship-007-partial-discharge
Closed

feat(falsify-ship-007): MODEL-1 apr bench decode ≥30 tok/s PARTIAL discharge#1014
noahgift wants to merge 4 commits into
mainfrom
feat/falsify-ship-007-partial-discharge

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Summary

Wires AC-SHIP1-007 (`apr bench` decode ≥30 tok/s on RTX 4090 for 7B Q4_K)
at `PARTIAL_ALGORITHM_LEVEL`: a pure f32 threshold verdict fn bound to the
MODEL-1 teacher ship floor.

MODEL-1 AC-SHIP1 coverage: 3/10 → 4/10 touched (after SHIP-008 + SHIP-009 + SHIP-006).

What changed

File Change
`crates/aprender-core/src/bench/ship_007.rs` NEW — `verdict_from_decode_tps(f32) -> Ship007Verdict` + `AC_SHIP1_007_MIN_DECODE_TPS_RTX4090_7B = 30.0` + 7-section mutation survey
`crates/aprender-core/src/bench/mod.rs` register `pub mod ship_007;`
`contracts/qwen2-e2e-verification-v1.yaml` v1.0.0 → v1.1.0, adds `FALSIFY-QW2E-SHIP-007` (PARTIAL_ALGORITHM_LEVEL)
`docs/specifications/aprender-train/ship-two-models-spec.md` v2.25.0 → v2.26.0, annotates AC/FALSIFY rows + amendment entry

Design

  • f32 threshold: `Pass` iff measured tok/s is finite AND ≥ `AC_SHIP1_007_MIN_DECODE_TPS_RTX4090_7B` (30.0).
  • 7-section mutation survey in `falsify_ship_007_decode_tps_threshold_logic`:
    1. Boundary at exactly 30.0 → Pass (contract is ≥, not >).
    2. One-ULP-below 30.0 → Fail (sharpest off-by-one counter-example).
    3. Clear Pass band (45 / 100 tok/s).
    4. Clear Fail band (0 / 10 / 29.999_999).
    5. Monotonicity above floor + below floor (sampled discretely).
    6. Non-finite conservative Fail for NaN / +∞ / -∞ (telemetry-bug guard).
    7. Provenance pin on the 30.0 constant lockstepping with spec §4.2 AC-SHIP1-007.
  • Mirrors MODEL-2 SHIP-020 single-threshold shape (task apr fails to find config.json #150 on `feat/falsify-ship-020-partial-discharge`, PR feat(ship-two-001): FALSIFY-SHIP-020 algorithm-level PARTIAL discharge (5th PARTIAL) #1005). SHIP-020 isn't on main yet, so SHIP-007 is authored self-contained; once both ship the two verdict fns should be deduped into a single parameterized helper.
  • Full discharge blocks on a live `apr bench --iterations 5 --max-tokens 128 paiml/qwen2.5-coder-7b-apache-q4k-v1` run on RTX 4090 with `--features cuda`; median of 5 iterations must be ≥ 30.0.

Test plan

  • `cargo test -p aprender-core --lib falsify_ship_007_decode_tps_threshold_logic` → 1 passed.
  • `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/qwen2-e2e-verification-v1.yaml` → "Contract is valid" (0 errors, 0 warnings).
  • CI `ci / test` + `workspace-test` green on this stacked branch.
  • Full discharge: live `apr bench` median ≥ 30.0 tok/s on RTX 4090.

Stacked on #1012 (which absorbed #1013)

Base = `feat/falsify-ship-008-partial-discharge` (PR #1012 — already contains SHIP-008 + SHIP-006 after #1013 merged into its branch). When #1012 merges to main, GitHub will automatically retarget this PR to `main`.

🤖 Generated with Claude Code

noahgift and others added 3 commits April 22, 2026 18:25
Discharge FALSIFY-SHIP-008 / AC-SHIP1-008 at PARTIAL_ALGORITHM_LEVEL.

- contracts/chat-template-v1.yaml v1.0.0 -> v1.1.0: adds
  GATE-CHAT-SHIP-008 binding ChatMLTemplate::format_conversation to
  the canonical Qwen2.5-Coder-7B (system, user) golden via a pure
  verdict_from_chat_template_render const fn. ship_blocking: true,
  discharge_status: PARTIAL_ALGORITHM_LEVEL; full discharge blocks
  on live `apr run paiml/qwen2.5-coder-7b-apache-q4k-v1` completion
  diff against golden.
- crates/aprender-core/src/text/chat_template/ship_008.rs (new):
  AC_SHIP1_008_CANONICAL_{SYSTEM,USER,GOLDEN} constants +
  Ship008Verdict enum + verdict_from_chat_template_render const fn
  (byte-equality, UTF-8-safe) + 5-section mutation survey
  (engine-binding, empty Fail, missing-gen-prompt Fail, wrong-delim
  Fail, swapped-roles Fail, single-byte flip Fail) + symmetry +
  provenance pin.
- crates/aprender-core/src/text/chat_template/mod.rs: include!
  ship_008.rs alongside existing template.rs, raw_template.rs.
- docs/specifications/aprender-train/ship-two-models-spec.md
  v2.23.0 -> v2.24.0: AC-SHIP1-008 row + FALSIFY-SHIP-008 row
  annotated PARTIAL_ALGORITHM_LEVEL; v2.24.0 amendment entry
  records MODEL-1 coverage 1/10 -> 2/10 (first MODEL-1
  non-provenance PARTIAL; mirrors SHIP-016/017/018/020 pattern).

Test: cargo test -p aprender-core --lib
  falsify_ship_008_chat_template_render_bind -> 1 passed
Contract: pv validate contracts/chat-template-v1.yaml -> Contract is valid

Refs: SHIP-TWO-001, task #155

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…arge

Wires AC-SHIP1-006 "apr qa <model> — all 8 gates PASS" at
PARTIAL_ALGORITHM_LEVEL: a pure aggregate-AND verdict fn bound
to the 8-gate ship criterion from `docs/specifications/components/qa.md`
§3 (golden / throughput / ollama parity / gpu speedup / tensor contracts
/ format parity / ptx parity / metadata).

Files:
- `crates/aprender-core/src/qa/ship_006.rs` (NEW, 217 lines) —
  `verdict_from_qa_gates(&[bool]) -> Ship006Verdict` const fn with
  7-section mutation survey: all-Pass→Pass, all-Fail→Fail,
  single-gate-flip × 8, exhaustive 2^8=256 bitmask proof, Pass→Fail
  monotonicity, length-drift counter-examples (0 / 7 / 9 / 16),
  provenance pin (AC_SHIP1_006_REQUIRED_QA_GATE_COUNT = 8).

- `crates/aprender-core/src/qa/mod.rs` — register `pub mod ship_006;`.

- `contracts/apr-model-qa-v1.yaml` v1.1.0 → v1.2.0 — adds
  `FALSIFY-QA-SHIP-006` with `ship_blocking: true`,
  `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by`
  pointing at ship_006.rs + the harness test, and
  `full_discharge_blocks_on` live `apr qa paiml/qwen2.5-coder-7b-apache-q4k-v1
  --json` on an RTX 4090 host (8× `"pass": true` entries in the JSON body).

- `docs/specifications/aprender-train/ship-two-models-spec.md`
  v2.24.0 → v2.25.0 — annotates AC-SHIP1-006 + FALSIFY-SHIP-006 rows
  with PARTIAL_ALGORITHM_LEVEL markers and adds v2.25.0 amendment entry.

Design: mirrors the aggregate-AND shape set by MODEL-2 SHIP-016
(task #152 on `feat/falsify-ship-016-partial-discharge`, not yet on
main). Authored self-contained because SHIP-016 hasn't landed;
once both ship, the two `verdict_from_qa_gates_*` fns should be
deduplicated into a single parameterized helper. Required gate
count differs by model (both 8 today — the spec's "All must Pass"
is model-independent).

MODEL-1 AC-SHIP1 coverage: 2/10 touched (SHIP-008 + SHIP-009) →
**3/10** touched (+ SHIP-006). First MODEL-1 aggregate-AND PARTIAL.

Full discharge blocks on a live `apr qa` run against the teacher
weights on RTX 4090; the compute-heavy portion is intentionally
out of scope here.

Test: `cargo test -p aprender-core --lib falsify_ship_006_apr_qa_eight_gates_aggregate` → 1 passed.
Contract: `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/apr-model-qa-v1.yaml` → 0 errors.

Stacked on #1012 (feat/falsify-ship-008-partial-discharge). Spec
v2.25.0 builds on v2.24.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…scharge

Wires AC-SHIP1-007 "apr bench decode throughput ≥30 tok/s on RTX 4090
(7B Q4_K target)" at PARTIAL_ALGORITHM_LEVEL: a pure f32 threshold
verdict fn bound to the MODEL-1 teacher ship floor. The decision rule
is proven today; the compute-heavy half (live `apr bench` on RTX 4090)
is deferred to hardware evidence collection.

Files:
- `crates/aprender-core/src/bench/ship_007.rs` (NEW, 158 lines) —
  `AC_SHIP1_007_MIN_DECODE_TPS_RTX4090_7B = 30.0`,
  `Ship007Verdict { Pass, Fail }`,
  `verdict_from_decode_tps(f32) -> Ship007Verdict`,
  `falsify_ship_007_decode_tps_threshold_logic` 7-section survey:
    1. boundary (30.0 exactly → Pass; the contract is ≥, not >)
    2. one-ULP-below → Fail (sharpest off-by-one counter-example)
    3. clear Pass band (45 / 100 tok/s)
    4. clear Fail band (0 / 10 / 29.999999)
    5. monotonicity above floor + below floor
    6. non-finite → Fail conservatively (NaN, +∞, -∞)
    7. provenance pin binding the 30.0 constant to spec §4.2.

- `crates/aprender-core/src/bench/mod.rs` — register `pub mod ship_007;`.

- `contracts/qwen2-e2e-verification-v1.yaml` v1.0.0 → v1.1.0 — adds
  `FALSIFY-QW2E-SHIP-007` with `ship_blocking: true`,
  `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by`
  pointing at ship_007.rs + the harness test, and
  `full_discharge_blocks_on` live `apr bench --iterations 5
  --max-tokens 128 paiml/qwen2.5-coder-7b-apache-q4k-v1` on RTX 4090
  with --features cuda; median of 5 iterations must be ≥ 30.0. Also
  4 `counter_example_classes` (regressed_kernel / drifted_constant /
  relaxed_rule / nan_promoted).

- `docs/specifications/aprender-train/ship-two-models-spec.md`
  v2.25.0 → v2.26.0 — annotates AC-SHIP1-007 + FALSIFY-SHIP-007 rows
  with PARTIAL_ALGORITHM_LEVEL markers and adds v2.26.0 amendment entry.

Design: mirrors the MODEL-2 SHIP-020 single-f32-threshold shape (task
#150 on `feat/falsify-ship-020-partial-discharge`, PR #1005 not yet on
main). Authored self-contained because SHIP-020 hasn't landed; once
both ship, the two `verdict_from_decode_tps_*` fns should be
deduplicated into a single parameterized helper
`verdict_from_decode_tps(measured, floor) -> ThresholdVerdict` with
the model-specific floor pinned as a module-level const. MODEL-1 floor
is 30.0 (7B Q4_K, bandwidth-bound at ~3.5× the size of 370M); MODEL-2
floor is 100.0 (370M sovereign, compute-bound at RTX 4090 bandwidth).

MODEL-1 AC-SHIP1 coverage: 3/10 touched (SHIP-008 + SHIP-009 +
SHIP-006) → **4/10** touched (+ SHIP-007).

Test: `cargo test -p aprender-core --lib falsify_ship_007_decode_tps_threshold_logic` → 1 passed.
Contract: `cargo run --quiet -p aprender-contracts-cli --bin pv -- validate contracts/qwen2-e2e-verification-v1.yaml` → 0 errors.

Stacked on #1013 (feat/falsify-ship-006-partial-discharge), which is
itself stacked on #1012 (feat/falsify-ship-008-partial-discharge).
Spec v2.26.0 builds on v2.25.0 which builds on v2.24.0.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Base automatically changed from feat/falsify-ship-008-partial-discharge to main April 22, 2026 16:55
…nEval pass@1 ≥86.00% (1.2 pp noise → effective 84.80%) (#1015)

Wires MODEL-1 `apr eval --benchmark humaneval` ship floor (AC-SHIP1-005)
to a pure two-number threshold verdict fn. 5th compute-free MODEL-1
lever (SHIP-008 + SHIP-009 + SHIP-006 + SHIP-007 + SHIP-005) brings
MODEL-1 AC-SHIP1 coverage to 5/10 touched. Mirrors MODEL-2 SHIP-018
pattern (pass@1 threshold) but uniquely carries a 1.2 pp noise
allowance called out by spec §4.2 AC-SHIP1-005.

contracts/qwen2-e2e-verification-v1.yaml v1.1.0 → v1.2.0:
  - Adds FALSIFY-QW2E-SHIP-005 binding
      AC_SHIP1_005_NOMINAL_HUMANEVAL_PASS_AT_1_PCT = 86.00
      AC_SHIP1_005_NOISE_ALLOWANCE_PP              = 1.20
      AC_SHIP1_005_EFFECTIVE_HUMANEVAL_PASS_AT_1_PCT ≈ 84.80
    to `verdict_from_pass_at_1(correct, total, threshold_pct) ->
    Ship005Verdict` in `crates/aprender-core/src/metrics/ship_005.rs`.
  - 8-section mutation survey:
      1. Safe-margin Pass above effective floor (85/100 = 85.0%)
      2. Above nominal floor (87/100 = 87.0%) Pass
      3. Noise-window Fail at nominal (85/100 Fails nominal)
      4. Below-effective Fail incl. HumanEval-canonical 139/164 = 84.756%
      5. Monotonicity sweep correct=0..=164 at effective
      6. Div-safety (total=0) + sanity (correct>total) → Fail
      7. Non-finite threshold (NaN, ±∞) → Fail conservatively
      8. Tolerance-bounded provenance pin on all three constants
         (86.0 − 1.2 in f32 yields ~84.79999924, not exact 84.80).
  - `ship_blocking: true`, `discharge_status: PARTIAL_ALGORITHM_LEVEL`,
    `full_discharge_blocks_on: live apr eval --benchmark humaneval ...`
    on RTX 4090; 6 named counter_example_classes.

crates/aprender-core/src/metrics/ship_005.rs (NEW, 310 lines):
  - Three-constant design unique to MODEL-1 (SHIP-007/018 had one).
  - `#[must_use] pub fn verdict_from_pass_at_1(...)` returns
    `Ship005Verdict::Fail` conservatively on: total=0 (div guard),
    correct>total (sanity), !threshold.is_finite() (NaN/±∞).
  - `falsify_ship_005_humaneval_pass_at_1_threshold_logic` — 1 passing.

Spec `docs/specifications/aprender-train/ship-two-models-spec.md`
  v2.26.0 → v2.27.0 annotates AC-SHIP1-005 + FALSIFY-SHIP-005 rows
  `**(PARTIAL_ALGORITHM_LEVEL v2.27.0)**` and appends amendment entry
  noting 11 PARTIAL + 3 DISCHARGED across both models, MODEL-1 5/10.

Authored self-contained because MODEL-2 SHIP-018 sibling PR has not
yet landed on main. Once it does, the two `verdict_from_pass_at_1_*`
fns should be dedup'd into a single parameterized helper.

Full discharge blocks on: live `apr eval --benchmark humaneval
paiml/qwen2.5-coder-7b-apache-q4k-v1 --json` on RTX 4090 with
--features cuda; median pass@1 across 3 seed=0 runs ≥ 86.00 (or
≥ 84.80 under the 1.2 pp noise allowance).

Tests:
  cargo test -p aprender-core --lib \
    falsify_ship_005_humaneval_pass_at_1_threshold_logic
Contract:
  cargo run --quiet -p aprender-contracts-cli --bin pv -- validate \
    contracts/qwen2-e2e-verification-v1.yaml

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift

Copy link
Copy Markdown
Contributor Author

Superseded by PR #1019 — clean-branch rebuild on post-SHIP-002 main (contract v1.1.0 → v1.2.0). The original branch was stacked on feat/falsify-ship-008/006-partial-discharge which had not merged to main, producing CONFLICTING state. Closing stale.

@noahgift noahgift closed this Apr 22, 2026
noahgift added a commit that referenced this pull request Apr 23, 2026
…lean branch)

Clean-branch rebuild of SHIP-007 PARTIAL_ALGORITHM_LEVEL discharge on
main (superseding stale PR #1014 which was stacked on
feat/falsify-ship-008/006-partial-discharge branches that had not yet
merged to main). Algorithm commit carries the same 7-section mutation
survey as the original be6d129, re-based onto post-SHIP-002 main
(commit f615148, contract v1.1.0).

Wires AC-SHIP1-007 "apr bench decode throughput ≥30 tok/s on RTX 4090
(7B Q4_K target)" at PARTIAL_ALGORITHM_LEVEL: a pure f32 threshold
verdict fn bound to the MODEL-1 teacher ship floor. Decision rule is
proven today; compute-heavy half (live `apr bench` on RTX 4090) is
deferred to hardware evidence collection.

Files:
- `crates/aprender-core/src/bench/ship_007.rs` (NEW) —
  `AC_SHIP1_007_MIN_DECODE_TPS_RTX4090_7B = 30.0`,
  `Ship007Verdict { Pass, Fail }`,
  `verdict_from_decode_tps(f32) -> Ship007Verdict`,
  `falsify_ship_007_decode_tps_threshold_logic` 7-section survey:
    1. boundary (30.0 exactly → Pass; contract is ≥, not >)
    2. one-ULP-below → Fail (sharpest off-by-one counter-example)
    3. clear Pass band (45 / 100 tok/s)
    4. clear Fail band (0 / 10 / 29.999999)
    5. monotonicity above floor + below floor
    6. non-finite → Fail conservatively (NaN, +∞, -∞)
    7. provenance pin binding 30.0 to spec §4.2.

- `crates/aprender-core/src/bench/mod.rs` — register `pub mod ship_007;`.

- `contracts/qwen2-e2e-verification-v1.yaml` v1.1.0 → v1.2.0 — adds
  `FALSIFY-QW2E-SHIP-007` with `ship_blocking: true`,
  `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by`
  pointing at ship_007.rs + the harness test, and
  `full_discharge_blocks_on` live `apr bench --iterations 5
  --max-tokens 128 paiml/qwen2.5-coder-7b-apache-q4k-v1` on RTX 4090
  with --features cuda; median of 5 iterations must be ≥ 30.0.

- `docs/specifications/aprender-train/ship-two-models-spec.md` v2.26.0
  → v2.27.0 — annotates AC-SHIP1-007 row with PARTIAL_ALGORITHM_LEVEL
  v2.27.0 marker and adds v2.27.0 amendment entry.

Design: mirrors MODEL-2 SHIP-020 single-f32-threshold shape (PR #1005
not yet on main). Once both ship, the two `verdict_from_decode_tps_*`
fns should be deduplicated into a single parameterized helper
`verdict_from_decode_tps(measured, floor) -> ThresholdVerdict` with
model-specific floors pinned as module-level consts. MODEL-1 floor is
30.0 (7B Q4_K, bandwidth-bound at ~3.5× the 370M size); MODEL-2 floor
is 100.0 (370M sovereign, compute-bound at RTX 4090 bandwidth).

MODEL-1 AC-SHIP1 coverage: 4/10 touched (SHIP-009 + SHIP-008 + SHIP-006
+ SHIP-002) → **5/10** touched (+ SHIP-007).

Test: `cargo test -p aprender-core --lib falsify_ship_007_decode_tps_threshold_logic` → 1 passed.
Contract: `pv validate contracts/qwen2-e2e-verification-v1.yaml` → 0 errors.
Clippy: `cargo clippy -p aprender-core --lib -- -D warnings` → clean.
Fmt: `cargo fmt --check -p aprender-core` → clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 23, 2026
…lean branch) (#1019)

* feat(falsify-ship-007): MODEL-1 apr bench decode ≥30 tok/s PARTIAL (clean branch)

Clean-branch rebuild of SHIP-007 PARTIAL_ALGORITHM_LEVEL discharge on
main (superseding stale PR #1014 which was stacked on
feat/falsify-ship-008/006-partial-discharge branches that had not yet
merged to main). Algorithm commit carries the same 7-section mutation
survey as the original be6d129, re-based onto post-SHIP-002 main
(commit f615148, contract v1.1.0).

Wires AC-SHIP1-007 "apr bench decode throughput ≥30 tok/s on RTX 4090
(7B Q4_K target)" at PARTIAL_ALGORITHM_LEVEL: a pure f32 threshold
verdict fn bound to the MODEL-1 teacher ship floor. Decision rule is
proven today; compute-heavy half (live `apr bench` on RTX 4090) is
deferred to hardware evidence collection.

Files:
- `crates/aprender-core/src/bench/ship_007.rs` (NEW) —
  `AC_SHIP1_007_MIN_DECODE_TPS_RTX4090_7B = 30.0`,
  `Ship007Verdict { Pass, Fail }`,
  `verdict_from_decode_tps(f32) -> Ship007Verdict`,
  `falsify_ship_007_decode_tps_threshold_logic` 7-section survey:
    1. boundary (30.0 exactly → Pass; contract is ≥, not >)
    2. one-ULP-below → Fail (sharpest off-by-one counter-example)
    3. clear Pass band (45 / 100 tok/s)
    4. clear Fail band (0 / 10 / 29.999999)
    5. monotonicity above floor + below floor
    6. non-finite → Fail conservatively (NaN, +∞, -∞)
    7. provenance pin binding 30.0 to spec §4.2.

- `crates/aprender-core/src/bench/mod.rs` — register `pub mod ship_007;`.

- `contracts/qwen2-e2e-verification-v1.yaml` v1.1.0 → v1.2.0 — adds
  `FALSIFY-QW2E-SHIP-007` with `ship_blocking: true`,
  `discharge_status: PARTIAL_ALGORITHM_LEVEL`, `evidence_discharged_by`
  pointing at ship_007.rs + the harness test, and
  `full_discharge_blocks_on` live `apr bench --iterations 5
  --max-tokens 128 paiml/qwen2.5-coder-7b-apache-q4k-v1` on RTX 4090
  with --features cuda; median of 5 iterations must be ≥ 30.0.

- `docs/specifications/aprender-train/ship-two-models-spec.md` v2.26.0
  → v2.27.0 — annotates AC-SHIP1-007 row with PARTIAL_ALGORITHM_LEVEL
  v2.27.0 marker and adds v2.27.0 amendment entry.

Design: mirrors MODEL-2 SHIP-020 single-f32-threshold shape (PR #1005
not yet on main). Once both ship, the two `verdict_from_decode_tps_*`
fns should be deduplicated into a single parameterized helper
`verdict_from_decode_tps(measured, floor) -> ThresholdVerdict` with
model-specific floors pinned as module-level consts. MODEL-1 floor is
30.0 (7B Q4_K, bandwidth-bound at ~3.5× the 370M size); MODEL-2 floor
is 100.0 (370M sovereign, compute-bound at RTX 4090 bandwidth).

MODEL-1 AC-SHIP1 coverage: 4/10 touched (SHIP-009 + SHIP-008 + SHIP-006
+ SHIP-002) → **5/10** touched (+ SHIP-007).

Test: `cargo test -p aprender-core --lib falsify_ship_007_decode_tps_threshold_logic` → 1 passed.
Contract: `pv validate contracts/qwen2-e2e-verification-v1.yaml` → 0 errors.
Clippy: `cargo clippy -p aprender-core --lib -- -D warnings` → clean.
Fmt: `cargo fmt --check -p aprender-core` → clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci: retrigger after 3 disk-guard race failures

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant