Skip to content

docs(ship-two-001): §26 + §26.8 — three-priority plan + apr-is-canonical stack-tool-extension rule — spec v2.69.0 → v2.71.0#1079

Merged
noahgift merged 1 commit into
mainfrom
feat/spec-26-next-session-execution-plan
Apr 27, 2026
Merged

docs(ship-two-001): §26 + §26.8 — three-priority plan + apr-is-canonical stack-tool-extension rule — spec v2.69.0 → v2.71.0#1079
noahgift merged 1 commit into
mainfrom
feat/spec-26-next-session-execution-plan

Conversation

@noahgift

@noahgift noahgift commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

Summary

Two-section spec amendment: §26 codifies the three-priority next-session execution plan (P1/P2/P3 with falsifiable binding criteria), and §26.8 codifies a binding methodology rule discovered mid-session:

When a stack CLI lacks a feature, extend the tool via contract→code, NEVER route around to a non-stack shim like huggingface-cli.

§26.8 triggering incident (2026-04-27)

P1 sub-agent recommended huggingface-cli download --include 'data/train-000[0-7][0-9]-of-00880.parquet' because batuta hf pull lacks --include. This is mudahuggingface-cli is non-stack, batuta hf pull is stack-canonical. Reaching for a non-stack CLI to bypass a missing flag violates feedback_fix_root_cause_never_route_around.md and feedback_pv_not_bash_for_contracts.md.

§26.8.1 Binding rule

When a stack CLI (batuta, apr, pv, …) lacks a feature:

  1. Author contracts/<tool>-cli-<flag>-v1.yaml provable contract
  2. Extend the tool via in-tree implementation
  3. Use the extended stack tool

Acceptable narrow exceptions: one-off uv run --with data prep, one-off xxd forensics. Recurring workflows MUST extend the stack tool.

§26 priority matrix

Priority Wall-time Binding Discharges
P1 ~6-8 hr manifest.total_tokens > 1e9 AND vocab_size == 50257 enables P2
P2 7.3 hr best_val_loss < 9.75 up to 9 MODEL-2 PARTIALs
P3 ~4 hr APR vs GGUF layer-3 ffn_swigl ratio ≥10× or <2× up to 5 MODEL-1 PARTIALs

P1 + P3 parallel; P2 gated on P1.

§26.9 Revised P1 chain (per §26.8 rule)

P1 now has a prerequisite:

P1.0 Author contracts/batuta-cli-pull-pattern-include-v1.yaml
P1.1 Implement batuta hf pull --include <glob> per contract
P1.2 Drift-prevention test
P1.3 THEN: batuta hf pull dataset codeparrot/github-code-clean --include '...'
P1.4 manifest.json.total_tokens > 1e9 AND vocab_size == 50257

Adds ~3-6 hours code-authoring before download, but produces a durable batuta improvement.

Coverage tally evolution (unchanged)

State PARTIAL DISCHARGED
Now 33 12
Both criteria met 19 26 (58% DISCHARGED)

Test plan

  • CI workspace-test passes
  • CI gate passes
  • Spec banner v2.71.0 reflects §26 + §26.8
  • §26.8 binding rule cross-referenced to existing feedback memories

🤖 Generated with Claude Code

@noahgift noahgift enabled auto-merge (squash) April 27, 2026 06:07
@noahgift noahgift force-pushed the feat/spec-26-next-session-execution-plan branch from 99d3ad2 to e1c7ec9 Compare April 27, 2026 06:28
@noahgift noahgift changed the title docs(ship-two-001): §26 — three-priority execution plan + binding criteria — spec v2.69.0 → v2.70.0 docs(ship-two-001): §26 + §26.8 — three-priority plan + stack-tool-extension methodology rule — spec v2.69.0 → v2.71.0 Apr 27, 2026
…cal stack-tool-extension rule — spec v2.69.0 → v2.71.0

§26: Three-priority execution plan with falsifiable binding
criteria. P1 (Stack v2 corpus), P2 (convergence run, depends P1),
P3 (GGUF forward_traced for SHIP-007 pin). Maximum theoretical
flip: 14 PARTIAL→DISCHARGED.

§26.8 (added 2026-04-27 after triggering incident):
**BINDING METHODOLOGY RULE — `apr` is the canonical stack CLI
post-monorepo. When `apr` lacks a feature, extend `apr` via
contract→code, NEVER route around to non-stack shims like
`huggingface-cli` or to deprecated namespaces like `batuta hf
pull`.**

Triggering incident: P1 sub-agent recommended downloading
codeparrot/github-code-clean via `huggingface-cli download
--include '...'` because `apr pull` is model-only today (no
dataset asset-type, no --include, no --license-allowlist).
This violates three rules:
- feedback_fix_root_cause_never_route_around: missing surface
  is a feature gap, fix at root
- feedback_pv_not_bash_for_contracts: re-implementing what a
  stack tool should do via non-stack CLI is muda
- feedback_monorepo_single_source_of_truth: `apr` is canonical
  post-APR-MONO; `batuta hf pull` is deprecated namespace

Binding rule §26.8.1:
1. Author contracts/apr-cli-<subcommand>-v1.yaml
2. Extend `apr` via in-tree implementation
3. Use the extended `apr`

Acceptable exceptions: one-off `uv run --with` data-prep where
no stack tool covers the niche; one-off xxd forensics. Recurring
workflows (every dataset pull) MUST extend `apr`.

§26.9: P1 prerequisite chain:
P1.0 Author contracts/apr-cli-pull-dataset-v1.yaml
P1.1 Implement `apr pull dataset` with --include, --license-allowlist
P1.2 Drift-prevention test
P1.3 Update apr-cli-commands-v1.yaml registry per
     feedback_cli_subcommand_three_surface_drift
P1.4 THEN: `apr pull dataset codeparrot/github-code-clean
     --include '...' --license-allowlist '...' --output ...`
P1   manifest.json.total_tokens > 1e9 AND vocab_size == 50257

§26.2 corpus target updated: codeparrot/github-code-clean
(directly downloadable; ~12-16B Python tokens after filters)
replaces bigcode/the-stack-v2-dedup (uses Software Heritage IDs,
too complex for session window). Sub-agent corpus survey
ratified.

Adds ~3-6 hours code-authoring before download, but produces a
durable `apr pull dataset` extension benefiting every future
dataset pull, not a one-off shim.

Spec v2.69.0 → v2.70.0 (§26) → v2.71.0 (§26.8). Coverage
unchanged at amendment — §26 is the plan, §26.8 the methodology
clarification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift force-pushed the feat/spec-26-next-session-execution-plan branch from e1c7ec9 to 7eae6d7 Compare April 27, 2026 06:36
@noahgift noahgift changed the title docs(ship-two-001): §26 + §26.8 — three-priority plan + stack-tool-extension methodology rule — spec v2.69.0 → v2.71.0 docs(ship-two-001): §26 + §26.8 — three-priority plan + apr-is-canonical stack-tool-extension rule — spec v2.69.0 → v2.71.0 Apr 27, 2026
@noahgift noahgift merged commit 1556983 into main Apr 27, 2026
10 checks passed
@noahgift noahgift deleted the feat/spec-26-next-session-execution-plan branch April 27, 2026 07:11
noahgift added a commit that referenced this pull request Apr 27, 2026
…irmed APR-side at inference.rs:160-164 — spec v2.71.0 → v2.72.0 (#1084)

Live evidence on noah-Lambda-Vector RTX 4090 2026-04-27.
Built apr from PR #1083 branch (commits 77c016b + c657968
+ f249464 from PR A+B+C cascade). Ran `apr trace --payload`
on canonical 7B teacher in BOTH formats with identical prompt
+ tokenizer.

Result:
| Layer | APR ffn_swigl std | GGUF ffn_swigl std | Ratio |
|------:|------------------:|-------------------:|------:|
| 3     | 1.2216            | 0.0670             | 18.23x |

§26.4 binding criterion threshold: ≥10x → APR-side bug.
**Observed 18.23x — 8x past the threshold, decisive verdict.**

The investigation chain that started in §15.4 (GPU GQA
elimination) has reached its conclusion at §27:

§15.4 → §16 → §17 → §23 → §27 (this)
"Whole forward path" → "GPU eliminated" → "(layer=3, FFN sub-block)"
→ "(layer=3, ffn_swigl)" → "**APR-side at inference.rs:160-164**"

Cascade-damping signature confirmed:
- Layers 0-2: ratio ~1.1x (normal)
- Layer 3: 18.23x (anomaly)
- Layers 4-5: 3.3-4.5x (cascade)
- Layer 6+: ~1x (recovered)

This is consistent with a localized perturbation (off-by-one,
buffer aliasing, or F32-vs-Q4K dequant defect at layer-3-
specifically) rather than persistent residual-stream corruption.

Per §17.5, SHIP-007 fix discharges 5 MODEL-1 PARTIALs at once
(SHIP-002/005/006/007/008). §26.5 expected coverage flip: 33+12
→ 28+17 when fix lands.

§27 does NOT discharge by itself — it locates the bug for fixing.
Next investigation reads `inference.rs:160-164` and tests 4 hypotheses:
1. Off-by-one slice indexing
2. Buffer aliasing (scratch reuse pattern)
3. F32-vs-Q4K dequant defect at layer-3 input range
4. Activation overflow (SiLU saturation amplifies multiply)

Methodology held throughout: zero eprintln!, zero route-arounds,
apr is canonical (§26.8), all instrumentation via `apr trace
--payload`. Lambda-labs lane pre-authorized.

Evidence persisted to evidence/ship-007-apr-vs-gguf-2026-04-27/:
- apr-trace.txt (13.5 KB)
- gguf-trace.txt (13.7 KB)
- binding-criterion-summary.json

Note: §27 reproduction requires PR #1081 + #1082 + #1083
cascade to merge first (the apr trace --payload <gguf> wiring
is in PR C). Evidence was generated with a local build of PR
#1083 branch.

Spec v2.71.0 → v2.72.0. Coverage flip pending fix.

Spec: SPEC-SHIP-TWO-001 §26.4 P3 verdict
References:
- §15.4 (PR #1062) — GPU GQA eliminated
- §16 (PR #1063) — APR CPU isolated
- §17 (PR #1064) — layer-3 FFN sub-block
- §23 (PR #1075) — layer-3 ffn_swigl named
- §26.8 (PR #1079) — apr-is-canonical methodology rule
- PR #1081 (P3 PR A scaffold)
- PR #1082 (P3 PR B sub-FFN populate)
- PR #1083 (P3 PR C CLI wiring)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant