docs(ship-two-001): §26 + §26.8 — three-priority plan + apr-is-canonical stack-tool-extension rule — spec v2.69.0 → v2.71.0#1079
Merged
Conversation
99d3ad2 to
e1c7ec9
Compare
…cal stack-tool-extension rule — spec v2.69.0 → v2.71.0
§26: Three-priority execution plan with falsifiable binding
criteria. P1 (Stack v2 corpus), P2 (convergence run, depends P1),
P3 (GGUF forward_traced for SHIP-007 pin). Maximum theoretical
flip: 14 PARTIAL→DISCHARGED.
§26.8 (added 2026-04-27 after triggering incident):
**BINDING METHODOLOGY RULE — `apr` is the canonical stack CLI
post-monorepo. When `apr` lacks a feature, extend `apr` via
contract→code, NEVER route around to non-stack shims like
`huggingface-cli` or to deprecated namespaces like `batuta hf
pull`.**
Triggering incident: P1 sub-agent recommended downloading
codeparrot/github-code-clean via `huggingface-cli download
--include '...'` because `apr pull` is model-only today (no
dataset asset-type, no --include, no --license-allowlist).
This violates three rules:
- feedback_fix_root_cause_never_route_around: missing surface
is a feature gap, fix at root
- feedback_pv_not_bash_for_contracts: re-implementing what a
stack tool should do via non-stack CLI is muda
- feedback_monorepo_single_source_of_truth: `apr` is canonical
post-APR-MONO; `batuta hf pull` is deprecated namespace
Binding rule §26.8.1:
1. Author contracts/apr-cli-<subcommand>-v1.yaml
2. Extend `apr` via in-tree implementation
3. Use the extended `apr`
Acceptable exceptions: one-off `uv run --with` data-prep where
no stack tool covers the niche; one-off xxd forensics. Recurring
workflows (every dataset pull) MUST extend `apr`.
§26.9: P1 prerequisite chain:
P1.0 Author contracts/apr-cli-pull-dataset-v1.yaml
P1.1 Implement `apr pull dataset` with --include, --license-allowlist
P1.2 Drift-prevention test
P1.3 Update apr-cli-commands-v1.yaml registry per
feedback_cli_subcommand_three_surface_drift
P1.4 THEN: `apr pull dataset codeparrot/github-code-clean
--include '...' --license-allowlist '...' --output ...`
P1 manifest.json.total_tokens > 1e9 AND vocab_size == 50257
§26.2 corpus target updated: codeparrot/github-code-clean
(directly downloadable; ~12-16B Python tokens after filters)
replaces bigcode/the-stack-v2-dedup (uses Software Heritage IDs,
too complex for session window). Sub-agent corpus survey
ratified.
Adds ~3-6 hours code-authoring before download, but produces a
durable `apr pull dataset` extension benefiting every future
dataset pull, not a one-off shim.
Spec v2.69.0 → v2.70.0 (§26) → v2.71.0 (§26.8). Coverage
unchanged at amendment — §26 is the plan, §26.8 the methodology
clarification.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
e1c7ec9 to
7eae6d7
Compare
4 tasks
noahgift
added a commit
that referenced
this pull request
Apr 27, 2026
…irmed APR-side at inference.rs:160-164 — spec v2.71.0 → v2.72.0 (#1084) Live evidence on noah-Lambda-Vector RTX 4090 2026-04-27. Built apr from PR #1083 branch (commits 77c016b + c657968 + f249464 from PR A+B+C cascade). Ran `apr trace --payload` on canonical 7B teacher in BOTH formats with identical prompt + tokenizer. Result: | Layer | APR ffn_swigl std | GGUF ffn_swigl std | Ratio | |------:|------------------:|-------------------:|------:| | 3 | 1.2216 | 0.0670 | 18.23x | §26.4 binding criterion threshold: ≥10x → APR-side bug. **Observed 18.23x — 8x past the threshold, decisive verdict.** The investigation chain that started in §15.4 (GPU GQA elimination) has reached its conclusion at §27: §15.4 → §16 → §17 → §23 → §27 (this) "Whole forward path" → "GPU eliminated" → "(layer=3, FFN sub-block)" → "(layer=3, ffn_swigl)" → "**APR-side at inference.rs:160-164**" Cascade-damping signature confirmed: - Layers 0-2: ratio ~1.1x (normal) - Layer 3: 18.23x (anomaly) - Layers 4-5: 3.3-4.5x (cascade) - Layer 6+: ~1x (recovered) This is consistent with a localized perturbation (off-by-one, buffer aliasing, or F32-vs-Q4K dequant defect at layer-3- specifically) rather than persistent residual-stream corruption. Per §17.5, SHIP-007 fix discharges 5 MODEL-1 PARTIALs at once (SHIP-002/005/006/007/008). §26.5 expected coverage flip: 33+12 → 28+17 when fix lands. §27 does NOT discharge by itself — it locates the bug for fixing. Next investigation reads `inference.rs:160-164` and tests 4 hypotheses: 1. Off-by-one slice indexing 2. Buffer aliasing (scratch reuse pattern) 3. F32-vs-Q4K dequant defect at layer-3 input range 4. Activation overflow (SiLU saturation amplifies multiply) Methodology held throughout: zero eprintln!, zero route-arounds, apr is canonical (§26.8), all instrumentation via `apr trace --payload`. Lambda-labs lane pre-authorized. Evidence persisted to evidence/ship-007-apr-vs-gguf-2026-04-27/: - apr-trace.txt (13.5 KB) - gguf-trace.txt (13.7 KB) - binding-criterion-summary.json Note: §27 reproduction requires PR #1081 + #1082 + #1083 cascade to merge first (the apr trace --payload <gguf> wiring is in PR C). Evidence was generated with a local build of PR #1083 branch. Spec v2.71.0 → v2.72.0. Coverage flip pending fix. Spec: SPEC-SHIP-TWO-001 §26.4 P3 verdict References: - §15.4 (PR #1062) — GPU GQA eliminated - §16 (PR #1063) — APR CPU isolated - §17 (PR #1064) — layer-3 FFN sub-block - §23 (PR #1075) — layer-3 ffn_swigl named - §26.8 (PR #1079) — apr-is-canonical methodology rule - PR #1081 (P3 PR A scaffold) - PR #1082 (P3 PR B sub-FFN populate) - PR #1083 (P3 PR C CLI wiring) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two-section spec amendment: §26 codifies the three-priority next-session execution plan (P1/P2/P3 with falsifiable binding criteria), and §26.8 codifies a binding methodology rule discovered mid-session:
§26.8 triggering incident (2026-04-27)
P1 sub-agent recommended
huggingface-cli download --include 'data/train-000[0-7][0-9]-of-00880.parquet'becausebatuta hf pulllacks--include. This is muda —huggingface-cliis non-stack,batuta hf pullis stack-canonical. Reaching for a non-stack CLI to bypass a missing flag violatesfeedback_fix_root_cause_never_route_around.mdandfeedback_pv_not_bash_for_contracts.md.§26.8.1 Binding rule
When a stack CLI (batuta, apr, pv, …) lacks a feature:
contracts/<tool>-cli-<flag>-v1.yamlprovable contractAcceptable narrow exceptions: one-off
uv run --withdata prep, one-offxxdforensics. Recurring workflows MUST extend the stack tool.§26 priority matrix
P1 + P3 parallel; P2 gated on P1.
§26.9 Revised P1 chain (per §26.8 rule)
P1 now has a prerequisite:
Adds ~3-6 hours code-authoring before download, but produces a durable batuta improvement.
Coverage tally evolution (unchanged)
Test plan
🤖 Generated with Claude Code