feat(pretrain): add wall_ms to StepMetrics — Residual B per spec §19.4 by noahgift · Pull Request #1069 · paiml/aprender

noahgift · 2026-04-26T08:49:47Z

Summary

Adds wall_ms: f32 to StepMetrics, the 7th required field per contracts/training-loop-pretrain-v1.yaml v1.4.0 → v1.5.0 (additive minor bump).
Discharges the code prerequisite for §19.4 Residual B — promoting GATE-GPUTRAIN-004 (per-step latency budget < 500ms on RTX 4090 / 370M) to ACTIVE_WITH_LIVE_EVIDENCE.
Live evidence dispatch (50-step cuda:0 run with wall_ms persisted) is the operator follow-up.

What this PR delivers

Contract bump: training-loop-pretrain-v1.yaml v1.4.0 → v1.5.0 with wall_ms in per_step_metrics.required + consistency invariant note (tokens_per_sec * (wall_ms / 1000.0) ≈ batch_tokens).
Struct field: StepMetrics.wall_ms: f32 with #[serde(default)] for backward compat on older JSONL.
Producer wired: PretrainLoop::train_step populates wall_ms from the same t0.elapsed() span as tokens_per_sec — single-source derivation prevents independent drift.
Validation: validate_finite() rejects non-finite or negative wall_ms.
3 new tests covering negative/NaN rejection + consistency invariant.

Test plan

pv validate contracts/training-loop-pretrain-v1.yaml: 0 errors, 0 warnings
cargo test -p aprender-train --release --lib pretrain::tests::: 25 passed (was 22, +3 new)
cargo build --workspace --release: succeeds — no downstream consumer broke
PMAT pre-commit gates pass

What this does NOT do

Does NOT capture live evidence on a cuda:0 dispatch — that's the operator-action step in Residual B.
Does NOT promote GATE-GPUTRAIN-004 to ACTIVE_WITH_LIVE_EVIDENCE — needs the live dispatch + persisted evidence file at evidence/task-132/gputrain-004-live-2026-04-XX.{json,csv}.

References

§19.4 of docs/specifications/aprender-train/ship-two-models-spec.md
Plan-agent investigation 2026-04-26 (sub-agent narrowed task feat(voice): Voice processing module - embeddings, style transfer, cloning, isolation #132 residuals)
memory/project_task_132_cuda_training_backend_gap.md (updated description)

Closes task #156.

🤖 Generated with Claude Code

Per `contracts/training-loop-pretrain-v1.yaml` v1.4.0 → v1.5.0 (additive minor): adds the 7th required field `wall_ms: f32` to per-step JSONL emission. Discharges §19.4 Residual B prerequisite for promoting GATE-GPUTRAIN-004 (per-step latency budget < 500ms on RTX 4090 / 370M) to ACTIVE_WITH_LIVE_EVIDENCE. ## What changed - `contracts/training-loop-pretrain-v1.yaml` v1.4.0 → v1.5.0: - per_step_metrics.required adds `wall_ms` (f32) - Includes consistency invariant note: tokens_per_sec * (wall_ms / 1000.0) ≈ batch_tokens - `StepMetrics` (pretrain.rs:104-118) gains `wall_ms: f32` field - `#[serde(default)]` to keep older JSONL parseable on read - Doc-comment cites the contract version - `PretrainLoop::train_step` (pretrain.rs:565-586) populates wall_ms from the same `t0.elapsed()` span as tokens_per_sec — single-source derivation prevents independent drift - `validate_finite()` rejects non-finite or negative wall_ms - 3 new unit tests: - `step_metrics_rejects_negative_wall_ms` - `step_metrics_rejects_nan_wall_ms` - `step_metrics_wall_ms_consistent_with_tokens_per_sec` ## Backward compat `#[serde(default)]` on the new field means JSONL emitted by older binaries (without wall_ms) still deserializes — wall_ms defaults to 0.0 when absent. Newly emitted JSONL always has the field set. ## Test plan - [x] `pv validate` on contract: 0 errors, 0 warnings - [x] `cargo test -p aprender-train --release --lib pretrain::tests::`: 25 passed (was 22) - [x] `cargo build --workspace --release` succeeds - [x] No downstream consumer of StepMetrics broke ## What this does NOT do - Does NOT capture live evidence on a cuda:0 dispatch (Residual B step 2 — operator action, scoped to a follow-up). - Does NOT promote GATE-GPUTRAIN-004 to ACTIVE_WITH_LIVE_EVIDENCE (that requires the live dispatch + persisted evidence file). This PR closes the *code* prerequisite for GATE-GPUTRAIN-004 discharge. The next operator dispatch can persist `wall_ms` per-step and use it as the GATE-004 verdict input. Closes task #156. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…2.64.0 → v2.65.0 §19 verified `apr pretrain --device cuda` is wired but the canonical apr binary lacked `--features cuda`. §20 records the next step: **rebuild + live dispatch + evidence capture** on RTX 4090. ## What §20 contains (9 subsections) 1. §20.1 — Rebuild (40s incremental, `--features cuda` enabled apr-cli) 2. §20.2 — Live dispatch command + 100-step JSONL output 3. §20.3 — wall_ms statistics: median=264.74ms (47% headroom under GATE-GPUTRAIN-004's 500ms budget) 4. §20.4 — nvidia-smi PID 1658504 / 6636 MiB GPU memory captured mid-run 5. §20.5 — Gate-by-gate impact table (GATE-GPUTRAIN-002/003/004/005) 6. §20.6 — Evidence files at evidence/task-132-residual-b/ 7. §20.7 — Long-path status: §19.5 step (a) DONE 8. §20.8 — What §20 is NOT (contract bump is follow-up PR) 9. §20.9 — Methodological alignment (live-evidence pattern, not chain-of-thought) ## Live evidence captured - 100 real CUDA training steps on noah-Lambda-Vector RTX 4090 - Real corpus: /mnt/nvme-raid0/data/csn-python-shards - Real tokenizer: /mnt/nvme-raid0/models/model-2-tokenizer-v1 (vocab=50,257) - wall_ms median: 264.74 ms (range 257.86–467.66 with step 0 = 467.66 kernel-warmup outlier) - train_loss step 0=11.02 → step 99=10.50 (Δ=−0.52, decreasing) - val_loss=10.31 triggered GATE-TRAIN-005 ship-blocker abort at epoch boundary (correct behavior for fresh-init 370M before convergence) - nvidia-smi PID 1658504 / 6636 MiB stable mid-run ## Spec progression v2.64.0 → v2.65.0. Coverage tally update is **pending** the contract bump for `gpu-training-backend-v1.yaml` GATE-GPUTRAIN-004 PARTIAL_ALGORITHM_LEVEL → ACTIVE_WITH_LIVE_EVIDENCE (separate follow-up PR; §20 records the data, the contract amendment captures the durable verdict). ## Stacks under - #1068 (§19 — task #132 correction) - #1067 (§18 — training status snapshot) - Concrete progress on §19.4 Residual B (live evidence half) - Pairs with PR #1069 (wall_ms code half — provided the JSONL field used for the GATE-GPUTRAIN-004 timing data) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…2.64.0 → v2.65.0 (#1070) §19 verified `apr pretrain --device cuda` is wired but the canonical apr binary lacked `--features cuda`. §20 records the next step: **rebuild + live dispatch + evidence capture** on RTX 4090. ## What §20 contains (9 subsections) 1. §20.1 — Rebuild (40s incremental, `--features cuda` enabled apr-cli) 2. §20.2 — Live dispatch command + 100-step JSONL output 3. §20.3 — wall_ms statistics: median=264.74ms (47% headroom under GATE-GPUTRAIN-004's 500ms budget) 4. §20.4 — nvidia-smi PID 1658504 / 6636 MiB GPU memory captured mid-run 5. §20.5 — Gate-by-gate impact table (GATE-GPUTRAIN-002/003/004/005) 6. §20.6 — Evidence files at evidence/task-132-residual-b/ 7. §20.7 — Long-path status: §19.5 step (a) DONE 8. §20.8 — What §20 is NOT (contract bump is follow-up PR) 9. §20.9 — Methodological alignment (live-evidence pattern, not chain-of-thought) ## Live evidence captured - 100 real CUDA training steps on noah-Lambda-Vector RTX 4090 - Real corpus: /mnt/nvme-raid0/data/csn-python-shards - Real tokenizer: /mnt/nvme-raid0/models/model-2-tokenizer-v1 (vocab=50,257) - wall_ms median: 264.74 ms (range 257.86–467.66 with step 0 = 467.66 kernel-warmup outlier) - train_loss step 0=11.02 → step 99=10.50 (Δ=−0.52, decreasing) - val_loss=10.31 triggered GATE-TRAIN-005 ship-blocker abort at epoch boundary (correct behavior for fresh-init 370M before convergence) - nvidia-smi PID 1658504 / 6636 MiB stable mid-run ## Spec progression v2.64.0 → v2.65.0. Coverage tally update is **pending** the contract bump for `gpu-training-backend-v1.yaml` GATE-GPUTRAIN-004 PARTIAL_ALGORITHM_LEVEL → ACTIVE_WITH_LIVE_EVIDENCE (separate follow-up PR; §20 records the data, the contract amendment captures the durable verdict). ## Stacks under - #1068 (§19 — task #132 correction) - #1067 (§18 — training status snapshot) - Concrete progress on §19.4 Residual B (live evidence half) - Pairs with PR #1069 (wall_ms code half — provided the JSONL field used for the GATE-GPUTRAIN-004 timing data) Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) April 26, 2026 08:49

This was referenced Apr 26, 2026

docs(ship-two-001): §20 live CUDA training dispatch evidence — spec v2.65.0 #1070

Merged

contract(gpu-training-backend-v1): GATE-GPUTRAIN-004 verdict pending → pass (v1.4 → v1.5) #1071

Merged

noahgift merged commit 9d9a390 into main Apr 26, 2026
11 checks passed

noahgift deleted the feat/wall-ms-per-step-residual-b branch April 26, 2026 09:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(pretrain): add wall_ms to StepMetrics — Residual B per spec §19.4#1069

feat(pretrain): add wall_ms to StepMetrics — Residual B per spec §19.4#1069
noahgift merged 1 commit into
mainfrom
feat/wall-ms-per-step-residual-b

noahgift commented Apr 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented Apr 26, 2026

Summary

What this PR delivers

Test plan

What this does NOT do

References

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant