chore(deps): Bump criterion from 0.5.1 to 0.8.1#115
Conversation
Comprehensive spec for embedding ML models in binaries with smart memory paging: - APR binary format: Page-aligned tensors with lazy loading - Three embedding strategies: include_bytes!, linker sections, external file - Memory paging: mmap with OnceCell lazy initialization - Predictive prefetching: Background thread for anticipated weights - ALM integration: Bundle datasets alongside models - 10 annotated peer-reviewed papers (ACL 2024, SOSP 2023, MLSys 2021/2023) Implementation roadmap: Binary embedding → Lazy loading → Prefetching → ALM 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…#74) Resolves all 3 action items from Gemini review (Toyota/NASA/Startup personas): [NASA] Sandbox V&V for Code Translation: - Added SandboxExecutor to CodeTranslationGenerator - quality_score() now tests functional correctness (40% weight) - Addresses Codex hallucination issue (compiles != correct) [Toyota] Andon Mechanism (Jidoka): - Added AndonHandler trait with DefaultAndon implementation - Halts pipeline if rejection rate >90% - Alerts on quality drift below baseline [Startup] Decoupled Roadmap: - Shell SLM: v0.14.0 (MVP - tractable structured prediction) - Code Oracle: v0.15.0 (experimental - AI-Complete) - Added EXPERIMENTAL warning to CodeTranslationGenerator Updated risk matrix with 3 new mitigations. Spec version bumped to 1.1.0. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
#74) Adds Toyota Jidoka-inspired Andon system for synthetic data generation: EXTREME TDD Implementation: - 37 new tests (30 andon + 7 config integration) - RED phase: Write failing tests first - GREEN phase: Implement to pass tests - All 1859 tests passing New Components: - AndonHandler trait: Customizable event handling - AndonEvent enum: HighRejectionRate, QualityDrift, DiversityCollapse - AndonSeverity: Info/Warning/Critical levels - DefaultAndon: Production handler (logs + halts on critical) - TestAndon: Silent collector for unit tests - AndonConfig: Configuration with thresholds SyntheticConfig Integration: - Added andon field with AndonConfig - Builder methods: with_andon(), with_andon_enabled(), with_andon_rejection_threshold() - Default: enabled=true, rejection_threshold=0.90 (Toyota standard) Pipeline Integration: - check_andon() function validates generation quality - Halts on >92% rejection rate (threshold + 2% tolerance) - Warns on diversity collapse (< minimum threshold) Addresses review feedback from automl-with-synthetic-data-review.md: - [Toyota] Andon alert for high rejection rates ✓ 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Phase 2 of AutoML with Synthetic Data specification: EDA Generator (Wei & Zou, 2019): - Synonym replacement with shell command vocabulary - Random insertion, swap, and deletion operations - Deterministic LCG-based randomness for reproducibility - Jaccard similarity for quality scoring - 34 unit tests with EXTREME TDD Template Generator: - Slot-based pattern filling with weighted templates - shell_commands() preset for CLI training data - Diversity scoring via unique token ratio - 24 unit tests with EXTREME TDD Both implement SyntheticGenerator trait for pipeline integration. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Phase 3 of AutoML with Synthetic Data specification: ShellSample struct: - Command with context (history, cwd, prefix, completion) - Extraction helpers (command_name, arguments) - Completion validity checking ShellGrammar: - Command/subcommand validation (git, cargo, npm, docker, Unix) - Common options recognition - Extensible via add_command/add_subcommands ShellSyntheticGenerator implementing SyntheticGenerator: - Template substitution (argument variants) - Argument permutation (reorder/add options) - Context variation (cwd, history) - Quality scoring: 0.4*semantic + 0.4*grammar + 0.2*coherence - Diversity scoring via unique command patterns 42 tests with Extreme TDD methodology. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…efs #74) Implement three advanced synthetic data generation components: - MixUp generator: Zhang et al. 2018 embedding interpolation with Beta distribution sampling and configurable alpha parameter (24 tests) - WeakSupervision generator: Snorkel-style programmatic labeling with LabelingFunction trait, multiple aggregation strategies (MajorityVote, WeightedVote, Unanimous, Any), and built-in LFs (29 tests) - SyntheticCache: LRU eviction memoization for avoiding redundant generation during AutoML hyperparameter search (18 tests) Total: 71 new tests, 2030 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive model bundling and memory paging support: ## Model Bundling (.apbundle format) - Binary format with magic bytes, version, and manifest - BundleReader/BundleWriter for efficient file I/O - ModelBundle API for creating, saving, and loading bundles - Builder pattern for flexible bundle construction - Support for multiple models with metadata ## Memory-Mapped File Support - MappedRegion for efficient memory access - MemoryMappedFile with region caching - PageTable for LRU/LFU tracking ## LRU Paging - PagedBundle for memory-constrained environments - Configurable max_memory and eviction strategies - LRU (Least Recently Used) and LFU (Least Frequently Used) eviction - Automatic page eviction when memory limit exceeded ## Pre-fetching - Access pattern tracking for predictive loading - Configurable prefetch_count - Hint API for explicit prefetch requests ## Also included: - Synthetic data integration tests (15 tests) - Synthetic data generation example - Updated spec status to "Implemented (Phases 1-4)" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…74) Update spec status to reflect complete implementation. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add PagedMarkovModel using aprender's bundle module for memory-efficient storage - Implement LRU-based on-demand segment loading - Add --memory-limit CLI flag to train, suggest, and stats commands - Add 13 comprehensive tests for paged model functionality - Fix doctest in synthetic/mixup.rs (missing Clone derive) The paged model stores n-gram segments separately and loads them on-demand, enabling handling of shell histories that exceed RAM. Refs #74 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add comprehensive case study for bundle module - Update shell-completion chapter with paging documentation - Add bundle_trace_demo example for renacer tracing - Update SUMMARY.md with new chapter Refs #74 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add comprehensive guide for using renacer syscall tracer to profile and optimize memory paging behavior in ML model loading. Content includes: - Renacer usage patterns (-e trace=file, -T, -c, -s flags) - Syscall analysis for detecting evictions and cache misses - Pre-fetch effectiveness measurement - JSON output for programmatic analysis - Optimization patterns (reduce seeks, right-size memory, pre-fetching) - Troubleshooting guide with symptom/fix table Also adds book chapters for bundle_trace_demo and synthetic_data_generation examples to satisfy EXTREME TDD requirements. Allows clippy::large_stack_arrays lint for ML test data arrays. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…77) Implements two new synthetic data components for code analysis: CodeEDA (GH-76): - Code-specific EDA (Easy Data Augmentation) implementing SyntheticGenerator - Variable renaming with synonym dictionary - Comment insertion (Rust/Python/Generic modes) - Statement reordering for independent statements - Dead code removal (comments and whitespace) - Quality scoring via token overlap - 23 unit tests CodeFeatureExtractor (GH-77): - 8-dimensional commit feature extraction for defect prediction - CommitFeatures: defect_category, files_changed, lines_added/deleted, complexity_delta, timestamp, hour_of_day, day_of_week - Keyword-based commit classification (bug/security/perf/refactor) - Batch extraction and normalization support - 22 unit tests References: - Wei & Zou (2019) EDA paper - D'Ambros et al. (2012) defect prediction benchmark 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…76, Refs #77) - Add --use-code-eda flag to Augment command for code-aware augmentation - Add new Analyze command using CodeFeatureExtractor - Shows command categories (bug/security/performance/refactor/general) - Displays top base commands with visual bar charts - Shows sample commands by category - Reports complexity metrics (avg tokens, max tokens, unique bases) - Identifies developer workflow (git, cargo, npm, docker usage) - Add 3 integration tests for new features 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…Refs #74) Benchmarks (modeled after bashrs patterns): - parse_history: History file parsing throughput - train_model: N-gram model training (small/medium/large fixtures) - suggest_latency: Suggestion performance for common prefixes - partial_completion: Partial token completion benchmarks - serialization: JSON and file save/load benchmarks - end_to_end: Complete workflow benchmarks - synthetic_generation: CodeEDA augmentation benchmarks Fixtures (aligned with bashrs): - small_history.txt: ~50 commands (basic developer workflow) - medium_history.txt: ~265 commands (full developer workflow) - large_history.txt: ~3800 commands (production scale) Real-world tests (19 new tests): - REAL_001-003: Small/Medium/Large history training and suggestions - REAL_004: Cross-validation testing - REAL_005: Data augmentation with CodeEDA - REAL_006: Analysis command testing - REAL_007: Export/import roundtrip - REAL_008: Paged model for large histories - REAL_009: Incremental updates - REAL_010: End-to-end user workflow Architecture changes: - Added lib.rs to expose modules for benchmarks - Refactored main.rs to use library imports 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…rks (Refs #74) Sub-10ms Verification Benchmark Suite: Performance Results (vs 10ms target): - Small model (50 cmds): 437ns - 1.5µs (6,500-22,000x faster) - Medium model (500 cmds): 530ns - 10.6µs (940-18,800x faster) - Large model (5000 cmds): 670ns - 15µs (660-14,900x faster) Benchmark Groups: - suggestion_latency: Core latency verification by model size - partial_completion: Mid-word completion (git co → git commit) - training_throughput: Commands/second during training - cold_start: Model load + first suggestion latency - serialization: JSON serialize/deserialize performance - scalability: Latency growth with model size (O(1) verified) - paged_model: Memory-constrained model performance Industry Comparison: - GitHub Copilot: 100-500ms → aprender 10,000-50,000x faster - Fish completion: 5-20ms → aprender 500-2,000x faster - Zsh compinit: 10-50ms → aprender 1,000-5,000x faster Run: cargo bench --package aprender-shell --bench recommendation_latency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
#74) Updated shell-completion.md: - Added "Performance: Sub-10ms Verification" section - Detailed benchmark results table (437ns - 14.6µs latency) - Industry comparison (600-22,000x faster than alternatives) - "Why So Fast?" explanation (O(1) trie, no neural overhead) - Benchmark suite overview New chapter: shell-completion-benchmarks.md - Comprehensive benchmark analysis - trueno-style criterion patterns - Scalability analysis (sub-linear O(log n)) - Training throughput metrics - Cold start verification (<3ms) - Fixture design documentation - Custom benchmark extension guide - CI integration example Key results documented: - Worst case: 14.6 µs (685x under 10ms target) - Best case: 437 ns (22,883x under 10ms target) - Scales sub-linearly with model size 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Add dedicated book chapters for the new code-aware synthetic data modules: - CodeEDA: Syntax-aware data augmentation for source code - Variable renaming, comment insertion, statement reorder - Language-specific reserved keyword handling (Rust, Python) - Quality and diversity metrics - CodeFeatureExtractor: 8-dimensional commit feature extraction - Defect category classification (bug, security, perf, refactor) - Complexity estimation, time-based features - Normalization for ML pipelines 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Change alimentar from local path dependency to crates.io v0.1.0 for publishing compatibility. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Change aprender dependency from path to crates.io v0.10.0 - Add README.md for crate documentation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
## Metaheuristics (Refs #80) - Add src/metaheuristics/ module with Differential Evolution (DE) - SearchSpace enum for continuous/discrete/mixed optimization - ComputeBudget for resource-aware optimization - PerturbativeMetaheuristic trait following Toyota Way principles - Book documentation for DE and metaheuristics fundamentals ## aprender-shell Enhancements (Refs #87, #88, #96) - Fish shell widget support (fish-widget command) - Uninstall command for clean widget removal - ZSH widget v2 with toggle, timeout, ShellCheck fixes - New CLI integration tests ## AutoML Enhancements - Expanded search.rs with advanced hyperparameter optimization - Grid search, random search, and TPE improvements - Fixed clippy warnings (range contains, format strings) ## Documentation - aprender-shell-harden-plan.md spec (16 issues, Toyota Way, 10 refs) - metaheuristics-spec.md with CEC benchmarks - Updated roadmap.yaml ## Quality - 382 tests passing - 92.66% coverage - Clippy clean (-D warnings) - PMAT: A+ (151/134), TDG: A+ (99/100) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… unsafe) POLICY: We will NEVER use unsafe code. If HE crypto primitives are needed, we will implement them from scratch in safe Rust. Additions: - docs/specifications/homomorphic-encryption-spec.md (10 peer-reviewed citations) - book/src/examples/shell-encryption-tiers.md (4-tier protection guide) - src/format/homomorphic.rs (28 tests: types, traits, API design) - Shell Tier 2 compression: save_compressed() (5 tests) - Shell Tier 2+3 combo: save_compressed_encrypted() 4-Tier Model Protection: - Tier 1: Plain (.apr) - Tier 2: Compressed (zstd, 14x smaller) - Tier 3: At-rest encrypted (AES-256-GCM) - Tier 4: Homomorphic (API ready, crypto deferred) Test counts: - Core aprender: 2,292 tests (with format-homomorphic) - aprender-shell: 127 tests (+5 compression) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add src/ensemble/ module with MoE, SoftmaxGating, MoeConfig - Add ModelType::MixtureOfExperts (0x0040) to format - Add examples/mixture_of_experts.rs runnable example - Add book/src/examples/mixture-of-experts.md documentation - Update model-format.md with MoE section and model type - Fix Makefile coverage (move config before clean for sccache) - Add docs/specifications/more-learning-specs.md (34 sections) - GAN, VAE, Diffusion, Contrastive, GNN, Meta-learning - Transfer learning for transpiler ecosystem - Distillation ingestion from entrenar - Code-specific ML for depyler oracle Refs #101 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
100 test cases covering: - Installation (5) - train command (17) - update command (8) - suggest command (14) - stats command (6) - export/import (10) - validate command (10) - augment command (8) - analyze command (6) - tune command (6) - zsh-widget (4) - Edge cases (6) - Performance benchmarks (5) - Platform compatibility (5) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
New features: - Mixture of Experts (MoE) ensemble module - ModelType::MixtureOfExperts (0x0040) - Future ML specs (34 sections) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Update aprender dependency from path to crates.io v0.11 - Ready for v0.2.0 release 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Fix trivial cast lint error in mmap.rs:611 that broke CI - Update hero image: 17 → 18 model types (MoE added) - Update hero image version: v0.9 → v0.11 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- hf_hub/mod.rs: Replace unwrap() with expect() (disallowed-methods) - hf_hub/mod.rs: Use char '.' instead of string "." (single_char_pattern) - stopwords.rs: Remove redundant is_empty check (const_is_empty) - format/mod.rs: Fix large file tests using Compression::None and unique values 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Without --all-features, feature-gated examples fail to compile, causing coverage to show 0%. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
This flag keeps getting accidentally removed, causing 0% coverage. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
… PAR-001) - Migrate APR format from v1 (APRN) to v2 (APR2) magic - Update trueno 0.9.0 → 0.10.1 (thiserror 2.x compatibility) - Update renacer 0.8 → 0.9.1 - Fix integration tests for v2 format (INT-01b, CC1) - Bump version to 0.20.2 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Dependabot can't resolve your Rust dependency files. Because of this, Dependabot cannot update this pull request. |
…-001) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…efs PAR-001) - Add conditional cfg for LlamaTokenizer import in chat.rs - Add allow attributes for format_push_string and unnecessary_wraps - Configure apr-cli specific clippy allows in Cargo.toml - Fix formatting in create_test_apr.rs All 5885 unit tests and 11 integration tests pass. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Dependabot can't resolve your Rust dependency files. Because of this, Dependabot cannot update this pull request. |
- Add PAR-011: Add --gpu flag to run/serve commands - ✅ DONE - Document --gpu flag implementation details - Mark PAR-011 as complete in next priority section The --gpu flag enables forced CUDA acceleration for: - `realizar run model.gguf --gpu "prompt"` - `realizar serve --model model.gguf --gpu` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- object 0.38.0 -> 0.38.1 - zmij 1.0.6 -> 1.0.7 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
|
Dependabot can't resolve your Rust dependency files. Because of this, Dependabot cannot update this pull request. |
…PAR-023) - Use realizar's full inference API for GGUF serving - Endpoints: /generate, /stream/generate, /v1/completions - Performance targets: 100+ tok/s CPU, 500+ tok/s GPU - Add Ollama-parity benchmark suite - Fix clippy warnings in federation module - Update autograd backward pass to use trueno SIMD 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…-023) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Updated trueno dependency to 0.11.0 - Benefiting from improved AVX-512 coverage and TUI monitoring 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
a28662b to
ff9a3c7
Compare
Add transparent compression/decompression for .apr model files: - APR2 format: compressed payload with auto-detection - LZ4: fast compression for real-time use cases - ZSTD: higher ratio for cold storage - Backward compatible: APR1 files still work - Feature-gated: requires `format-compression` feature API: - AprWriter::with_compression(Compression::Lz4) - AprReader::from_bytes() auto-detects format 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Implement magnitude-based importance (L1/L2) scoring - Add Wanda activation-weighted pruning with calibration - Add SparseGPT Hessian-based pruning support - Support unstructured, N:M (2:4, 4:8), and block sparsity patterns - Add CSR sparse matrix format for efficient storage - Include depth/width pruning for structured compression - Add pruning_magnitude example demonstrating the API - Add book documentation for neural network pruning theory 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
New features: - Add pruning module with magnitude, Wanda, and SparseGPT methods - SparsityMask and SparsityPattern for structured pruning - CalibrationContext for activation-weighted importance - ImportanceScores with statistical tracking 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…Refs #151) - Add chat_template module with 6+ template formats (ChatML, LLaMA2, Mistral, Phi, Alpaca, Raw) - Add auto-detection from model name and vocabulary tokens - Add HuggingFace Jinja2 template support via minijinja - Add book chapter: examples/chat-template.md - Add playbook: playbooks/chat_template.yaml with probador integration - Add example: examples/chat_template.rs - Add 8 book tests in tests/book/case_studies/chat_template_usage.rs - Fix GGUF tokenizer extraction to preserve vocabulary during APR import 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Bumps [criterion](https://github.com/criterion-rs/criterion.rs) from 0.5.1 to 0.8.1. - [Release notes](https://github.com/criterion-rs/criterion.rs/releases) - [Changelog](https://github.com/criterion-rs/criterion.rs/blob/master/CHANGELOG.md) - [Commits](criterion-rs/criterion.rs@0.5.1...criterion-v0.8.1) --- updated-dependencies: - dependency-name: criterion dependency-version: 0.8.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com>
ff9a3c7 to
ef111b2
Compare
057bf9e to
b4d0814
Compare
|
OK, I won't notify you again about this release, but will get in touch when a new version is available. If you'd rather skip all updates until the next major or minor version, let me know by commenting If you change your mind, just re-open this PR and I'll resolve any conflicts on it. |
…-012/015 PARTIAL Captures the three evidence-wiring commits landed on chore/post-v2.19-evidence since v2.20.0: 1. FALSIFY-SHIP-011 (AC-SHIP2-001) DISCHARGED at 338c6eb (task #114) C-LLAMA-370M-SOVEREIGN v1.0.0 PROPOSED -> v1.1.0 ACTIVE. Rust-YAML byte-equality binding via include_str! + serde_yaml::Value. 2. FALSIFY-SHIP-012 (AC-SHIP2-002) PARTIAL_ALGORITHM_LEVEL at 2e8b8b8 (task #115). C-TOK-BPE v1.0.0 -> v1.1.0 stays PROPOSED. 3 tokenizer harness tests wired; full discharge blocks on task #91 10K Stack-v2 Python holdout (fixture-swap is data-only). 3. FALSIFY-SHIP-015 (AC-SHIP2-005) PARTIAL_ALGORITHM_LEVEL at bfb8831 (task #116). Sovereign contract v1.1.0 -> v1.2.0 stays ACTIVE. estimated_param_count_within_contract_band + const fns wired; full discharge blocks on real 370M .apr from compute-dispatch. Also codifies the PARTIAL_ALGORITHM_LEVEL pattern as a first-class spec concept: when a gate's evidence_required describes a production-scale check that is not yet runnable but the underlying invariant is provable today at algorithm/compile/unit-test level, wire the algorithm proofs and carry discharge_status + partial_discharge_note + full_discharge_blocks_on + ship_blocking=true to make the data gap first-class contract state. MODEL-2 ship-gate status after v2.21.0: 3/12 fully ACTIVE (001, 011, 012) + 2/12 PARTIAL_ALGORITHM_LEVEL (002, 005) = 5/12 touched (~42%). Remaining 7 block on real 370M compute-dispatch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…/019/021/022) + spec v2.19→v2.22 (#898) ## MODEL-2 evidence burst (post-v2.19) Six SHIP-TWO-001 ship-gate discharges on branch `chore/post-v2.19-evidence`: | Gate | AC | Status | Commit | Task | |------|----|--------|--------|------| | FALSIFY-SHIP-021 | AC-SHIP2-011 | DISCHARGED | `0b8ca8c84` | #112 | | FALSIFY-SHIP-022 | AC-SHIP2-012 (provenance) | DISCHARGED | `8f0607d42` | #113 | | FALSIFY-SHIP-011 | AC-SHIP2-001 | DISCHARGED | `338c6eb3c` | #114 | | FALSIFY-SHIP-012 | AC-SHIP2-002 | PARTIAL_ALGORITHM_LEVEL | `2e8b8b8e2` | #115 | | FALSIFY-SHIP-015 | AC-SHIP2-005 | PARTIAL_ALGORITHM_LEVEL | `bfb883199` | #116 | | FALSIFY-SHIP-019 | AC-SHIP2-009 | PARTIAL_ALGORITHM_LEVEL | `846cc1dbb` | #117 | **Spec:** v2.19.0 → v2.22.0 (4 amendments recorded). **MODEL-2 ledger after this PR:** 3/12 fully ACTIVE (001, 011, 012) + 3/12 PARTIAL_ALGORITHM_LEVEL (002, 005, 009) = 6/12 touched (50%). Remaining 6 (003/004/006/007/008/010) all require real 370M compute-dispatch, a trained on-disk `.apr` with eval harness, or RTX 4090 wall-clock benchmark — genuine algorithm-level PARTIAL harvesting for MODEL-2 is now exhausted. **Pattern lessons codified:** - **PARTIAL-inside-ACTIVE nesting** (SHIP-012/015/019): gates can carry `discharge_status: PARTIAL_ALGORITHM_LEVEL` + `ship_blocking: true` inside contracts that stay ACTIVE via their primary binding gate. Auditors must read both `status:` AND `gates[].discharge_status:`. - **Counter-example hunting** (SHIP-019): re-run search surveys with explicit counter-example hunting before declaring a space exhausted. Spec §9 Risk mitigations are the highest-leverage hint source. - **Parallel-safe stdout** (SHIP-022): pure formatter helper (`format_provenance_block`) instead of direct `println!()` so harness tests run in parallel without `gag` races. - **Seed-mutex for reproducibility** (SHIP-021): `lock_init_seed` mutex fixes global `INIT_SEED` race in parallel tests. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
…olicy (#901) * evidence(ship-two-001): MODEL-2 pretrain smoke test — task #105 discharge Records the end-to-end synthetic drive of `apr pretrain` on commit 1e7cf53 (now landed on main at 9209383 via PR #882 merge). Verifies task #105 deliverable: GATE-TRAIN-005 / INV-TRAIN-007 / GATE-TRAIN-008 wiring is functional end-to-end. Run: 20 steps, 4 epochs, batch=4, seq=128 — val_loss monotone 3.96 → 2.64. Synthetic drive caveat: no real 370M forward pass, no real corpus read, no checkpoint artifacts written yet. Real corpus + checkpoint wiring tracked as task #111. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * spec(model-2): MVP plan for task #111 (pretrain real corpus + checkpoint) 7-step edit list from Plan agent afd391d1eb1395d30 against post-#882-merge commit 9209383. Identifies 5 critical files (pretrain.rs, apr-cli/commands/pretrain.rs, trainer.rs, transformer/model.rs, io/save.rs) and 5 binary acceptance criteria (AC-111-001..005). Host assignment: lambda-labs (impl), yoga (8GB smoke), gx10 (parity). Non-goals explicitly deferred: async H2D streaming, full corpus-ingest pipeline, mixed-precision scaler tuning, distributed training, convergence budget, resume round-trip, nvml telemetry, apr qa post-hoc validators. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * evidence(ship-two-001): yoga parity smoke — GATE-TRAIN-006 discharged Cross-host byte-identical loss history on yoga RTX 4060 Laptop (8GB): lambda-labs: [3.96, 3.52, 3.08, 2.64] yoga: [3.96, 3.52, 3.08, 2.64] Discharges GATE-TRAIN-006 (seed=42 deterministic) across x86_64 RTX 4090 ↔ x86_64 RTX 4060 Laptop. Same synthetic drive — task #111 MVP will add the real 370M forward pass; yoga stays as 8GB smoke-test host per MVP plan's host assignment table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): RealStepFn/RealValFn + shard reader (task #111 steps 1-3) Implements MODEL-2 pretrain MVP plan steps 1-3: the model-agnostic PretrainLoop now has a real-corpus driver that runs a full forward + backward + AdamW step through TransformerTrainer against the 370M Llama scaffold — replacing the LinearDecaySynthetic/ScriptedVal pair used for GATE-TRAIN-005/006/007/008 wiring verification in task #105. **New modules** - `train::shard_reader::ShardBatchIter` Streaming iterator over .bin token shards (little-endian u32). Reads seq_length+1 sequences, chunks into LMBatch of batch_size. Empty-dir errors; lexical shard ordering; EOF auto-advances to next shard. No MinHash dedup / PII scrub / license filter — those belong to `apr-corpus-ingest run`. - `train::pretrain_real::{RealStepFn, RealValFn, build_shared_trainer}` - `llama_370m_transformer_config()` field-for-field from the frozen Llama370MConfig constants (INV-ARCH-370M-001..008 source of truth) - `llama_370m_train_config(lr, seq_length, seed)` builds TransformerTrainConfig with MODEL-2 v2-remedy defaults - `SharedTrainer = Rc<RefCell<TransformerTrainer>>` so both the mutable StepFn and the forward-only ValFn own the same model - `RealStepFn::step` pulls one LMBatch, runs train_batch, returns (loss, grad_norm=1.0 placeholder). Exhausted iterator returns a finite (1.0, 1.0) so GATE-TRAIN-007 (NaN/Inf) does not mis-fire on shard-stream EOF before the loop plans to stop. - `RealValFn::validate` runs forward-only across a held-out Vec, returns mean cross-entropy loss (or NaN if held-out is empty). - `build_shared_trainer` runs INV-ARCH-370M-001 as a debug_assert (param count must land in [366M, 374M]) so any drift in the Llama370MConfig constants fails the instant a dev build compiles. **Contract coverage** Existing `contracts/training-loop-pretrain-v1.yaml` covers all MVP obligations already; no new contract needed. Task #111 follow-up will add per-epoch APR checkpoint hooks (C-TRAIN-PRETRAIN INV-TRAIN-002) and real optimizer-state sha256 (INV-TRAIN-003). **Tests** - shard_reader: single_shard_yields_expected_batch_count, empty_dir_errors, multi_shard_ordering_is_lexical - pretrain_real: transformer_config_matches_llama_370m_constants, real_step_fn_exhausted_iterator_returns_finite_placeholder, real_val_fn_empty_held_out_returns_nan All 6 new tests PASS. Steps 4-7 (SafeTensors→APR swap, `apr pretrain` CLI wiring, real grad_norm, checkpoint hook) to follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): wire real-corpus drive into apr pretrain (task #111 step 5) Replaces the `if !synthetic { return Err(...) }` guard with a real branch: build a shared 370M `TransformerTrainer`, split the shard stream head-off into a `HELD_OUT_BATCHES`-entry validation set, and drive the `PretrainLoop` with `RealStepFn`/`RealValFn` (from `entrenar::train::pretrain_real`) against a `ShardBatchIter`. **Structure** - `run` is now a 2-branch dispatcher. `drive_synthetic` preserves the deterministic decay drive used for GATE-TRAIN-005/006/007/008 wiring verification (task #105). `drive_real` is the new real-corpus path. - Both branches funnel into `run_and_report<S, V>` which owns the `PretrainLoop::new` + `run` + `report` sequence so the terminal status propagation (→ exit code) stays single-sourced. **MVP invariants (documented)** - `HELD_OUT_BATCHES = 2` — small constant; follow-up will plumb an explicit `--val-shards` flag so training and held-out shards are disjoint. - `pad_id = eos_id = 0` — uniform-length sequences take the shared layout in `LMBatch::from_sequences`, so pad_id is never used; the real tokenizer's special-token ids plumb through in a follow-up. - Empty dataset dir → `CliError::ValidationFailed` (shard iterator init failure), covered by the new test `real_mode_empty_dataset_dir_errors`. **Test changes** - `real_mode_empty_dataset_dir_errors` replaces the now-obsolete `synthetic_mode_false_rejected` test. Both synthetic and validation tests continue to pass (3/3 in `commands::pretrain::tests`). **Remaining MVP steps (task #111)** - Step 4: swap SafeTensors → APR in `trainer.rs` checkpoint writer. - Step 6: real optimizer-state sha256 over AdamW m/v/t (INV-TRAIN-003). - Step 7: per-epoch checkpoint hook in `PretrainLoop::run_epoch` post-gate-pass (C-TRAIN-PRETRAIN INV-TRAIN-002). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): CPU save_apr + per-epoch checkpoint hook (task #111 steps 4+7) Steps 4 and 7 of the MODEL-2 pretrain MVP (SHIP-TWO-001 v2.19.0): Step 4 — CPU save_apr - Add `TransformerTrainer::save_apr(path, name, arch)` in crates/aprender-train/src/train/transformer_trainer/trainer.rs, mirroring the existing CudaTransformerTrainer::save_apr. Emits a sovereign row-major .apr via aprender's Model + SaveConfig::Apr. - Existing `save()` (SafeTensors) left unchanged — three tests at trainer/core.rs:388,409 and tests.rs:423 still round-trip via safetensors for backward compat. - Test `save_apr_writes_readable_apr_file`: write a tiny-config trainer, open with `AprReader`, assert APR magic (APR\0 / APRN), assert `architecture` metadata round-trips, assert `model.embed_tokens.weight` readable as f32. PASSES. Step 7 — per-epoch APR checkpoint hook - Add `pub trait CheckpointFn` in train/pretrain.rs: `fn save(&mut self, epoch, &EpochArtifact) -> Result<(), String>` - Add `Option<Box<dyn CheckpointFn>>` field to `PretrainLoop` + builder method `with_checkpoint_fn`. Keeps PretrainLoop<S,V> at two generics (synthetic + real call-sites unify). - Wire into `run_epoch` AFTER `check_non_divergence(...)?` passes, BEFORE `epoch_artifacts.push()`. Aborted epochs never produce checkpoint files (per contract `per_epoch_artifacts` invariant). Write failures log eprintln but are non-fatal — a flaky disk cannot lose training progress. - Emit companion `metadata.json` (contract path_template). Real-corpus wiring - Add `AprCheckpointFn` in train/pretrain_real.rs holding the shared `Rc<RefCell<TransformerTrainer>>`; its `save()` delegates to `trainer.save_apr()` so the three hooks (RealStepFn, RealValFn, AprCheckpointFn) see the same in-memory weights. - Re-export `CheckpointFn` from train/mod.rs. CLI - `apr pretrain` --real path (drive_real): construct `build_shared_trainer` once, clone Rc into RealStepFn + RealValFn + AprCheckpointFn, pass to `run_and_report`. - `run_and_report` takes `Option<Box<dyn CheckpointFn>>`; synthetic branch passes `None` (no real weights to save). Tests (all green, 21 pretrain + 4 pretrain_real/save_apr + 3 CLI) - `pretrain_loop_calls_checkpoint_fn_once_per_passing_epoch`: mock `CheckpointFn` counts calls. Every successful epoch fires exactly one call; companion metadata.json written to disk. - `pretrain_loop_skips_checkpoint_on_abort`: NaN step forces abort; mock hook recorded zero calls. - `save_apr_writes_readable_apr_file`: magic + metadata + tensor round-trip via AprReader. Contract discharge - GATE-TRAIN-005 invariant preserved: checkpoint placement AFTER divergence guard means aborted epochs never touch disk. - training-loop-pretrain-v1 `per_epoch_artifacts.path_template` honored: `{run_dir}/ckpt/epoch-{N:03d}.apr` + `.metadata.json`. Deferred (Step 6) - `fake_optimizer_sha(epoch)` at pretrain.rs:680 still returns a placeholder. INV-TRAIN-003 discharge needs TransformerTrainer to expose AdamW m/v/t buffers for a real sha256. Separate step. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): real AdamW optimizer-state sha256 (task #111 step 6) INV-TRAIN-003 discharge for the MODEL-2 pretrain MVP. TransformerTrainer::optimizer_state_sha256() - New accessor in crates/aprender-train/src/train/transformer_trainer/trainer.rs that hashes (t, m_buffers, v_buffers) in fixed order. - Uses sha2::Sha256 + bytemuck::cast_slice over each Array1<f32>. - Versioned tag "aprender-train:adamw:optstate:v1" prefixes the digest so schema changes are loud, not silent. - Uninitialized slots hash to the literal "none" so missing m[i] is semantically distinct from an all-zeros m[i]. StepFn trait extension - Add `fn optimizer_state_sha256(&self) -> Option<String>` with default `None`. Synthetic harnesses keep returning None and continue using the `fake_optimizer_sha` epoch/seed fallback. - `PretrainLoop::run_epoch` now reads `step_fn.optimizer_state_sha256()` and falls back to the fake fingerprint only when None. RealStepFn override - RealStepFn in pretrain_real.rs implements the new hook by delegating to `trainer.borrow().optimizer_state_sha256()`, so the real-corpus path records the actual AdamW digest. Tests (all 25 + 3 green) - `optimizer_state_sha256_is_hex_digest_on_fresh_trainer`: 64-char lowercase hex shape check on an un-stepped trainer. - `optimizer_state_sha256_is_stable_across_fresh_trainers`: two fresh trainers hash to the same digest (reproducibility). - `pretrain_loop_uses_step_fn_optimizer_sha_when_available`: a StepFn with override wins over fake_optimizer_sha. - `pretrain_loop_falls_back_to_fake_optimizer_sha_for_synthetic`: default impl still produces a 64-char hex digest via fallback. Task #111 MVP status - Steps 1-3 shipped in commit b2b0329 - Step 5 shipped in commit e5a2f02 - Steps 4+7 shipped in commit 89db4b3 - Step 6 shipped in this commit - All 7 steps of the task #111 plan are now committed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): FALSIFY-SHIP-021 seed=0 × 100-step reproducibility harness Discharges GATE-TRAIN-006 / INV-TRAIN-006 from training-loop-pretrain-v1 (bumped 1.0.0 → 1.1.0 PROPOSED → ACTIVE). Two new Rust tests in crates/aprender-train/src/train/transformer_trainer/tests.rs: - falsify_ship_021_seed_0_100_step_reproducibility: two trainers built with seed=0 produce identical finite losses for 100 consecutive train_batch calls (|Δ| ≤ 1e-6) AND identical AdamW optimizer_state_sha256 digests. - falsify_ship_021_different_seeds_do_diverge: seed=0 vs seed=1 counter-test must diverge > 1e-4 within 10 steps (guards against degenerate "always equal" implementations). Seed plumbing fixes: - TransformerTrainer::new now calls lock_init_seed(config.seed) before Transformer::new so direct (non-YAML) callers honor the configured seed instead of silently inheriting the global default of 42. - transformer::init::INIT_SEED_LOCK (std::sync::Mutex) + lock_init_seed helper returning a #[must_use] MutexGuard. Held across the full Transformer::new call so cargo test's default parallel runner cannot clobber the global atomic INIT_SEED between one test's set_init_seed and another test's weight-init reads. Poisoned mutex is recovered transparently (seed itself is atomic; poison only signals prior panic). Contract uplift (contracts/training-loop-pretrain-v1.yaml v1.1.0): - status PROPOSED → ACTIVE - INV-TRAIN-006 gains harness: block naming both test paths + assertions - GATE-TRAIN-006 gains evidence_discharged_by: pointing to both tests - metadata.changelog entry recording the discharge Verification: cargo test -p aprender-train --lib falsify_ship_021 → 2 passed cargo clippy -p aprender-train --lib --no-deps -- -D warnings → clean pv validate contracts/training-loop-pretrain-v1.yaml → 0 errors Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(ship-two): FALSIFY-SHIP-022 apr inspect provenance (AC-SHIP2-012) Discharges FALSIFY-SHIP-022: apr inspect surfaces license + data_source + data_license on every .apr, with "(missing)" / null rendering when a field is absent rather than silent skip. Makes a .apr binary a sufficient provenance-audit artifact (no sidecar manifest required). Contract: contracts/apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0, ACTIVE, kind: schema). 3 invariants + 3 gates + 3 failure modes, all bound to AC-SHIP2-012 / FALSIFY-SHIP-022. pv validate PASS. Code changes: - AprV2Metadata: add data_source + data_license as named Option<String> fields (not buried in custom HashMap). No skip_serializing_if, so JSON round-trips them as null when None (FM-APR-PROV-SILENT-SKIP). - apr inspect MetadataInfo: mirror all 3 provenance fields, also with no skip_serializing_if. - apr inspect text output: new "Provenance:" block via pure helper format_provenance_block() — always emits all 3 keys, renders None as literal "(missing)". - Two struct-literal construction sites updated for new fields. Harness tests (5 passing): - aprender-core: - falsify_ship_022_apr_metadata_provenance_round_trip - falsify_ship_022_inspect_emits_provenance_keys (JSON null half) - falsify_ship_022_partial_provenance_round_trip - apr-cli: - falsify_ship_022_inspect_emits_provenance_keys (MetadataInfo JSON) - falsify_ship_022_inspect_missing_renders_as_missing (text half) - falsify_ship_022_inspect_populated_renders_values Smoke test: apr inspect on existing .apr (no provenance stored) correctly emits: Provenance: license: (missing) data_source: (missing) data_license: (missing) cargo fmt + cargo clippy (aprender-core, apr-cli) clean. 3239 aprender-core format tests PASS, 85 apr-cli inspect tests PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(ship-two): v2.20.0 amendment — FALSIFY-SHIP-021 + FALSIFY-SHIP-022 DISCHARGED Documents two MODEL-2 ship gates closed in the post-v2.19 evidence window: 1. FALSIFY-SHIP-021 (AC-SHIP2-011) — seed=0 × 100-step reproducibility harness + counter-test seed=0 vs seed=1 divergence proof. Root cause of original flake (sibling test racing on global INIT_SEED atomic) fixed via lock_init_seed(seed) -> MutexGuard. Contract training-loop-pretrain-v1.yaml bumped 1.0.0 → 1.1.0 ACTIVE. Commit 0b8ca8c, task #112. 2. FALSIFY-SHIP-022 (AC-SHIP2-012) — apr inspect provenance block (license + data_source + data_license) shipped. AprV2Metadata extended with 2 named Option<String> fields; no skip_serializing_if (FM-APR-PROV-SILENT-SKIP guard). Pure helper format_provenance_block replaces stdout-capture in tests (gag is NOT parallel-safe). New contract apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0 ACTIVE, kind: schema). pv validate PASS. Commit 8f0607d, task #113. Combined status: 2/12 AC-SHIP2 gates DISCHARGED. Remaining 10 block on 370M compute-dispatch (the long-pole from v2.19.0). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): FALSIFY-SHIP-011 llama-370m sovereign contract ACTIVE (AC-SHIP2-001) Discharges FALSIFY-SHIP-011 / AC-SHIP2-001 — MODEL-2 370M architectural contract registered AND byte-equally bound to the Rust scaffold that aprender-train consumes. Contract lift: - contracts/model-families/llama-370m-sovereign-v1.yaml - version 1.0.0 → 1.1.0 - status PROPOSED → ACTIVE - GATE-ARCH-370M-001 gains evidence_discharged_by (4 entries) and ship_blocking: true - changelog block added documenting the v1.1.0 discharge Harness tests (crates/aprender-train/src/models/llama_370m.rs): - `falsify_ship_011_rust_scaffold_matches_yaml_contract` — loads the contract via include_str! (compile-time-embedded, no path deps at runtime) and asserts every architecture.* and constraints.* key matches the corresponding Llama370MConfig::* const byte-equally - `falsify_ship_011_sovereign_contract_is_active` — asserts status == ACTIVE (a PROPOSED contract cannot gate a ship) Test run: 6/6 aprender-train::models::llama_370m tests PASS (4 pre- existing + 2 new). pv validate on contract: 0 errors, 0 warnings. Why this discharge is strong: - Rust scaffold already encodes INV-ARCH-370M-002..008 as compile-time `const _: () = Llama370MConfig::validate();` — a drift of any value fails `cargo build`, not just `cargo test` - The new YAML-vs-Rust binding test adds the missing half: drift of a YAML key that the Rust scaffold doesn't mirror is now also caught at test time, preventing the MODEL-1-v2 QLoRA class of recipe/artifact drift (rank=16 actual vs rank=32 recipe — see project_ship_two_001_model1_qlora_divergence.md) - INV-ARCH-370M-001 (param count band) is discharged by the existing `estimated_param_count_within_contract_band` test - INV-ARCH-370M-009 (row-major layout) is discharged by aprender::format::layout_contract at APR load time Combined MODEL-2 status after this commit: 3/12 AC-SHIP2 gates DISCHARGED (001, 011, 012). Remaining 9 (002–010) still block on actual 370M training compute-dispatch — the pretrain loop driver from v2.19.0 is ready to exercise them once the weights exist. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): FALSIFY-SHIP-012 algorithm-level PARTIAL discharge (AC-SHIP2-002) Bumps C-TOK-BPE to v1.1.0 and wires evidence_discharged_by into GATE-BPE-003 pointing at 3 existing harness tests in crates/apr-cli/tests/falsify_ship_012_tokenizer_roundtrip.rs and the emitted evidence JSON at evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json. Status intentionally stays PROPOSED. The gate requires 10K-doc byte-exact round-trip on The Stack v2 Python holdout; task #91 shipped the ingest scaffold (corpus-ingest dry-run CLI) but the 10K fixture itself is not yet materialized — so this lands as PARTIAL_ALGORITHM_LEVEL discharge with full_discharge_blocks_on: task #91 data. What passes algorithm-level today (all 3 tests green at commit time): - falsify_ship_012_tokenizer_roundtrip_byte_exact — decode(encode(nfc(doc))) byte-equals nfc(doc) on every doc in a 20-doc synthetic Python-like holdout (ASCII keywords + Unicode identifiers + docstrings + emoji + combining marks). Hard-asserts evidence.docs_failed == 0 — regressions reintroducing whitespace splitting or dropping the byte encoder panic. - falsify_ship_012_nfc_idempotence_only — INV-BPE-005 standalone: nfc(nfc(x)) byte-equals nfc(x) on every holdout doc. - falsify_ship_012_train_corpus_sanity — train/holdout set disjointness plus minimum corpus sizes (>=20 docs each). When task #91's 10K Stack-v2 Python holdout lands the fixture swap is data-only: the harness module doc-comment already flagged this path so no test rewrite will be required. Evidence: evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json (20/20 passed, nfc_idempotent: true, vocab_size_trained: 489/512). Verification: - pv validate contracts/tokenizer-bpe-v1.yaml -> 0 errors, 0 warnings - cargo test -p apr-cli --test falsify_ship_012_tokenizer_roundtrip -> 3/3 passed Bound to: AC-SHIP2-002 (ship-two-models-spec §5). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): FALSIFY-SHIP-015 algorithm-level PARTIAL discharge (AC-SHIP2-005) Bumps C-LLAMA-370M-SOVEREIGN v1.1.0 → v1.2.0 and wires evidence_discharged_by into GATE-ARCH-370M-003 (the param-count gate that binds AC-SHIP2-005 via FALSIFY-SHIP-015). Contract stays ACTIVE — the FALSIFY-SHIP-011 discharge (v1.1.0) is what gates the ACTIVE promotion, not SHIP-015. GATE-ARCH-370M-003's evidence_required asks for apr inspect --json model.apr | jq '.param_count' ∈ [366M, 374M] on a real 370M `.apr` checkpoint. That file does not exist yet — it blocks on AC-SHIP2-003/004 pretraining compute-dispatch. Rather than leave the gate's evidence blank, this commit wires the algorithm-level proof that already exists: - estimated_param_count() / estimated_stored_param_count() — const fn over Llama370MConfig::*, so the count is computed at compile time. - estimated_param_count_within_contract_band (unit test) hard-asserts: * p ∈ [PARAMETERS_MIN=366M, PARAMETERS_MAX=374M] (INV-ARCH-370M-001) * |p − 370M| / 370M < 5% (tighter sanity) * p − stored == VOCAB_SIZE × HIDDEN_DIM (tied embeddings) Any edit to Llama370MConfig that moves the count out of the INV-ARCH-370M-001 band fails `cargo test -p aprender-train --lib llama_370m` — before any compute runs. The gate now carries: discharge_status: PARTIAL_ALGORITHM_LEVEL full_discharge_blocks_on: "real 370M .apr checkpoint from pretraining compute-dispatch (AC-SHIP2-003/004)" ship_blocking: true so the data-scale gap is first-class contract state, not an unspoken assumption. Verification: - pv validate contracts/model-families/llama-370m-sovereign-v1.yaml -> 0 errors, 0 warnings - cargo test -p aprender-train --lib models::llama_370m -> 6/6 passed (including the newly-cited estimated_param_count_within_contract_band and the pre-existing falsify_ship_011_* pair) MODEL-2 AC-SHIP2 ledger after this: 3/12 fully ACTIVE (001, 011, 012) + 2/12 PARTIAL (002 via SHIP-012, 005 via SHIP-015) = 5/12 touched. Remaining 7 (003/004/006/007/008/009/010) block on 370M compute. Bound to: AC-SHIP2-005 (ship-two-models-spec §5). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(ship-two-001): spec v2.21.0 — FALSIFY-SHIP-011 DISCHARGED + SHIP-012/015 PARTIAL Captures the three evidence-wiring commits landed on chore/post-v2.19-evidence since v2.20.0: 1. FALSIFY-SHIP-011 (AC-SHIP2-001) DISCHARGED at 338c6eb (task #114) C-LLAMA-370M-SOVEREIGN v1.0.0 PROPOSED -> v1.1.0 ACTIVE. Rust-YAML byte-equality binding via include_str! + serde_yaml::Value. 2. FALSIFY-SHIP-012 (AC-SHIP2-002) PARTIAL_ALGORITHM_LEVEL at 2e8b8b8 (task #115). C-TOK-BPE v1.0.0 -> v1.1.0 stays PROPOSED. 3 tokenizer harness tests wired; full discharge blocks on task #91 10K Stack-v2 Python holdout (fixture-swap is data-only). 3. FALSIFY-SHIP-015 (AC-SHIP2-005) PARTIAL_ALGORITHM_LEVEL at bfb8831 (task #116). Sovereign contract v1.1.0 -> v1.2.0 stays ACTIVE. estimated_param_count_within_contract_band + const fns wired; full discharge blocks on real 370M .apr from compute-dispatch. Also codifies the PARTIAL_ALGORITHM_LEVEL pattern as a first-class spec concept: when a gate's evidence_required describes a production-scale check that is not yet runnable but the underlying invariant is provable today at algorithm/compile/unit-test level, wire the algorithm proofs and carry discharge_status + partial_discharge_note + full_discharge_blocks_on + ship_blocking=true to make the data gap first-class contract state. MODEL-2 ship-gate status after v2.21.0: 3/12 fully ACTIVE (001, 011, 012) + 2/12 PARTIAL_ALGORITHM_LEVEL (002, 005) = 5/12 touched (~42%). Remaining 7 block on real 370M compute-dispatch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): FALSIFY-SHIP-019 algorithm-level PARTIAL discharge (AC-SHIP2-009) GATE-ARCH-370M-004 gains evidence_discharged_by + discharge_status: PARTIAL_ALGORITHM_LEVEL. Three algorithm-level invariants wired without training: 1. Coverage — every 370M tensor (219 entries: 1 embed + 1 lm_head + 9 per-layer × 24 layers + 1 final norm) resolves to a TensorContract entry in LayoutContract::new(). Pattern-normalises per-layer names; any uncovered tensor would be silently skipped by GGUF export. 2. Row-major ordering (INV-ARCH-370M-009) — every 2D shape is [out_dim, in_dim]. Pinned lm_head/embed/q_proj/k_proj shapes verify GQA (k_proj = [kv_heads*head_dim, hidden]) and bind the 370M architecture to the GH-202-regression-proof layout. 3. Critical-tensor enforcement — validate_apr_shape accepts [vocab, hidden] AND rejects reversed [hidden, vocab] on lm_head.weight. Proves the validator catches layout bugs, not just passes silently. Full discharge (GGUF cosine-parity on trained 370M, max_logit_cosine ≤ 1e-3 over 100 canary prompts) blocks on compute-dispatch (AC-SHIP2-003/004). Harness is fixture-swap-ready once a trained .apr exists — no test rewrite needed. Spec §9 Risk #2 names this exact mitigation path. Contract: llama-370m-sovereign-v1.yaml v1.2.0 → v1.3.0, stays ACTIVE. Tests: 2 new test fns in crates/aprender-train/src/models/llama_370m.rs (8/8 pass). `pv validate` = 0 errors, 0 warnings. Closes #117. Binds to AC-SHIP2-009 / FALSIFY-SHIP-019. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(ship-two-001): v2.22.0 — FALSIFY-SHIP-019 PARTIAL discharge capstone Records the SHIP-019 algorithm-level PARTIAL discharge (task #117, commit 846cc1d) in the authoritative spec: - Version bump 2.21.0 → 2.22.0 - Full amendment block #4 under post-v2.19 evidence window documenting GATE-ARCH-370M-004 wired to `layout_contract.rs` algorithm proofs (219-tensor coverage + row-major ordering + GH-202 rejection) - New "counter-example hunting" pattern lesson: prior "exhausted PARTIAL levers" verdict was ~86% correct; re-running the 7-gate FALSIFY-SHIP survey with explicit counter-example hunting found exactly one genuine lever (SHIP-019). SHIP-017/018/020 need compute; SHIP-013/014/016 collapse into SHIP-011 wiring. - Combined MODEL-2 ledger: 3/12 fully ACTIVE + 3/12 PARTIAL = 6/12 touched (50%). Remaining 6 (003/004/006/007/008/010) all require real 370M compute, trained .apr + eval harness, or RTX 4090 wall-clock benchmark. Genuine algorithm-level PARTIAL harvesting for MODEL-2 is now exhausted. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(publish): mark 5 QA harness crates publish = false + document policy Evidence: aprender-qa-{cli,gen,runner,report,certify} have never been published to crates.io (verified against crates.io API 2026-04-19). They are reached through `apr qa` (the user-facing binary), not through `cargo add`, so marking them publish = false prevents accidental version-bump-with-no-publish drift across the workspace. Spec §A.12 rewritten from the stale "63 crates (49 published + 14 internal)" snapshot to the real 80-crate layout: 9 publish = false (4 benchmarks/xtask + 5 QA harness) plus 71 publishable. §A.12.1 codifies publishing policy: three opt-out categories (benchmarks, xtask, QA harness), and the rule that a v0.31.0-style release does NOT require cargo publish across all 80 crates — crates.io publish is selective (via cargo workspaces publish --from-git or cargo publish -p <name>), workspace-wide tag/release is not. Verified: cargo check --workspace clean after the flip. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(mcp-spec): refresh header — M1–M3 SHIPPED in v0.31.0, M4 in flight Five-whys on the stale 2026-04-17 draft status: 1. Why stale? Spec said "DRAFT (pre-implementation)" + target "v0.32.0" but M1–M3 actually shipped in v0.31.0 on 2026-04-19 (tag 62893da). 2. Why not refreshed? M1–M3 landed across multiple PRs without a spec-header refresh pass. 3. Why is that a problem? New contributors reading the spec think MCP is unshipped — contradicted by `cargo install aprender` already exposing `apr mcp` with 9 tools. 4. Root cause: spec headers are not on the release checklist. 5. Fix here: update status to ACTIVE, version to 1.2.0, delivery line to "v0.31.0 M1–M3 SHIPPED / M4 in flight (PRs #886-892)". No body changes — architecture/tool-surface/protocol sections are still accurate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(publish): mark aprender-viz-ttop publish = false + 4th category Evidence: `aprender-viz-ttop` has never been published to crates.io (release workflow explicitly never invokes `cargo publish` for it). Its `description` field calls it a "Terminal Top: 10X better than btop" system monitor — ships as a binary subcommand inside the `apr` facade, not as a library dependency. Five-whys: 1. Why flip it? Because it's a bundled binary, not a library. 2. Why does that matter? `cargo add aprender-viz-ttop` would mislead library authors into taking a user-facing TUI as a dep. 3. Why wasn't it already flipped? It predated the A.12 policy audit performed in 42907db. 4. Why a 4th category? Benchmarks / xtask / QA harness all leave outputs as artifacts; this one ships a runnable subcommand. The distinction matters because `apr cbtop` dispatches to it. 5. Why document it? To prevent a future reader from re-opening the "publish all 80 crates" question when we only publish ~70. Changes: - crates/aprender-viz-ttop/Cargo.toml: add `publish = false` - docs/specifications/aprender-monorepo-consolidation.md: - §A.12: add viz-ttop to internal-crates table (10 rows) - §A.12.1: add 4th category (Bundled binaries); update total to "10 opted out / 70 publishable"; remove stale "Candidates to migrate" paragraph (superseded by 42907db + this commit) Refs: APR-MONO, PR #901 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
#902) * evidence(ship-two-001): MODEL-2 pretrain smoke test — task #105 discharge Records the end-to-end synthetic drive of `apr pretrain` on commit 1e7cf53 (now landed on main at 9209383 via PR #882 merge). Verifies task #105 deliverable: GATE-TRAIN-005 / INV-TRAIN-007 / GATE-TRAIN-008 wiring is functional end-to-end. Run: 20 steps, 4 epochs, batch=4, seq=128 — val_loss monotone 3.96 → 2.64. Synthetic drive caveat: no real 370M forward pass, no real corpus read, no checkpoint artifacts written yet. Real corpus + checkpoint wiring tracked as task #111. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * spec(model-2): MVP plan for task #111 (pretrain real corpus + checkpoint) 7-step edit list from Plan agent afd391d1eb1395d30 against post-#882-merge commit 9209383. Identifies 5 critical files (pretrain.rs, apr-cli/commands/pretrain.rs, trainer.rs, transformer/model.rs, io/save.rs) and 5 binary acceptance criteria (AC-111-001..005). Host assignment: lambda-labs (impl), yoga (8GB smoke), gx10 (parity). Non-goals explicitly deferred: async H2D streaming, full corpus-ingest pipeline, mixed-precision scaler tuning, distributed training, convergence budget, resume round-trip, nvml telemetry, apr qa post-hoc validators. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * evidence(ship-two-001): yoga parity smoke — GATE-TRAIN-006 discharged Cross-host byte-identical loss history on yoga RTX 4060 Laptop (8GB): lambda-labs: [3.96, 3.52, 3.08, 2.64] yoga: [3.96, 3.52, 3.08, 2.64] Discharges GATE-TRAIN-006 (seed=42 deterministic) across x86_64 RTX 4090 ↔ x86_64 RTX 4060 Laptop. Same synthetic drive — task #111 MVP will add the real 370M forward pass; yoga stays as 8GB smoke-test host per MVP plan's host assignment table. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): RealStepFn/RealValFn + shard reader (task #111 steps 1-3) Implements MODEL-2 pretrain MVP plan steps 1-3: the model-agnostic PretrainLoop now has a real-corpus driver that runs a full forward + backward + AdamW step through TransformerTrainer against the 370M Llama scaffold — replacing the LinearDecaySynthetic/ScriptedVal pair used for GATE-TRAIN-005/006/007/008 wiring verification in task #105. **New modules** - `train::shard_reader::ShardBatchIter` Streaming iterator over .bin token shards (little-endian u32). Reads seq_length+1 sequences, chunks into LMBatch of batch_size. Empty-dir errors; lexical shard ordering; EOF auto-advances to next shard. No MinHash dedup / PII scrub / license filter — those belong to `apr-corpus-ingest run`. - `train::pretrain_real::{RealStepFn, RealValFn, build_shared_trainer}` - `llama_370m_transformer_config()` field-for-field from the frozen Llama370MConfig constants (INV-ARCH-370M-001..008 source of truth) - `llama_370m_train_config(lr, seq_length, seed)` builds TransformerTrainConfig with MODEL-2 v2-remedy defaults - `SharedTrainer = Rc<RefCell<TransformerTrainer>>` so both the mutable StepFn and the forward-only ValFn own the same model - `RealStepFn::step` pulls one LMBatch, runs train_batch, returns (loss, grad_norm=1.0 placeholder). Exhausted iterator returns a finite (1.0, 1.0) so GATE-TRAIN-007 (NaN/Inf) does not mis-fire on shard-stream EOF before the loop plans to stop. - `RealValFn::validate` runs forward-only across a held-out Vec, returns mean cross-entropy loss (or NaN if held-out is empty). - `build_shared_trainer` runs INV-ARCH-370M-001 as a debug_assert (param count must land in [366M, 374M]) so any drift in the Llama370MConfig constants fails the instant a dev build compiles. **Contract coverage** Existing `contracts/training-loop-pretrain-v1.yaml` covers all MVP obligations already; no new contract needed. Task #111 follow-up will add per-epoch APR checkpoint hooks (C-TRAIN-PRETRAIN INV-TRAIN-002) and real optimizer-state sha256 (INV-TRAIN-003). **Tests** - shard_reader: single_shard_yields_expected_batch_count, empty_dir_errors, multi_shard_ordering_is_lexical - pretrain_real: transformer_config_matches_llama_370m_constants, real_step_fn_exhausted_iterator_returns_finite_placeholder, real_val_fn_empty_held_out_returns_nan All 6 new tests PASS. Steps 4-7 (SafeTensors→APR swap, `apr pretrain` CLI wiring, real grad_norm, checkpoint hook) to follow. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): wire real-corpus drive into apr pretrain (task #111 step 5) Replaces the `if !synthetic { return Err(...) }` guard with a real branch: build a shared 370M `TransformerTrainer`, split the shard stream head-off into a `HELD_OUT_BATCHES`-entry validation set, and drive the `PretrainLoop` with `RealStepFn`/`RealValFn` (from `entrenar::train::pretrain_real`) against a `ShardBatchIter`. **Structure** - `run` is now a 2-branch dispatcher. `drive_synthetic` preserves the deterministic decay drive used for GATE-TRAIN-005/006/007/008 wiring verification (task #105). `drive_real` is the new real-corpus path. - Both branches funnel into `run_and_report<S, V>` which owns the `PretrainLoop::new` + `run` + `report` sequence so the terminal status propagation (→ exit code) stays single-sourced. **MVP invariants (documented)** - `HELD_OUT_BATCHES = 2` — small constant; follow-up will plumb an explicit `--val-shards` flag so training and held-out shards are disjoint. - `pad_id = eos_id = 0` — uniform-length sequences take the shared layout in `LMBatch::from_sequences`, so pad_id is never used; the real tokenizer's special-token ids plumb through in a follow-up. - Empty dataset dir → `CliError::ValidationFailed` (shard iterator init failure), covered by the new test `real_mode_empty_dataset_dir_errors`. **Test changes** - `real_mode_empty_dataset_dir_errors` replaces the now-obsolete `synthetic_mode_false_rejected` test. Both synthetic and validation tests continue to pass (3/3 in `commands::pretrain::tests`). **Remaining MVP steps (task #111)** - Step 4: swap SafeTensors → APR in `trainer.rs` checkpoint writer. - Step 6: real optimizer-state sha256 over AdamW m/v/t (INV-TRAIN-003). - Step 7: per-epoch checkpoint hook in `PretrainLoop::run_epoch` post-gate-pass (C-TRAIN-PRETRAIN INV-TRAIN-002). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): CPU save_apr + per-epoch checkpoint hook (task #111 steps 4+7) Steps 4 and 7 of the MODEL-2 pretrain MVP (SHIP-TWO-001 v2.19.0): Step 4 — CPU save_apr - Add `TransformerTrainer::save_apr(path, name, arch)` in crates/aprender-train/src/train/transformer_trainer/trainer.rs, mirroring the existing CudaTransformerTrainer::save_apr. Emits a sovereign row-major .apr via aprender's Model + SaveConfig::Apr. - Existing `save()` (SafeTensors) left unchanged — three tests at trainer/core.rs:388,409 and tests.rs:423 still round-trip via safetensors for backward compat. - Test `save_apr_writes_readable_apr_file`: write a tiny-config trainer, open with `AprReader`, assert APR magic (APR\0 / APRN), assert `architecture` metadata round-trips, assert `model.embed_tokens.weight` readable as f32. PASSES. Step 7 — per-epoch APR checkpoint hook - Add `pub trait CheckpointFn` in train/pretrain.rs: `fn save(&mut self, epoch, &EpochArtifact) -> Result<(), String>` - Add `Option<Box<dyn CheckpointFn>>` field to `PretrainLoop` + builder method `with_checkpoint_fn`. Keeps PretrainLoop<S,V> at two generics (synthetic + real call-sites unify). - Wire into `run_epoch` AFTER `check_non_divergence(...)?` passes, BEFORE `epoch_artifacts.push()`. Aborted epochs never produce checkpoint files (per contract `per_epoch_artifacts` invariant). Write failures log eprintln but are non-fatal — a flaky disk cannot lose training progress. - Emit companion `metadata.json` (contract path_template). Real-corpus wiring - Add `AprCheckpointFn` in train/pretrain_real.rs holding the shared `Rc<RefCell<TransformerTrainer>>`; its `save()` delegates to `trainer.save_apr()` so the three hooks (RealStepFn, RealValFn, AprCheckpointFn) see the same in-memory weights. - Re-export `CheckpointFn` from train/mod.rs. CLI - `apr pretrain` --real path (drive_real): construct `build_shared_trainer` once, clone Rc into RealStepFn + RealValFn + AprCheckpointFn, pass to `run_and_report`. - `run_and_report` takes `Option<Box<dyn CheckpointFn>>`; synthetic branch passes `None` (no real weights to save). Tests (all green, 21 pretrain + 4 pretrain_real/save_apr + 3 CLI) - `pretrain_loop_calls_checkpoint_fn_once_per_passing_epoch`: mock `CheckpointFn` counts calls. Every successful epoch fires exactly one call; companion metadata.json written to disk. - `pretrain_loop_skips_checkpoint_on_abort`: NaN step forces abort; mock hook recorded zero calls. - `save_apr_writes_readable_apr_file`: magic + metadata + tensor round-trip via AprReader. Contract discharge - GATE-TRAIN-005 invariant preserved: checkpoint placement AFTER divergence guard means aborted epochs never touch disk. - training-loop-pretrain-v1 `per_epoch_artifacts.path_template` honored: `{run_dir}/ckpt/epoch-{N:03d}.apr` + `.metadata.json`. Deferred (Step 6) - `fake_optimizer_sha(epoch)` at pretrain.rs:680 still returns a placeholder. INV-TRAIN-003 discharge needs TransformerTrainer to expose AdamW m/v/t buffers for a real sha256. Separate step. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): real AdamW optimizer-state sha256 (task #111 step 6) INV-TRAIN-003 discharge for the MODEL-2 pretrain MVP. TransformerTrainer::optimizer_state_sha256() - New accessor in crates/aprender-train/src/train/transformer_trainer/trainer.rs that hashes (t, m_buffers, v_buffers) in fixed order. - Uses sha2::Sha256 + bytemuck::cast_slice over each Array1<f32>. - Versioned tag "aprender-train:adamw:optstate:v1" prefixes the digest so schema changes are loud, not silent. - Uninitialized slots hash to the literal "none" so missing m[i] is semantically distinct from an all-zeros m[i]. StepFn trait extension - Add `fn optimizer_state_sha256(&self) -> Option<String>` with default `None`. Synthetic harnesses keep returning None and continue using the `fake_optimizer_sha` epoch/seed fallback. - `PretrainLoop::run_epoch` now reads `step_fn.optimizer_state_sha256()` and falls back to the fake fingerprint only when None. RealStepFn override - RealStepFn in pretrain_real.rs implements the new hook by delegating to `trainer.borrow().optimizer_state_sha256()`, so the real-corpus path records the actual AdamW digest. Tests (all 25 + 3 green) - `optimizer_state_sha256_is_hex_digest_on_fresh_trainer`: 64-char lowercase hex shape check on an un-stepped trainer. - `optimizer_state_sha256_is_stable_across_fresh_trainers`: two fresh trainers hash to the same digest (reproducibility). - `pretrain_loop_uses_step_fn_optimizer_sha_when_available`: a StepFn with override wins over fake_optimizer_sha. - `pretrain_loop_falls_back_to_fake_optimizer_sha_for_synthetic`: default impl still produces a 64-char hex digest via fallback. Task #111 MVP status - Steps 1-3 shipped in commit b2b0329 - Step 5 shipped in commit e5a2f02 - Steps 4+7 shipped in commit 89db4b3 - Step 6 shipped in this commit - All 7 steps of the task #111 plan are now committed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): FALSIFY-SHIP-021 seed=0 × 100-step reproducibility harness Discharges GATE-TRAIN-006 / INV-TRAIN-006 from training-loop-pretrain-v1 (bumped 1.0.0 → 1.1.0 PROPOSED → ACTIVE). Two new Rust tests in crates/aprender-train/src/train/transformer_trainer/tests.rs: - falsify_ship_021_seed_0_100_step_reproducibility: two trainers built with seed=0 produce identical finite losses for 100 consecutive train_batch calls (|Δ| ≤ 1e-6) AND identical AdamW optimizer_state_sha256 digests. - falsify_ship_021_different_seeds_do_diverge: seed=0 vs seed=1 counter-test must diverge > 1e-4 within 10 steps (guards against degenerate "always equal" implementations). Seed plumbing fixes: - TransformerTrainer::new now calls lock_init_seed(config.seed) before Transformer::new so direct (non-YAML) callers honor the configured seed instead of silently inheriting the global default of 42. - transformer::init::INIT_SEED_LOCK (std::sync::Mutex) + lock_init_seed helper returning a #[must_use] MutexGuard. Held across the full Transformer::new call so cargo test's default parallel runner cannot clobber the global atomic INIT_SEED between one test's set_init_seed and another test's weight-init reads. Poisoned mutex is recovered transparently (seed itself is atomic; poison only signals prior panic). Contract uplift (contracts/training-loop-pretrain-v1.yaml v1.1.0): - status PROPOSED → ACTIVE - INV-TRAIN-006 gains harness: block naming both test paths + assertions - GATE-TRAIN-006 gains evidence_discharged_by: pointing to both tests - metadata.changelog entry recording the discharge Verification: cargo test -p aprender-train --lib falsify_ship_021 → 2 passed cargo clippy -p aprender-train --lib --no-deps -- -D warnings → clean pv validate contracts/training-loop-pretrain-v1.yaml → 0 errors Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(ship-two): FALSIFY-SHIP-022 apr inspect provenance (AC-SHIP2-012) Discharges FALSIFY-SHIP-022: apr inspect surfaces license + data_source + data_license on every .apr, with "(missing)" / null rendering when a field is absent rather than silent skip. Makes a .apr binary a sufficient provenance-audit artifact (no sidecar manifest required). Contract: contracts/apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0, ACTIVE, kind: schema). 3 invariants + 3 gates + 3 failure modes, all bound to AC-SHIP2-012 / FALSIFY-SHIP-022. pv validate PASS. Code changes: - AprV2Metadata: add data_source + data_license as named Option<String> fields (not buried in custom HashMap). No skip_serializing_if, so JSON round-trips them as null when None (FM-APR-PROV-SILENT-SKIP). - apr inspect MetadataInfo: mirror all 3 provenance fields, also with no skip_serializing_if. - apr inspect text output: new "Provenance:" block via pure helper format_provenance_block() — always emits all 3 keys, renders None as literal "(missing)". - Two struct-literal construction sites updated for new fields. Harness tests (5 passing): - aprender-core: - falsify_ship_022_apr_metadata_provenance_round_trip - falsify_ship_022_inspect_emits_provenance_keys (JSON null half) - falsify_ship_022_partial_provenance_round_trip - apr-cli: - falsify_ship_022_inspect_emits_provenance_keys (MetadataInfo JSON) - falsify_ship_022_inspect_missing_renders_as_missing (text half) - falsify_ship_022_inspect_populated_renders_values Smoke test: apr inspect on existing .apr (no provenance stored) correctly emits: Provenance: license: (missing) data_source: (missing) data_license: (missing) cargo fmt + cargo clippy (aprender-core, apr-cli) clean. 3239 aprender-core format tests PASS, 85 apr-cli inspect tests PASS. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(ship-two): v2.20.0 amendment — FALSIFY-SHIP-021 + FALSIFY-SHIP-022 DISCHARGED Documents two MODEL-2 ship gates closed in the post-v2.19 evidence window: 1. FALSIFY-SHIP-021 (AC-SHIP2-011) — seed=0 × 100-step reproducibility harness + counter-test seed=0 vs seed=1 divergence proof. Root cause of original flake (sibling test racing on global INIT_SEED atomic) fixed via lock_init_seed(seed) -> MutexGuard. Contract training-loop-pretrain-v1.yaml bumped 1.0.0 → 1.1.0 ACTIVE. Commit 0b8ca8c, task #112. 2. FALSIFY-SHIP-022 (AC-SHIP2-012) — apr inspect provenance block (license + data_source + data_license) shipped. AprV2Metadata extended with 2 named Option<String> fields; no skip_serializing_if (FM-APR-PROV-SILENT-SKIP guard). Pure helper format_provenance_block replaces stdout-capture in tests (gag is NOT parallel-safe). New contract apr-provenance-v1.yaml (C-APR-PROVENANCE v1.0.0 ACTIVE, kind: schema). pv validate PASS. Commit 8f0607d, task #113. Combined status: 2/12 AC-SHIP2 gates DISCHARGED. Remaining 10 block on 370M compute-dispatch (the long-pole from v2.19.0). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): FALSIFY-SHIP-011 llama-370m sovereign contract ACTIVE (AC-SHIP2-001) Discharges FALSIFY-SHIP-011 / AC-SHIP2-001 — MODEL-2 370M architectural contract registered AND byte-equally bound to the Rust scaffold that aprender-train consumes. Contract lift: - contracts/model-families/llama-370m-sovereign-v1.yaml - version 1.0.0 → 1.1.0 - status PROPOSED → ACTIVE - GATE-ARCH-370M-001 gains evidence_discharged_by (4 entries) and ship_blocking: true - changelog block added documenting the v1.1.0 discharge Harness tests (crates/aprender-train/src/models/llama_370m.rs): - `falsify_ship_011_rust_scaffold_matches_yaml_contract` — loads the contract via include_str! (compile-time-embedded, no path deps at runtime) and asserts every architecture.* and constraints.* key matches the corresponding Llama370MConfig::* const byte-equally - `falsify_ship_011_sovereign_contract_is_active` — asserts status == ACTIVE (a PROPOSED contract cannot gate a ship) Test run: 6/6 aprender-train::models::llama_370m tests PASS (4 pre- existing + 2 new). pv validate on contract: 0 errors, 0 warnings. Why this discharge is strong: - Rust scaffold already encodes INV-ARCH-370M-002..008 as compile-time `const _: () = Llama370MConfig::validate();` — a drift of any value fails `cargo build`, not just `cargo test` - The new YAML-vs-Rust binding test adds the missing half: drift of a YAML key that the Rust scaffold doesn't mirror is now also caught at test time, preventing the MODEL-1-v2 QLoRA class of recipe/artifact drift (rank=16 actual vs rank=32 recipe — see project_ship_two_001_model1_qlora_divergence.md) - INV-ARCH-370M-001 (param count band) is discharged by the existing `estimated_param_count_within_contract_band` test - INV-ARCH-370M-009 (row-major layout) is discharged by aprender::format::layout_contract at APR load time Combined MODEL-2 status after this commit: 3/12 AC-SHIP2 gates DISCHARGED (001, 011, 012). Remaining 9 (002–010) still block on actual 370M training compute-dispatch — the pretrain loop driver from v2.19.0 is ready to exercise them once the weights exist. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): FALSIFY-SHIP-012 algorithm-level PARTIAL discharge (AC-SHIP2-002) Bumps C-TOK-BPE to v1.1.0 and wires evidence_discharged_by into GATE-BPE-003 pointing at 3 existing harness tests in crates/apr-cli/tests/falsify_ship_012_tokenizer_roundtrip.rs and the emitted evidence JSON at evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json. Status intentionally stays PROPOSED. The gate requires 10K-doc byte-exact round-trip on The Stack v2 Python holdout; task #91 shipped the ingest scaffold (corpus-ingest dry-run CLI) but the 10K fixture itself is not yet materialized — so this lands as PARTIAL_ALGORITHM_LEVEL discharge with full_discharge_blocks_on: task #91 data. What passes algorithm-level today (all 3 tests green at commit time): - falsify_ship_012_tokenizer_roundtrip_byte_exact — decode(encode(nfc(doc))) byte-equals nfc(doc) on every doc in a 20-doc synthetic Python-like holdout (ASCII keywords + Unicode identifiers + docstrings + emoji + combining marks). Hard-asserts evidence.docs_failed == 0 — regressions reintroducing whitespace splitting or dropping the byte encoder panic. - falsify_ship_012_nfc_idempotence_only — INV-BPE-005 standalone: nfc(nfc(x)) byte-equals nfc(x) on every holdout doc. - falsify_ship_012_train_corpus_sanity — train/holdout set disjointness plus minimum corpus sizes (>=20 docs each). When task #91's 10K Stack-v2 Python holdout lands the fixture swap is data-only: the harness module doc-comment already flagged this path so no test rewrite will be required. Evidence: evidence/ship-two-001/model-2/falsify-ship-012-tokenizer-roundtrip.json (20/20 passed, nfc_idempotent: true, vocab_size_trained: 489/512). Verification: - pv validate contracts/tokenizer-bpe-v1.yaml -> 0 errors, 0 warnings - cargo test -p apr-cli --test falsify_ship_012_tokenizer_roundtrip -> 3/3 passed Bound to: AC-SHIP2-002 (ship-two-models-spec §5). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): FALSIFY-SHIP-015 algorithm-level PARTIAL discharge (AC-SHIP2-005) Bumps C-LLAMA-370M-SOVEREIGN v1.1.0 → v1.2.0 and wires evidence_discharged_by into GATE-ARCH-370M-003 (the param-count gate that binds AC-SHIP2-005 via FALSIFY-SHIP-015). Contract stays ACTIVE — the FALSIFY-SHIP-011 discharge (v1.1.0) is what gates the ACTIVE promotion, not SHIP-015. GATE-ARCH-370M-003's evidence_required asks for apr inspect --json model.apr | jq '.param_count' ∈ [366M, 374M] on a real 370M `.apr` checkpoint. That file does not exist yet — it blocks on AC-SHIP2-003/004 pretraining compute-dispatch. Rather than leave the gate's evidence blank, this commit wires the algorithm-level proof that already exists: - estimated_param_count() / estimated_stored_param_count() — const fn over Llama370MConfig::*, so the count is computed at compile time. - estimated_param_count_within_contract_band (unit test) hard-asserts: * p ∈ [PARAMETERS_MIN=366M, PARAMETERS_MAX=374M] (INV-ARCH-370M-001) * |p − 370M| / 370M < 5% (tighter sanity) * p − stored == VOCAB_SIZE × HIDDEN_DIM (tied embeddings) Any edit to Llama370MConfig that moves the count out of the INV-ARCH-370M-001 band fails `cargo test -p aprender-train --lib llama_370m` — before any compute runs. The gate now carries: discharge_status: PARTIAL_ALGORITHM_LEVEL full_discharge_blocks_on: "real 370M .apr checkpoint from pretraining compute-dispatch (AC-SHIP2-003/004)" ship_blocking: true so the data-scale gap is first-class contract state, not an unspoken assumption. Verification: - pv validate contracts/model-families/llama-370m-sovereign-v1.yaml -> 0 errors, 0 warnings - cargo test -p aprender-train --lib models::llama_370m -> 6/6 passed (including the newly-cited estimated_param_count_within_contract_band and the pre-existing falsify_ship_011_* pair) MODEL-2 AC-SHIP2 ledger after this: 3/12 fully ACTIVE (001, 011, 012) + 2/12 PARTIAL (002 via SHIP-012, 005 via SHIP-015) = 5/12 touched. Remaining 7 (003/004/006/007/008/009/010) block on 370M compute. Bound to: AC-SHIP2-005 (ship-two-models-spec §5). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(ship-two-001): spec v2.21.0 — FALSIFY-SHIP-011 DISCHARGED + SHIP-012/015 PARTIAL Captures the three evidence-wiring commits landed on chore/post-v2.19-evidence since v2.20.0: 1. FALSIFY-SHIP-011 (AC-SHIP2-001) DISCHARGED at 338c6eb (task #114) C-LLAMA-370M-SOVEREIGN v1.0.0 PROPOSED -> v1.1.0 ACTIVE. Rust-YAML byte-equality binding via include_str! + serde_yaml::Value. 2. FALSIFY-SHIP-012 (AC-SHIP2-002) PARTIAL_ALGORITHM_LEVEL at 2e8b8b8 (task #115). C-TOK-BPE v1.0.0 -> v1.1.0 stays PROPOSED. 3 tokenizer harness tests wired; full discharge blocks on task #91 10K Stack-v2 Python holdout (fixture-swap is data-only). 3. FALSIFY-SHIP-015 (AC-SHIP2-005) PARTIAL_ALGORITHM_LEVEL at bfb8831 (task #116). Sovereign contract v1.1.0 -> v1.2.0 stays ACTIVE. estimated_param_count_within_contract_band + const fns wired; full discharge blocks on real 370M .apr from compute-dispatch. Also codifies the PARTIAL_ALGORITHM_LEVEL pattern as a first-class spec concept: when a gate's evidence_required describes a production-scale check that is not yet runnable but the underlying invariant is provable today at algorithm/compile/unit-test level, wire the algorithm proofs and carry discharge_status + partial_discharge_note + full_discharge_blocks_on + ship_blocking=true to make the data gap first-class contract state. MODEL-2 ship-gate status after v2.21.0: 3/12 fully ACTIVE (001, 011, 012) + 2/12 PARTIAL_ALGORITHM_LEVEL (002, 005) = 5/12 touched (~42%). Remaining 7 block on real 370M compute-dispatch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(model-2): FALSIFY-SHIP-019 algorithm-level PARTIAL discharge (AC-SHIP2-009) GATE-ARCH-370M-004 gains evidence_discharged_by + discharge_status: PARTIAL_ALGORITHM_LEVEL. Three algorithm-level invariants wired without training: 1. Coverage — every 370M tensor (219 entries: 1 embed + 1 lm_head + 9 per-layer × 24 layers + 1 final norm) resolves to a TensorContract entry in LayoutContract::new(). Pattern-normalises per-layer names; any uncovered tensor would be silently skipped by GGUF export. 2. Row-major ordering (INV-ARCH-370M-009) — every 2D shape is [out_dim, in_dim]. Pinned lm_head/embed/q_proj/k_proj shapes verify GQA (k_proj = [kv_heads*head_dim, hidden]) and bind the 370M architecture to the GH-202-regression-proof layout. 3. Critical-tensor enforcement — validate_apr_shape accepts [vocab, hidden] AND rejects reversed [hidden, vocab] on lm_head.weight. Proves the validator catches layout bugs, not just passes silently. Full discharge (GGUF cosine-parity on trained 370M, max_logit_cosine ≤ 1e-3 over 100 canary prompts) blocks on compute-dispatch (AC-SHIP2-003/004). Harness is fixture-swap-ready once a trained .apr exists — no test rewrite needed. Spec §9 Risk #2 names this exact mitigation path. Contract: llama-370m-sovereign-v1.yaml v1.2.0 → v1.3.0, stays ACTIVE. Tests: 2 new test fns in crates/aprender-train/src/models/llama_370m.rs (8/8 pass). `pv validate` = 0 errors, 0 warnings. Closes #117. Binds to AC-SHIP2-009 / FALSIFY-SHIP-019. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(ship-two-001): v2.22.0 — FALSIFY-SHIP-019 PARTIAL discharge capstone Records the SHIP-019 algorithm-level PARTIAL discharge (task #117, commit 846cc1d) in the authoritative spec: - Version bump 2.21.0 → 2.22.0 - Full amendment block #4 under post-v2.19 evidence window documenting GATE-ARCH-370M-004 wired to `layout_contract.rs` algorithm proofs (219-tensor coverage + row-major ordering + GH-202 rejection) - New "counter-example hunting" pattern lesson: prior "exhausted PARTIAL levers" verdict was ~86% correct; re-running the 7-gate FALSIFY-SHIP survey with explicit counter-example hunting found exactly one genuine lever (SHIP-019). SHIP-017/018/020 need compute; SHIP-013/014/016 collapse into SHIP-011 wiring. - Combined MODEL-2 ledger: 3/12 fully ACTIVE + 3/12 PARTIAL = 6/12 touched (50%). Remaining 6 (003/004/006/007/008/010) all require real 370M compute, trained .apr + eval harness, or RTX 4090 wall-clock benchmark. Genuine algorithm-level PARTIAL harvesting for MODEL-2 is now exhausted. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(publish): mark 5 QA harness crates publish = false + document policy Evidence: aprender-qa-{cli,gen,runner,report,certify} have never been published to crates.io (verified against crates.io API 2026-04-19). They are reached through `apr qa` (the user-facing binary), not through `cargo add`, so marking them publish = false prevents accidental version-bump-with-no-publish drift across the workspace. Spec §A.12 rewritten from the stale "63 crates (49 published + 14 internal)" snapshot to the real 80-crate layout: 9 publish = false (4 benchmarks/xtask + 5 QA harness) plus 71 publishable. §A.12.1 codifies publishing policy: three opt-out categories (benchmarks, xtask, QA harness), and the rule that a v0.31.0-style release does NOT require cargo publish across all 80 crates — crates.io publish is selective (via cargo workspaces publish --from-git or cargo publish -p <name>), workspace-wide tag/release is not. Verified: cargo check --workspace clean after the flip. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(mcp-spec): refresh header — M1–M3 SHIPPED in v0.31.0, M4 in flight Five-whys on the stale 2026-04-17 draft status: 1. Why stale? Spec said "DRAFT (pre-implementation)" + target "v0.32.0" but M1–M3 actually shipped in v0.31.0 on 2026-04-19 (tag 62893da). 2. Why not refreshed? M1–M3 landed across multiple PRs without a spec-header refresh pass. 3. Why is that a problem? New contributors reading the spec think MCP is unshipped — contradicted by `cargo install aprender` already exposing `apr mcp` with 9 tools. 4. Root cause: spec headers are not on the release checklist. 5. Fix here: update status to ACTIVE, version to 1.2.0, delivery line to "v0.31.0 M1–M3 SHIPPED / M4 in flight (PRs #886-892)". No body changes — architecture/tool-surface/protocol sections are still accurate. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(publish): mark aprender-viz-ttop publish = false + 4th category Evidence: `aprender-viz-ttop` has never been published to crates.io (release workflow explicitly never invokes `cargo publish` for it). Its `description` field calls it a "Terminal Top: 10X better than btop" system monitor — ships as a binary subcommand inside the `apr` facade, not as a library dependency. Five-whys: 1. Why flip it? Because it's a bundled binary, not a library. 2. Why does that matter? `cargo add aprender-viz-ttop` would mislead library authors into taking a user-facing TUI as a dep. 3. Why wasn't it already flipped? It predated the A.12 policy audit performed in 42907db. 4. Why a 4th category? Benchmarks / xtask / QA harness all leave outputs as artifacts; this one ships a runnable subcommand. The distinction matters because `apr cbtop` dispatches to it. 5. Why document it? To prevent a future reader from re-opening the "publish all 80 crates" question when we only publish ~70. Changes: - crates/aprender-viz-ttop/Cargo.toml: add `publish = false` - docs/specifications/aprender-monorepo-consolidation.md: - §A.12: add viz-ttop to internal-crates table (10 rows) - §A.12.1: add 4th category (Bundled binaries); update total to "10 opted out / 70 publishable"; remove stale "Candidates to migrate" paragraph (superseded by 42907db + this commit) Refs: APR-MONO, PR #901 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(task-123): native Rust pretokenize CLI — close MODEL-2 corpus gap Root-cause fix for pretokenize-to-.bin gap that was blocking task #119 MODEL-2 370M real-compute pretrain smoke. User 2026-04-19 callout "why not fix root cause vs 'hack'" rejected the Python shim path. What ships (uncommitted WIP in `pretrain.rs`/`llama_370m.rs` left out): - `contracts/pretokenize-bin-v1.yaml` v1.0.0 PROPOSED * `pv validate` PASS (0 errors / 0 warnings) * GATE-PRETOK-003 ship-blocking round-trip gate gains `evidence_discharged_by` (4 tests) + `discharge_status: PARTIAL_ALGORITHM_LEVEL`. Full discharge still blocks on cross-host byte-identical test (task #119 lambda-labs dispatch). - `BPETokenizer::from_vocab_merges(vocab, merges, cfg)` loader (crates/aprender-train/src/tokenizer/bpe.rs) * Reads HEX-encoded vocab.json + merges.txt * Detects id collisions, rejects orphan merges * 2 new round-trip tests PASS - `apr tokenize encode-corpus` CLI subcommand (crates/apr-cli/src/commands/tokenize.rs::run_encode_corpus, crates/apr-cli/src/tokenize_commands.rs, crates/apr-cli/src/dispatch_analysis.rs) * Gated `#[cfg(feature = "training")]` * Writes `shard-NNNNN.bin` (u32 LE) + `manifest.json` (schema `pretokenize-bin-v1`) * Flags: --corpus --tokenizer --output --shard-tokens --content-field --normalization --eos-policy * EOS lookup order: `</s>`, `<|endoftext|>`, `<eos>`, `<|eos|>` * "between" policy fix: emit EOS BEFORE each doc except the first (N-1 separators for N docs) - `tests/pretokenize_shard_roundtrip.rs` * `cli_shard_layout_is_read_by_shard_batch_iter` — INV-PRETOK-002 + INV-PRETOK-007 * `multi_shard_names_preserve_order` — INV-PRETOK-004 - `evidence/ship-two-001/pretokenize-bin-v1-partial-discharge.json` documents algorithm-level partial discharge. Manual dogfood: 5-doc fixture → 78 tokens / 1 shard / 312 bytes / 4 EOS separators (N-1 for between-policy) / EOS id = 2 (`</s>`). Next session: wait on task #118 (50257-vocab tokenizer training, PID 2832743, 79min+) then run `apr tokenize encode-corpus` on CSN-Python train split and dispatch to lambda-labs RTX 4090. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Bumps criterion from 0.5.1 to 0.8.1.
Release notes
Sourced from criterion's releases.
Changelog
Sourced from criterion's changelog.
... (truncated)
Commits
e4e06dfchore: release v0.8.1aa548b9fix: Homepage link950c3b7fix: Typo7e3e50cchore(deps): bump crate-ci/typos from 1.23.5 to 1.40.0391a99achore(deps): bump jontze/action-mdbook from 3 to 48fb9a87chore(deps): bump actions/checkout from 4 to 6b49ade7chore: release v0.8.0c56485fdocs: Mark Master API Docs links that need to be updated86526a4docs: Remove Master API Docs link temporarily00a443fdocs: Update README linksYou can trigger a rebase of this PR by commenting
@dependabot rebase.Dependabot commands and options
You can trigger Dependabot actions by commenting on this PR:
@dependabot rebasewill rebase this PR@dependabot recreatewill recreate this PR, overwriting any edits that have been made to it@dependabot mergewill merge this PR after your CI passes on it@dependabot squash and mergewill squash and merge this PR after your CI passes on it@dependabot cancel mergewill cancel a previously requested merge and block automerging@dependabot reopenwill reopen this PR if it is closed@dependabot closewill close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually@dependabot show <dependency name> ignore conditionswill show all of the ignore conditions of the specified dependency@dependabot ignore this major versionwill close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this minor versionwill close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)@dependabot ignore this dependencywill close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)