Skip to content

Add ML Fundamentals Section to EXTREME TDD Book - Theory Through Verification #7

@noahgift

Description

@noahgift

Problem Statement

The current EXTREME TDD book (https://paiml.github.io/aprender/) has useful testing methodology content (65% of total), but VERY LITTLE machine learning theory (only 10%). The book is titled "Machine Learning" but lacks ML fundamentals.

Current Content Breakdown (3,842 total lines):

  • Testing Methodology: 2,498 lines (65%)
  • ML Content: 384 lines (10%)
  • Placeholders: 960 lines (25%)

Proposed Solution

Add comprehensive "Machine Learning Fundamentals" section with Theory Through Verification approach:

  1. 15-20 new ML theory chapters (3,000-4,000 lines)
  2. TDD Harness Enforcement (ruchy-book pattern) - ALL examples must compile and pass tests
  3. Property Tests - Every mathematical equation must be verified in code
  4. One-Piece Flow - Write theory + case study simultaneously (Toyota Way)

Target Content Balance:

  • Testing Methodology: 40%
  • ML Theory: 40%
  • Examples: 20%

Technical Approach

TDD Harness (Critical - Poka-Yoke)

All book examples will be validated via tests/book/ structure:

```rust
// tests/book/ml_fundamentals/linear_regression_theory.rs
#[test]
fn test_linear_regression_closed_form() {
// Property test verifying OLS closed form solution
// This test is referenced in book chapter
}
```

Each chapter will have doc status blocks:

```markdown

Chapter Status: ✅ 100% Working (3/3 examples)

Status Count Examples
✅ Working 3 Linear regression tests passing
⚠️ Not Implemented 0 -
❌ Broken 0 -

Last tested: 2025-11-19
Aprender version: 0.3.0
Test file: tests/book/linear_regression_theory.rs

```

CI will fail if book examples don't compile (Jidoka - built-in quality).

Phase 1: Foundation (BOOK-001)

  • Create book/src/ml-fundamentals/ directory structure
  • Implement TDD harness (tests/book/ structure) ← CRITICAL
  • Update SUMMARY.md with new section
  • Set up CI validation for book examples
  • Create chapter template with verification focus

Phase 2: Core Theory + Case Studies (One-Piece Flow)

Priority 1 Pairs (write theory + case study together):

  1. Linear Regression Theory + Case Study
  2. Regularization Theory + Case Study: Regularized Regression
  3. Regression Metrics Theory + Case Study: Boston Housing
  4. Logistic Regression Theory + Case Study: Logistic Regression
  5. Classification Metrics Theory + Case Study: Decision Tree Iris
  6. Cross-Validation Theory + Case Study: Cross-Validation

Per Pair Process:

  1. Write theory chapter with verification focus
  2. Write case study simultaneously
  3. Extract code examples from BOTH
  4. Create test files for both
  5. Add Property Tests that prove the math
  6. Validate all examples work
  7. Add doc status blocks
  8. CI validation

Phase 3: Integration and Quality Review

  • Full book test pass (100% examples working)
  • Update overall doc status dashboard
  • Verify all 10 peer-reviewed citations integrated
  • Book deployment validation
  • Final quality review

Success Criteria

  • ✅ TDD harness prevents all hallucinated code (CI fails on broken examples)
  • ✅ Every theoretical equation has a Property Test proving it
  • ✅ 40% ML theory, 40% Testing, 20% Examples
  • ✅ 100% of book examples compile and pass tests
  • ✅ All 10 peer-reviewed citations integrated
  • ✅ Book live at https://paiml.github.io/aprender/

References

  • Specification: docs/specifications/initial-book-spec.md (17,000+ words)
  • TDD Harness Pattern: /home/noah/src/ruchy-book (reference implementation)
  • Current Book: https://paiml.github.io/aprender/
  • 10 Peer-Reviewed Citations: Parnas (2011), Sculley (2015), Tibshirani (1996), Zou & Hastie (2005), Cox (1958), Breiman (2001), Arthur & Vassilvitskii (2007), Kingma & Ba (2014), Kohavi (1995), Powers (2011)

Toyota Way Principles Applied

  • Jidoka (Built-in Quality): TDD harness prevents defects from propagating
  • Poka-Yoke (Error Proofing): CI fails if examples don't compile
  • One-Piece Flow: Theory + case study written together (no batch waste)
  • Kaizen (Continuous Improvement): Property Tests verify mathematical correctness

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions