feat(optim): Add Damped Newton optimizer with finite-difference Hessian by noahgift · Pull Request #59 · paiml/aprender

noahgift · 2025-11-23T15:46:13Z

Implement Newton's method with finite-difference Hessian approximation, Cholesky decomposition for linear system solving, and automatic fallback to steepest descent when Hessian is not positive definite.

DampedNewton optimizer with Hessian approximation via finite differences
Cholesky decomposition to solve H * d = -g linear system
Automatic fallback to steepest descent on non-PD Hessian
Descent direction checking (grad^T d < 0)
Backtracking line search for global convergence
Configurable finite difference epsilon (default: 1e-5)

Key Features:

Quadratic convergence on convex quadratic problems
Graceful degradation: falls back to gradient descent when needed
Hessian symmetrization for numerical stability
14 comprehensive tests covering all edge cases

All 1097 tests pass (+14 new). Phase 1 batch optimizers complete: L-BFGS (12 tests), Conjugate Gradient (18 tests), Damped Newton (14 tests).

Implement Newton's method with finite-difference Hessian approximation, Cholesky decomposition for linear system solving, and automatic fallback to steepest descent when Hessian is not positive definite. - DampedNewton optimizer with Hessian approximation via finite differences - Cholesky decomposition to solve H * d = -g linear system - Automatic fallback to steepest descent on non-PD Hessian - Descent direction checking (grad^T d < 0) - Backtracking line search for global convergence - Configurable finite difference epsilon (default: 1e-5) Key Features: - Quadratic convergence on convex quadratic problems - Graceful degradation: falls back to gradient descent when needed - Hessian symmetrization for numerical stability - 14 comprehensive tests covering all edge cases All 1097 tests pass (+14 new). Phase 1 batch optimizers complete: L-BFGS (12 tests), Conjugate Gradient (18 tests), Damped Newton (14 tests).

- apr train sweep: grid/random hyperparameter sweep config generation (#59) - apr train archive: checkpoint release bundle with BLAKE3 manifest (#85) - apr eval --task correlation: PPL-benchmark Pearson/Spearman analysis (#66) - apr eval --task human: human evaluation pipeline (generate + analyze) (#68) - apr encrypt/decrypt: BLAKE3-based model weight encryption at rest (#89) - apr train plan: comprehensive resource estimation (RAM, disk, time) (#95) All features pure Rust, sovereign stack compliant. Tested on: - sweep: 5 random configs from 350M base config - archive: 50M checkpoint → 238 MB bundle with MANIFEST.json - encrypt/decrypt: 238 MB roundtrip verified (MAC authenticated) - correlation: 236 data points from multi-checkpoint loss histories - human eval: generate 10-prompt sheet + analyze 5-rating test set - resource est: extended VRAM/RAM/disk/tokens/step-time/throughput Refs #118 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…IP-001/003/004/009/010 PARTIAL→LIVE-DISCHARGED (PMAT-CODE-SHIP-TWO-SECTION-72) Closes 5 of the 6 algorithm-level PARTIALs left after §71 closed SHIP-005. Only SHIP-007 (multi-PR CUDA cascade per §63) remains as a PARTIAL. The cascade is EVIDENCE-ONLY — no code changes. Five ACs already had falsifier tests at PARTIAL_ALGORITHM_LEVEL (`#[test]`s merged); they just lacked LIVE-evidence runs on the canonical 7B Qwen2.5-Coder- Instruct teacher. Evidence captured (lambda-vector, RTX 4090, post-§71 main binary): SHIP-001 apr run <safetensors> --prompt 'Hello' --max-tokens 4 → exit 0, 62.55s load via realizar SHIP-003 apr diff <safetensors> <q4k.apr> --values --filter weight --limit 20 --transpose-aware → 20 tensors at cos_sim=1.000000 (floor 0.999) SHIP-004 llama-cli -m <q4k.gguf> -p 'Hello' -n 8 -ngl 99 -st → exit 0, "Hello! How can I help you today", 133.1 gen tok/s, model 5580 MiB on RTX 4090 SHIP-009 apr inspect <q4k.apr> → license: Apache-2.0, data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct SHIP-010 curl HF tree API + sha256sum on gx10 canonical teacher → 0a854098… == HF lfs.oid 0a854098…, 8035635524 bytes §17.5 + AC-SHIP1 chain post-§72: SHIP-001 LIVE-DISCHARGED ← §72 SHIP-002 LIVE-DISCHARGED (#1609 §61) SHIP-003 LIVE-DISCHARGED ← §72 SHIP-004 LIVE-DISCHARGED ← §72 SHIP-005 LIVE-DISCHARGED (§71) SHIP-006 LIVE-DISCHARGED (#1615 §61.8) SHIP-007 PARTIAL — multi-PR CUDA cascade (§63) SHIP-008 LIVE-DISCHARGED (#1614 §61) SHIP-009 LIVE-DISCHARGED ← §72 SHIP-010 LIVE-DISCHARGED ← §72 9 of 10 AC-SHIP1-* LIVE-discharged. Ship-% movement: MODEL-1 ship %: 95% → 99% (5 algorithm-level PARTIALs → LIVE) Path to 100% = SHIP-007 multi-PR CUDA cascade per §63: Layer 1: cuBLASLt FP8 JIT warmup ILLEGAL_ADDRESS root fix Layer 2: CUDA-vs-CPU parity (cosine -0.005 on Qwen 7B dims) Layer 3: throughput 5.6 → 30 tok/s Host: RTX 4090 / lambda-vector (gx10 is wrong arch) MODEL-2 ship %: unchanged at 57% Methodology lesson #19 NEW: algorithm-level falsifiers + small evidence runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of missing live evidence (not missing algorithm), batch-discharge in one cascade rather than treating each as separate ship-row work. The 95→99% jump is the highest-ROI move because the algorithms are already merged. Spec v3.17.0 → v3.18.0. Evidence: - evidence/section-72-ship-live-cascade-2026-05-12/findings.json - ship-001-apr-run-safetensors.txt (exit 0 + 62.55s load) - ship-003-apr-diff-q4k-roundtrip.txt (20 tensors at cos_sim=1.000000) - ship-004-llama-cli-stdout.txt (llama.cpp first-response on canonical GGUF) - ship-009-apr-inspect.txt (license + provenance fields) - ship-010-sha256-match.json + ship-010-hf-tree.json (sha256 match) Refs: - AC-SHIP1-001 through AC-SHIP1-010 (spec §5) - §71 (SHIP-005 LIVE-DISCHARGED, predecessor) - §63 (SHIP-007 multi-PR cascade scope) - contracts/eval-harness-humaneval-v1.yaml + contracts/apr-publish-hf-large-file-v1.yaml + contracts/apr-provenance-v1.yaml (PARTIAL_ALGORITHM_LEVEL → LIVE-DISCHARGED) Closes tasks #59-63. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…IP-001/003/004/009/010 PARTIAL→LIVE-DISCHARGED (PMAT-CODE-SHIP-TWO-SECTION-72) (#1646) Closes 5 of the 6 algorithm-level PARTIALs left after §71 closed SHIP-005. Only SHIP-007 (multi-PR CUDA cascade per §63) remains as a PARTIAL. The cascade is EVIDENCE-ONLY — no code changes. Five ACs already had falsifier tests at PARTIAL_ALGORITHM_LEVEL (`#[test]`s merged); they just lacked LIVE-evidence runs on the canonical 7B Qwen2.5-Coder- Instruct teacher. Evidence captured (lambda-vector, RTX 4090, post-§71 main binary): SHIP-001 apr run <safetensors> --prompt 'Hello' --max-tokens 4 → exit 0, 62.55s load via realizar SHIP-003 apr diff <safetensors> <q4k.apr> --values --filter weight --limit 20 --transpose-aware → 20 tensors at cos_sim=1.000000 (floor 0.999) SHIP-004 llama-cli -m <q4k.gguf> -p 'Hello' -n 8 -ngl 99 -st → exit 0, "Hello! How can I help you today", 133.1 gen tok/s, model 5580 MiB on RTX 4090 SHIP-009 apr inspect <q4k.apr> → license: Apache-2.0, data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct SHIP-010 curl HF tree API + sha256sum on gx10 canonical teacher → 0a854098… == HF lfs.oid 0a854098…, 8035635524 bytes §17.5 + AC-SHIP1 chain post-§72: SHIP-001 LIVE-DISCHARGED ← §72 SHIP-002 LIVE-DISCHARGED (#1609 §61) SHIP-003 LIVE-DISCHARGED ← §72 SHIP-004 LIVE-DISCHARGED ← §72 SHIP-005 LIVE-DISCHARGED (§71) SHIP-006 LIVE-DISCHARGED (#1615 §61.8) SHIP-007 PARTIAL — multi-PR CUDA cascade (§63) SHIP-008 LIVE-DISCHARGED (#1614 §61) SHIP-009 LIVE-DISCHARGED ← §72 SHIP-010 LIVE-DISCHARGED ← §72 9 of 10 AC-SHIP1-* LIVE-discharged. Ship-% movement: MODEL-1 ship %: 95% → 99% (5 algorithm-level PARTIALs → LIVE) Path to 100% = SHIP-007 multi-PR CUDA cascade per §63: Layer 1: cuBLASLt FP8 JIT warmup ILLEGAL_ADDRESS root fix Layer 2: CUDA-vs-CPU parity (cosine -0.005 on Qwen 7B dims) Layer 3: throughput 5.6 → 30 tok/s Host: RTX 4090 / lambda-vector (gx10 is wrong arch) MODEL-2 ship %: unchanged at 57% Methodology lesson #19 NEW: algorithm-level falsifiers + small evidence runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of missing live evidence (not missing algorithm), batch-discharge in one cascade rather than treating each as separate ship-row work. The 95→99% jump is the highest-ROI move because the algorithms are already merged. Spec v3.17.0 → v3.18.0. Evidence: - evidence/section-72-ship-live-cascade-2026-05-12/findings.json - ship-001-apr-run-safetensors.txt (exit 0 + 62.55s load) - ship-003-apr-diff-q4k-roundtrip.txt (20 tensors at cos_sim=1.000000) - ship-004-llama-cli-stdout.txt (llama.cpp first-response on canonical GGUF) - ship-009-apr-inspect.txt (license + provenance fields) - ship-010-sha256-match.json + ship-010-hf-tree.json (sha256 match) Refs: - AC-SHIP1-001 through AC-SHIP1-010 (spec §5) - §71 (SHIP-005 LIVE-DISCHARGED, predecessor) - §63 (SHIP-007 multi-PR cascade scope) - contracts/eval-harness-humaneval-v1.yaml + contracts/apr-publish-hf-large-file-v1.yaml + contracts/apr-provenance-v1.yaml (PARTIAL_ALGORITHM_LEVEL → LIVE-DISCHARGED) Closes tasks #59-63. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift merged commit 5913413 into main Nov 23, 2025
5 of 11 checks passed

noahgift deleted the claude/research-optimization-techniques-01LWS5ZwqVEHQ13NbShwH7Ls branch November 23, 2025 15:46

noahgift mentioned this pull request Apr 18, 2026

feat(ship-two-001): SPEC v2.19.0 — teacher shipped + MODEL-2 scaffold + pre-upload gates #882

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(optim): Add Damped Newton optimizer with finite-difference Hessian#59

feat(optim): Add Damped Newton optimizer with finite-difference Hessian#59
noahgift merged 1 commit into
mainfrom
claude/research-optimization-techniques-01LWS5ZwqVEHQ13NbShwH7Ls

noahgift commented Nov 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

noahgift commented Nov 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants