feat(optim): Add Damped Newton optimizer with finite-difference Hessian#59
Merged
noahgift merged 1 commit intoNov 23, 2025
Conversation
Implement Newton's method with finite-difference Hessian approximation, Cholesky decomposition for linear system solving, and automatic fallback to steepest descent when Hessian is not positive definite. - DampedNewton optimizer with Hessian approximation via finite differences - Cholesky decomposition to solve H * d = -g linear system - Automatic fallback to steepest descent on non-PD Hessian - Descent direction checking (grad^T d < 0) - Backtracking line search for global convergence - Configurable finite difference epsilon (default: 1e-5) Key Features: - Quadratic convergence on convex quadratic problems - Graceful degradation: falls back to gradient descent when needed - Hessian symmetrization for numerical stability - 14 comprehensive tests covering all edge cases All 1097 tests pass (+14 new). Phase 1 batch optimizers complete: L-BFGS (12 tests), Conjugate Gradient (18 tests), Damped Newton (14 tests).
noahgift
added a commit
that referenced
this pull request
Mar 3, 2026
- apr train sweep: grid/random hyperparameter sweep config generation (#59) - apr train archive: checkpoint release bundle with BLAKE3 manifest (#85) - apr eval --task correlation: PPL-benchmark Pearson/Spearman analysis (#66) - apr eval --task human: human evaluation pipeline (generate + analyze) (#68) - apr encrypt/decrypt: BLAKE3-based model weight encryption at rest (#89) - apr train plan: comprehensive resource estimation (RAM, disk, time) (#95) All features pure Rust, sovereign stack compliant. Tested on: - sweep: 5 random configs from 350M base config - archive: 50M checkpoint → 238 MB bundle with MANIFEST.json - encrypt/decrypt: 238 MB roundtrip verified (MAC authenticated) - correlation: 236 data points from multi-checkpoint loss histories - human eval: generate 10-prompt sheet + analyze 5-rating test set - resource est: extended VRAM/RAM/disk/tokens/step-time/throughput Refs #118 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Merged
6 tasks
noahgift
added a commit
that referenced
this pull request
May 12, 2026
…IP-001/003/004/009/010 PARTIAL→LIVE-DISCHARGED (PMAT-CODE-SHIP-TWO-SECTION-72)
Closes 5 of the 6 algorithm-level PARTIALs left after §71 closed SHIP-005.
Only SHIP-007 (multi-PR CUDA cascade per §63) remains as a PARTIAL.
The cascade is EVIDENCE-ONLY — no code changes. Five ACs already had
falsifier tests at PARTIAL_ALGORITHM_LEVEL (`#[test]`s merged); they
just lacked LIVE-evidence runs on the canonical 7B Qwen2.5-Coder-
Instruct teacher.
Evidence captured (lambda-vector, RTX 4090, post-§71 main binary):
SHIP-001 apr run <safetensors> --prompt 'Hello' --max-tokens 4
→ exit 0, 62.55s load via realizar
SHIP-003 apr diff <safetensors> <q4k.apr> --values --filter weight
--limit 20 --transpose-aware
→ 20 tensors at cos_sim=1.000000 (floor 0.999)
SHIP-004 llama-cli -m <q4k.gguf> -p 'Hello' -n 8 -ngl 99 -st
→ exit 0, "Hello! How can I help you today",
133.1 gen tok/s, model 5580 MiB on RTX 4090
SHIP-009 apr inspect <q4k.apr>
→ license: Apache-2.0,
data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
SHIP-010 curl HF tree API + sha256sum on gx10 canonical teacher
→ 0a854098… == HF lfs.oid 0a854098…, 8035635524 bytes
§17.5 + AC-SHIP1 chain post-§72:
SHIP-001 LIVE-DISCHARGED ← §72
SHIP-002 LIVE-DISCHARGED (#1609 §61)
SHIP-003 LIVE-DISCHARGED ← §72
SHIP-004 LIVE-DISCHARGED ← §72
SHIP-005 LIVE-DISCHARGED (§71)
SHIP-006 LIVE-DISCHARGED (#1615 §61.8)
SHIP-007 PARTIAL — multi-PR CUDA cascade (§63)
SHIP-008 LIVE-DISCHARGED (#1614 §61)
SHIP-009 LIVE-DISCHARGED ← §72
SHIP-010 LIVE-DISCHARGED ← §72
9 of 10 AC-SHIP1-* LIVE-discharged.
Ship-% movement:
MODEL-1 ship %: 95% → 99% (5 algorithm-level PARTIALs → LIVE)
Path to 100% = SHIP-007 multi-PR CUDA cascade per §63:
Layer 1: cuBLASLt FP8 JIT warmup ILLEGAL_ADDRESS root fix
Layer 2: CUDA-vs-CPU parity (cosine -0.005 on Qwen 7B dims)
Layer 3: throughput 5.6 → 30 tok/s
Host: RTX 4090 / lambda-vector (gx10 is wrong arch)
MODEL-2 ship %: unchanged at 57%
Methodology lesson #19 NEW: algorithm-level falsifiers + small evidence
runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of
missing live evidence (not missing algorithm), batch-discharge in one
cascade rather than treating each as separate ship-row work. The 95→99%
jump is the highest-ROI move because the algorithms are already merged.
Spec v3.17.0 → v3.18.0.
Evidence:
- evidence/section-72-ship-live-cascade-2026-05-12/findings.json
- ship-001-apr-run-safetensors.txt (exit 0 + 62.55s load)
- ship-003-apr-diff-q4k-roundtrip.txt (20 tensors at cos_sim=1.000000)
- ship-004-llama-cli-stdout.txt (llama.cpp first-response on canonical GGUF)
- ship-009-apr-inspect.txt (license + provenance fields)
- ship-010-sha256-match.json + ship-010-hf-tree.json (sha256 match)
Refs:
- AC-SHIP1-001 through AC-SHIP1-010 (spec §5)
- §71 (SHIP-005 LIVE-DISCHARGED, predecessor)
- §63 (SHIP-007 multi-PR cascade scope)
- contracts/eval-harness-humaneval-v1.yaml + contracts/apr-publish-hf-large-file-v1.yaml + contracts/apr-provenance-v1.yaml (PARTIAL_ALGORITHM_LEVEL → LIVE-DISCHARGED)
Closes tasks #59-63.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift
added a commit
that referenced
this pull request
May 12, 2026
…IP-001/003/004/009/010 PARTIAL→LIVE-DISCHARGED (PMAT-CODE-SHIP-TWO-SECTION-72) (#1646) Closes 5 of the 6 algorithm-level PARTIALs left after §71 closed SHIP-005. Only SHIP-007 (multi-PR CUDA cascade per §63) remains as a PARTIAL. The cascade is EVIDENCE-ONLY — no code changes. Five ACs already had falsifier tests at PARTIAL_ALGORITHM_LEVEL (`#[test]`s merged); they just lacked LIVE-evidence runs on the canonical 7B Qwen2.5-Coder- Instruct teacher. Evidence captured (lambda-vector, RTX 4090, post-§71 main binary): SHIP-001 apr run <safetensors> --prompt 'Hello' --max-tokens 4 → exit 0, 62.55s load via realizar SHIP-003 apr diff <safetensors> <q4k.apr> --values --filter weight --limit 20 --transpose-aware → 20 tensors at cos_sim=1.000000 (floor 0.999) SHIP-004 llama-cli -m <q4k.gguf> -p 'Hello' -n 8 -ngl 99 -st → exit 0, "Hello! How can I help you today", 133.1 gen tok/s, model 5580 MiB on RTX 4090 SHIP-009 apr inspect <q4k.apr> → license: Apache-2.0, data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct SHIP-010 curl HF tree API + sha256sum on gx10 canonical teacher → 0a854098… == HF lfs.oid 0a854098…, 8035635524 bytes §17.5 + AC-SHIP1 chain post-§72: SHIP-001 LIVE-DISCHARGED ← §72 SHIP-002 LIVE-DISCHARGED (#1609 §61) SHIP-003 LIVE-DISCHARGED ← §72 SHIP-004 LIVE-DISCHARGED ← §72 SHIP-005 LIVE-DISCHARGED (§71) SHIP-006 LIVE-DISCHARGED (#1615 §61.8) SHIP-007 PARTIAL — multi-PR CUDA cascade (§63) SHIP-008 LIVE-DISCHARGED (#1614 §61) SHIP-009 LIVE-DISCHARGED ← §72 SHIP-010 LIVE-DISCHARGED ← §72 9 of 10 AC-SHIP1-* LIVE-discharged. Ship-% movement: MODEL-1 ship %: 95% → 99% (5 algorithm-level PARTIALs → LIVE) Path to 100% = SHIP-007 multi-PR CUDA cascade per §63: Layer 1: cuBLASLt FP8 JIT warmup ILLEGAL_ADDRESS root fix Layer 2: CUDA-vs-CPU parity (cosine -0.005 on Qwen 7B dims) Layer 3: throughput 5.6 → 30 tok/s Host: RTX 4090 / lambda-vector (gx10 is wrong arch) MODEL-2 ship %: unchanged at 57% Methodology lesson #19 NEW: algorithm-level falsifiers + small evidence runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of missing live evidence (not missing algorithm), batch-discharge in one cascade rather than treating each as separate ship-row work. The 95→99% jump is the highest-ROI move because the algorithms are already merged. Spec v3.17.0 → v3.18.0. Evidence: - evidence/section-72-ship-live-cascade-2026-05-12/findings.json - ship-001-apr-run-safetensors.txt (exit 0 + 62.55s load) - ship-003-apr-diff-q4k-roundtrip.txt (20 tensors at cos_sim=1.000000) - ship-004-llama-cli-stdout.txt (llama.cpp first-response on canonical GGUF) - ship-009-apr-inspect.txt (license + provenance fields) - ship-010-sha256-match.json + ship-010-hf-tree.json (sha256 match) Refs: - AC-SHIP1-001 through AC-SHIP1-010 (spec §5) - §71 (SHIP-005 LIVE-DISCHARGED, predecessor) - §63 (SHIP-007 multi-PR cascade scope) - contracts/eval-harness-humaneval-v1.yaml + contracts/apr-publish-hf-large-file-v1.yaml + contracts/apr-provenance-v1.yaml (PARTIAL_ALGORITHM_LEVEL → LIVE-DISCHARGED) Closes tasks #59-63. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implement Newton's method with finite-difference Hessian approximation, Cholesky decomposition for linear system solving, and automatic fallback to steepest descent when Hessian is not positive definite.
Key Features:
All 1097 tests pass (+14 new). Phase 1 batch optimizers complete: L-BFGS (12 tests), Conjugate Gradient (18 tests), Damped Newton (14 tests).