Skip to content

feat(optim): Add Damped Newton optimizer with finite-difference Hessian#59

Merged
noahgift merged 1 commit into
mainfrom
claude/research-optimization-techniques-01LWS5ZwqVEHQ13NbShwH7Ls
Nov 23, 2025
Merged

feat(optim): Add Damped Newton optimizer with finite-difference Hessian#59
noahgift merged 1 commit into
mainfrom
claude/research-optimization-techniques-01LWS5ZwqVEHQ13NbShwH7Ls

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Implement Newton's method with finite-difference Hessian approximation, Cholesky decomposition for linear system solving, and automatic fallback to steepest descent when Hessian is not positive definite.

  • DampedNewton optimizer with Hessian approximation via finite differences
  • Cholesky decomposition to solve H * d = -g linear system
  • Automatic fallback to steepest descent on non-PD Hessian
  • Descent direction checking (grad^T d < 0)
  • Backtracking line search for global convergence
  • Configurable finite difference epsilon (default: 1e-5)

Key Features:

  • Quadratic convergence on convex quadratic problems
  • Graceful degradation: falls back to gradient descent when needed
  • Hessian symmetrization for numerical stability
  • 14 comprehensive tests covering all edge cases

All 1097 tests pass (+14 new). Phase 1 batch optimizers complete: L-BFGS (12 tests), Conjugate Gradient (18 tests), Damped Newton (14 tests).

Implement Newton's method with finite-difference Hessian approximation,
Cholesky decomposition for linear system solving, and automatic fallback
to steepest descent when Hessian is not positive definite.

- DampedNewton optimizer with Hessian approximation via finite differences
- Cholesky decomposition to solve H * d = -g linear system
- Automatic fallback to steepest descent on non-PD Hessian
- Descent direction checking (grad^T d < 0)
- Backtracking line search for global convergence
- Configurable finite difference epsilon (default: 1e-5)

Key Features:
- Quadratic convergence on convex quadratic problems
- Graceful degradation: falls back to gradient descent when needed
- Hessian symmetrization for numerical stability
- 14 comprehensive tests covering all edge cases

All 1097 tests pass (+14 new). Phase 1 batch optimizers complete:
L-BFGS (12 tests), Conjugate Gradient (18 tests), Damped Newton (14 tests).
@noahgift noahgift merged commit 5913413 into main Nov 23, 2025
5 of 11 checks passed
@noahgift noahgift deleted the claude/research-optimization-techniques-01LWS5ZwqVEHQ13NbShwH7Ls branch November 23, 2025 15:46
noahgift added a commit that referenced this pull request Mar 3, 2026
- apr train sweep: grid/random hyperparameter sweep config generation (#59)
- apr train archive: checkpoint release bundle with BLAKE3 manifest (#85)
- apr eval --task correlation: PPL-benchmark Pearson/Spearman analysis (#66)
- apr eval --task human: human evaluation pipeline (generate + analyze) (#68)
- apr encrypt/decrypt: BLAKE3-based model weight encryption at rest (#89)
- apr train plan: comprehensive resource estimation (RAM, disk, time) (#95)

All features pure Rust, sovereign stack compliant. Tested on:
- sweep: 5 random configs from 350M base config
- archive: 50M checkpoint → 238 MB bundle with MANIFEST.json
- encrypt/decrypt: 238 MB roundtrip verified (MAC authenticated)
- correlation: 236 data points from multi-checkpoint loss histories
- human eval: generate 10-prompt sheet + analyze 5-rating test set
- resource est: extended VRAM/RAM/disk/tokens/step-time/throughput

Refs #118

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 12, 2026
…IP-001/003/004/009/010 PARTIAL→LIVE-DISCHARGED (PMAT-CODE-SHIP-TWO-SECTION-72)

Closes 5 of the 6 algorithm-level PARTIALs left after §71 closed SHIP-005.
Only SHIP-007 (multi-PR CUDA cascade per §63) remains as a PARTIAL.

The cascade is EVIDENCE-ONLY — no code changes. Five ACs already had
falsifier tests at PARTIAL_ALGORITHM_LEVEL (`#[test]`s merged); they
just lacked LIVE-evidence runs on the canonical 7B Qwen2.5-Coder-
Instruct teacher.

Evidence captured (lambda-vector, RTX 4090, post-§71 main binary):

  SHIP-001  apr run <safetensors> --prompt 'Hello' --max-tokens 4
            → exit 0, 62.55s load via realizar
  SHIP-003  apr diff <safetensors> <q4k.apr> --values --filter weight
            --limit 20 --transpose-aware
            → 20 tensors at cos_sim=1.000000 (floor 0.999)
  SHIP-004  llama-cli -m <q4k.gguf> -p 'Hello' -n 8 -ngl 99 -st
            → exit 0, "Hello! How can I help you today",
              133.1 gen tok/s, model 5580 MiB on RTX 4090
  SHIP-009  apr inspect <q4k.apr>
            → license: Apache-2.0,
              data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
  SHIP-010  curl HF tree API + sha256sum on gx10 canonical teacher
            → 0a854098… == HF lfs.oid 0a854098…, 8035635524 bytes

§17.5 + AC-SHIP1 chain post-§72:

  SHIP-001  LIVE-DISCHARGED ← §72
  SHIP-002  LIVE-DISCHARGED (#1609 §61)
  SHIP-003  LIVE-DISCHARGED ← §72
  SHIP-004  LIVE-DISCHARGED ← §72
  SHIP-005  LIVE-DISCHARGED (§71)
  SHIP-006  LIVE-DISCHARGED (#1615 §61.8)
  SHIP-007  PARTIAL — multi-PR CUDA cascade (§63)
  SHIP-008  LIVE-DISCHARGED (#1614 §61)
  SHIP-009  LIVE-DISCHARGED ← §72
  SHIP-010  LIVE-DISCHARGED ← §72

9 of 10 AC-SHIP1-* LIVE-discharged.

Ship-% movement:
  MODEL-1 ship %: 95% → 99% (5 algorithm-level PARTIALs → LIVE)
  Path to 100% = SHIP-007 multi-PR CUDA cascade per §63:
    Layer 1: cuBLASLt FP8 JIT warmup ILLEGAL_ADDRESS root fix
    Layer 2: CUDA-vs-CPU parity (cosine -0.005 on Qwen 7B dims)
    Layer 3: throughput 5.6 → 30 tok/s
    Host: RTX 4090 / lambda-vector (gx10 is wrong arch)
  MODEL-2 ship %: unchanged at 57%

Methodology lesson #19 NEW: algorithm-level falsifiers + small evidence
runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of
missing live evidence (not missing algorithm), batch-discharge in one
cascade rather than treating each as separate ship-row work. The 95→99%
jump is the highest-ROI move because the algorithms are already merged.

Spec v3.17.0 → v3.18.0.

Evidence:
- evidence/section-72-ship-live-cascade-2026-05-12/findings.json
- ship-001-apr-run-safetensors.txt (exit 0 + 62.55s load)
- ship-003-apr-diff-q4k-roundtrip.txt (20 tensors at cos_sim=1.000000)
- ship-004-llama-cli-stdout.txt (llama.cpp first-response on canonical GGUF)
- ship-009-apr-inspect.txt (license + provenance fields)
- ship-010-sha256-match.json + ship-010-hf-tree.json (sha256 match)

Refs:
- AC-SHIP1-001 through AC-SHIP1-010 (spec §5)
- §71 (SHIP-005 LIVE-DISCHARGED, predecessor)
- §63 (SHIP-007 multi-PR cascade scope)
- contracts/eval-harness-humaneval-v1.yaml + contracts/apr-publish-hf-large-file-v1.yaml + contracts/apr-provenance-v1.yaml (PARTIAL_ALGORITHM_LEVEL → LIVE-DISCHARGED)

Closes tasks #59-63.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 12, 2026
…IP-001/003/004/009/010 PARTIAL→LIVE-DISCHARGED (PMAT-CODE-SHIP-TWO-SECTION-72) (#1646)

Closes 5 of the 6 algorithm-level PARTIALs left after §71 closed SHIP-005.
Only SHIP-007 (multi-PR CUDA cascade per §63) remains as a PARTIAL.

The cascade is EVIDENCE-ONLY — no code changes. Five ACs already had
falsifier tests at PARTIAL_ALGORITHM_LEVEL (`#[test]`s merged); they
just lacked LIVE-evidence runs on the canonical 7B Qwen2.5-Coder-
Instruct teacher.

Evidence captured (lambda-vector, RTX 4090, post-§71 main binary):

  SHIP-001  apr run <safetensors> --prompt 'Hello' --max-tokens 4
            → exit 0, 62.55s load via realizar
  SHIP-003  apr diff <safetensors> <q4k.apr> --values --filter weight
            --limit 20 --transpose-aware
            → 20 tensors at cos_sim=1.000000 (floor 0.999)
  SHIP-004  llama-cli -m <q4k.gguf> -p 'Hello' -n 8 -ngl 99 -st
            → exit 0, "Hello! How can I help you today",
              133.1 gen tok/s, model 5580 MiB on RTX 4090
  SHIP-009  apr inspect <q4k.apr>
            → license: Apache-2.0,
              data_source: huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct
  SHIP-010  curl HF tree API + sha256sum on gx10 canonical teacher
            → 0a854098… == HF lfs.oid 0a854098…, 8035635524 bytes

§17.5 + AC-SHIP1 chain post-§72:

  SHIP-001  LIVE-DISCHARGED ← §72
  SHIP-002  LIVE-DISCHARGED (#1609 §61)
  SHIP-003  LIVE-DISCHARGED ← §72
  SHIP-004  LIVE-DISCHARGED ← §72
  SHIP-005  LIVE-DISCHARGED (§71)
  SHIP-006  LIVE-DISCHARGED (#1615 §61.8)
  SHIP-007  PARTIAL — multi-PR CUDA cascade (§63)
  SHIP-008  LIVE-DISCHARGED (#1614 §61)
  SHIP-009  LIVE-DISCHARGED ← §72
  SHIP-010  LIVE-DISCHARGED ← §72

9 of 10 AC-SHIP1-* LIVE-discharged.

Ship-% movement:
  MODEL-1 ship %: 95% → 99% (5 algorithm-level PARTIALs → LIVE)
  Path to 100% = SHIP-007 multi-PR CUDA cascade per §63:
    Layer 1: cuBLASLt FP8 JIT warmup ILLEGAL_ADDRESS root fix
    Layer 2: CUDA-vs-CPU parity (cosine -0.005 on Qwen 7B dims)
    Layer 3: throughput 5.6 → 30 tok/s
    Host: RTX 4090 / lambda-vector (gx10 is wrong arch)
  MODEL-2 ship %: unchanged at 57%

Methodology lesson #19 NEW: algorithm-level falsifiers + small evidence
runs collapse PARTIAL→LIVE in batches. When ACs are PARTIAL because of
missing live evidence (not missing algorithm), batch-discharge in one
cascade rather than treating each as separate ship-row work. The 95→99%
jump is the highest-ROI move because the algorithms are already merged.

Spec v3.17.0 → v3.18.0.

Evidence:
- evidence/section-72-ship-live-cascade-2026-05-12/findings.json
- ship-001-apr-run-safetensors.txt (exit 0 + 62.55s load)
- ship-003-apr-diff-q4k-roundtrip.txt (20 tensors at cos_sim=1.000000)
- ship-004-llama-cli-stdout.txt (llama.cpp first-response on canonical GGUF)
- ship-009-apr-inspect.txt (license + provenance fields)
- ship-010-sha256-match.json + ship-010-hf-tree.json (sha256 match)

Refs:
- AC-SHIP1-001 through AC-SHIP1-010 (spec §5)
- §71 (SHIP-005 LIVE-DISCHARGED, predecessor)
- §63 (SHIP-007 multi-PR cascade scope)
- contracts/eval-harness-humaneval-v1.yaml + contracts/apr-publish-hf-large-file-v1.yaml + contracts/apr-provenance-v1.yaml (PARTIAL_ALGORITHM_LEVEL → LIVE-DISCHARGED)

Closes tasks #59-63.

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants