Skip to content

feat(optim): Implement ADMM for distributed and constrained optimization#66

Merged
noahgift merged 1 commit into
mainfrom
claude/research-optimization-techniques-01LWS5ZwqVEHQ13NbShwH7Ls
Nov 23, 2025
Merged

feat(optim): Implement ADMM for distributed and constrained optimization#66
noahgift merged 1 commit into
mainfrom
claude/research-optimization-techniques-01LWS5ZwqVEHQ13NbShwH7Ls

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Add Alternating Direction Method of Multipliers (ADMM) optimizer for distributed ML and constrained optimization problems.

Implementation (~700 lines total):

  • Core ADMM algorithm with consensus form (305 lines)
  • Adaptive penalty parameter (ρ) adjustment
  • Primal and dual residual tracking
  • 9 comprehensive unit tests (380 lines)

Features:

  • Consensus optimization: minimize f(x) + g(z) s.t. Ax + Bz = c
  • Adaptive ρ adjustment (Boyd et al. 2011)
  • O(1/k) convergence for convex problems
  • Flexible x-minimizer and z-minimizer closures

Applications:

  • Distributed Lasso regression
  • Federated learning (consensus optimization)
  • Equality-constrained problems via consensus
  • Model parallelism across devices

Tests (9 passing, 1 ignored):

  • Consensus form with simple quadratic
  • Lasso regression with L1 regularization
  • Box constraints via consensus (ignored - needs refinement)
  • Convergence tracking and residuals
  • Adaptive ρ parameter adjustment
  • Max iterations handling

Total test count: 1,165 tests (up from 1,156)

Related: Phase 2 convex optimization (spec v0.9.0) Reference: Boyd et al. (2011), "Distributed Optimization and Statistical Learning via ADMM", Foundations and Trends in ML, 3(1), 1-122

Add Alternating Direction Method of Multipliers (ADMM) optimizer for
distributed ML and constrained optimization problems.

Implementation (~700 lines total):
- Core ADMM algorithm with consensus form (305 lines)
- Adaptive penalty parameter (ρ) adjustment
- Primal and dual residual tracking
- 9 comprehensive unit tests (380 lines)

Features:
- Consensus optimization: minimize f(x) + g(z) s.t. Ax + Bz = c
- Adaptive ρ adjustment (Boyd et al. 2011)
- O(1/k) convergence for convex problems
- Flexible x-minimizer and z-minimizer closures

Applications:
- Distributed Lasso regression
- Federated learning (consensus optimization)
- Equality-constrained problems via consensus
- Model parallelism across devices

Tests (9 passing, 1 ignored):
- Consensus form with simple quadratic
- Lasso regression with L1 regularization
- Box constraints via consensus (ignored - needs refinement)
- Convergence tracking and residuals
- Adaptive ρ parameter adjustment
- Max iterations handling

Total test count: 1,165 tests (up from 1,156)

Related: Phase 2 convex optimization (spec v0.9.0)
Reference: Boyd et al. (2011), "Distributed Optimization and Statistical
Learning via ADMM", Foundations and Trends in ML, 3(1), 1-122
@noahgift noahgift merged commit 4b34859 into main Nov 23, 2025
4 of 11 checks passed
@noahgift noahgift deleted the claude/research-optimization-techniques-01LWS5ZwqVEHQ13NbShwH7Ls branch November 23, 2025 20:49
noahgift added a commit that referenced this pull request Mar 3, 2026
- apr train sweep: grid/random hyperparameter sweep config generation (#59)
- apr train archive: checkpoint release bundle with BLAKE3 manifest (#85)
- apr eval --task correlation: PPL-benchmark Pearson/Spearman analysis (#66)
- apr eval --task human: human evaluation pipeline (generate + analyze) (#68)
- apr encrypt/decrypt: BLAKE3-based model weight encryption at rest (#89)
- apr train plan: comprehensive resource estimation (RAM, disk, time) (#95)

All features pure Rust, sovereign stack compliant. Tested on:
- sweep: 5 random configs from 350M base config
- archive: 50M checkpoint → 238 MB bundle with MANIFEST.json
- encrypt/decrypt: 238 MB roundtrip verified (MAC authenticated)
- correlation: 236 data points from multi-checkpoint loss histories
- human eval: generate 10-prompt sheet + analyze 5-rating test set
- resource est: extended VRAM/RAM/disk/tokens/step-time/throughput

Refs #118

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 19, 2026
v0.31.1 was yanked from crates.io because `cargo install aprender@0.31.1`
panicked at aprender-mcp build.rs — `CARGO_MANIFEST_DIR/../../contracts/
apr-mcp-tool-schemas-v1.yaml` lives in the monorepo tree but NOT in the
published tarball (outside the package root), so every external install
hit a hard panic at build time.

Three-layer defense so this class never recurs:

1. **In-crate contract copy** — contracts/apr-mcp-tool-schemas-v1.yaml
   copied into crates/aprender-mcp/contracts/ and Cargo.toml `include`
   lists `contracts/*.yaml`. build.rs reads `CARGO_MANIFEST_DIR/
   contracts/…` (no `..` escape). Drift-guarded by new test
   `contract_copy_matches_workspace_root` which asserts byte-identity
   with the workspace-root copy in-tree, and skips when the workspace
   copy is absent (published-tarball mode).

2. **Static Poka-Yoke** — scripts/check_build_rs_paths.sh walks every
   git-tracked build.rs, flags any that join `".."` onto
   CARGO_MANIFEST_DIR AND `panic!`/`unwrap_or_else(…panic)` AND lack a
   `.exists()` guard or `ALLOW_ESCAPE` annotation. Reverting this fix
   locally makes the gate exit 1 with a remediation message.

3. **Wired into both gates** — `make tier3` runs the check pre-push,
   and `.github/workflows/ci.yml` `workspace-test` job runs it on every
   PR so a future build.rs path escape can't sneak back in.

Yank command (already executed):
  cargo yank --version 0.31.1 <each of 68 crates>  # all confirmed yanked

Next step after this lands: cut v0.31.2, re-publish all 68 crates at
40s/crate pacing (crates.io 30-per-10min window), verify `cargo install
aprender --version 0.31.2` actually installs this time.

Refs: task #64 (yank), #65 (fix), #66 (prevention)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request Apr 19, 2026
…rd (#910)

v0.31.1 was yanked from crates.io because `cargo install aprender@0.31.1`
panicked at aprender-mcp build.rs — `CARGO_MANIFEST_DIR/../../contracts/
apr-mcp-tool-schemas-v1.yaml` lives in the monorepo tree but NOT in the
published tarball (outside the package root), so every external install
hit a hard panic at build time.

Three-layer defense so this class never recurs:

1. **In-crate contract copy** — contracts/apr-mcp-tool-schemas-v1.yaml
   copied into crates/aprender-mcp/contracts/ and Cargo.toml `include`
   lists `contracts/*.yaml`. build.rs reads `CARGO_MANIFEST_DIR/
   contracts/…` (no `..` escape). Drift-guarded by new test
   `contract_copy_matches_workspace_root` which asserts byte-identity
   with the workspace-root copy in-tree, and skips when the workspace
   copy is absent (published-tarball mode).

2. **Static Poka-Yoke** — scripts/check_build_rs_paths.sh walks every
   git-tracked build.rs, flags any that join `".."` onto
   CARGO_MANIFEST_DIR AND `panic!`/`unwrap_or_else(…panic)` AND lack a
   `.exists()` guard or `ALLOW_ESCAPE` annotation. Reverting this fix
   locally makes the gate exit 1 with a remediation message.

3. **Wired into both gates** — `make tier3` runs the check pre-push,
   and `.github/workflows/ci.yml` `workspace-test` job runs it on every
   PR so a future build.rs path escape can't sneak back in.

Yank command (already executed):
  cargo yank --version 0.31.1 <each of 68 crates>  # all confirmed yanked

Next step after this lands: cut v0.31.2, re-publish all 68 crates at
40s/crate pacing (crates.io 30-per-10min window), verify `cargo install
aprender --version 0.31.2` actually installs this time.

Refs: task #64 (yank), #65 (fix), #66 (prevention)

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
noahgift added a commit that referenced this pull request May 12, 2026
* ci: re-enable sccache — unblocks 240-PR cold-compile bottleneck

The Phase 3 sccache pilot was disabled on 2026-04-19 because the
`sovereign-ci:stable` container image was missing the `rustc-sccache`
wrapper script. That was fixed upstream in paiml/infra commit f4fccf9
("use exec script not symlink", PR #66) the same day, but aprender's
ci.yml was never flipped back.

Verified on intel runner:

    $ docker run --rm localhost:5000/sovereign-ci:stable rustc-sccache --version
    sccache 0.14.0
    $ docker run --rm localhost:5000/sovereign-ci:stable which rustc-sccache
    /usr/local/bin/rustc-sccache

Sccache cache directory is warm: `/home/noah/data/sccache` is ~11GB
across 290 sub-dirs, shared across all 16 intel-clean-room runners and
all PRs via the existing `/home/noah/data/sccache:/sccache` bind-mount
in `paiml/.github/.github/workflows/sovereign-ci.yml`.

Why this matters:

- Per-PR target dir scheme (`/mnt/nvme-raid0/targets/aprender-ci/<PR>`)
  from #1043 cold-compiles each new PR's 879 deps from scratch.
- Job timing (PR #1619 latest run): 34min build + 4min tests = 40min
  timeout. Tests never finish.
- 249-PR queue × 34min cold compile = backlog cannot drain.
- With sccache hit-rate ≥80% expected on a warm cache, cold builds
  drop from 34min → ~3-5min, and the timeout becomes a non-issue.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* ci(workspace-test): wire sccache into the inline workspace-test job

The first commit on this branch flipped enable_sccache=true on the
reusable ci/{test,lint,coverage,...} jobs. That doesn't reach the
inline `workspace-test` job (the slowest one, where the 40min timeout
actually fires), so this commit wires sccache into it directly:

- Bind-mount /home/noah/data/sccache:/sccache (shared across all 16
  intel-clean-room runners + all PRs; sccache handles concurrency
  via per-entry atomic rename + LRU eviction).
- Set RUSTC_WRAPPER=rustc-sccache (image-baked exec shim) and
  SCCACHE_DIR=/sccache env vars.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants