feat(rosetta): OLMo + StableLM + GPTBigCode families (closes #1591, #1592, #1594) by noahgift · Pull Request #1662 · paiml/aprender

noahgift · 2026-05-13T16:15:12Z

Summary

Closes #1591 (OLMo), #1592 (StableLM), #1594 (GPTBigCode) in a single PR. Three Llama- / GPT-2-derivative families share an existing `Architecture` variant with their parent — none need a new variant or a custom tensor mapper.

Engine change (single function)

`tensor_expectation.rs::from_model_type`:

// StarCoder + GPTBigCode reuse GPT-2 tensor naming
"starcoder" | "starcoder2" | "bigcode"
| "gpt_bigcode" | "gpt-bigcode"
=> Some(Self::Gpt2),
```

YAMLs

`contracts/model-families/olmo.yaml` — OLMo 1B / 7B + OLMo-2 7B / 13B
`contracts/model-families/stablelm.yaml` — 1.6B / 3B / Zephyr-3B
`contracts/model-families/gpt_bigcode.yaml` — tiny_starcoder_py (164M) / SantaCoder (1.1B) / StarCoder1 (15.5B)

Rationale

OLMo / OLMo-2 reuse `LlamaForCausalLM` tensor naming. Norm-type variation (non-parametric LN in early OLMo `_hf` vs RMSNorm in OLMo-2) is runtime-only.
StableLM uses `LlamaForCausalLM` names. Partial-RoPE and head-bias toggles are per-checkpoint runtime settings, not name changes.
GPTBigCode uses GPT-2 Conv1D layout with MQA (single K/V head). MQA affects cache shape + dispatch, not tensor-name resolution — the Gpt2 mapper handles names.

Test plan

`pv validate` clean on all three YAMLs
FALSIFY-PARITY-002 (`test_every_model_family_yaml_has_architecture`) passes
CI: workspace-test

🤖 Generated with Claude Code

…w-major (was [K,N]); MODEL-1 → 100% (PMAT-CODE-SHIP-007-F32-GEMV-LAYOUT-FIX) §74 localized the SHIP-007 PARITY-GATE bug to f32_gemv_into via PR-B's stage-bisection scaffold (CPU vs GPU per-stage statistics analysis). The F32 GEMV PTX kernel was reading weights with TRANSPOSED layout interpretation: Bug: kernel assumed A is K-rows × N-cols row-major (A[i,j] at i*N+j), but actual ML weights are stored [output_dim=N, input_dim=K] row-major (A[i,j] at i*K+j per PyTorch/SafeTensors/GGUF convention and PMAT-333 F32 dequantization output). Symptom: GPU read transposed weights → computed y = A^T @ x instead of y = A @ x → systematically anti-correlated logits (cos=-0.005190 vs CPU, top-10 divergences all sign-flipped, CPU mean=-2.42 vs GPU mean=0.013). Fix: rewrite the inner loop to iterate along the K dimension within row block_id: row_base = a_ptr + block_id * K * 4 thread reads A[block_id, t], A[block_id, t+32], ... instead of: col_base = a_ptr + block_id * 4 thread reads A[t, block_id], A[t+32, block_id], ... Empirical discharge (canonical 7B teacher, lambda-vector RTX 4090, default graphed path): PARITY-GATE: PASS (no error from forward_gpu_resident) Throughput @ 128-tok 5-iter decode: 124.6 tok/s AC-SHIP1-007 floor: 30 tok/s Headroom: 4.15× over floor TTFT: 8.39 ms p50 latency: 1016 ms Before PR-E: PARITY-GATE FAILED cos=-0.005190 Throughput (with SKIP_PARITY_GATE=1 + SKIP_FP8_WARMUP=1): 5.6 tok/s (§63) / 54.5 tok/s (§73) GPU CANNOT serve this model After PR-E: PARITY-GATE PASS, default path, NO workarounds 124.6 tok/s, 4.15× over floor Ship-% impact: MODEL-1 ship %: **99% → 100%** 10 of 10 AC-SHIP1-* LIVE-DISCHARGED: SHIP-001 (§72) SHIP-002 (§61) SHIP-003 (§72) SHIP-004 (§72) SHIP-005 (§71) SHIP-006 (§61.8) SHIP-007 (this PR) SHIP-008 (§61) SHIP-009 (§72) SHIP-010 (§72) MODEL-2 ship %: unchanged at 57% (independent track). Cascade arc closeout: §63 → §73 → PR-A (#1648) → PR-B (#1649) → §74 (#1650) → PR-E (this). One PR shipped in 1 day after §73's '3-5 PR / 3-5 day' estimate. Auxiliary change: logits.rs adds APR_LM_HEAD_FORCE_QTYPE env-var probe kept as a diagnostic tool (zero behavior change when unset). Test plan: - [x] cargo build --release -p apr-cli --bin apr --features cuda → clean - [x] apr bench (default path, 128-tok 5-iter) → 124.6 tok/s, passed: true - [x] apr parity → PARITY-GATE PASS - [ ] CI tests (workspace-test on per-PR runner) Refs: - §74 SHIP-007 bug localized (PR #1650) - §73 SHIP-007 cascade reduction (PR #1647) - contracts/apr-ship-007-gpu-stage-bisection-v1.yaml (PR-A #1648 contract) - PR #1649 (PR-B GPU stage dump scaffold) - AC-SHIP1-007 (spec §5) - evidence/section-75-ship-007-discharged-2026-05-13/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…07 contract violation (PMAT-CODE-SHIP-007-PR-E-FALSIFY-007-CLEAN) The env-var bisection probe added in PR-E (this branch) introduced a `_ =>` catch-all inside a `match` expression that referenced `WeightQuantType` in its arm values. The `falsify_007_no_catch_all_ in_dispatch_sites` contract test's 30-line walk-back heuristic flagged this as a violation, even though the match was on `&str` (env var value), not on `WeightQuantType`. The probe was a bisection tool used to identify the bug location during §74. Now that §75 has shipped the actual fix and the probe is no longer needed, removing it cleans up the contract violation. The remaining PR-E change is solely the F32 GEMV PTX kernel layout fix in `crates/aprender-gpu/src/kernels/gemv/mod.rs` — that's the actual bug fix. Test verified: cargo test -p aprender-serve --lib \ quantize::contract_tests::tests::falsify_007_no_catch_all_in_dispatch_sites → 1 passed Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…loses #1591, #1592, #1594) Three Llama-derivative / GPT-2-derivative families share an `Architecture` variant with their parent — none need a new variant or a custom tensor mapper. Engine change is a single match arm extension in `from_model_type`: - OLMo / OLMo-2 (allenai/OLMo*) → `Architecture::Llama` - StableLM (stabilityai/stablelm*) → `Architecture::Llama` - GPTBigCode (StarCoder1 / SantaCoder / tiny_starcoder_py) → `Architecture::Gpt2` OLMo and OLMo-2 share `LlamaForCausalLM` tensor naming. StableLM likewise — partial-RoPE and per-checkpoint norm variation are runtime concerns, not tensor-name concerns. GPTBigCode uses GPT-2 Conv1D layout with Multi-Query Attention (single shared K/V head); MQA semantics affect cache shape and inference dispatch but not tensor-name resolution, so the Gpt2 mapper handles names. Three YAMLs added: - `contracts/model-families/olmo.yaml` (1B / 7B / OLMo-2 7B / OLMo-2 13B) - `contracts/model-families/stablelm.yaml` (1.6B / 3B / Zephyr-3B) - `contracts/model-families/gpt_bigcode.yaml` (tiny / SantaCoder / StarCoder1 15.5B) `from_model_type` extended: - `"olmo" | "olmo2" | "stablelm" | "stablelm_epoch" | "stablelm_alpha"` → `Self::Llama` (joins existing smollm / granite / nemotron list) - `"gpt_bigcode" | "gpt-bigcode"` → `Self::Gpt2` (joins existing starcoder / starcoder2 / bigcode list) Verified: - `pv validate` clean on all three YAMLs - FALSIFY-PARITY-002 (`test_every_model_family_yaml_has_architecture`) passes

…tives

noahgift and others added 11 commits May 13, 2026 09:33

Merge branch 'main' into fix/ship-007-pr-e-f32-gemv-layout

413971c

Merge branch 'main' into fix/ship-007-pr-e-f32-gemv-layout

cbaf08a

Merge branch 'main' into fix/ship-007-pr-e-f32-gemv-layout

ce8be77

Merge branch 'main' into fix/ship-007-pr-e-f32-gemv-layout

5a8cd3c

Merge branch 'main' into fix/ship-007-pr-e-f32-gemv-layout

8a0f0f6

ci: trigger fresh workflow run for flake-class test re-execution

476eeaf

Merge branch 'main' into fix/ship-007-pr-e-f32-gemv-layout

5e56a10

Merge branch 'main' into fix/ship-007-pr-e-f32-gemv-layout

0a5fe58

noahgift enabled auto-merge (squash) May 13, 2026 16:15

noahgift added 3 commits May 13, 2026 18:16

Merge branch 'main' into fix/1591-1592-1594-rosetta-llama-gpt2-deriva…

7200df4

…tives

Merge branch 'main' into fix/1591-1592-1594-rosetta-llama-gpt2-deriva…

5d0f30e

…tives

Merge branch 'main' into fix/1591-1592-1594-rosetta-llama-gpt2-deriva…

23e0a3a

…tives

noahgift merged commit a1d8abd into main May 13, 2026
10 checks passed

noahgift deleted the fix/1591-1592-1594-rosetta-llama-gpt2-derivatives branch May 13, 2026 21:02

This was referenced May 14, 2026

feat: add StableLM (StableLmForCausalLM) loader to aprender::rosetta #1592

Closed

feat: add GPTBigCode (GPTBigCodeForCausalLM) loader to aprender::rosetta — covers tiny_starcoder_py #1594

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rosetta): OLMo + StableLM + GPTBigCode families (closes #1591, #1592, #1594)#1662

feat(rosetta): OLMo + StableLM + GPTBigCode families (closes #1591, #1592, #1594)#1662
noahgift merged 14 commits into
mainfrom
fix/1591-1592-1594-rosetta-llama-gpt2-derivatives

noahgift commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 13, 2026

Summary

Engine change (single function)

YAMLs

Rationale

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant