feat: 3-knob HTTP wire-up — operator-actionable sampling/penalty via apr code env vars by noahgift · Pull Request #1846 · paiml/aprender

noahgift · 2026-05-20T11:16:50Z

Summary

Closes the gap between the 3-knob implementation (sampling #1842 + rep-penalty #1844) and the HTTP chat-completions interface. Without this PR, the new `QuantizedGenerateConfig` fields (top_k, top_p, repeat_penalty, repeat_last_n, seed) were silently hardcoded to defaults in `try_qwen3_moe_backend` — the impl was on main but unreachable from the HTTP path.

Changes

`ChatCompletionRequest` extension (`mod_create_demo.rs`)

5 new optional fields (aprender extensions to the OpenAI schema; all `#[serde(default)]` so existing clients unaffected):

`top_k: Option`
`repeat_penalty: Option`
`repeat_last_n: Option`
`seed: Option`
(`top_p` already existed)

`try_qwen3_moe_backend` (`cuda_chat_backend.rs`)

Thread all 5 new fields from HTTP request into `QuantizedGenerateConfig`. Fallback to `QuantizedGenerateConfig::default()` when unset (greedy decoding).

`AprServeDriver::build_openai_body` (`apr_serve.rs`)

Read 6 env vars + include in HTTP body when set:

`APR_AGENT_TEMPERATURE` (overrides `CompletionRequest.temperature`)
`APR_AGENT_TOP_K`, `APR_AGENT_TOP_P`
`APR_AGENT_REPEAT_PENALTY`, `APR_AGENT_REPEAT_LAST_N`
`APR_AGENT_SEED`

Operator dispatch (post-merge)

APR_AGENT_TEMPERATURE=0.3 APR_AGENT_TOP_K=50 APR_AGENT_TOP_P=0.95 \\
APR_AGENT_REPEAT_PENALTY=1.2 APR_AGENT_REPEAT_LAST_N=64 \\
bash scripts/phase-6-bench.sh

Per paiml/claude-code-parity-apr M288 `v1004-3knob-dispatch-recipe-2026-05-20.md`.

What this is NOT

NOT new contract gates — V1_001..V1_004 are already discharged. PURE PLUMBING.
NOT companion-side bench changes — companion bench just inherits env from operator's shell.

Test plan

`cargo check -p aprender-serve --lib --features cuda` — clean
`cargo check -p aprender-orchestrate` — clean
CI

🤖 Generated with Claude Code

…apr code Closes the gap between the 3-knob implementation (sampling #1842 + rep-penalty #1844) and the HTTP chat-completions interface. Without this PR, the new QuantizedGenerateConfig fields (top_k, top_p, repeat_penalty, repeat_last_n, seed) were silently hardcoded to defaults in `try_qwen3_moe_backend` — the impl was on main but unreachable from the HTTP path. ## Changes ### `crates/aprender-serve/src/api/mod_create_demo.rs` `ChatCompletionRequest` gains 5 optional fields (aprender extensions to the OpenAI schema): - `top_k: Option<usize>` — qwen3-moe-sampling-v1 V1_001 knob - `repeat_penalty: Option<f32>` — qwen3-moe-repetition-penalty-v1 - `repeat_last_n: Option<usize>` — penalty window - `seed: Option<u64>` — qwen3-moe-sampling-v1 V1_002 reproducibility All `#[serde(default)]` so existing clients are unaffected. ### `crates/aprender-serve/src/api/cuda_chat_backend.rs` `try_qwen3_moe_backend` thread all 5 new fields from the HTTP request into `QuantizedGenerateConfig`. When unset, falls back to the QuantizedGenerateConfig::default() values (greedy decoding). ### `crates/aprender-orchestrate/src/agent/driver/apr_serve.rs` `AprServeDriver::build_openai_body` reads 6 env vars and includes them in the HTTP request body when set: - `APR_AGENT_TEMPERATURE` — overrides CompletionRequest.temperature - `APR_AGENT_TOP_K` - `APR_AGENT_TOP_P` - `APR_AGENT_REPEAT_PENALTY` - `APR_AGENT_REPEAT_LAST_N` - `APR_AGENT_SEED` Operator can now dispatch the CCPA Phase 6 bench with sampling/penalty: ```bash APR_AGENT_TEMPERATURE=0.3 \ APR_AGENT_TOP_K=50 \ APR_AGENT_TOP_P=0.95 \ APR_AGENT_REPEAT_PENALTY=1.2 \ APR_AGENT_REPEAT_LAST_N=64 \ bash scripts/phase-6-bench.sh ``` (Per paiml/claude-code-parity-apr M288 v1004-3knob-dispatch-recipe.md.) ## What this is NOT - NOT new contract gates — V1_001..V1_004 of qwen3-moe-sampling-v1 + qwen3-moe-repetition-penalty-v1 are already discharged. This is PURE PLUMBING. - NOT companion-side env-var plumbing — apr code in this PR already reads env vars; companion bench script just needs to set them (mechanical, no aprender change). ## Cross-references - aprender#1832 (M32d, MERGED) - aprender#1842 (sampling impl, MERGED via squash that also absorbed #1844) - aprender#1844 (rep-penalty impl, MERGED) - aprender#1835 (streaming SSE contract, OPEN) - paiml/claude-code-parity-apr M288 (3-knob dispatch recipe) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…t sites PR #1846 added top_k/repeat_penalty/repeat_last_n/seed to ChatCompletionRequest but missed updating the existing test construction sites. CI failed with 11 E0063 errors ("missing fields ... in initializer of api::ChatCompletionRequest"). Surgical fix: insert the 4 new fields (all set to None) at every test-site struct literal. Behavior preserved — these fields are Option<T> defaulting to None matches existing behavior. Sites patched: - src/api/tests/ (10 files, 11 sites) — built by `cargo test --lib` - tests/api_coverage.rs, tests/api_deep_coverage.rs, tests/property_api.rs (3 files, 15 sites) — built by integration test path Closes the workspace-test failure on b2333b8.

paiml/aprender#1846 closes the env-var plumbing gap noted in M288. M288's "NOT YET shipped" caveat is now resolved end-to-end. ## End-to-end flow now wired ``` operator shell ENV → bench script (inherits) → ccpa-arena-bench (inherits) → apr code (inherits) → AprServeDriver::build_openai_body (READS env vars) → HTTP POST /v1/chat/completions {temperature, top_k, ...} → try_qwen3_moe_backend (PARSES request) → QuantizedGenerateConfig {...} → run_qwen3_moe_generate → sample_from_logits (APPLIES sampling + penalty) ``` Every link wired. Operator's `APR_AGENT_TEMPERATURE=0.3` (etc) now flows through to actual logit sampling, no longer a no-op. ## Status reconciliation 8/10 aprender M32d-arc PRs MERGED. 2 OPEN: - #1835 (streaming SSE contract; workspace-test pending) - #1846 (this M289's prerequisite; just opened) ## Companion-side state CCPA M281-M288 + M289 = 9 docs tracking the full upstream arc + dispatch recipe + plumbing confirmation. ## What's NOT done - V1_004 sub-bench not yet dispatched (operator-coordinated; needs #1846 merge + apr rebuild + ~10-15hr wall) - Currently-running greedy baseline bench should finish first; don't start the 3-knob bench until the baseline scores.json lands (the COMPARISON is the value) Mechanical doc. M-counter NOT bumped. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…4 follow-up) (#1849) Adds 3 concrete few-shot <tool_call> examples to CODE_SYSTEM_PROMPT (the 7B+ branch used for Qwen3-Coder-30B-A3B). Empirical context: paiml/claude-code-parity-apr M287 evidence showed the 30B model emits Markdown ```rust``` code blocks (in turn-1 text) instead of <tool_call> JSON. The parser at realizar.rs:144-149 accepts <tool_call> + ```json``` but NOT ```rust``` — so the model's turns are silently text-only, bench hits per-turn timeout after 4 turns of rambling. The 3-knob toolkit (sampling/penalty/streaming) tunes probability distributions but can't change format adherence. THIS PR addresses the format adherence directly by: 1. Showing the model 3 concrete <tool_call> examples in-context (file_read, file_edit, shell) 2. Adding an explicit "ALWAYS gets a tool-call response" rule 3. Adding "Be concise — DO NOT narrate" guideline 4. Adding "DO NOT use Markdown ```rust``` code blocks" anti-rule ## Why few-shot examples work Large language models are pattern-matchers. Showing them the exact format they should emit (rather than just describing it) drastically improves format adherence on coder-finetuned models. The 30B-Coder has strong "Markdown code block" priors from training; explicit counter-examples + the negative rule pull it toward the <tool_call> format. ## Empirical context M287 (Phase 6 bench, fixtures 1-10 + greedy decoding): uniform driver_error / turns_before_error=4 pattern. Every turn was text with Rust code in Markdown, no tool calls extracted. Operator playbook calls for sampling/penalty sub-bench (#1842 + #1844 + #1846 shipped). This PR is COMPLEMENTARY: prompt fix + sampling together have the best chance of breaking the rambling pattern. ## Companion-side dispatch (post-merge) After this PR + rebuild, operator can run a NEW sub-bench (call it Sub-bench E in M288 nomenclature) that combines: - 3-knob sampling (temperature=0.3, top_k=50, top_p=0.95) - Repetition penalty (repeat_penalty=1.2, repeat_last_n=64) - THIS PR's few-shot prompt (active by default; no env var needed) If Sub-bench E shows ANY fixture pass, V1_004 discharges. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

noahgift enabled auto-merge (squash) May 20, 2026 11:23

noahgift added 2 commits May 20, 2026 13:55

Merge branch 'main' into feat/3knob-http-wireup

25d6560

noahgift mentioned this pull request May 20, 2026

docs(M289): 3-knob plumbing SHIPPED — M288 prerequisites resolved paiml/claude-code-parity-apr#257

Merged

2 tasks

noahgift added 2 commits May 20, 2026 14:56

Merge branch 'main' into feat/3knob-http-wireup

78c3a3d

Merge branch 'main' into feat/3knob-http-wireup

6113b85

noahgift merged commit 1910e9e into main May 20, 2026
10 checks passed

noahgift deleted the feat/3knob-http-wireup branch May 20, 2026 13:45

This was referenced May 20, 2026

feat(code-prompt): few-shot <tool_call> examples + anti-rambling guideline (V1_004 follow-up) #1849

Merged

fix(try_qwen3_moe_backend): populate stop_tokens with EOS — fixes M287 runaway 'Human:' generation #1852

Merged

This was referenced May 21, 2026

fix(tests): repair stale include_str! paths after monorepo consolidation #1857

Merged

Qwen2.5-7B Q4_K GPU inference produces gibberish — 'ampiezza' (wgpu) / '<|im_start|>' (cuBLAS) — regression vs #374 / #559 #1864

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 3-knob HTTP wire-up — operator-actionable sampling/penalty via apr code env vars#1846

feat: 3-knob HTTP wire-up — operator-actionable sampling/penalty via apr code env vars#1846
noahgift merged 5 commits into
mainfrom
feat/3knob-http-wireup

noahgift commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

noahgift commented May 20, 2026

Summary

Changes

`ChatCompletionRequest` extension (`mod_create_demo.rs`)

`try_qwen3_moe_backend` (`cuda_chat_backend.rs`)

`AprServeDriver::build_openai_body` (`apr_serve.rs`)

Operator dispatch (post-merge)

What this is NOT

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant