Skip to content

chore(distill): Phase 5 HumanEval dispatch wrapper#1886

Closed
noahgift wants to merge 5 commits into
mainfrom
chore/dispatch-phase5-humaneval
Closed

chore(distill): Phase 5 HumanEval dispatch wrapper#1886
noahgift wants to merge 5 commits into
mainfrom
chore/dispatch-phase5-humaneval

Conversation

@noahgift

Copy link
Copy Markdown
Contributor

Phase 5 wrapper for HumanEval discharge on Stage D output. With PMAT-702 (#1874), no more pass@1=1.0 false positives on broken models. Target pass@1 ≥ 25% per AC-DISTILL-004. QA: bashrs 0 errors, bash -n ok. 🤖 Generated with Claude Code

scripts/dispatch-distill-phase-5-humaneval.sh runs apr eval --task
humaneval on a Stage D output checkpoint. With PMAT-702 (PR #1874) in
main, the eval no longer falls back to structural validation with a
fake pass@1=1.0 false positive on broken models -- inference failure
now returns exit code 8 with mode=inference_failed.

Target per SPEC-DISTILL-001 AC-DISTILL-004: pass@1 >= 25% (loose ship
threshold; competitive 0.5B 30-40%; upstream 7B teacher 91%).

Env vars: CHECKPOINT (required), SAMPLES, DEVICE, TEMPERATURE, TOP_P,
HUMANEVAL_JSONL, DRY_RUN. Estimated wall time on GB10: 5-8 h.

Falsifier: mode=inference_failed or pass@1=0.0 with non-zero exit
means re-train. With #1874 effective this signal is accurate; pre-fix
the structural fallback masked it as pass@1=1.0.

QA: bashrs lint 0 errors.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@noahgift noahgift enabled auto-merge (squash) May 22, 2026 13:54
@noahgift

Copy link
Copy Markdown
Contributor Author

Subsumed by #1898 (mega-bundle hiatus close-out). Squash-merge preserves the per-PR commit message — see #1898 commit log.

@noahgift noahgift closed this May 23, 2026
auto-merge was automatically disabled May 23, 2026 07:09

Pull request was closed

noahgift added a commit that referenced this pull request May 23, 2026
, #1896, #1897) (#1898)

* docs(spec): SPEC-DISTILL-001 §87 — PMAT-704 post-mortem on Bug B wrong turn (#1880)

* chore(distill): Stage D dispatch wrapper with PMAT-701 lessons baked in (#1883)

* chore(distill): Phase 5 HumanEval dispatch wrapper (#1886)

* chore: bundle PMAT-702..705 distill cascade + clippy fix (#1897)

* fix(cli): point 7B qwen models to single-file GGUF artifacts and align caches (#1891)

* fix(chat): preserve original path in FileNotFound for filesystem paths

PR #1891 wrapped all path_arg through HF alias resolution. For inputs
that look like filesystem paths (absolute or starts with ./, ../) and
don't exist, the alias resolver was rewriting them as hf:// URIs and
returning a mangled path in the FileNotFound error.

Fix: short-circuit with the original path_arg in the error BEFORE alias
resolution kicks in. Preserves the contract that test_run_file_not_found
and test_run_nonexistent_path_without_trace assert.

Closes the workspace-test failure on bundle PR #1898.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant