fix(skills): add nutrigx_advisor compat symlink for bench harness#215
Merged
manuelcorpas merged 1 commit intomainfrom May 3, 2026
Merged
fix(skills): add nutrigx_advisor compat symlink for bench harness#215manuelcorpas merged 1 commit intomainfrom
manuelcorpas merged 1 commit intomainfrom
Conversation
Restores nutrigx end-to-end testability after the AgentSkills naming rename in e4ed975 (skills/nutrigx_advisor to skills/nutrigx-advisor). The clawbio_bench v0.1.5 nutrigx_harness hardcodes the legacy underscore path at nutrigx_harness.py:511, so all 10 nutrigx test cases were failing on the live repo with exit_code 2 ("script not found"). Bench reads from a git checkout, so an untracked symlink is invisible to it; this needs to be a tracked symlink committed to the tree. Effect on benchmark - nutrigx-advisor: 0/10 (0.0%) -> 10/10 (100.0%) - Aggregate (clawbio-bench v0.1.5 smoke): 139/162 (85.8%) -> 149/162 (92.0%) excluding fine-mapping infrastructure errors The symlink is a temporary compatibility shim. The proper fix is a PR to biostochastics/clawbio_bench to either resolve skill folders dynamically (try hyphen, fall back to underscore) or read the path from a per-skill manifest. Once that lands, this symlink can be removed. Verified locally - python clawbio.py run nutrigx --demo (exit 0, full repro bundle) - clawbio-bench --smoke --harness nutrigx (10/10 PASS) - clawbio-bench --smoke (149/162, 92.0%) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 3, 2026
smoe
pushed a commit
to smoe/ClawBio
that referenced
this pull request
May 3, 2026
Updates the public benchmark leaderboard and the homepage to reflect the 2026-05-03 re-run of clawbio_bench v0.1.5, after the nutrigx compat-symlink fix in PR ClawBio#215. Headline numbers - Audit baseline (2026-04-05, bench v0.1.0): 80 / 140 (57.1%) - Latest run (2026-05-03, bench v0.1.5): 149 / 162 (92.0%) - Skills audited: 7 to 10 (3 new harnesses in v0.1.5) Per-skill changes - equity-scorer: 20.0% to 100.0% (P0 findings resolved) - nutrigx-advisor: 80.0% to 100.0% (today's symlink fix) - pharmgx-reporter: 42.4% to 97.7% - bio-orchestrator: 75.9% to 98.1% - claw-metagenomics: 85.7% to 100.0% - clinical-variant-reporter: 80.0% to 80.0% (unchanged) - fine-mapping: now reports 21 harness infrastructure errors, excluded from rate (under investigation) - New harnesses: cvr-acmg-correctness 69.2%, gwas-prs 62.5%, cvr-variant-identity 50.0% Site changes - benchmarks.html - Hero adds "We fix them" arc - Audit metadata card "Auditor: Sergey Kornilov" replaced with "Bench Author: Biostochastics LLC", focusing the credit on the open-source clawbio_bench tool rather than an individual - Bench version v0.1.0 to v0.1.5 - Audit commit 1481fb4 to 925b89a - Summary bar: 80/140 to 149/162 - Scorecard table: full rebuild against new run, 10 rows, status pills updated (most P0/P1 now Clear) - Footer: original audit baseline retained for transparency - JSON-LD dateModified bumped - index.html - Top banner: "80/140 across 7 skills" to "149/162 (92.0%) | Up from 80/140 (57.1%) at original audit" - Hero CTA pill: "Leaderboard 80/140" to "Leaderboard 149/162" - Benchmark Validated feature card: numbers refreshed - Skills section header: "7 publicly benchmarked" to "149/162 publicly benchmarked passing" - Recent Updates: leaderboard card now references PR ClawBio#215 too, cites clawbio_bench, includes the gain numbers The auditor-name swap follows the principle that the bench is a public tool with org credit; a single individual's name is not the right framing for a permanent front-page asset. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
smoe
pushed a commit
to smoe/ClawBio
that referenced
this pull request
May 3, 2026
Resolves the 21 harness errors observed in clawbio_bench v0.1.5
finemapping driver. The driver does:
sys.path.insert(0, skill_dir)
from core.abf import compute_abf
from core.susie import run_susie
from core.credible_sets import ...
from core.susie_inf import run_susie_inf # optional
The skill restructured its internal package from `core` to
`fine_mapping_core` at some point but the bench harness still
expects `core`. Adds a tracked directory symlink so the legacy
import path resolves until the bench is updated.
Effect on benchmark
- clawbio-finemapping: 0/0 with 21 harness errors -> 19/20 (95.0%)
with 1 real algorithm failure (susie_inf_est_tausq_ignored)
- Aggregate: 149/162 (92.0%) excluding finemapping -> 168/182
(92.3%) including finemapping
Same pattern as the nutrigx_advisor symlink in PR ClawBio#215. The proper
fix is in biostochastics/clawbio_bench: either resolve skill-package
names dynamically or read import paths from a per-skill manifest.
Once that lands, both symlinks can be removed.
Verified
- clawbio-bench --smoke --harness finemapping: 19/20 pass
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores nutrigx-advisor end-to-end testability against
clawbio-benchv0.1.5. After the AgentSkills naming rename ine4ed975(skills/nutrigx_advisortoskills/nutrigx-advisor), the bench harness was still pointing at the legacy underscore path and all 10 nutrigx test cases were failing withexit_code: 2("script not found").This adds a tracked directory symlink so the legacy path resolves until the bench is updated.
Impact on the public benchmark scorecard
Compared to the original 2026-04-05 audit baseline of 80 / 140 (57.1%), the live scorecard is now +35 percentage points across 7 of 10 harnesses (the other 3 are new harnesses added in bench v0.1.5).
Why a symlink and not a code fix
The bench harness invokes the skill script directly (
nutrigx_harness.py:511) bypassing the ClawBio CLI:So updating
clawbio.py's SKILLS dict does not help. The bench reads from a git checkout (untracked files are invisible), which is why the symlink has to be tracked.Why this is temporary
The proper fix is in
biostochastics/clawbio_bench: either resolve skill folders dynamically (try hyphen, fall back to underscore) or read the script path from a per-skill manifest. I will open a follow-up issue / PR there. Once that lands, this symlink can be removed.Verified locally
python clawbio.py run nutrigx --demoreturns exit 0 and writes the full reproducibility bundle (commands.sh,environment.yml,checksums.txt,provenance.json,nutrigx_report.md,result.json).clawbio-bench --smoke --harness nutrigxreports 10 / 10 passing withscore_correct,snp_valid,threshold_consistent,repro_functional.clawbio-bench --smokeaggregate run reports 149 / 162 (92.0%).Test plan
git clone https://github.com/ClawBio/ClawBio.git && cd ClawBio && python clawbio.py run nutrigx --demoreturns exit 0clawbio-bench --smoke --harness nutrigx --repo .returns 10 / 10 passingFollow-up (separate PR)
clawbio.pyfor other brokenSKILLS_DIR / "name"references (one other was found:llm-biobank-benchhas no folder; out of scope here)benchmarks.htmlto reflect 149 / 162 (92.0%) once this PR is merged🤖 Generated with Claude Code