Skip to content

fix(skills): add nutrigx_advisor compat symlink for bench harness#215

Merged
manuelcorpas merged 1 commit intomainfrom
fix/nutrigx-folder-path
May 3, 2026
Merged

fix(skills): add nutrigx_advisor compat symlink for bench harness#215
manuelcorpas merged 1 commit intomainfrom
fix/nutrigx-folder-path

Conversation

@manuelcorpas
Copy link
Copy Markdown
Contributor

Summary

Restores nutrigx-advisor end-to-end testability against clawbio-bench v0.1.5. After the AgentSkills naming rename in e4ed975 (skills/nutrigx_advisor to skills/nutrigx-advisor), the bench harness was still pointing at the legacy underscore path and all 10 nutrigx test cases were failing with exit_code: 2 ("script not found").

This adds a tracked directory symlink so the legacy path resolves until the bench is updated.

Impact on the public benchmark scorecard

Harness Before After
nutrigx-advisor 0 / 10 (0.0%) 10 / 10 (100.0%)
Aggregate 139 / 162 (85.8%) 149 / 162 (92.0%)

Compared to the original 2026-04-05 audit baseline of 80 / 140 (57.1%), the live scorecard is now +35 percentage points across 7 of 10 harnesses (the other 3 are new harnesses added in bench v0.1.5).

Why a symlink and not a code fix

The bench harness invokes the skill script directly (nutrigx_harness.py:511) bypassing the ClawBio CLI:

tool_path = repo_path / "skills" / "nutrigx_advisor" / "nutrigx_advisor.py"

So updating clawbio.py's SKILLS dict does not help. The bench reads from a git checkout (untracked files are invisible), which is why the symlink has to be tracked.

Why this is temporary

The proper fix is in biostochastics/clawbio_bench: either resolve skill folders dynamically (try hyphen, fall back to underscore) or read the script path from a per-skill manifest. I will open a follow-up issue / PR there. Once that lands, this symlink can be removed.

Verified locally

  • python clawbio.py run nutrigx --demo returns exit 0 and writes the full reproducibility bundle (commands.sh, environment.yml, checksums.txt, provenance.json, nutrigx_report.md, result.json).
  • clawbio-bench --smoke --harness nutrigx reports 10 / 10 passing with score_correct, snp_valid, threshold_consistent, repro_functional.
  • clawbio-bench --smoke aggregate run reports 149 / 162 (92.0%).

Test plan

  • CI green
  • Manual: git clone https://github.com/ClawBio/ClawBio.git && cd ClawBio && python clawbio.py run nutrigx --demo returns exit 0
  • Manual: clawbio-bench --smoke --harness nutrigx --repo . returns 10 / 10 passing

Follow-up (separate PR)

  • Open issue at biostochastics/clawbio_bench requesting dynamic skill folder resolution
  • Audit clawbio.py for other broken SKILLS_DIR / "name" references (one other was found: llm-biobank-bench has no folder; out of scope here)
  • Update benchmarks.html to reflect 149 / 162 (92.0%) once this PR is merged

🤖 Generated with Claude Code

Restores nutrigx end-to-end testability after the AgentSkills naming
rename in e4ed975 (skills/nutrigx_advisor to skills/nutrigx-advisor).
The clawbio_bench v0.1.5 nutrigx_harness hardcodes the legacy
underscore path at nutrigx_harness.py:511, so all 10 nutrigx test
cases were failing on the live repo with exit_code 2 ("script not
found"). Bench reads from a git checkout, so an untracked symlink is
invisible to it; this needs to be a tracked symlink committed to the
tree.

Effect on benchmark
- nutrigx-advisor: 0/10 (0.0%) -> 10/10 (100.0%)
- Aggregate (clawbio-bench v0.1.5 smoke): 139/162 (85.8%) -> 149/162
  (92.0%) excluding fine-mapping infrastructure errors

The symlink is a temporary compatibility shim. The proper fix is a
PR to biostochastics/clawbio_bench to either resolve skill folders
dynamically (try hyphen, fall back to underscore) or read the path
from a per-skill manifest. Once that lands, this symlink can be
removed.

Verified locally
- python clawbio.py run nutrigx --demo (exit 0, full repro bundle)
- clawbio-bench --smoke --harness nutrigx (10/10 PASS)
- clawbio-bench --smoke (149/162, 92.0%)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@manuelcorpas manuelcorpas merged commit 925b89a into main May 3, 2026
6 checks passed
@manuelcorpas manuelcorpas deleted the fix/nutrigx-folder-path branch May 3, 2026 08:40
smoe pushed a commit to smoe/ClawBio that referenced this pull request May 3, 2026
Updates the public benchmark leaderboard and the homepage to reflect
the 2026-05-03 re-run of clawbio_bench v0.1.5, after the nutrigx
compat-symlink fix in PR ClawBio#215.

Headline numbers
- Audit baseline (2026-04-05, bench v0.1.0): 80 / 140 (57.1%)
- Latest run (2026-05-03, bench v0.1.5): 149 / 162 (92.0%)
- Skills audited: 7 to 10 (3 new harnesses in v0.1.5)

Per-skill changes
- equity-scorer: 20.0% to 100.0% (P0 findings resolved)
- nutrigx-advisor: 80.0% to 100.0% (today's symlink fix)
- pharmgx-reporter: 42.4% to 97.7%
- bio-orchestrator: 75.9% to 98.1%
- claw-metagenomics: 85.7% to 100.0%
- clinical-variant-reporter: 80.0% to 80.0% (unchanged)
- fine-mapping: now reports 21 harness infrastructure errors,
  excluded from rate (under investigation)
- New harnesses: cvr-acmg-correctness 69.2%, gwas-prs 62.5%,
  cvr-variant-identity 50.0%

Site changes
- benchmarks.html
  - Hero adds "We fix them" arc
  - Audit metadata card "Auditor: Sergey Kornilov" replaced with
    "Bench Author: Biostochastics LLC", focusing the credit on the
    open-source clawbio_bench tool rather than an individual
  - Bench version v0.1.0 to v0.1.5
  - Audit commit 1481fb4 to 925b89a
  - Summary bar: 80/140 to 149/162
  - Scorecard table: full rebuild against new run, 10 rows, status
    pills updated (most P0/P1 now Clear)
  - Footer: original audit baseline retained for transparency
  - JSON-LD dateModified bumped
- index.html
  - Top banner: "80/140 across 7 skills" to "149/162 (92.0%) | Up
    from 80/140 (57.1%) at original audit"
  - Hero CTA pill: "Leaderboard 80/140" to "Leaderboard 149/162"
  - Benchmark Validated feature card: numbers refreshed
  - Skills section header: "7 publicly benchmarked" to "149/162
    publicly benchmarked passing"
  - Recent Updates: leaderboard card now references PR ClawBio#215 too,
    cites clawbio_bench, includes the gain numbers

The auditor-name swap follows the principle that the bench is a
public tool with org credit; a single individual's name is not the
right framing for a permanent front-page asset.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
smoe pushed a commit to smoe/ClawBio that referenced this pull request May 3, 2026
Resolves the 21 harness errors observed in clawbio_bench v0.1.5
finemapping driver. The driver does:

    sys.path.insert(0, skill_dir)
    from core.abf import compute_abf
    from core.susie import run_susie
    from core.credible_sets import ...
    from core.susie_inf import run_susie_inf  # optional

The skill restructured its internal package from `core` to
`fine_mapping_core` at some point but the bench harness still
expects `core`. Adds a tracked directory symlink so the legacy
import path resolves until the bench is updated.

Effect on benchmark
- clawbio-finemapping: 0/0 with 21 harness errors -> 19/20 (95.0%)
  with 1 real algorithm failure (susie_inf_est_tausq_ignored)
- Aggregate: 149/162 (92.0%) excluding finemapping -> 168/182
  (92.3%) including finemapping

Same pattern as the nutrigx_advisor symlink in PR ClawBio#215. The proper
fix is in biostochastics/clawbio_bench: either resolve skill-package
names dynamically or read import paths from a per-skill manifest.
Once that lands, both symlinks can be removed.

Verified
- clawbio-bench --smoke --harness finemapping: 19/20 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant