fix: doctor resolves skills/ from install path when run outside repo by TheAndersMadsen · Pull Request #128 · garrytan/gbrain

TheAndersMadsen · 2026-04-15T06:21:48Z

Summary

gbrain doctor reports resolver_health: warn — "Could not find skills directory" for every user who installed via bun install github:garrytan/gbrain and runs gbrain from ~ or any other directory that has no skills/RESOLVER.md up the tree. The bundled skills/ directory lives under node_modules/gbrain/skills/ but the cwd-only walk in findRepoRoot() never looks there.

Impact: every hosted-CLI install takes a permanent -5 on the health score, plus the scary-sounding warning noise, even though everything is actually wired correctly.

Fix

Three-tier resolution in findRepoRoot() (first hit wins):

GBRAIN_SKILLS_DIR env var — explicit override for Docker mounts, CI, monorepo subdirs, or anyone who wants a deterministic path.
Walk up from cwd — original behavior, preferred when you're inside a brain repo.
Walk up from this module's install path — via fileURLToPath(import.meta.url), catches node_modules/gbrain/ automatically without user config.

Also refactors the walk into a walkUpFor(start, relPath) helper so cwd and install-path both share a single, tested implementation.

Test plan

bun test test/doctor.test.ts — 7 pass, 0 fail (2 new tests added)
bun test (full suite) — 823 pass, 126 skip, 0 fail
Verified from a hosted install on Linux aarch64: cd ~ && gbrain doctor --fast --json no longer shows "Could not find skills directory" — resolver_health now reports the actual resolver state (found 10 upstream DRY_VIOLATION warnings in bundled skill content, which is a separate issue).
Verified env override: GBRAIN_SKILLS_DIR=/tmp/my-skills gbrain doctor --fast uses the override path.
Verified cwd walk unchanged when running from inside a brain repo.

New tests

resolver_health finds bundled skills when run outside repo — spawns gbrain doctor --fast --json from a fresh tmpdir and asserts the message does not contain "Could not find".
resolver_health honors GBRAIN_SKILLS_DIR env override — creates a tmpdir skills layout, points the env var at it, asserts the resolver finds it.

🤖 Generated with Claude Code

Before this patch, `gbrain doctor` required the user to be inside (or below) a directory containing `skills/RESOLVER.md`. That matches the "clone and hack on the brain" flow, but breaks the hosted/CLI-only flow where the user installs via `bun install github:garrytan/gbrain` and runs `gbrain` from `~` — the bundled skills live under `node_modules/gbrain/skills/` and are invisible to a cwd-only walk. The resolver_health check then reports "Could not find skills directory" and drops the health score by 5 for every user of the hosted path. Resolution now falls back through three sources (first hit wins): 1. `GBRAIN_SKILLS_DIR` env var — explicit override for non-standard layouts (Docker mounts, CI, monorepo subdirs). 2. Walk up from cwd — original behavior, still preferred when you are inside a brain repo. 3. Walk up from this module's install path via `import.meta.url` — catches the `node_modules/gbrain/` case without requiring any user configuration. Two new tests cover the hosted-CLI case (run from tmpdir, expect no "Could not find" warning) and the env override path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

EagleEyez1 · 2026-04-19T14:37:18Z

+1 — hit this exact issue running gbrain from ~ with the bundled skills under node_modules/gbrain/skills/. Every doctor run was taking the -5 health penalty even though everything was wired correctly.

Your three-tier resolution (env var → cwd walk → install walk) is cleaner than the two-tier fallback I was carrying locally, and the GBRAIN_SKILLS_DIR override is nice for Docker / monorepo cases I hadn't thought through. Would love to see this merged.

@FUSED-ID

…788 + #536 + #376 + #128 adapted) (#804) * fix: merge resolver entries from all files (RESOLVER.md + AGENTS.md) OpenClaw deployments typically have AGENTS.md at the workspace root as the real skill dispatcher (200+ entries), while gbrain skillpacks install a thin skills/RESOLVER.md (~40 entries). The previous first-match-wins policy meant check-resolvable only saw the thin RESOLVER.md, reporting 187 skills as 'unreachable' when they were fully routed in AGENTS.md. Now: check-resolvable collects entries from ALL resolver files across both the skills directory and its parent. Entries are deduped by skillPath (first occurrence wins). The combined content is also passed to the routing-eval (Check 5) so routing fixtures see the full trigger index. New function findAllResolverFiles() in resolver-filenames.ts returns all matching files instead of just the first. findResolverFile() is unchanged (backward-compatible for callers that need a single path). Before: 37/224 reachable (our deployment) After: 200/224 reachable (remaining 24 are genuine gaps) Tests: 8 new (findAllResolverFiles + checkResolvable merge behavior) * fix: graph_coverage skipped when brain has 0 entity pages Closes #530. `graph_coverage` measures `link_coverage` (fraction of entity pages with inbound links) and `timeline_coverage` (fraction with timeline entries). Both formulas divide by entity-page count. For markdown-only brains (journals, wikis, notes — Karpathy's original LLM Wiki use case) the entity count is 0, so coverage is structurally undefined. The check still reported 'warn: 0%' under that condition, which: 1. Brain owners cannot satisfy without indexing code/entities 2. Doctor's hint references stale commands (`link-extract` / `timeline-extract` were renamed to `extract` in v0.22) 3. Adds noise to compliance/health automation gating on doctor exit Fix: detect entity-page count via SQL. If 0, mark check 'ok' with explanation. Otherwise keep existing logic but update hint to current `gbrain extract all`. Tested on Nous AGaaS production wiki: 2533 markdown pages, 100% embedded, 6086 wikilinks, 1964 timeline entries — 0 entity pages — graph_coverage correctly clears. * fix(doctor): deprecate stale link-extract / timeline-extract verb names The graph_coverage hint and the link-extraction.ts header comment still referenced `gbrain link-extract` / `gbrain timeline-extract`, which were consolidated into `gbrain extract <links|timeline|all>` in v0.16. Following the consolidation in #536's resolution (which fixed the doctor hint to `gbrain extract all`), this commit removes the last stale reference in `src/core/link-extraction.ts`'s header comment. Originally PR #376 by @FUSED-ID. The doctor.ts portion of #376 is absorbed by #536's richer warn message; this commit lands #376's `link-extraction.ts` portion only. Co-Authored-By: Leon-Gerard Vandenberg <FUSED-ID@users.noreply.github.com> * test(doctor): pin canonical `gbrain extract all` hint, ban stale verbs IRON-RULE regression guard for PR #376 + #536's graph_coverage hint fix (locked in v0.31.7 eng-review). The removed verbs `gbrain link-extract` and `gbrain timeline-extract` were consolidated into `gbrain extract <links|timeline|all>` in v0.16 but the hint kept suggesting them for ~30 releases. Pin the user-facing copy at the source-string level so a future edit can't silently re-regress. Structural assertion in the existing `doctor command` describe block, matching the file's existing `frontmatter_integrity` / `rls_event_trigger` pattern. No DB-fixture infrastructure needed. * fix: sync RESOLVER.md triggers with v0.25.1 skill frontmatter `gbrain doctor` reported 36 routing-miss/ambiguous warnings against the v0.25.1 wave skills (book-mirror, article-enrichment, strategic-reading, concept-synthesis, perplexity-research, archive-crawler, academic-verify, brain-pdf, voice-note-ingest). Each skill's frontmatter declared 4-5 triggers, but only the first ever made it into RESOLVER.md's hand-curated rows. The structural matcher couldn't find any specific phrase for realistic user intents, so requests fell through to broader parents (`ingest`, `enrich`, `data-research`). Pulled the missing triggers from each skill's `triggers:` frontmatter into the matching RESOLVER.md row. Converted media-ingest's prose row to quoted triggers so the matcher actually sees them. Added `"summarize this book"` to media-ingest (covers a book-mirror disambiguation fixture). Marked article-enrichment + perplexity-research fixtures with `ambiguous_with` for the parent skills they intentionally chain with — RESOLVER.md's preamble explicitly documents that skills are designed to chain, so this is acknowledging the truth, not papering over a bug. Result: 36 routing warnings → 0. resolver-test/check-resolvable/ routing-eval suite: 140/0. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(doctor): find skills/ on every deployment shape (read-path-only) Adapts the install-path resolution from PR #128 (TheAndersMadsen) into the existing 5-tier autoDetectSkillsDir architecture. Two new code paths, read-path-only by design: 1. Tier-0 $GBRAIN_SKILLS_DIR explicit operator override on the SHARED autoDetectSkillsDir. Safe for both read and write paths because the operator explicitly set the var — opt-in retargeting is fine. 2. New autoDetectSkillsDirReadOnly() function for READ-ONLY callers (gbrain doctor, check-resolvable, routing-eval). Wraps the shared detect; on null, walks up from fileURLToPath(import.meta.url) gated by isGbrainRepoRoot() so unrelated repos along the install path can't false-positive. The split is the architectural fix for a write-path regression risk codex outside-voice review surfaced (eng-review D5): adding the install-path fallback to the SHARED resolver would let `gbrain skillpack install` from `~` silently target the bundled gbrain repo's skills/ instead of the user's actual workspace. Three write-path call sites stay on the original autoDetectSkillsDir; three read-path call sites switch to the new readOnly variant. Closes the install-path footgun for hosted-CLI installs: `bun install -g github:garrytan/gbrain && cd ~ && gbrain doctor` now finds the bundled skills/ instead of warning "Could not find skills directory." Test surface: 8 new cases in test/repo-root.test.ts covering tier-0 valid/invalid/precedence, install-path walk, isGbrainRepoRoot gate (via primary-success-no-drift assertion), AUTO_DETECT_HINT updates, and the D5 regression guard that pins the read-path/write-path split. Co-Authored-By: Anders Madsen <TheAndersMadsen@users.noreply.github.com> * docs(changelog): expand v0.31.7 entry for full 5-PR doctor wave Promotes headline from "doctor stops crying wolf about unreachable skills on OpenClaw" to the assembled wave's narrative: every doctor false-positive class on disk today, plus the install-path footgun that bit every hosted-CLI user. Numbers-that-matter table expanded to 6 rows covering all 5 PRs. Itemized-changes section grouped by sub-wave: resolver merge, RESOLVER.md trigger sync, graph_coverage zero-entity, stale verb hint fix, install-path resolver. Contributors named explicitly: @mayazbay, @psperera, @FUSED-ID, @TheAndersMadsen. "For contributors" section flags the new SkillsDirSource variants and the read-path / write-path split as the canonical pattern for future fallback additions. * chore(v0.31.7): bump version + regenerate llms + fix CLI regression-gate Wraps up the v0.31.7 doctor-fix wave: - VERSION + package.json: 0.31.1.1-fixwave -> 0.31.7 - llms-full.txt: regenerated against the expanded v0.31.7 CHANGELOG entry (committed bundle drift caught by test/build-llms.test.ts) - test/check-resolvable-cli.test.ts: update the REGRESSION-GATE for empty-cwd no_skills_dir error to reflect v0.31.7's intentional behavior change. The install-path fallback in autoDetectSkillsDirReadOnly now finds the bundled skills/ from any cwd inside the gbrain repo, so the test asserts source: 'install_path' instead of error: 'no_skills_dir'. This is the wave's headline capability ("doctor finds itself on every deployment shape") rather than a regression. Pre-existing flake unrelated to this wave: BrainRegistry — lazy init > empty/null/undefined id routes to host fails on machines that have ~/.gbrain/config.json present (the test assumes test env has none). Reproduces on master before this wave landed; not a v0.31.7 regression. Filed for follow-up in next maintainer hygiene sweep. * fix(doctor): close write-path leak in --fix + sync routing-eval merge Codex adversarial review of v0.31.7 caught a HIGH that the eng review missed (D6 lock during /ship): the read-path-only architecture for the install-path fallback is leaky because TWO of the three "read-only" callers (doctor, check-resolvable) actually have write modes via --fix that call autoFixDryViolations() and writeFileSync to SKILL.md files. A user running `cd ~ && gbrain doctor --fix` with no skills/RESOLVER.md up the cwd tree would resolve via the install-path fallback to the bundled gbrain repo and silently rewrite the install-tree skills — exactly the regression D5's split was supposed to prevent. Fix: when --fix is requested and the resolved skills dir came from the install-path source, refuse with a clear error pointing at GBRAIN_SKILLS_DIR / OPENCLAW_WORKSPACE / --skills-dir as explicit overrides. The read parts of doctor and check-resolvable continue to benefit from the install-path fallback (the v0.31.7 capability headline); only --fix is gated. Plus a MEDIUM consistency fix codex flagged: routing-eval was still single-file-only while check-resolvable does multi-file merge across skills/RESOLVER.md + ../AGENTS.md. On OpenClaw layouts this caused routing-eval and check-resolvable to disagree on what's routable. routing-eval now uses the same findAllResolverFiles + content-merge pattern as check-resolvable, so all three commands see the same trigger index. Test coverage: D6 regression guard in test/check-resolvable-cli.test.ts spawning a real subprocess from an empty tempdir (no env, no cwd fallback) and asserting --fix refuses with the correct stderr message. Co-Authored-By: Codex (outside-voice review) <noreply@openai.com> * docs(changelog): note D6 --fix gate + routing-eval merge in v0.31.7 entry * docs: post-ship sync for v0.31.7 CLAUDE.md updates only. CHANGELOG.md was already authored by /ship and was left untouched. - src/core/repo-root.ts annotation: read-path/write-path split, tier-0 GBRAIN_SKILLS_DIR override, autoDetectSkillsDirReadOnly install-path fallback, D6 --fix safety gate. - src/commands/check-resolvable.ts annotation: multi-file resolver merge across skills dir + parent (37/224 -> 200/224 reachable on the reference OpenClaw layout), install-path read-only fallback, D6 --fix gate. - src/commands/routing-eval.ts annotation: same multi-file merge as check-resolvable; v0.25.1 RESOLVER.md trigger sync. - src/commands/doctor.ts annotation: switched to autoDetectSkillsDirReadOnly so 'cd ~ && gbrain doctor' finds bundled skills via install-path fallback; --fix D6 install-path refuse-write gate; graph_coverage zero-entity short-circuit + canonical 'gbrain extract all' hint with regression-test pin. - Test inventory: replaced bare regression-v0_16_4 line with explicit test/repo-root.test.ts entry (20 cases - 12 existing + 8 new D3/D5) and new test/resolver-merge.test.ts entry (8 cases). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs(llms): regenerate after CLAUDE.md sync for v0.31.7 * ci(test): quarantine *.serial.test.ts files from test-shard CI's test-shard.sh was including *.serial.test.ts files in the parallel shard runs, which broke voyage-multimodal.test.ts: 18 of its 22 tests failed in CI shard 2 because eval-takes-quality-runner.serial.test.ts ran before it in the same bun-test process and leaked its mock.module() substitution of src/core/ai/gateway.ts. The leaked mock omitted embedMultimodal and resetGateway, so voyage-multimodal saw `undefined is not a function` everywhere it touched the gateway. Locally `bun run test` (run-unit-parallel.sh → run-unit-shard.sh) already excludes *.serial.test.ts and runs them via `bun run test:serial` in their own pass with --max-concurrency=1. Master ran green there; only CI's matrix shards exposed the leak. The runner.serial test file's own header comment explicitly calls out this exact cross-file mock leak — the quarantine was the design, CI just wasn't honoring it. Three changes: 1. scripts/test-shard.sh — exclude *.serial.test.ts and *.slow.test.ts from the find expression, mirroring scripts/run-unit-shard.sh. 2. .github/workflows/test.yml — add a `test-serial` sibling job that runs `bun run test:serial`. Keeps serial tests gating CI without merging them back into the parallel shards. 3. test/scripts/test-shard.test.ts — regression test pinning the three exclusion clauses (serial, slow, e2e) so a future refactor that drops one of them fails loud rather than silently re-introducing the cross-file mock leak. Verified locally: - shard 2 reproduction: 18 voyage-multimodal failures → 0 (1 unrelated env-dependent perf flake remains, won't fail on CI) - bun run test:serial: 189/190 pass (1 unrelated env-dependent BrainRegistry flake from ~/.gbrain/config.json presence) - typecheck + check:test-isolation clean * ci(test): rephrase mock-module comment to satisfy R2 lint The verify gate's check:test-isolation flagged test/scripts/test-shard.test.ts because the JSDoc comment contained the literal string 'mock.module()' which matches R2's grep regex 'mock\.module[[:space:]]*\('. The file itself doesn't use mock.module — it just describes why the linter rule exists in human-readable prose. Rephrased to avoid the trailing parens. The regex requires the open paren, so 'bun's module-mocking primitive' instead of 'mock.module()' is invisible to the linter while preserving meaning for the next maintainer who reads the test. * docs(claude): tighten version-consistency rules + add merge recovery procedure After several merges from master where VERSION + package.json + CHANGELOG.md drifted out of sync (each merge hit conflicts on those three files; auto-merge sometimes resolved silently in the wrong direction), CLAUDE.md gets an explicit drift-recovery checklist + a 3-line paste-ready audit command anyone can run. Three additions to the existing "Version locations" section: 1. **Mandatory audit command** — three echo lines that print VERSION, package.json version, and the top CHANGELOG header. All three MUST match the wave's `MAJOR.MINOR.PATCH.MICRO`. Designed for paste-after- every-merge use. 2. **Merge-conflict recovery procedure** — exact sed/echo patterns for resolving VERSION + package.json + CHANGELOG conflicts, in the order to apply them. Names the anti-pattern (mixing `git checkout --ours` on the trio) that's bitten us before. 3. **Pre-push gate** — re-run the audit before `git push` of any merge commit. /ship Step 12 catches drift but only if you actually run /ship; manual pushes skip the check. Confirmed consistent at d361482, 7e8f696, 65a5994 (every merge commit on this branch). The doc gap was the rules being too loose, not the rules being wrong — this beefs up the procedural side so the next merge can't silently desync. * docs(llms): regenerate after CLAUDE.md edit + tighten the rule CI failed on the build-llms generator test because CLAUDE.md edited in fe050ae (version-consistency procedure) shipped without a matching `bun run build:llms` regen. The committed llms-full.txt was 77 lines short of fresh generator output, and test/build-llms.test.ts caught the drift in CI shard 1. Two changes: 1. llms.txt + llms-full.txt — regenerated to match current CLAUDE.md. 2. CLAUDE.md — strengthened the "Auto-derived" entry for llms.txt / llms-full.txt with explicit "every CLAUDE.md edit chases with `bun run build:llms` in the same commit" wording. Notes that `verify` doesn't run the build-llms test, only the full unit suite does, so a clean typecheck is NOT enough to know you can push after touching CLAUDE.md. This is now the third time this has bitten the wave. The previous "Auto-derived" entry said the right thing but was buried in a list; elevating it to imperative voice with a count of past regressions should make the next CLAUDE.md edit hard to land without the chaser. --------- Co-authored-by: garrytan-agents <garrytan-agents@users.noreply.github.com> Co-authored-by: Madi Ayazbay <madia@Mac.localdomain> Co-authored-by: Leon-Gerard Vandenberg <FUSED-ID@users.noreply.github.com> Co-authored-by: psperera <pperera@mac.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Anders Madsen <TheAndersMadsen@users.noreply.github.com> Co-authored-by: Codex (outside-voice review) <noreply@openai.com>

alexputici mentioned this pull request Apr 20, 2026

bug: doctor resolver_health false positive when invoked outside gbrain repo #283

Closed

garrytan mentioned this pull request May 10, 2026

v0.31.7 fix-wave: doctor stops crying wolf — 5 community PRs (#798 + #788 + #536 + #376 + #128 adapted) #804

Merged

6 tasks

garrytan mentioned this pull request May 24, 2026

fix: merge resolver entries from all files (RESOLVER.md + AGENTS.md) #798

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: doctor resolves skills/ from install path when run outside repo#128

fix: doctor resolves skills/ from install path when run outside repo#128
TheAndersMadsen wants to merge 1 commit into
garrytan:masterfrom
TheAndersMadsen:fix-resolver-fallback-paths

TheAndersMadsen commented Apr 15, 2026

Uh oh!

EagleEyez1 commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TheAndersMadsen commented Apr 15, 2026

Summary

Fix

Test plan

New tests

Uh oh!

EagleEyez1 commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants