v0.41.4.0 wave: local providers + cross-platform stdin + gateway-routed dream judge (6 community PRs)#1377
Merged
Merged
Conversation
…eads
`readFileSync('/dev/stdin', 'utf-8')` works on Unix but fails on Windows
(Git Bash, PowerShell, cmd) with `ENOENT: no such file or directory,
open '/dev/stdin'`. Windows doesn't expose `/dev/stdin` as a filesystem
path.
Reading file descriptor 0 directly (`readFileSync(0, 'utf-8')`) is the
documented Node.js idiom and works on every platform. No behavior change
on Unix — same syscall path, same semantics.
Repro on Windows before the fix:
echo "test" | gbrain put my-page
ENOENT: no such file or directory, open '/dev/stdin'
After: round-trip put/search/delete works on Windows Git Bash.
…via llama.cpp Adds local reranker support so users can point gbrain's reranker call at their own llama.cpp server instead of ZeroEntropy's hosted API. One new recipe (`llama-server-reranker`), a `path?: string` + `default_timeout_ms?: number` extension on `RerankerTouchpoint`, env passthrough wiring, budget-tracker `FREE_LOCAL_RERANK_PROVIDERS` set so `--max-cost` callers don't TX2 hard-fail on local rerank, and a doctor-probe divergence fix (probe and live search now read the same `search.reranker.model` path via `loadSearchModeConfig` + `resolveSearchMode`). ZE-hosted users are unchanged. Voyage / Cohere / vLLM rerankers stay out of scope — different wire shapes need adapter hooks designed against their actual shapes in a follow-up plan. Verification: - `bun run verify` (typecheck + 13 pre-checks): clean - `bun run check:all` (15 historical checks): clean - 107/107 expect() calls pass across 5 affected test files - /codex review against the full diff: GATE PASS (caught one [P2] /v1 path doubling bug pre-merge; fixed by changing recipe path to leaf `/rerank`) - Claude adversarial subagent: 7 net-new findings filed as v0.40.7+ TODOs (none currently exploitable; hardening for future contributor traps) Test surface (107 cases, 5 files): - test/ai/rerank.test.ts: path override (exact URL match), default_timeout_ms honored, empty models[] accepts any id, ZE regression - test/ai/recipe-llama-server-reranker.test.ts: recipe shape regression guard + base_url + path concat assertion (codex-caught /v1/v1/ regression) - test/search-mode.test.ts: timeout precedence chain (per-call > config > recipe > bundle), ZE no-recipe-default regression, unknown provider fallthrough - test/models-doctor-reranker.test.ts: divergence-fix helper across DB-plane read, mode default, disabled, override, DB-error graceful fallback - test/core/budget/budget-tracker.test.ts: free-local rerank pricing + arbitrary model id + chat-kind TX2 hard-fail preserved Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r-reranker) The hand-curated llms-config.ts doc map never included docs/ai-providers/, so both zeroentropy.md (since v0.35.0.0) and the new llama-server-reranker.md were invisible to the AI-facing llms.txt / llms-full.txt index. Adds an "AI providers" section with both. Marked includeInFull: false (setup walkthroughs belong in the index but would push the single-fetch bundle past FULL_SIZE_BUDGET) — same treatment CHANGELOG.md gets. Caught by the /ship document-release subagent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
doctor --remediation-plan and autopilot both judged the embedding provider with a hosted-only key check, so a brain on ollama: or llama-server: was reported "blocked" on a missing API key it never needed, contradicting doctor --json's 100%-coverage health. Extract a shared embeddingProviderConfigured() helper into brain-score-recommendations.ts: empty auth_env.required (local providers) is configured with no key; hosted providers check their OWN required key. Both producers (doctor, autopilot) call it, killing the DRY violation that caused the bug. Hosted brains with a missing key still block.
A --max-cost-bounded embed/reindex job configured for ollama: or llama-server: TX2 hard-failed with no_pricing because lookupEmbeddingPrice has no entry for local models. Add FREE_LOCAL_EMBED_PROVIDERS (sibling to FREE_LOCAL_RERANK_PROVIDERS) so a pricing miss on a local-inference provider returns $0 instead of null. lmstudio/litellm intentionally excluded.
A down/misconfigured local embed server was invisible until first embed. Add probeEmbeddingReachability() (mirrors the reranker probe): a 1-input embed with a 5s abort timeout, classified via classifyError, under a new 'embedding_reachability' touchpoint, gated on the zero-network config probe returning ok first.
codex review caught a false positive: HOSTED_EMBED_KEY_CONFIG mapped VOYAGE_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY to config fields, but buildGatewayConfig only threads openai/anthropic/zeroentropy config keys into the gateway env. A Voyage/Google brain with the key only in config.json would be judged "configured" and dispatch an embed.stale job that then fails auth at the gateway. Drop those two from the map so the producer closures resolve them by env var only, matching what the gateway can actually use. Pinned by a regression test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…provider support Replaces the hardcoded `new Anthropic()` client in the dream-cycle synthesize phase with a gateway-routed JudgeClient adapter. Mirrors the v0.35.5.0 pattern that closed #952 for runThink: construction-time provider/key probe returns null on a clear miss (cheap pre-flight); the verdict loop wraps the chat call in try/catch for AIConfigError mid-run. Any provider with a registered gateway recipe (Anthropic, DeepSeek, OpenRouter, Voyage, Ollama, llama-server, etc.) is now reachable via: gbrain config set models.dream.synthesize_verdict <provider>:<model> The canonical config key `models.dream.synthesize_verdict` (per PER_TASK_KEYS in src/core/model-config.ts) is used unchanged. The exported JudgeClient interface signature is preserved for test-seam stability. The original community PR (#1349) shipped a custom fetch adapter that bypassed the gateway entirely. This reworked landing routes through the canonical seam so future provider additions automatically benefit, and a CI guard (T7) will land in this wave to prevent the bug class from re-opening (the same one that bit src/core/think/index.ts before v0.35.5.0). Co-Authored-By: justemu <206393437+justemu@users.noreply.github.com>
…t parity
11 cases pin the gateway-routed JudgeClient adapter from T5:
- A1: makeJudgeClient returns null on missing Anthropic key (legacy short-circuit preserved)
- A2: returns a JudgeClient when chat provider is reachable
- A3: JudgeClient.create routes through gateway.chat (via __setChatTransportForTests)
- A4: ChatResult.text → Anthropic.Message.content[0].text mapping
- A5: empty text from gateway → graceful empty-text Anthropic.Message
- A6: non-AIConfigError from gateway propagates to caller (no swallow)
- A7: AIConfigError from gateway propagates as AIConfigError (caught per-transcript in production loop)
- A8: makeJudgeClient returns null on unknown provider prefix
- A9: returns a JudgeClient for non-anthropic providers without env-probing (delegates to gateway at call time)
- R3: parsed-verdict SEMANTIC parity — gateway-routed and legacy SDK-shape JudgeClients produce same {worth_processing, reasons} given identical canned LLM text
- R3 corollary: unparseable LLM output → both paths fall through to cheap-fallback verdict
Codex flagged byte-identical-Anthropic.Message as a meaningless gate; R3 is
parsed-verdict semantic parity instead. Mirror pattern of
test/think-gateway-adapter.test.ts for cross-site consistency with the
v0.35.5.0 runThink migration.
… files New scripts/check-gateway-routed-no-direct-anthropic.sh greps two guarded files (src/core/cycle/synthesize.ts and src/core/think/index.ts) for `new Anthropic()` constructor calls and runtime imports of @anthropic-ai/sdk. Type-only imports (`import type Anthropic from '@anthropic-ai/sdk'`) stay allowed because both files use Anthropic.Message / .MessageCreateParamsNonStreaming as adapter types. Comment lines (starting with `//` or ` *`) are excluded so historical references in JSDoc don't false-fire. Negative test in this commit's verification confirms: injecting `new Anthropic()` into synthesize.ts makes the guard exit 1 with a clear error pointing at the gateway adapter pattern; reverting restores the OK state. Wired into both `bun run verify` and `bun run check:all`. Closes the bug class that bit synthesize.ts in PR #1349 (which would have shipped a parallel fetch stack instead of routing through the canonical gateway). The same class previously bit think/index.ts and was fixed structurally in v0.35.5.0; this guard prevents either file from regressing. Extend GUARDED_FILES in the script when migrating another file off direct SDK construction.
…-file Extends the put_page op description (surfaced by `gbrain put --help`) with a one-line pointer to `gbrain capture --file PATH --slug SLUG` for the file- as-input use case. Capture (v0.39.3.0) is the canonical Windows-pipe-buffer escape route: reads files as a Buffer first, scans the first 8KB for NUL bytes to refuse binary content, decodes to UTF-8 only after the safety check, and adds provenance write-through. Lands the user-facing value the closed PR #1365 was reaching for, without duplicating the CLI surface. Credits the original contributor. Co-Authored-By: ecat2010 <90021101+ecat2010@users.noreply.github.com>
…ding
Per the wave's eng-review plan (IRON RULE — mandatory):
R1 — get_page handler accepts calls without `content` param. Pre-wave
PR #1365 landed its `!p.content → throw` check in the WRONG handler
(get_page instead of put_page), which would have broken every read
in the system. Pin: get_page MUST NOT require content + the schema
carries no `content` or `file` param.
R2 — put_page schema content stays `required: true`. PR #1365 also
flipped `content` from required→optional in the schema. Pin: the
contract stays at `required: true` + the closed PR's `file` param
is NOT in the schema.
R4 — Cross-platform stdin via fd 0 (PR #1325 regression pin). Source-grep
asserts src/cli.ts uses `readFileSync(0, ...)` and NOT the legacy
`readFileSync('/dev/stdin', ...)`. Belt-and-suspenders pattern
assertions confirm the parseOpArgs branch shape (cliHints.stdin
check, 5MB cap, isTTY gate) hasn't drifted.
R3 (gateway-adapter parsed-verdict parity) lives in the sibling file
test/cycle/synthesize-gateway-adapter.test.ts.
…icity After T5's gateway-adapter rework, the "no API key" verdict text changed from 'no ANTHROPIC_API_KEY for significance judge' to 'no configured provider for verdict model: <model>' (broader + names the actual model so the user sees WHICH provider failed). Update both assertions that check the old text. Hermeticity bug fix in the same commit: `withoutAnthropicKey` previously only cleared the env var. After the rework, `makeJudgeClient` ALSO checks `loadConfig().anthropic_api_key` (same hasAnthropicKey() pattern think/index.ts uses since v0.35.5.0). If the developer running the test has the key set in ~/.gbrain/config.json, the test would behave non-deterministically. Fix: override GBRAIN_HOME to a fresh tmpdir for the duration of the body, restore on return (even on throw).
…-end
Drives runPhaseSynthesize against a real PGLite engine with the gateway
chat transport stubbed to throw AIConfigError on every call (simulates a
revoked/misconfigured provider surfacing mid-run). Asserts:
- Phase does NOT crash; converts the throw to a per-transcript verdict
with worth=false and reasons[0] matching "gateway error: ...".
- status='ok' so subsequent transcripts in the loop would continue
being judged (not visible in 1-transcript test, but the loop shape is
proven not to abort).
Pre-rework (T5), this code path didn't exist — judgeSignificance threw
directly to runPhaseSynthesize and crashed the whole phase. Pin so a
future regression that removes the try/catch fires loudly.
Two additions to the Key files section: - src/core/cycle/synthesize.ts — appends a v0.41+ paragraph documenting the gateway-adapter rework (makeJudgeClient + AIConfigError catch loop + canonical config key + JudgeClient interface preserved + CI guard reference + test file references). - scripts/check-gateway-routed-no-direct-anthropic.sh — new entry documenting the CI guard's contract, scope, and how to extend GUARDED_FILES when migrating another file off direct SDK construction. CLAUDE.md drives /sync-gbrain and llms.txt generation; both need the wave's annotations to land BEFORE the llms regeneration step (T10).
Refreshes the auto-generated llms.txt bundles to pick up the CLAUDE.md annotations landed earlier in this wave (gateway-adapter synthesize.ts + check-gateway-routed-no-direct-anthropic.sh + the cherry-picked llama-server-reranker recipe). Pinned by test/build-llms.test.ts.
…anker
v0.40.6.1 introduced `llama-server-reranker` (21 chars), which overflowed
formatRecipeTable's static 14-char PROVIDER column. When the id is longer
than the column, padEnd is a no-op — the row starts with the tier name
directly, no space delimiter. test/providers.test.ts 'each recipe appears
at most once' iterates every recipe and asserts at least one row starts
with `${id} ` or `${id} `; with no space after `llama-server-reranker`,
the assertion fails and the recipe appears effectively missing from the
human-readable list.
Fix: compute column width dynamically as `max(14, max(id.length) + 1)` so
every id is followed by at least one space, regardless of length. Also
widens the separator rule to match. 14 stays as the floor so the existing
short-id rows (openai 6, ollama 6, anthropic 9, ...) keep their familiar
layout when llama-server-reranker isn't in the active recipe set.
10/10 cases in test/providers.test.ts pass after the fix.
…mbed timeout TODO Two pre-landing review absorptions: - `src/commands/models.ts:154` — the help-text tip said `gbrain models doctor` "spends ~1 token per model" but the wave added an `embed(['probe'])` call AND a reranker probe. Generalize to "spends a minimal request per configured chat/embed/rerank surface" so the cost expectation matches reality. - `TODOS.md` — file a follow-up to widen `default_timeout_ms` from RerankerTouchpoint to EmbeddingTouchpoint so `probeEmbeddingReachability` doesn't hardcode 5000ms while the sibling reranker probe reads the recipe's configured timeout. Local CPU embedding endpoints (llama-server) hit the same cold-start curve as Qwen3-Reranker-4B; workaround today is "re-run the probe" per the existing JSDoc. Other informational findings from pre-landing review either match established patterns (no behavioral test for `probeEmbeddingReachability`, matching `probeRerankerReachability`), are intentional choices documented in JSDoc (the `as unknown as Anthropic.Message` cast), or are micro-perf in non-hot paths (autopilot's 4 sequential `getConfig` awaits per 5-minute tick). All non-blocking.
…t JSDoc
Adversarial review caught two soft spots in the wave's new contracts:
1. `scripts/check-gateway-routed-no-direct-anthropic.sh` only matched the
default-import shape `import Anthropic from '@anthropic-ai/sdk'`. A future
contributor (or, more realistically, a future refactor) could bypass with:
- `import { Anthropic } from '@anthropic-ai/sdk'`
- `import { Anthropic as A } from '@anthropic-ai/sdk'`
- `import * as Anthropic from '@anthropic-ai/sdk'`
- `const x = await import('@anthropic-ai/sdk')`
Tightened the regex to match ANY value-shaped import from the SDK module
(excluding only the explicit `import type ... from '@anthropic-ai/sdk'`
form which the adapter's Anthropic.Message return type needs). Added a
second grep for dynamic imports. Verified all four bypass shapes now
trigger the guard against synthesize.ts; type-only import still passes.
2. `synthesize.ts:makeJudgeClient` JSDoc claimed the adapter "tolerates the
array-of-blocks shape for future flexibility" — but the mapping flattens
ONLY text blocks; `tool_use`, `tool_result`, image blocks silently
become empty strings. Today only `judgeSignificance` calls this and it
only sends string content, so no behavior bug. But the comment was
marketing future flexibility the code doesn't deliver. Narrowed to call
out the silent-drop and say to extend the mapping if a future caller
wires non-text content through.
Both wave-scope: the CI guard was added by the wave, the JSDoc was added
by the wave's T5 rework. Adversarial review caught them before merge.
…ence chain Codex Pass-9 adversarial review caught a probe-vs-production divergence: production `hybridSearch` resolves reranker timeout via the full chain (per-call > config > recipe > bundle) by going through `loadSearchModeConfig + resolveSearchMode`, but `probeRerankerReachability` was reading ONLY the recipe's `default_timeout_ms` — so an operator who set `search.reranker.timeout_ms=1000` would see doctor wait 30s and report "reachable" while production search timed out at 1s and fail-opened. A higher configured timeout produces the opposite false failure (probe gives up at 5s when production would have waited longer). Fix: extract `resolveLiveRerankerTimeoutMs(engine)` parallel to the existing `resolveLiveRerankerModel(engine)` — same precedence chain, same DB-plane consistency posture. The probe now reads the SAME timeout live search reads, on the same lookup path. The codex P1 finding about `FREE_LOCAL_*_PROVIDERS` zero-pricing being bypassable via redirected `LLAMA_SERVER_BASE_URL` is filed as a TODO under community-pr-wave follow-ups — couples with the existing FREE_LOCAL_PROVIDERS unification TODO so both close in one v0.41+ PR.
Codex structured review [P3] caught a bypass in the freshly-tightened
gateway-routed guard:
import { type Message, Anthropic } from '@anthropic-ai/sdk';
new Anthropic();
The previous regex `^\s*import\s+[^t][^y]*from ...` was meant to exclude
`import type ...` but stops at the `y` in `type` inside the brace list,
silently allowing the value-import `Anthropic` through. Two fixes:
1. Replace the brittle regex-based type-exclusion with a clause-level
parse: extract the brace-list specifiers, allow the import iff EVERY
non-empty specifier is `type`-prefixed. Catches mixed-import bypasses
(`{ type Foo, Bar }`) while keeping all-type braces (`{ type Foo, type Bar }`)
passing. Default + namespace imports remain always-value-shaped.
2. Replace `\s` with POSIX `[[:space:]]` in the sed extract — macOS BSD sed
doesn't honor `\s` in extended-regex mode (it silently no-ops the pattern
so `specifiers` comes back empty and the script falls through to the
default/namespace branch's wrong error message).
Hermetic 7-shape regression matrix now verifies every TypeScript import
shape against the expected ALLOW/BLOCK verdict; all 7 pass:
- ALLOW: `import type Anthropic from '...'`
- ALLOW: `import type { Foo } from '...'`
- ALLOW: `import { type Message, type Foo } from '...'`
- BLOCK: `import { type Message, Anthropic } from '...'`
- BLOCK: `import { Anthropic } from '...'`
- BLOCK: `import Anthropic from '...'`
- BLOCK: `import * as A from '...'`
Subshell-trap fix in the same commit: the previous "exit 1 inside while-pipe"
pattern doesn't propagate to the outer `$?` because the pipe spawns a
subshell. Switched to a tmpfile-flagged sentinel so the verdict survives
the subshell boundary cleanly.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 24, 2026
garrytan
pushed a commit
that referenced
this pull request
May 25, 2026
Sibling fix-wave PR #1377 (garrytan/community-pr-wave) claimed v0.41.4.0 between my queue check (.3.0 was available) and PR creation. Re-bump to the next available slot per workspace-aware allocator. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…pr-wave # Conflicts: # CHANGELOG.md # VERSION # package.json
CI failure surfaced a time-dependent test flake in `test/audit/audit-writer.test.ts` "returns events from current week, filtered by ts cutoff" (added in v0.40.4.0 PR #1300). The test pinned synthetic `now = 2026-05-22T12:00:00Z` (ISO week 21), logged 3 events with synthetic ts values, then called `readRecent(7, now)` expecting to find 2 events in window. Root cause: `log()` ignored the caller-supplied `ts` for filename routing and ALWAYS wrote to the file matching real-time-now's ISO week. When real CI time crossed into 2026-W22 (this Monday), the events went to W22's file but `readRecent` walked W21 + W20 → 0 hits. Fix: - `log()` parses `event.ts` (when provided) and routes to the file matching that ts's ISO week. Falls back to real-now when ts is missing or unparseable. - No behavior change for production callers — none of the 5 audit consumers pass `ts` explicitly (rerank-audit, audit-slug-fallback, content-sanity-audit, graph-signals, supervisor-audit). The writer stamps real-now → both ts and filename use real-now → same file as before. - Sibling test "honors caller-supplied ts override" also pinned a fixed ts and would have broken from the opposite angle (test read from `computeFilename()` default = real-now). Updated to read from `computeFilename(new Date(fixedTs))` so it asserts the per-row file routing the wave now provides. 22/22 audit-writer cases pass. Production callers (5 sites) unchanged. Pre-existing on master since v0.40.4.0; surfaced when real time crossed into a different ISO week than the test's synthetic now. NOT introduced by this PR (#1377 community-PR-wave) — audit-writer files aren't touched by the wave.
…pr-wave # Conflicts: # CHANGELOG.md # TODOS.md # VERSION # package.json # src/core/audit/audit-writer.ts # test/audit/audit-writer.test.ts
garrytan
added a commit
that referenced
this pull request
May 25, 2026
…1374) * fix(recipes/openai): add max_batch_tokens to embedding touchpoint OpenAI is the only recipe in the codebase without a max_batch_tokens cap. Every other provider declares one (voyage=120K, azure-openai=8K, dashscope=8K, zhipu=8K, minimax=4K). Without it, gbrain's recursive-halving safety net never engages — batches dispatched purely on the char/4 estimator window will trip OpenAI's 1M-token TPM ceiling on token-dense pages (Discord exports, JSON dumps, code-heavy markdown), then retry storm and block the queue head. Setting cap to 100_000: - gbrain's batcher estimates tokens as chars/4 - Token-dense markdown+JSON tokenizes at ~chars/2.7 - 100K estimated = ~150K real worst-case, safely under OpenAI's 300K per-request hard cap and the 1M/min TPM ceiling - Leaves headroom for recursive-halving on outlier chunks (cherry picked from commit 40536aa) * fix(ai/embed): recognize OpenAI 'maximum request size' error in isTokenLimitError OpenAI's /v1/embeddings endpoint hard-caps a single request at 300k tokens total across all input items. When the cap is exceeded it returns: Invalid 'input': maximum request size is 300000 tokens per request. None of the three existing regexes in isTokenLimitError matched this phrasing, so the recursive-halving safety net in embedSubBatch never engaged for OpenAI. The same fat page (a token-dense markdown export, e.g. a Discord transcript) would re-fail every pass, blocking forward progress on the whole batch indefinitely. Locally reproduced on a 31,129-chunk Postgres brain: 2,125 chunks stuck at 'remaining' across 30+ embed --stale passes with retry loops + sleep delays. Adding the two new patterns lets halving fire; the same backlog cleared in one pass after the regex change (the companion max_batch_tokens recipe fix from PR #924 caps fresh batches, but existing oversize pages still need halving to recover). Adds: - /maximum request size.*tokens/i — OpenAI verbatim - /max.*tokens.*per.*request/i — defensive against minor rewording Tests: - Regression test for the exact OpenAI error string - Coverage for the generic 'max tokens per request' variant - All 25 tests in adaptive-embed-batch.test.ts pass No behavior change for providers whose errors already matched. (cherry picked from commit b834e84) * fix(connection-manager): strip .<project-ref> suffix from username when deriving direct URL `deriveDirectUrl()` correctly rewrites the host (`aws-0-us-east-1.pooler.supabase.com` → `db.abcxyz.supabase.co`) but preserves the full pooler-form username (`postgres.abcxyz`). Supabase direct connections expect a bare `postgres` username — Supavisor uses the `.<ref>` suffix for tenant routing, but it's not a real database user. The auto-derived URL therefore fails to authenticate even with the correct password: password authentication failed for user "postgres.abcxyz" Strip the suffix to `postgres` whenever the project-ref was successfully extracted (same condition that triggers the host rewrite). The non-pooler username branch is unaffected — preserved as-is to keep the port-only fallback case working. Hit while exercising v0.30.1's dual-pool routing on a real Supabase brain; the kill switch (`GBRAIN_DISABLE_DIRECT_POOL=1`) papered over it locally but every Supabase user with a stock pooler URL would silently fall through to single-pool until the user-supplied a `GBRAIN_DIRECT_DATABASE_URL` override. With this fix, dual-pool works out of the box for the canonical Supabase shape. Test additions: - 1 case asserting bare `postgres:secret@` in the derived URL when project-ref is parseable from the pooler URL (the new behavior) - extends the existing "falls back to port-only" case with an assertion that non-pooler usernames are preserved (unchanged behavior) `bun run typecheck` clean. `deriveDirectUrl` test block passes 5/5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit ddf2c6a) * fix(init): --help should not mutate config or scan filesystem `gbrain init --help` (and `-h`) currently fall through to the smart-detection branch in runInit(), which scans cwd for .md files and on a directory with 1000+ files prints "Found ~1500 .md files. For a brain this size, Supabase gives faster search..." then defaults to PGLite — calling saveConfig() and overwriting any existing Postgres config with `engine: 'pglite' + database_path: ~/.gbrain/brain.pglite`. Confirmed in the wild: ran `gbrain init --help` from $HOME on a machine where ~/.gbrain/config.json pointed at a Supabase Postgres brain with 10K+ pages. The config was silently flipped to PGLite. The Supabase data was intact, but gbrain stopped pointing at it until the config was manually restored. Root cause: cli.ts:62-69 only routes --help → printOpHelp() for shared-op commands; CLI_ONLY commands (init, embed, etc.) fall through to their handler with --help still in argv. None of them check for it. Fix: add a --help/-h guard at the top of runInit() that prints help text and returns. Help should never mutate state — Postel's robustness principle for CLI tools. Help text covers all flags (engine selection, AI provider options, thin-client mode) so users running `--help` get the canonical list rather than having to read the source. A wider architectural fix — adding --help routing for all CLI_ONLY commands in cli.ts — is plausible follow-up, but each CLI_ONLY command would still need its own help text. This per-command pattern matches how shared ops handle it via printOpHelp(). Init is the highest-stakes case because it's the only CLI_ONLY command that calls saveConfig(). Smoke test: from a directory with 1500 .md files, with GBRAIN_HOME pointed at a fresh tempdir: - Before fix: ~/.gbrain/config.json materialized with engine: 'pglite' - After fix: help text printed, no config dir created `bun run typecheck` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit ed11fdd) * test(frontmatter-install-hook): isolate hooksPath assertion from developer global config The "installHook writes ... and sets core.hooksPath" test asserted `git config --get core.hooksPath` returns `.githooks`, which falls back to the global scope when local is unset. Developers who set `core.hooksPath` globally (common with dotfiles managers pointing at ~/.config/git/hooks) saw a deterministic FAIL because installHook intentionally respects an existing global value and skips writing the local one — exactly the documented contract. Fix: read via `git config --local --get core.hooksPath` (scope-locked) and branch the assertion on whether a global is already set. Both clean-CI (local should be '.githooks') and developer-with-global (local should be empty; installHook correctly didn't clobber) now pass deterministically. No API change. installHook behavior is unchanged. Verified locally with the affected test passing under `GIT_CONFIG_GLOBAL=~/.gitconfig` carrying `core.hooksPath=...`. (cherry picked from commit 0e4da2c) * fix: guard against missing 'intent' field in routing-eval fixtures Two defensive fixes: 1. normalizeText(): return empty string on null/undefined input instead of crashing with 'undefined is not an object (evaluating s.toLowerCase)' 2. loadRoutingFixtures(): validate that parsed fixture has 'intent' as a string before adding to fixtures array. Fixtures with wrong field names (e.g. 'input' instead of 'intent') are now reported as malformed with a helpful error message listing the actual keys found. Root cause: a skill's routing-eval.jsonl used {"input": ...} instead of {"intent": ...}. The JSON parsed fine but the cast to RoutingFixture was unchecked, so fixture.intent was undefined. normalizeText(undefined) then crashed. This made 'gbrain doctor' completely unusable. (cherry picked from commit b142bbd) * fix(test): isolate HOME in run-e2e.sh to stop config corruption Replaces #517 (re-ported fresh against current scripts/run-e2e.sh after v0.23.1 rewrote the script — original cherry-pick would not apply). E2E tests call setupDB which writes $HOME/.gbrain/config.json pointing at the docker test container. When the container tears down, the user's real autopilot daemon wedges trying to connect to a vanished postgres. Three operators hit this within 16 days before the original PR filed. Fix: wrapper exports HOME + GBRAIN_HOME to a mktemp tmpdir BEFORE bun starts so config writes land in the tmpdir, with a post-run breach detector that compares md5 of the user's real config against pre-run. Both env vars required: loadConfig/saveConfig resolve via HOME while configPath honors GBRAIN_HOME. HOME set before bun starts because os.homedir() caches at first call. Test seam: test/gbrain-home-isolation.test.ts updated to assert against homedir() === configDir() when GBRAIN_HOME unset (correct under the safety wrapper itself) instead of the prior "not /tmp/" sentinel. Revert path: git revert <this-sha> if test:e2e regresses on master. Co-Authored-By: orendi84 <orendi84@users.noreply.github.com> * test(dream-cycle): add schema-suggest to EXPECTED_PHASES v0.40.7.0 Schema Cathedral v3 added the 'schema-suggest' phase between 'orphans' and 'purge' in ALL_PHASES, but the E2E phase-order test was not updated to match. ALL_PHASES vs EXPECTED_PHASES diverged and the shape-pin test failed every run on master. Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(autopilot-fanout): use relative timestamp inside freshness window The 'end-to-end: updateSourceConfig persists timestamp visible to next listAllSources' test pinned last_full_cycle_at to a hardcoded '2026-05-22T15:00:00.000Z'. The 60-minute freshness window passed within ~1 hour of write — every run after the deadline classified the source as stale and dispatched it, breaking the test's .skippedFresh expectation. Switch to Date.now() - 30min relative timestamp (mirrors the prior 'source with last_full_cycle_at < 60min ago is skipped by gate' test). Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(fresh-install-pglite): unset other provider keys in beforeEach init.ts:455 fails loud when multiple embedding providers are env-ready in non-TTY mode. The test sets ZEROENTROPY_API_KEY then runs init, but developer machines commonly have OPENAI_API_KEY + VOYAGE_API_KEY + ZEROENTROPY_API_KEY all set, so init sees 3 providers and exits 1. Save+unset OPENAI_API_KEY + VOYAGE_API_KEY in beforeEach, restore in afterEach. Now only ZE is env-ready, init picks it, schema sized to zembed-1's 1280d as the test expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(voyage-multimodal): switch fixture from AVIF to PNG Voyage's /multimodalembeddings endpoint rejects AVIF as of 2026-05 with 'Please provide a valid base64-encoded image'. The prior comment ('AVIF is fine for an embed call') held at v0.27.x and regressed silently on the provider side. Add test/fixtures/images/tiny.png (16x16 RGB PNG, 1307 bytes generated via sips from the macOS default wallpaper). PNG is universally accepted by Voyage and other multimodal providers. Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cycle/synthesize): prefix bare anthropic model ids before queue.add queue.add's subagent capability validator (classifyCapabilities → resolveRecipe) requires provider:model format and rejects bare ids with 'unknown provider'. resolveModel returns the bare id from TIER_DEFAULTS / DEFAULT_ALIASES (e.g. 'claude-sonnet-4-6'), which the validator then rejects, dropping the synthesize phase to status:fail with SYNTH_PHASE_FAIL. Narrow fix at the call site: if config.model has no colon AND starts with 'claude-', prefix 'anthropic:'. Other providers must already declare a colon. Avoids changing TIER_DEFAULTS / DEFAULT_ALIASES constant shapes, which would ripple across every resolveModel caller. Surfaced by dream-synthesize-chunking E2E during fix-wave: warm-narwhal. Affected tests: 'single-chunk transcript uses legacy idempotency key' and 'multi-chunk transcript spawns N children with chunk-suffixed idempotency keys' — both relied on result.details.children_submitted which only the ok() path sets; the failed() path returns details: {}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(mechanical): pin doctor init embedding model + clean non-default sources Two fixes in the E2E Doctor Command describe block, both surfaced by cross-file state pollution under the full sequential E2E run: 1. Pass --embedding-model openai:text-embedding-3-large to the init subprocess. Without the explicit flag, doctor inherits whatever the resolver picks from env keys (ZE if ZEROENTROPY_API_KEY is set, defaulting to zembed-1 at 1280d). The test's setupDB initialized schema at 1536d, so the dim mismatch fires embedding_width_consistency WARN, exiting doctor 1. 2. DELETE FROM sources WHERE id != 'default' in beforeAll. Prior E2E files leave non-default source rows (e.g. 'delta' from autopilot / sources tests). sync_freshness + cycle_freshness then FAIL on those orphans because they were never synced/cycled, exiting doctor 1. setupDB TRUNCATEs sources but schema.sql re-seeds 'default' via initSchema; this leaves only the canonical single-source brain the test expects. Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(run-e2e): per-file connection flush + 180s outer timeout Two cross-file isolation hardenings for the sequential E2E runner: 1. Terminate stale Postgres connections before each file. Without this, idle connections from the prior bun process's pool race with the next file's setupDB() TRUNCATE CASCADE, producing 'fixture pages disappear mid-test' failures. The terminate call is idempotent + ~50ms; first iteration is a no-op. 2. Hard outer timeout (180s per file) via gtimeout / timeout. bun's --timeout=60000 is per-test; if a PGLite WASM call hangs in beforeAll/afterAll (e.g. ingestion-roundtrip.test.ts wedging 30+ minutes on macOS), --timeout never fires and the entire suite wedges. Outer SIGKILL lets the suite advance and the file is recorded as failed for triage. Falls through to bare bun if neither gtimeout nor timeout is on PATH. Surfaced during fix-wave: warm-narwhal — 3 of 5 cross-file flakes caught by the connection flush; ingestion-roundtrip 30-min wedge caught by the outer timeout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.41.3.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: annotate synthesize.ts narrow prefix fix (v0.41.3.0) CLAUDE.md gains the v0.41.3.0 note on src/core/cycle/synthesize.ts (narrow anthropic: prefix at the queue.add boundary so resolveModel's bare ids satisfy the subagent validator). llms-full.txt regenerated to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: rebump v0.41.3.0 → v0.41.5.0 (queue drift; PR #1377 claimed .4.0) Sibling fix-wave PR #1377 (garrytan/community-pr-wave) claimed v0.41.4.0 between my queue check (.3.0 was available) and PR creation. Re-bump to the next available slot per workspace-aware allocator. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(cycle/synthesize): refuse empty brainDir + resolve relative paths Pre-fix, runPhaseSynthesize accepted any brainDir string and passed it to writeReversePages which does join(brainDir, '<slug>.md'). When brainDir is '' or relative ('.' / './brain' / etc), join() produces a relative path that writeFileSync resolves against cwd. Result: every synthesize reverse-write spills into <cwd>/companies/<slug>.md, <cwd>/people/<slug>.md, etc. instead of the intended brainDir tempdir. Surfaced by the warm-narwhal wave when E2E test cleanup found orphan synthesize pages (companies/novamind.md, people/sarah-chen.md, meetings/2025-04-01-novamind-board-update.md) at the gbrain repo root from a runCycle({brainDir: '.'}) chain that ran during morning E2E execution. Fix at the function entry, single location, all callers protected: 1. Empty/whitespace brainDir → return failed(BRAINDIR_EMPTY) loud instead of silently resolving against cwd 2. Relative brainDir → resolve(opts.brainDir) before any read/write can use it. opts.brainDir mutated so writeReversePages, writeSummaryPage, and every join() downstream see the absolute path Regression test pins all 4 contracts: - empty string → fail(BRAINDIR_EMPTY) - whitespace-only → fail(BRAINDIR_EMPTY) - '.' → mutated to absolute on entry - already-absolute → unchanged Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(dream): resolve brainDir to absolute at CLI surface Defense-in-depth for the synthesize-braindir spillage bug class. The core fix lives in runPhaseSynthesize (commit 98222a0); this resolves brainDir one layer earlier so the entire 9-phase runCycle gets the absolute path, not just synthesize. Two paths in resolveBrainDir get path.resolve(): - explicit --dir argument (e.g., `gbrain dream --dir .`) - sync.repo_path config (in case it was ever stored relative) resolveBrainDir already checked existsSync; resolve() just canonicalizes before return. No behavior change for paths already absolute. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Matt Gunnin <mgunnin@esports.one> Co-authored-by: Brandon Lipman <brandon@offdeck.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Jeremy Knows <jeremy@veefriends.com> Co-authored-by: root <root@localhost> Co-authored-by: orendi84 <orendigergo@gmail.com> Co-authored-by: orendi84 <orendi84@users.noreply.github.com> Co-authored-by: Garry Tan <garry@ycombinator.com>
garrytan
added a commit
that referenced
this pull request
May 25, 2026
Master advanced past v0.41.3.0: - v0.41.4.0: local providers + cross-platform stdin + gateway-routed dream judge (#1377) - v0.41.5.0: warm-narwhal fix-wave — 6 community PRs + E2E reliability (#1374) Resolved VERSION + package.json + CHANGELOG + TODOS conflicts. v0.41.9.0 still wins the version slot; CHANGELOG now interleaves with master's v0.41.4 and v0.41.5 entries below ours; TODOS keeps both sections. 3-line audit: VERSION + package.json + CHANGELOG all agree on 0.41.9.0. Typecheck clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mgunnin
added a commit
to mgunnin/gbrain
that referenced
this pull request
May 28, 2026
* upstream/master: (22 commits) v0.41.4.0 wave: local providers + cross-platform stdin + gateway-routed dream judge (6 community PRs) (garrytan#1377) v0.41.3.0 fix(security/mcp): OAuth CORS lockdown + pre-register without DCR + validator surface (garrytan#1403) v0.41.2.0 feat: lens packs + epistemology unification — atoms + concepts as first-class units, calibration profile widening, gstack-learnings bridge (garrytan#1364) v0.41.1.0 feat: eval-loop wave — gbrain bench publish + gbrain eval gate close the LOOP (garrytan#1352) v0.41.0.0 feat(minions): fleet you supervise (4 field bugs + cathedral) (garrytan#1367) v0.40.10.0 feat: content sanity defense — junk-pattern throw + oversize-skip-embed (garrytan#1351) v0.40.9.0 feat(chunker): .sql indexing via tree-sitter + code-def on SQL DDL (garrytan#1173) (garrytan#1350) v0.40.8.1 docs: README rewrite + personal-brain + company-brain tutorials (garrytan#1345) v0.40.8.0 test: e2e + unit gap coverage + master flake root-cause fixes (garrytan#1313) v0.40.6.1 docs(todos): file v0.41 wave commitments + 7 verified-missing items (garrytan#1333) v0.40.7.0 Schema Cathedral v3 — agent-on-ramp + production rebuild of PR garrytan#1321 (garrytan#1327) v0.40.6.0 feat(sync): parallel sync --all + per-source lock invariant + sources status dashboard (productionized from PR garrytan#1314) (garrytan#1324) v0.40.5.0 Federated Sync v2 — parallel source sync + push triggers + per-source health (garrytan#1322) v0.40.4.0 feat(search): selective graph signals + per-stage attribution + audit-writer unification (garrytan#1300) v0.40.3.0 feat: contextual retrieval + cache invalidation gate + 4 deferred-item closures (garrytan#1323) v0.40.2.0 feat: trajectory routing for temporal + knowledge_update (gbrain think + LongMemEval) (garrytan#1296) v0.40.1.0 Track D — eval infrastructure (catch retrieval regressions, prove answer-quality wins) (garrytan#1298) v0.40.0.0 feat: agent-voice (Mars + Venus) + copy-into-host-repo skillpack paradigm (garrytan#1128) v0.39.3.0: productionize the v0.38 ingestion cathedral (smoke-test fix wave from PR garrytan#1299) (garrytan#1308) v0.39.2.0 feat(autopilot): per-source fan-out + cycle lock primitive + phase taxonomy (garrytan#1295) ...
garrytan-agents
pushed a commit
to garrytan-agents/gbrain
that referenced
this pull request
Jun 13, 2026
…ed dream judge (6 community PRs) (garrytan#1377) * fix(cli): use fd 0 instead of '/dev/stdin' for cross-platform stdin reads `readFileSync('/dev/stdin', 'utf-8')` works on Unix but fails on Windows (Git Bash, PowerShell, cmd) with `ENOENT: no such file or directory, open '/dev/stdin'`. Windows doesn't expose `/dev/stdin` as a filesystem path. Reading file descriptor 0 directly (`readFileSync(0, 'utf-8')`) is the documented Node.js idiom and works on every platform. No behavior change on Unix — same syscall path, same semantics. Repro on Windows before the fix: echo "test" | gbrain put my-page ENOENT: no such file or directory, open '/dev/stdin' After: round-trip put/search/delete works on Windows Git Bash. * v0.40.6.1 feat: llama-server reranker — local Qwen3 / self-hosted ZE via llama.cpp Adds local reranker support so users can point gbrain's reranker call at their own llama.cpp server instead of ZeroEntropy's hosted API. One new recipe (`llama-server-reranker`), a `path?: string` + `default_timeout_ms?: number` extension on `RerankerTouchpoint`, env passthrough wiring, budget-tracker `FREE_LOCAL_RERANK_PROVIDERS` set so `--max-cost` callers don't TX2 hard-fail on local rerank, and a doctor-probe divergence fix (probe and live search now read the same `search.reranker.model` path via `loadSearchModeConfig` + `resolveSearchMode`). ZE-hosted users are unchanged. Voyage / Cohere / vLLM rerankers stay out of scope — different wire shapes need adapter hooks designed against their actual shapes in a follow-up plan. Verification: - `bun run verify` (typecheck + 13 pre-checks): clean - `bun run check:all` (15 historical checks): clean - 107/107 expect() calls pass across 5 affected test files - /codex review against the full diff: GATE PASS (caught one [P2] /v1 path doubling bug pre-merge; fixed by changing recipe path to leaf `/rerank`) - Claude adversarial subagent: 7 net-new findings filed as v0.40.7+ TODOs (none currently exploitable; hardening for future contributor traps) Test surface (107 cases, 5 files): - test/ai/rerank.test.ts: path override (exact URL match), default_timeout_ms honored, empty models[] accepts any id, ZE regression - test/ai/recipe-llama-server-reranker.test.ts: recipe shape regression guard + base_url + path concat assertion (codex-caught /v1/v1/ regression) - test/search-mode.test.ts: timeout precedence chain (per-call > config > recipe > bundle), ZE no-recipe-default regression, unknown provider fallthrough - test/models-doctor-reranker.test.ts: divergence-fix helper across DB-plane read, mode default, disabled, override, DB-error graceful fallback - test/core/budget/budget-tracker.test.ts: free-local rerank pricing + arbitrary model id + chat-kind TX2 hard-fail preserved Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: post-ship documentation sync * docs: index docs/ai-providers/ in llms.txt (zeroentropy + llama-server-reranker) The hand-curated llms-config.ts doc map never included docs/ai-providers/, so both zeroentropy.md (since v0.35.0.0) and the new llama-server-reranker.md were invisible to the AI-facing llms.txt / llms-full.txt index. Adds an "AI providers" section with both. Marked includeInFull: false (setup walkthroughs belong in the index but would push the single-fetch bundle past FULL_SIZE_BUDGET) — same treatment CHANGELOG.md gets. Caught by the /ship document-release subagent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix: recipe-aware embedding-provider check for local providers doctor --remediation-plan and autopilot both judged the embedding provider with a hosted-only key check, so a brain on ollama: or llama-server: was reported "blocked" on a missing API key it never needed, contradicting doctor --json's 100%-coverage health. Extract a shared embeddingProviderConfigured() helper into brain-score-recommendations.ts: empty auth_env.required (local providers) is configured with no key; hosted providers check their OWN required key. Both producers (doctor, autopilot) call it, killing the DRY violation that caused the bug. Hosted brains with a missing key still block. * fix(budget): price local embed providers at $0 A --max-cost-bounded embed/reindex job configured for ollama: or llama-server: TX2 hard-failed with no_pricing because lookupEmbeddingPrice has no entry for local models. Add FREE_LOCAL_EMBED_PROVIDERS (sibling to FREE_LOCAL_RERANK_PROVIDERS) so a pricing miss on a local-inference provider returns $0 instead of null. lmstudio/litellm intentionally excluded. * feat(models): embedding reachability probe in gbrain models doctor A down/misconfigured local embed server was invisible until first embed. Add probeEmbeddingReachability() (mirrors the reranker probe): a 1-input embed with a 5s abort timeout, classified via classifyError, under a new 'embedding_reachability' touchpoint, gated on the zero-network config probe returning ok first. * fix: don't count config-plane voyage/google keys as configured codex review caught a false positive: HOSTED_EMBED_KEY_CONFIG mapped VOYAGE_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY to config fields, but buildGatewayConfig only threads openai/anthropic/zeroentropy config keys into the gateway env. A Voyage/Google brain with the key only in config.json would be judged "configured" and dispatch an embed.stale job that then fails auth at the gateway. Drop those two from the map so the producer closures resolve them by env var only, matching what the gateway can actually use. Pinned by a regression test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * feat(dream): route significance judge through gateway.chat for multi-provider support Replaces the hardcoded `new Anthropic()` client in the dream-cycle synthesize phase with a gateway-routed JudgeClient adapter. Mirrors the v0.35.5.0 pattern that closed garrytan#952 for runThink: construction-time provider/key probe returns null on a clear miss (cheap pre-flight); the verdict loop wraps the chat call in try/catch for AIConfigError mid-run. Any provider with a registered gateway recipe (Anthropic, DeepSeek, OpenRouter, Voyage, Ollama, llama-server, etc.) is now reachable via: gbrain config set models.dream.synthesize_verdict <provider>:<model> The canonical config key `models.dream.synthesize_verdict` (per PER_TASK_KEYS in src/core/model-config.ts) is used unchanged. The exported JudgeClient interface signature is preserved for test-seam stability. The original community PR (garrytan#1349) shipped a custom fetch adapter that bypassed the gateway entirely. This reworked landing routes through the canonical seam so future provider additions automatically benefit, and a CI guard (T7) will land in this wave to prevent the bug class from re-opening (the same one that bit src/core/think/index.ts before v0.35.5.0). Co-Authored-By: justemu <206393437+justemu@users.noreply.github.com> * test(dream): synthesize-gateway-adapter unit tests + R3 parsed-verdict parity 11 cases pin the gateway-routed JudgeClient adapter from T5: - A1: makeJudgeClient returns null on missing Anthropic key (legacy short-circuit preserved) - A2: returns a JudgeClient when chat provider is reachable - A3: JudgeClient.create routes through gateway.chat (via __setChatTransportForTests) - A4: ChatResult.text → Anthropic.Message.content[0].text mapping - A5: empty text from gateway → graceful empty-text Anthropic.Message - A6: non-AIConfigError from gateway propagates to caller (no swallow) - A7: AIConfigError from gateway propagates as AIConfigError (caught per-transcript in production loop) - A8: makeJudgeClient returns null on unknown provider prefix - A9: returns a JudgeClient for non-anthropic providers without env-probing (delegates to gateway at call time) - R3: parsed-verdict SEMANTIC parity — gateway-routed and legacy SDK-shape JudgeClients produce same {worth_processing, reasons} given identical canned LLM text - R3 corollary: unparseable LLM output → both paths fall through to cheap-fallback verdict Codex flagged byte-identical-Anthropic.Message as a meaningless gate; R3 is parsed-verdict semantic parity instead. Mirror pattern of test/think-gateway-adapter.test.ts for cross-site consistency with the v0.35.5.0 runThink migration. * ci: guard against direct Anthropic SDK construction in gateway-routed files New scripts/check-gateway-routed-no-direct-anthropic.sh greps two guarded files (src/core/cycle/synthesize.ts and src/core/think/index.ts) for `new Anthropic()` constructor calls and runtime imports of @anthropic-ai/sdk. Type-only imports (`import type Anthropic from '@anthropic-ai/sdk'`) stay allowed because both files use Anthropic.Message / .MessageCreateParamsNonStreaming as adapter types. Comment lines (starting with `//` or ` *`) are excluded so historical references in JSDoc don't false-fire. Negative test in this commit's verification confirms: injecting `new Anthropic()` into synthesize.ts makes the guard exit 1 with a clear error pointing at the gateway adapter pattern; reverting restores the OK state. Wired into both `bun run verify` and `bun run check:all`. Closes the bug class that bit synthesize.ts in PR garrytan#1349 (which would have shipped a parallel fetch stack instead of routing through the canonical gateway). The same class previously bit think/index.ts and was fixed structurally in v0.35.5.0; this guard prevents either file from regressing. Extend GUARDED_FILES in the script when migrating another file off direct SDK construction. * docs(put_page): point Windows / pipe-buffer users at gbrain capture --file Extends the put_page op description (surfaced by `gbrain put --help`) with a one-line pointer to `gbrain capture --file PATH --slug SLUG` for the file- as-input use case. Capture (v0.39.3.0) is the canonical Windows-pipe-buffer escape route: reads files as a Buffer first, scans the first 8KB for NUL bytes to refuse binary content, decodes to UTF-8 only after the safety check, and adds provenance write-through. Lands the user-facing value the closed PR garrytan#1365 was reaching for, without duplicating the CLI surface. Credits the original contributor. Co-Authored-By: ecat2010 <90021101+ecat2010@users.noreply.github.com> * test: R1+R2+R4 critical regression pins for the community-PR-wave landing Per the wave's eng-review plan (IRON RULE — mandatory): R1 — get_page handler accepts calls without `content` param. Pre-wave PR garrytan#1365 landed its `!p.content → throw` check in the WRONG handler (get_page instead of put_page), which would have broken every read in the system. Pin: get_page MUST NOT require content + the schema carries no `content` or `file` param. R2 — put_page schema content stays `required: true`. PR garrytan#1365 also flipped `content` from required→optional in the schema. Pin: the contract stays at `required: true` + the closed PR's `file` param is NOT in the schema. R4 — Cross-platform stdin via fd 0 (PR garrytan#1325 regression pin). Source-grep asserts src/cli.ts uses `readFileSync(0, ...)` and NOT the legacy `readFileSync('/dev/stdin', ...)`. Belt-and-suspenders pattern assertions confirm the parseOpArgs branch shape (cliHints.stdin check, 5MB cap, isTTY gate) hasn't drifted. R3 (gateway-adapter parsed-verdict parity) lives in the sibling file test/cycle/synthesize-gateway-adapter.test.ts. * test(e2e): update dream-synthesize no-key reason text + harden hermeticity After T5's gateway-adapter rework, the "no API key" verdict text changed from 'no ANTHROPIC_API_KEY for significance judge' to 'no configured provider for verdict model: <model>' (broader + names the actual model so the user sees WHICH provider failed). Update both assertions that check the old text. Hermeticity bug fix in the same commit: `withoutAnthropicKey` previously only cleared the env var. After the rework, `makeJudgeClient` ALSO checks `loadConfig().anthropic_api_key` (same hasAnthropicKey() pattern think/index.ts uses since v0.35.5.0). If the developer running the test has the key set in ~/.gbrain/config.json, the test would behave non-deterministically. Fix: override GBRAIN_HOME to a fresh tmpdir for the duration of the body, restore on return (even on throw). * test(e2e): pin verdict-loop AIConfigError catch from T5 rework end-to-end Drives runPhaseSynthesize against a real PGLite engine with the gateway chat transport stubbed to throw AIConfigError on every call (simulates a revoked/misconfigured provider surfacing mid-run). Asserts: - Phase does NOT crash; converts the throw to a per-transcript verdict with worth=false and reasons[0] matching "gateway error: ...". - status='ok' so subsequent transcripts in the loop would continue being judged (not visible in 1-transcript test, but the loop shape is proven not to abort). Pre-rework (T5), this code path didn't exist — judgeSignificance threw directly to runPhaseSynthesize and crashed the whole phase. Pin so a future regression that removes the try/catch fires loudly. * docs(claude.md): annotate v0.41+ community-PR-wave changes Two additions to the Key files section: - src/core/cycle/synthesize.ts — appends a v0.41+ paragraph documenting the gateway-adapter rework (makeJudgeClient + AIConfigError catch loop + canonical config key + JudgeClient interface preserved + CI guard reference + test file references). - scripts/check-gateway-routed-no-direct-anthropic.sh — new entry documenting the CI guard's contract, scope, and how to extend GUARDED_FILES when migrating another file off direct SDK construction. CLAUDE.md drives /sync-gbrain and llms.txt generation; both need the wave's annotations to land BEFORE the llms regeneration step (T10). * docs(llms): regenerate llms.txt + llms-full.txt for v0.41+ wave Refreshes the auto-generated llms.txt bundles to pick up the CLAUDE.md annotations landed earlier in this wave (gateway-adapter synthesize.ts + check-gateway-routed-no-direct-anthropic.sh + the cherry-picked llama-server-reranker recipe). Pinned by test/build-llms.test.ts. * fix(providers): dynamic-width id column accommodates llama-server-reranker v0.40.6.1 introduced `llama-server-reranker` (21 chars), which overflowed formatRecipeTable's static 14-char PROVIDER column. When the id is longer than the column, padEnd is a no-op — the row starts with the tier name directly, no space delimiter. test/providers.test.ts 'each recipe appears at most once' iterates every recipe and asserts at least one row starts with `${id} ` or `${id} `; with no space after `llama-server-reranker`, the assertion fails and the recipe appears effectively missing from the human-readable list. Fix: compute column width dynamically as `max(14, max(id.length) + 1)` so every id is followed by at least one space, regardless of length. Also widens the separator rule to match. 14 stays as the floor so the existing short-id rows (openai 6, ollama 6, anthropic 9, ...) keep their familiar layout when llama-server-reranker isn't in the active recipe set. 10/10 cases in test/providers.test.ts pass after the fix. * chore: pre-landing review polish — refresh models doctor tip + file embed timeout TODO Two pre-landing review absorptions: - `src/commands/models.ts:154` — the help-text tip said `gbrain models doctor` "spends ~1 token per model" but the wave added an `embed(['probe'])` call AND a reranker probe. Generalize to "spends a minimal request per configured chat/embed/rerank surface" so the cost expectation matches reality. - `TODOS.md` — file a follow-up to widen `default_timeout_ms` from RerankerTouchpoint to EmbeddingTouchpoint so `probeEmbeddingReachability` doesn't hardcode 5000ms while the sibling reranker probe reads the recipe's configured timeout. Local CPU embedding endpoints (llama-server) hit the same cold-start curve as Qwen3-Reranker-4B; workaround today is "re-run the probe" per the existing JSDoc. Other informational findings from pre-landing review either match established patterns (no behavioral test for `probeEmbeddingReachability`, matching `probeRerankerReachability`), are intentional choices documented in JSDoc (the `as unknown as Anthropic.Message` cast), or are micro-perf in non-hot paths (autopilot's 4 sequential `getConfig` awaits per 5-minute tick). All non-blocking. * ci: tighten gateway-routed guard against import bypass shapes + honest JSDoc Adversarial review caught two soft spots in the wave's new contracts: 1. `scripts/check-gateway-routed-no-direct-anthropic.sh` only matched the default-import shape `import Anthropic from '@anthropic-ai/sdk'`. A future contributor (or, more realistically, a future refactor) could bypass with: - `import { Anthropic } from '@anthropic-ai/sdk'` - `import { Anthropic as A } from '@anthropic-ai/sdk'` - `import * as Anthropic from '@anthropic-ai/sdk'` - `const x = await import('@anthropic-ai/sdk')` Tightened the regex to match ANY value-shaped import from the SDK module (excluding only the explicit `import type ... from '@anthropic-ai/sdk'` form which the adapter's Anthropic.Message return type needs). Added a second grep for dynamic imports. Verified all four bypass shapes now trigger the guard against synthesize.ts; type-only import still passes. 2. `synthesize.ts:makeJudgeClient` JSDoc claimed the adapter "tolerates the array-of-blocks shape for future flexibility" — but the mapping flattens ONLY text blocks; `tool_use`, `tool_result`, image blocks silently become empty strings. Today only `judgeSignificance` calls this and it only sends string content, so no behavior bug. But the comment was marketing future flexibility the code doesn't deliver. Narrowed to call out the silent-drop and say to extend the mapping if a future caller wires non-text content through. Both wave-scope: the CI guard was added by the wave, the JSDoc was added by the wave's T5 rework. Adversarial review caught them before merge. * fix(models doctor): reranker probe timeout matches live search precedence chain Codex Pass-9 adversarial review caught a probe-vs-production divergence: production `hybridSearch` resolves reranker timeout via the full chain (per-call > config > recipe > bundle) by going through `loadSearchModeConfig + resolveSearchMode`, but `probeRerankerReachability` was reading ONLY the recipe's `default_timeout_ms` — so an operator who set `search.reranker.timeout_ms=1000` would see doctor wait 30s and report "reachable" while production search timed out at 1s and fail-opened. A higher configured timeout produces the opposite false failure (probe gives up at 5s when production would have waited longer). Fix: extract `resolveLiveRerankerTimeoutMs(engine)` parallel to the existing `resolveLiveRerankerModel(engine)` — same precedence chain, same DB-plane consistency posture. The probe now reads the SAME timeout live search reads, on the same lookup path. The codex P1 finding about `FREE_LOCAL_*_PROVIDERS` zero-pricing being bypassable via redirected `LLAMA_SERVER_BASE_URL` is filed as a TODO under community-pr-wave follow-ups — couples with the existing FREE_LOCAL_PROVIDERS unification TODO so both close in one v0.41+ PR. * ci(guard): handle mixed type+value imports + macOS BSD sed POSIX classes Codex structured review [P3] caught a bypass in the freshly-tightened gateway-routed guard: import { type Message, Anthropic } from '@anthropic-ai/sdk'; new Anthropic(); The previous regex `^\s*import\s+[^t][^y]*from ...` was meant to exclude `import type ...` but stops at the `y` in `type` inside the brace list, silently allowing the value-import `Anthropic` through. Two fixes: 1. Replace the brittle regex-based type-exclusion with a clause-level parse: extract the brace-list specifiers, allow the import iff EVERY non-empty specifier is `type`-prefixed. Catches mixed-import bypasses (`{ type Foo, Bar }`) while keeping all-type braces (`{ type Foo, type Bar }`) passing. Default + namespace imports remain always-value-shaped. 2. Replace `\s` with POSIX `[[:space:]]` in the sed extract — macOS BSD sed doesn't honor `\s` in extended-regex mode (it silently no-ops the pattern so `specifiers` comes back empty and the script falls through to the default/namespace branch's wrong error message). Hermetic 7-shape regression matrix now verifies every TypeScript import shape against the expected ALLOW/BLOCK verdict; all 7 pass: - ALLOW: `import type Anthropic from '...'` - ALLOW: `import type { Foo } from '...'` - ALLOW: `import { type Message, type Foo } from '...'` - BLOCK: `import { type Message, Anthropic } from '...'` - BLOCK: `import { Anthropic } from '...'` - BLOCK: `import Anthropic from '...'` - BLOCK: `import * as A from '...'` Subshell-trap fix in the same commit: the previous "exit 1 inside while-pipe" pattern doesn't propagate to the outer `$?` because the pipe spawns a subshell. Switched to a tmpfile-flagged sentinel so the verdict survives the subshell boundary cleanly. * chore: bump version and changelog (v0.41.4.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(audit-writer): route log() to file matching event ts, not real-now CI failure surfaced a time-dependent test flake in `test/audit/audit-writer.test.ts` "returns events from current week, filtered by ts cutoff" (added in v0.40.4.0 PR garrytan#1300). The test pinned synthetic `now = 2026-05-22T12:00:00Z` (ISO week 21), logged 3 events with synthetic ts values, then called `readRecent(7, now)` expecting to find 2 events in window. Root cause: `log()` ignored the caller-supplied `ts` for filename routing and ALWAYS wrote to the file matching real-time-now's ISO week. When real CI time crossed into 2026-W22 (this Monday), the events went to W22's file but `readRecent` walked W21 + W20 → 0 hits. Fix: - `log()` parses `event.ts` (when provided) and routes to the file matching that ts's ISO week. Falls back to real-now when ts is missing or unparseable. - No behavior change for production callers — none of the 5 audit consumers pass `ts` explicitly (rerank-audit, audit-slug-fallback, content-sanity-audit, graph-signals, supervisor-audit). The writer stamps real-now → both ts and filename use real-now → same file as before. - Sibling test "honors caller-supplied ts override" also pinned a fixed ts and would have broken from the opposite angle (test read from `computeFilename()` default = real-now). Updated to read from `computeFilename(new Date(fixedTs))` so it asserts the per-row file routing the wave now provides. 22/22 audit-writer cases pass. Production callers (5 sites) unchanged. Pre-existing on master since v0.40.4.0; surfaced when real time crossed into a different ISO week than the test's synthetic now. NOT introduced by this PR (garrytan#1377 community-PR-wave) — audit-writer files aren't touched by the wave. --------- Co-authored-by: Tobias <34135750+tobbecokta@users.noreply.github.com> Co-authored-by: kohai-ut <chris@tincreek.com> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: justemu <noreply@github.com> Co-authored-by: justemu <206393437+justemu@users.noreply.github.com> Co-authored-by: ecat2010 <90021101+ecat2010@users.noreply.github.com>
garrytan-agents
pushed a commit
to garrytan-agents/gbrain
that referenced
this pull request
Jun 13, 2026
…arrytan#1374) * fix(recipes/openai): add max_batch_tokens to embedding touchpoint OpenAI is the only recipe in the codebase without a max_batch_tokens cap. Every other provider declares one (voyage=120K, azure-openai=8K, dashscope=8K, zhipu=8K, minimax=4K). Without it, gbrain's recursive-halving safety net never engages — batches dispatched purely on the char/4 estimator window will trip OpenAI's 1M-token TPM ceiling on token-dense pages (Discord exports, JSON dumps, code-heavy markdown), then retry storm and block the queue head. Setting cap to 100_000: - gbrain's batcher estimates tokens as chars/4 - Token-dense markdown+JSON tokenizes at ~chars/2.7 - 100K estimated = ~150K real worst-case, safely under OpenAI's 300K per-request hard cap and the 1M/min TPM ceiling - Leaves headroom for recursive-halving on outlier chunks (cherry picked from commit 40536aa) * fix(ai/embed): recognize OpenAI 'maximum request size' error in isTokenLimitError OpenAI's /v1/embeddings endpoint hard-caps a single request at 300k tokens total across all input items. When the cap is exceeded it returns: Invalid 'input': maximum request size is 300000 tokens per request. None of the three existing regexes in isTokenLimitError matched this phrasing, so the recursive-halving safety net in embedSubBatch never engaged for OpenAI. The same fat page (a token-dense markdown export, e.g. a Discord transcript) would re-fail every pass, blocking forward progress on the whole batch indefinitely. Locally reproduced on a 31,129-chunk Postgres brain: 2,125 chunks stuck at 'remaining' across 30+ embed --stale passes with retry loops + sleep delays. Adding the two new patterns lets halving fire; the same backlog cleared in one pass after the regex change (the companion max_batch_tokens recipe fix from PR garrytan#924 caps fresh batches, but existing oversize pages still need halving to recover). Adds: - /maximum request size.*tokens/i — OpenAI verbatim - /max.*tokens.*per.*request/i — defensive against minor rewording Tests: - Regression test for the exact OpenAI error string - Coverage for the generic 'max tokens per request' variant - All 25 tests in adaptive-embed-batch.test.ts pass No behavior change for providers whose errors already matched. (cherry picked from commit b834e84) * fix(connection-manager): strip .<project-ref> suffix from username when deriving direct URL `deriveDirectUrl()` correctly rewrites the host (`aws-0-us-east-1.pooler.supabase.com` → `db.abcxyz.supabase.co`) but preserves the full pooler-form username (`postgres.abcxyz`). Supabase direct connections expect a bare `postgres` username — Supavisor uses the `.<ref>` suffix for tenant routing, but it's not a real database user. The auto-derived URL therefore fails to authenticate even with the correct password: password authentication failed for user "postgres.abcxyz" Strip the suffix to `postgres` whenever the project-ref was successfully extracted (same condition that triggers the host rewrite). The non-pooler username branch is unaffected — preserved as-is to keep the port-only fallback case working. Hit while exercising v0.30.1's dual-pool routing on a real Supabase brain; the kill switch (`GBRAIN_DISABLE_DIRECT_POOL=1`) papered over it locally but every Supabase user with a stock pooler URL would silently fall through to single-pool until the user-supplied a `GBRAIN_DIRECT_DATABASE_URL` override. With this fix, dual-pool works out of the box for the canonical Supabase shape. Test additions: - 1 case asserting bare `postgres:secret@` in the derived URL when project-ref is parseable from the pooler URL (the new behavior) - extends the existing "falls back to port-only" case with an assertion that non-pooler usernames are preserved (unchanged behavior) `bun run typecheck` clean. `deriveDirectUrl` test block passes 5/5. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit ddf2c6a) * fix(init): --help should not mutate config or scan filesystem `gbrain init --help` (and `-h`) currently fall through to the smart-detection branch in runInit(), which scans cwd for .md files and on a directory with 1000+ files prints "Found ~1500 .md files. For a brain this size, Supabase gives faster search..." then defaults to PGLite — calling saveConfig() and overwriting any existing Postgres config with `engine: 'pglite' + database_path: ~/.gbrain/brain.pglite`. Confirmed in the wild: ran `gbrain init --help` from $HOME on a machine where ~/.gbrain/config.json pointed at a Supabase Postgres brain with 10K+ pages. The config was silently flipped to PGLite. The Supabase data was intact, but gbrain stopped pointing at it until the config was manually restored. Root cause: cli.ts:62-69 only routes --help → printOpHelp() for shared-op commands; CLI_ONLY commands (init, embed, etc.) fall through to their handler with --help still in argv. None of them check for it. Fix: add a --help/-h guard at the top of runInit() that prints help text and returns. Help should never mutate state — Postel's robustness principle for CLI tools. Help text covers all flags (engine selection, AI provider options, thin-client mode) so users running `--help` get the canonical list rather than having to read the source. A wider architectural fix — adding --help routing for all CLI_ONLY commands in cli.ts — is plausible follow-up, but each CLI_ONLY command would still need its own help text. This per-command pattern matches how shared ops handle it via printOpHelp(). Init is the highest-stakes case because it's the only CLI_ONLY command that calls saveConfig(). Smoke test: from a directory with 1500 .md files, with GBRAIN_HOME pointed at a fresh tempdir: - Before fix: ~/.gbrain/config.json materialized with engine: 'pglite' - After fix: help text printed, no config dir created `bun run typecheck` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit ed11fdd) * test(frontmatter-install-hook): isolate hooksPath assertion from developer global config The "installHook writes ... and sets core.hooksPath" test asserted `git config --get core.hooksPath` returns `.githooks`, which falls back to the global scope when local is unset. Developers who set `core.hooksPath` globally (common with dotfiles managers pointing at ~/.config/git/hooks) saw a deterministic FAIL because installHook intentionally respects an existing global value and skips writing the local one — exactly the documented contract. Fix: read via `git config --local --get core.hooksPath` (scope-locked) and branch the assertion on whether a global is already set. Both clean-CI (local should be '.githooks') and developer-with-global (local should be empty; installHook correctly didn't clobber) now pass deterministically. No API change. installHook behavior is unchanged. Verified locally with the affected test passing under `GIT_CONFIG_GLOBAL=~/.gitconfig` carrying `core.hooksPath=...`. (cherry picked from commit 0e4da2c) * fix: guard against missing 'intent' field in routing-eval fixtures Two defensive fixes: 1. normalizeText(): return empty string on null/undefined input instead of crashing with 'undefined is not an object (evaluating s.toLowerCase)' 2. loadRoutingFixtures(): validate that parsed fixture has 'intent' as a string before adding to fixtures array. Fixtures with wrong field names (e.g. 'input' instead of 'intent') are now reported as malformed with a helpful error message listing the actual keys found. Root cause: a skill's routing-eval.jsonl used {"input": ...} instead of {"intent": ...}. The JSON parsed fine but the cast to RoutingFixture was unchecked, so fixture.intent was undefined. normalizeText(undefined) then crashed. This made 'gbrain doctor' completely unusable. (cherry picked from commit b142bbd) * fix(test): isolate HOME in run-e2e.sh to stop config corruption Replaces garrytan#517 (re-ported fresh against current scripts/run-e2e.sh after v0.23.1 rewrote the script — original cherry-pick would not apply). E2E tests call setupDB which writes $HOME/.gbrain/config.json pointing at the docker test container. When the container tears down, the user's real autopilot daemon wedges trying to connect to a vanished postgres. Three operators hit this within 16 days before the original PR filed. Fix: wrapper exports HOME + GBRAIN_HOME to a mktemp tmpdir BEFORE bun starts so config writes land in the tmpdir, with a post-run breach detector that compares md5 of the user's real config against pre-run. Both env vars required: loadConfig/saveConfig resolve via HOME while configPath honors GBRAIN_HOME. HOME set before bun starts because os.homedir() caches at first call. Test seam: test/gbrain-home-isolation.test.ts updated to assert against homedir() === configDir() when GBRAIN_HOME unset (correct under the safety wrapper itself) instead of the prior "not /tmp/" sentinel. Revert path: git revert <this-sha> if test:e2e regresses on master. Co-Authored-By: orendi84 <orendi84@users.noreply.github.com> * test(dream-cycle): add schema-suggest to EXPECTED_PHASES v0.40.7.0 Schema Cathedral v3 added the 'schema-suggest' phase between 'orphans' and 'purge' in ALL_PHASES, but the E2E phase-order test was not updated to match. ALL_PHASES vs EXPECTED_PHASES diverged and the shape-pin test failed every run on master. Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(autopilot-fanout): use relative timestamp inside freshness window The 'end-to-end: updateSourceConfig persists timestamp visible to next listAllSources' test pinned last_full_cycle_at to a hardcoded '2026-05-22T15:00:00.000Z'. The 60-minute freshness window passed within ~1 hour of write — every run after the deadline classified the source as stale and dispatched it, breaking the test's .skippedFresh expectation. Switch to Date.now() - 30min relative timestamp (mirrors the prior 'source with last_full_cycle_at < 60min ago is skipped by gate' test). Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(fresh-install-pglite): unset other provider keys in beforeEach init.ts:455 fails loud when multiple embedding providers are env-ready in non-TTY mode. The test sets ZEROENTROPY_API_KEY then runs init, but developer machines commonly have OPENAI_API_KEY + VOYAGE_API_KEY + ZEROENTROPY_API_KEY all set, so init sees 3 providers and exits 1. Save+unset OPENAI_API_KEY + VOYAGE_API_KEY in beforeEach, restore in afterEach. Now only ZE is env-ready, init picks it, schema sized to zembed-1's 1280d as the test expects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(voyage-multimodal): switch fixture from AVIF to PNG Voyage's /multimodalembeddings endpoint rejects AVIF as of 2026-05 with 'Please provide a valid base64-encoded image'. The prior comment ('AVIF is fine for an embed call') held at v0.27.x and regressed silently on the provider side. Add test/fixtures/images/tiny.png (16x16 RGB PNG, 1307 bytes generated via sips from the macOS default wallpaper). PNG is universally accepted by Voyage and other multimodal providers. Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(cycle/synthesize): prefix bare anthropic model ids before queue.add queue.add's subagent capability validator (classifyCapabilities → resolveRecipe) requires provider:model format and rejects bare ids with 'unknown provider'. resolveModel returns the bare id from TIER_DEFAULTS / DEFAULT_ALIASES (e.g. 'claude-sonnet-4-6'), which the validator then rejects, dropping the synthesize phase to status:fail with SYNTH_PHASE_FAIL. Narrow fix at the call site: if config.model has no colon AND starts with 'claude-', prefix 'anthropic:'. Other providers must already declare a colon. Avoids changing TIER_DEFAULTS / DEFAULT_ALIASES constant shapes, which would ripple across every resolveModel caller. Surfaced by dream-synthesize-chunking E2E during fix-wave: warm-narwhal. Affected tests: 'single-chunk transcript uses legacy idempotency key' and 'multi-chunk transcript spawns N children with chunk-suffixed idempotency keys' — both relied on result.details.children_submitted which only the ok() path sets; the failed() path returns details: {}. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(mechanical): pin doctor init embedding model + clean non-default sources Two fixes in the E2E Doctor Command describe block, both surfaced by cross-file state pollution under the full sequential E2E run: 1. Pass --embedding-model openai:text-embedding-3-large to the init subprocess. Without the explicit flag, doctor inherits whatever the resolver picks from env keys (ZE if ZEROENTROPY_API_KEY is set, defaulting to zembed-1 at 1280d). The test's setupDB initialized schema at 1536d, so the dim mismatch fires embedding_width_consistency WARN, exiting doctor 1. 2. DELETE FROM sources WHERE id != 'default' in beforeAll. Prior E2E files leave non-default source rows (e.g. 'delta' from autopilot / sources tests). sync_freshness + cycle_freshness then FAIL on those orphans because they were never synced/cycled, exiting doctor 1. setupDB TRUNCATEs sources but schema.sql re-seeds 'default' via initSchema; this leaves only the canonical single-source brain the test expects. Surfaced during fix-wave: warm-narwhal E2E gate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(run-e2e): per-file connection flush + 180s outer timeout Two cross-file isolation hardenings for the sequential E2E runner: 1. Terminate stale Postgres connections before each file. Without this, idle connections from the prior bun process's pool race with the next file's setupDB() TRUNCATE CASCADE, producing 'fixture pages disappear mid-test' failures. The terminate call is idempotent + ~50ms; first iteration is a no-op. 2. Hard outer timeout (180s per file) via gtimeout / timeout. bun's --timeout=60000 is per-test; if a PGLite WASM call hangs in beforeAll/afterAll (e.g. ingestion-roundtrip.test.ts wedging 30+ minutes on macOS), --timeout never fires and the entire suite wedges. Outer SIGKILL lets the suite advance and the file is recorded as failed for triage. Falls through to bare bun if neither gtimeout nor timeout is on PATH. Surfaced during fix-wave: warm-narwhal — 3 of 5 cross-file flakes caught by the connection flush; ingestion-roundtrip 30-min wedge caught by the outer timeout. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: bump version and changelog (v0.41.3.0) Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: annotate synthesize.ts narrow prefix fix (v0.41.3.0) CLAUDE.md gains the v0.41.3.0 note on src/core/cycle/synthesize.ts (narrow anthropic: prefix at the queue.add boundary so resolveModel's bare ids satisfy the subagent validator). llms-full.txt regenerated to match. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore: rebump v0.41.3.0 → v0.41.5.0 (queue drift; PR garrytan#1377 claimed .4.0) Sibling fix-wave PR garrytan#1377 (garrytan/community-pr-wave) claimed v0.41.4.0 between my queue check (.3.0 was available) and PR creation. Re-bump to the next available slot per workspace-aware allocator. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(cycle/synthesize): refuse empty brainDir + resolve relative paths Pre-fix, runPhaseSynthesize accepted any brainDir string and passed it to writeReversePages which does join(brainDir, '<slug>.md'). When brainDir is '' or relative ('.' / './brain' / etc), join() produces a relative path that writeFileSync resolves against cwd. Result: every synthesize reverse-write spills into <cwd>/companies/<slug>.md, <cwd>/people/<slug>.md, etc. instead of the intended brainDir tempdir. Surfaced by the warm-narwhal wave when E2E test cleanup found orphan synthesize pages (companies/novamind.md, people/sarah-chen.md, meetings/2025-04-01-novamind-board-update.md) at the gbrain repo root from a runCycle({brainDir: '.'}) chain that ran during morning E2E execution. Fix at the function entry, single location, all callers protected: 1. Empty/whitespace brainDir → return failed(BRAINDIR_EMPTY) loud instead of silently resolving against cwd 2. Relative brainDir → resolve(opts.brainDir) before any read/write can use it. opts.brainDir mutated so writeReversePages, writeSummaryPage, and every join() downstream see the absolute path Regression test pins all 4 contracts: - empty string → fail(BRAINDIR_EMPTY) - whitespace-only → fail(BRAINDIR_EMPTY) - '.' → mutated to absolute on entry - already-absolute → unchanged Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(dream): resolve brainDir to absolute at CLI surface Defense-in-depth for the synthesize-braindir spillage bug class. The core fix lives in runPhaseSynthesize (commit 98222a0); this resolves brainDir one layer earlier so the entire 9-phase runCycle gets the absolute path, not just synthesize. Two paths in resolveBrainDir get path.resolve(): - explicit --dir argument (e.g., `gbrain dream --dir .`) - sync.repo_path config (in case it was ever stored relative) resolveBrainDir already checked existsSync; resolve() just canonicalizes before return. No behavior change for paths already absolute. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Matt Gunnin <mgunnin@esports.one> Co-authored-by: Brandon Lipman <brandon@offdeck.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Jeremy Knows <jeremy@veefriends.com> Co-authored-by: root <root@localhost> Co-authored-by: orendi84 <orendigergo@gmail.com> Co-authored-by: orendi84 <orendi84@users.noreply.github.com> Co-authored-by: Garry Tan <garry@ycombinator.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Community PR fix wave. Six contributor PRs land together (3 cherry-picked, 1 reworked, 2 closed-as-superseded/redundant with the value preserved). Builds on the v0.41 + v0.40.7.1 local-provider thread and finally closes the dream-cycle's hardcoded-Anthropic gap.
Local AI gets first-class treatment everywhere gbrain checks your setup:
--rerankingand Qwen3-Reranker-4B; gbrain routes through it instead of paying per-rerank to ZeroEntropy.doctor --remediation-plan, stop hitting TX2 hard-fails under--max-cost, and finally get a reachability probe ingbrain models doctor.Dream cycle stops being Anthropic-locked:
gateway.chat()adapter so any registered provider (DeepSeek, OpenRouter, Voyage, Ollama, llama-server, ...) reaches the dream judge via one config line:gbrain config set models.dream.synthesize_verdict deepseek:deepseek-chat.Cross-platform ergonomics:
readFileSync(0, ...)cross-platform stdin. One-line fix; supersedes the larger try/catch fallback in fix: add Windows stdin fallback forgbrain put#1366.gbrain put#1366 — closed as superseded by fix(cli): use fd 0 instead of '/dev/stdin' for cross-platform stdin reads #1325.Closed without merging (value preserved):
put_page --filewas redundant with the already-shippedgbrain capture --file PATH --slug SLUG(v0.39.3.0; reads files as a Buffer with binary-NUL guard and provenance write-through). PR also had a ship-blocking bug (handler block landed inget_pageinstead ofput_page). Closed with a doc improvement in this wave that points users at the canonical command fromgbrain put --help.Lock the bug class shut:
scripts/check-gateway-routed-no-direct-anthropic.shpreventssynthesize.tsandthink/index.tsfrom regressing back tonew Anthropic(). Handles every TypeScript import shape (default, named, namespace, mixed-type-value, dynamic). 7-shape regression matrix verifies every bypass attempt is caught.Test Coverage
Coverage audit (subagent): 85% (gate PASS, above 80% target). 0 gaps below threshold. 263/263 wave-affected tests pass across 15 files.
R1+R2+R3+R4 IRON-RULE regressions all pinned:
get_pageaccepts calls withoutcontent(closes the PR feat(put_page): add --file param to bypass Windows pipe buffer limitation #1365 bug class)put_pageschema content staysrequired: trueNew test files:
test/cycle/synthesize-gateway-adapter.test.ts(11 cases: A1-A9 unit + R3 parity + R3 corollary)test/cycle/regression-pr-wave-r1-r2-r4.test.ts(R1+R2+R4 pins)Extended test files:
test/e2e/dream-synthesize-pglite.test.ts— new "mid-run AIConfigError catch" E2E case + hermeticity hardening (withoutAnthropicKeynow overridesGBRAIN_HOME)Pre-Landing Review
9 informational findings, 0 critical. PR Quality Score: 7.0.
3 absorbed in-wave:
gbrain models doctortip text ("spends ~1 token" → "spends a minimal request per configured chat/embed/rerank surface").{ Anthropic }), namespace imports (* as A), mixed-type-value ({ type Msg, Anthropic }), and dynamic imports. All 7 import shapes now have hermetic regression coverage.resolveLiveRerankerTimeoutMs(engine)helper. Probe could lie either direction before — false-ok when production would have timed out at config-set 1s, false-fail when production had a higher configured timeout.Filed as TODO (deferred):
probeEmbeddingReachabilityshould honor embedding recipedefault_timeout_ms(sibling to the reranker probe fix; requires wideningEmbeddingTouchpointto carry the field).FREE_LOCAL_*_PROVIDERSzero-pricing bypassable via redirectedLLAMA_SERVER_*_BASE_URLenv vars. Pre-existing posture; couples with the unification TODO.Documentation honesty fix:
synthesize.ts:makeJudgeClientJSDoc narrowed — claimed "tolerates the array-of-blocks shape for future flexibility" but the mapping silently drops non-text blocks. Now explicitly states "TEXT-flattens only" so future callers wiring tool-use through this client extend the mapping instead of relying on silent drops.Adversarial Review Synthesis
import { type Message, Anthropic }slipped through[^t][^y]*). Fixed in-wave with clause-level parsing + macOS BSD sed POSIX class fix.Codex gate: PASS (no P1 findings).
Plan Completion
13/13 plan items: 10 DONE, 0 CHANGED, 2 DEFERRED by design (T11 close-PRs + T13 ship — both happen post-PR-creation), 1 UNVERIFIABLE-but-being-executed (T12 verification chain = this /ship).
Full plan + decision audit:
~/.claude/plans/system-instruction-you-are-working-cozy-pancake.md.Verification Results
gbrain is a CLI tool, not a web app.
/qa-onlyURL-probing doesn't apply. Verification handled viabun run verify(typecheck + 14 prechecks) + wave-affected test suites (263/263 pass).E2E note: 6 E2E failures observed (
cycle.test.ts,dream.test.ts) are PRE-EXISTING on master — verified by checking out origin/master and reproducing identically. PR #1367 (v0.41.0.0 minions cathedral, just landed) introduced them. Not in-branch failures. Mechanical.test.ts (78/78), wave-affected E2E (12/12 dream-synthesize-pglite), and the rest of the E2E suite (840 cases) pass.TODOS
probeEmbeddingReachabilityrecipe-timeout follow-up,FREE_LOCAL_*_PROVIDERSbypass concern (couples with unification TODO).Documentation
CLAUDE.md, README,
docs/ai-providers/llama-server-reranker.md,docs/integrations/embedding-providers.md, andllms.txt/llms-full.txtall updated in-wave./document-releaseran clean — no follow-up commit needed.Test plan
bun run verify— typecheck + 14 prechecks cleandream-synthesize-pglite.test.ts— 12/12 pass including new AIConfigError catch casemechanical.test.ts— 78/78 passgbrain config set models.dream.synthesize_verdict deepseek:deepseek-chat+DEEPSEEK_API_KEY=... gbrain dream --phase synthesize --dry-runecho "content" | gbrain put windows-test-slugon Windows (CI matrix doesn't cover; manual confirmation appreciated)Co-Authored-By: tobbecokta 34135750+tobbecokta@users.noreply.github.com
Co-Authored-By: kohai-ut chris@tincreek.com
Co-Authored-By: justemu 206393437+justemu@users.noreply.github.com
Co-Authored-By: ecat2010 90021101+ecat2010@users.noreply.github.com
🤖 Generated with Claude Code