Skip to content

feat(dream): make significance judge provider-agnostic with OpenAI-compatible fallback#1349

Closed
justemu wants to merge 1 commit into
garrytan:masterfrom
justemu:feat/provider-agnostic-dream-judge
Closed

feat(dream): make significance judge provider-agnostic with OpenAI-compatible fallback#1349
justemu wants to merge 1 commit into
garrytan:masterfrom
justemu:feat/provider-agnostic-dream-judge

Conversation

@justemu

@justemu justemu commented May 24, 2026

Copy link
Copy Markdown

Summary

The dream synthesize phase's significance judge was hardcoded to use Anthropic's Haiku via ANTHROPIC_API_KEY, blocking users who rely on other high-quality, cost-effective LLM providers (DeepSeek, OpenRouter, local endpoints, etc.). This PR makes the judge provider-agnostic with a clean fallback chain.

Fixes #1348

Changes

  • Rename makeHaikuClient()makeJudgeClient() with fallback chain:
    1. ANTHROPIC_API_KEY → Anthropic SDK (original behavior, unchanged)
    2. DEEPSEEK_API_KEY / OPENAI_API_KEY / OPENROUTER_API_KEY → OpenAI-compatible fetch adapter
  • Add makeOpenAIClient() adapter that translates Anthropic Message API params to OpenAI Chat Completions format and back, so judgeSignificance works without modification
  • Support custom endpoints via DEEPSEEK_BASE_URL / OPENAI_BASE_URL environment variables
  • Add 15s fetch timeout to prevent hangs on unresponsive endpoints
  • Update error message from 'no ANTHROPIC_API_KEY' to 'no configured API key' to reflect multi-provider reality

Design decisions

  1. Anthropic SDK path is completely unchanged — zero risk for existing users. The new Anthropic() client and its message creation path are preserved exactly.
  2. Minimal surface area — only 1 file changed (src/core/cycle/synthesize.ts), +82/-9 lines. No new dependencies.
  3. No TypeScript dependency changes — uses Bun/fetch which is available in all modern runtimes.
  4. API key env var prefix stripping — model identifiers like deepseek:deepseek-v4-flash have their provider prefix (deepseek:, openai:, openrouter:) stripped before being passed to the OpenAI-compatible API.

Testing

Tested with DeepSeek v4 Flash on real-world conversation transcripts:

  • 20260515_071247: ✅ worth=True (architecture design decisions)
  • 20260516_182013: ✅ worth=True (cross-agent collaboration reflection)
  • 20260518_173613: ✅ worth=True (governance process definition)
  • 20260518_173735: ✅ worth=False (routine coordination — correctly filtered)
  • 20260521_223620: ✅ worth=False (routine config — correctly filtered)

Before the patch (no ANTHROPIC_API_KEY): all 5 → worth=False.
After the patch (with DEEPSEEK_API_KEY): 3/5 worth=True, 2/5 worth=False.

Future work

  • The propose_takes phase's extractor subagent has a similar Anthropic-only code path. A similar generalization would benefit that phase too, but is out of scope for this PR.
  • Fully configurable model routing via the existing models.* config keys could make the judge use the configured models.dream.synthesize_verdict model directly instead of relying on env-var detection.

garrytan added a commit that referenced this pull request May 25, 2026
…ed dream judge (6 community PRs) (#1377)

* fix(cli): use fd 0 instead of '/dev/stdin' for cross-platform stdin reads

`readFileSync('/dev/stdin', 'utf-8')` works on Unix but fails on Windows
(Git Bash, PowerShell, cmd) with `ENOENT: no such file or directory,
open '/dev/stdin'`. Windows doesn't expose `/dev/stdin` as a filesystem
path.

Reading file descriptor 0 directly (`readFileSync(0, 'utf-8')`) is the
documented Node.js idiom and works on every platform. No behavior change
on Unix — same syscall path, same semantics.

Repro on Windows before the fix:
  echo "test" | gbrain put my-page
  ENOENT: no such file or directory, open '/dev/stdin'

After: round-trip put/search/delete works on Windows Git Bash.

* v0.40.6.1 feat: llama-server reranker — local Qwen3 / self-hosted ZE via llama.cpp

Adds local reranker support so users can point gbrain's reranker call at their
own llama.cpp server instead of ZeroEntropy's hosted API. One new recipe
(`llama-server-reranker`), a `path?: string` + `default_timeout_ms?: number`
extension on `RerankerTouchpoint`, env passthrough wiring, budget-tracker
`FREE_LOCAL_RERANK_PROVIDERS` set so `--max-cost` callers don't TX2 hard-fail on
local rerank, and a doctor-probe divergence fix (probe and live search now read
the same `search.reranker.model` path via `loadSearchModeConfig` + `resolveSearchMode`).

ZE-hosted users are unchanged. Voyage / Cohere / vLLM rerankers stay out of
scope — different wire shapes need adapter hooks designed against their actual
shapes in a follow-up plan.

Verification:
- `bun run verify` (typecheck + 13 pre-checks): clean
- `bun run check:all` (15 historical checks): clean
- 107/107 expect() calls pass across 5 affected test files
- /codex review against the full diff: GATE PASS (caught one [P2] /v1 path
  doubling bug pre-merge; fixed by changing recipe path to leaf `/rerank`)
- Claude adversarial subagent: 7 net-new findings filed as v0.40.7+ TODOs
  (none currently exploitable; hardening for future contributor traps)

Test surface (107 cases, 5 files):
- test/ai/rerank.test.ts: path override (exact URL match), default_timeout_ms
  honored, empty models[] accepts any id, ZE regression
- test/ai/recipe-llama-server-reranker.test.ts: recipe shape regression guard
  + base_url + path concat assertion (codex-caught /v1/v1/ regression)
- test/search-mode.test.ts: timeout precedence chain (per-call > config >
  recipe > bundle), ZE no-recipe-default regression, unknown provider fallthrough
- test/models-doctor-reranker.test.ts: divergence-fix helper across DB-plane
  read, mode default, disabled, override, DB-error graceful fallback
- test/core/budget/budget-tracker.test.ts: free-local rerank pricing + arbitrary
  model id + chat-kind TX2 hard-fail preserved

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: post-ship documentation sync

* docs: index docs/ai-providers/ in llms.txt (zeroentropy + llama-server-reranker)

The hand-curated llms-config.ts doc map never included docs/ai-providers/, so
both zeroentropy.md (since v0.35.0.0) and the new llama-server-reranker.md were
invisible to the AI-facing llms.txt / llms-full.txt index. Adds an "AI providers"
section with both. Marked includeInFull: false (setup walkthroughs belong in the
index but would push the single-fetch bundle past FULL_SIZE_BUDGET) — same
treatment CHANGELOG.md gets.

Caught by the /ship document-release subagent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: recipe-aware embedding-provider check for local providers

doctor --remediation-plan and autopilot both judged the embedding
provider with a hosted-only key check, so a brain on ollama: or
llama-server: was reported "blocked" on a missing API key it never
needed, contradicting doctor --json's 100%-coverage health.

Extract a shared embeddingProviderConfigured() helper into
brain-score-recommendations.ts: empty auth_env.required (local
providers) is configured with no key; hosted providers check their
OWN required key. Both producers (doctor, autopilot) call it,
killing the DRY violation that caused the bug. Hosted brains with a
missing key still block.

* fix(budget): price local embed providers at $0

A --max-cost-bounded embed/reindex job configured for ollama: or
llama-server: TX2 hard-failed with no_pricing because
lookupEmbeddingPrice has no entry for local models. Add
FREE_LOCAL_EMBED_PROVIDERS (sibling to FREE_LOCAL_RERANK_PROVIDERS)
so a pricing miss on a local-inference provider returns $0 instead
of null. lmstudio/litellm intentionally excluded.

* feat(models): embedding reachability probe in gbrain models doctor

A down/misconfigured local embed server was invisible until first
embed. Add probeEmbeddingReachability() (mirrors the reranker probe):
a 1-input embed with a 5s abort timeout, classified via classifyError,
under a new 'embedding_reachability' touchpoint, gated on the
zero-network config probe returning ok first.

* fix: don't count config-plane voyage/google keys as configured

codex review caught a false positive: HOSTED_EMBED_KEY_CONFIG mapped
VOYAGE_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY to config fields, but
buildGatewayConfig only threads openai/anthropic/zeroentropy config
keys into the gateway env. A Voyage/Google brain with the key only in
config.json would be judged "configured" and dispatch an embed.stale
job that then fails auth at the gateway. Drop those two from the map so
the producer closures resolve them by env var only, matching what the
gateway can actually use. Pinned by a regression test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(dream): route significance judge through gateway.chat for multi-provider support

Replaces the hardcoded `new Anthropic()` client in the dream-cycle synthesize
phase with a gateway-routed JudgeClient adapter. Mirrors the v0.35.5.0 pattern
that closed #952 for runThink: construction-time provider/key probe returns null
on a clear miss (cheap pre-flight); the verdict loop wraps the chat call in
try/catch for AIConfigError mid-run.

Any provider with a registered gateway recipe (Anthropic, DeepSeek, OpenRouter,
Voyage, Ollama, llama-server, etc.) is now reachable via:

    gbrain config set models.dream.synthesize_verdict <provider>:<model>

The canonical config key `models.dream.synthesize_verdict` (per PER_TASK_KEYS
in src/core/model-config.ts) is used unchanged. The exported JudgeClient
interface signature is preserved for test-seam stability.

The original community PR (#1349) shipped a custom fetch adapter that
bypassed the gateway entirely. This reworked landing routes through the
canonical seam so future provider additions automatically benefit, and a
CI guard (T7) will land in this wave to prevent the bug class from
re-opening (the same one that bit src/core/think/index.ts before v0.35.5.0).

Co-Authored-By: justemu <206393437+justemu@users.noreply.github.com>

* test(dream): synthesize-gateway-adapter unit tests + R3 parsed-verdict parity

11 cases pin the gateway-routed JudgeClient adapter from T5:

- A1: makeJudgeClient returns null on missing Anthropic key (legacy short-circuit preserved)
- A2: returns a JudgeClient when chat provider is reachable
- A3: JudgeClient.create routes through gateway.chat (via __setChatTransportForTests)
- A4: ChatResult.text → Anthropic.Message.content[0].text mapping
- A5: empty text from gateway → graceful empty-text Anthropic.Message
- A6: non-AIConfigError from gateway propagates to caller (no swallow)
- A7: AIConfigError from gateway propagates as AIConfigError (caught per-transcript in production loop)
- A8: makeJudgeClient returns null on unknown provider prefix
- A9: returns a JudgeClient for non-anthropic providers without env-probing (delegates to gateway at call time)
- R3: parsed-verdict SEMANTIC parity — gateway-routed and legacy SDK-shape JudgeClients produce same {worth_processing, reasons} given identical canned LLM text
- R3 corollary: unparseable LLM output → both paths fall through to cheap-fallback verdict

Codex flagged byte-identical-Anthropic.Message as a meaningless gate; R3 is
parsed-verdict semantic parity instead. Mirror pattern of
test/think-gateway-adapter.test.ts for cross-site consistency with the
v0.35.5.0 runThink migration.

* ci: guard against direct Anthropic SDK construction in gateway-routed files

New scripts/check-gateway-routed-no-direct-anthropic.sh greps two guarded
files (src/core/cycle/synthesize.ts and src/core/think/index.ts) for
`new Anthropic()` constructor calls and runtime imports of @anthropic-ai/sdk.
Type-only imports (`import type Anthropic from '@anthropic-ai/sdk'`) stay
allowed because both files use Anthropic.Message / .MessageCreateParamsNonStreaming
as adapter types.

Comment lines (starting with `//` or ` *`) are excluded so historical
references in JSDoc don't false-fire. Negative test in this commit's
verification confirms: injecting `new Anthropic()` into synthesize.ts
makes the guard exit 1 with a clear error pointing at the gateway adapter
pattern; reverting restores the OK state.

Wired into both `bun run verify` and `bun run check:all`. Closes the bug
class that bit synthesize.ts in PR #1349 (which would have shipped a
parallel fetch stack instead of routing through the canonical gateway).
The same class previously bit think/index.ts and was fixed structurally
in v0.35.5.0; this guard prevents either file from regressing.

Extend GUARDED_FILES in the script when migrating another file off
direct SDK construction.

* docs(put_page): point Windows / pipe-buffer users at gbrain capture --file

Extends the put_page op description (surfaced by `gbrain put --help`) with a
one-line pointer to `gbrain capture --file PATH --slug SLUG` for the file-
as-input use case. Capture (v0.39.3.0) is the canonical Windows-pipe-buffer
escape route: reads files as a Buffer first, scans the first 8KB for NUL bytes
to refuse binary content, decodes to UTF-8 only after the safety check, and
adds provenance write-through.

Lands the user-facing value the closed PR #1365 was reaching for, without
duplicating the CLI surface. Credits the original contributor.

Co-Authored-By: ecat2010 <90021101+ecat2010@users.noreply.github.com>

* test: R1+R2+R4 critical regression pins for the community-PR-wave landing

Per the wave's eng-review plan (IRON RULE — mandatory):

  R1 — get_page handler accepts calls without `content` param. Pre-wave
       PR #1365 landed its `!p.content → throw` check in the WRONG handler
       (get_page instead of put_page), which would have broken every read
       in the system. Pin: get_page MUST NOT require content + the schema
       carries no `content` or `file` param.

  R2 — put_page schema content stays `required: true`. PR #1365 also
       flipped `content` from required→optional in the schema. Pin: the
       contract stays at `required: true` + the closed PR's `file` param
       is NOT in the schema.

  R4 — Cross-platform stdin via fd 0 (PR #1325 regression pin). Source-grep
       asserts src/cli.ts uses `readFileSync(0, ...)` and NOT the legacy
       `readFileSync('/dev/stdin', ...)`. Belt-and-suspenders pattern
       assertions confirm the parseOpArgs branch shape (cliHints.stdin
       check, 5MB cap, isTTY gate) hasn't drifted.

R3 (gateway-adapter parsed-verdict parity) lives in the sibling file
test/cycle/synthesize-gateway-adapter.test.ts.

* test(e2e): update dream-synthesize no-key reason text + harden hermeticity

After T5's gateway-adapter rework, the "no API key" verdict text changed from
'no ANTHROPIC_API_KEY for significance judge' to
'no configured provider for verdict model: <model>' (broader + names the
actual model so the user sees WHICH provider failed). Update both assertions
that check the old text.

Hermeticity bug fix in the same commit: `withoutAnthropicKey` previously only
cleared the env var. After the rework, `makeJudgeClient` ALSO checks
`loadConfig().anthropic_api_key` (same hasAnthropicKey() pattern think/index.ts
uses since v0.35.5.0). If the developer running the test has the key set in
~/.gbrain/config.json, the test would behave non-deterministically. Fix:
override GBRAIN_HOME to a fresh tmpdir for the duration of the body, restore
on return (even on throw).

* test(e2e): pin verdict-loop AIConfigError catch from T5 rework end-to-end

Drives runPhaseSynthesize against a real PGLite engine with the gateway
chat transport stubbed to throw AIConfigError on every call (simulates a
revoked/misconfigured provider surfacing mid-run). Asserts:

  - Phase does NOT crash; converts the throw to a per-transcript verdict
    with worth=false and reasons[0] matching "gateway error: ...".
  - status='ok' so subsequent transcripts in the loop would continue
    being judged (not visible in 1-transcript test, but the loop shape is
    proven not to abort).

Pre-rework (T5), this code path didn't exist — judgeSignificance threw
directly to runPhaseSynthesize and crashed the whole phase. Pin so a
future regression that removes the try/catch fires loudly.

* docs(claude.md): annotate v0.41+ community-PR-wave changes

Two additions to the Key files section:

- src/core/cycle/synthesize.ts — appends a v0.41+ paragraph documenting
  the gateway-adapter rework (makeJudgeClient + AIConfigError catch loop +
  canonical config key + JudgeClient interface preserved + CI guard
  reference + test file references).

- scripts/check-gateway-routed-no-direct-anthropic.sh — new entry
  documenting the CI guard's contract, scope, and how to extend
  GUARDED_FILES when migrating another file off direct SDK construction.

CLAUDE.md drives /sync-gbrain and llms.txt generation; both need the
wave's annotations to land BEFORE the llms regeneration step (T10).

* docs(llms): regenerate llms.txt + llms-full.txt for v0.41+ wave

Refreshes the auto-generated llms.txt bundles to pick up the CLAUDE.md
annotations landed earlier in this wave (gateway-adapter synthesize.ts
+ check-gateway-routed-no-direct-anthropic.sh + the cherry-picked
llama-server-reranker recipe). Pinned by test/build-llms.test.ts.

* fix(providers): dynamic-width id column accommodates llama-server-reranker

v0.40.6.1 introduced `llama-server-reranker` (21 chars), which overflowed
formatRecipeTable's static 14-char PROVIDER column. When the id is longer
than the column, padEnd is a no-op — the row starts with the tier name
directly, no space delimiter. test/providers.test.ts 'each recipe appears
at most once' iterates every recipe and asserts at least one row starts
with `${id} ` or `${id}  `; with no space after `llama-server-reranker`,
the assertion fails and the recipe appears effectively missing from the
human-readable list.

Fix: compute column width dynamically as `max(14, max(id.length) + 1)` so
every id is followed by at least one space, regardless of length. Also
widens the separator rule to match. 14 stays as the floor so the existing
short-id rows (openai 6, ollama 6, anthropic 9, ...) keep their familiar
layout when llama-server-reranker isn't in the active recipe set.

10/10 cases in test/providers.test.ts pass after the fix.

* chore: pre-landing review polish — refresh models doctor tip + file embed timeout TODO

Two pre-landing review absorptions:

- `src/commands/models.ts:154` — the help-text tip said `gbrain models doctor`
  "spends ~1 token per model" but the wave added an `embed(['probe'])` call
  AND a reranker probe. Generalize to "spends a minimal request per configured
  chat/embed/rerank surface" so the cost expectation matches reality.

- `TODOS.md` — file a follow-up to widen `default_timeout_ms` from
  RerankerTouchpoint to EmbeddingTouchpoint so `probeEmbeddingReachability`
  doesn't hardcode 5000ms while the sibling reranker probe reads the
  recipe's configured timeout. Local CPU embedding endpoints (llama-server)
  hit the same cold-start curve as Qwen3-Reranker-4B; workaround today is
  "re-run the probe" per the existing JSDoc.

Other informational findings from pre-landing review either match
established patterns (no behavioral test for `probeEmbeddingReachability`,
matching `probeRerankerReachability`), are intentional choices documented
in JSDoc (the `as unknown as Anthropic.Message` cast), or are micro-perf
in non-hot paths (autopilot's 4 sequential `getConfig` awaits per
5-minute tick). All non-blocking.

* ci: tighten gateway-routed guard against import bypass shapes + honest JSDoc

Adversarial review caught two soft spots in the wave's new contracts:

1. `scripts/check-gateway-routed-no-direct-anthropic.sh` only matched the
   default-import shape `import Anthropic from '@anthropic-ai/sdk'`. A future
   contributor (or, more realistically, a future refactor) could bypass with:
     - `import { Anthropic } from '@anthropic-ai/sdk'`
     - `import { Anthropic as A } from '@anthropic-ai/sdk'`
     - `import * as Anthropic from '@anthropic-ai/sdk'`
     - `const x = await import('@anthropic-ai/sdk')`
   Tightened the regex to match ANY value-shaped import from the SDK module
   (excluding only the explicit `import type ... from '@anthropic-ai/sdk'`
   form which the adapter's Anthropic.Message return type needs). Added a
   second grep for dynamic imports. Verified all four bypass shapes now
   trigger the guard against synthesize.ts; type-only import still passes.

2. `synthesize.ts:makeJudgeClient` JSDoc claimed the adapter "tolerates the
   array-of-blocks shape for future flexibility" — but the mapping flattens
   ONLY text blocks; `tool_use`, `tool_result`, image blocks silently
   become empty strings. Today only `judgeSignificance` calls this and it
   only sends string content, so no behavior bug. But the comment was
   marketing future flexibility the code doesn't deliver. Narrowed to call
   out the silent-drop and say to extend the mapping if a future caller
   wires non-text content through.

Both wave-scope: the CI guard was added by the wave, the JSDoc was added
by the wave's T5 rework. Adversarial review caught them before merge.

* fix(models doctor): reranker probe timeout matches live search precedence chain

Codex Pass-9 adversarial review caught a probe-vs-production divergence:
production `hybridSearch` resolves reranker timeout via the full chain
(per-call > config > recipe > bundle) by going through
`loadSearchModeConfig + resolveSearchMode`, but `probeRerankerReachability`
was reading ONLY the recipe's `default_timeout_ms` — so an operator who
set `search.reranker.timeout_ms=1000` would see doctor wait 30s and report
"reachable" while production search timed out at 1s and fail-opened.
A higher configured timeout produces the opposite false failure (probe
gives up at 5s when production would have waited longer).

Fix: extract `resolveLiveRerankerTimeoutMs(engine)` parallel to the
existing `resolveLiveRerankerModel(engine)` — same precedence chain,
same DB-plane consistency posture. The probe now reads the SAME timeout
live search reads, on the same lookup path.

The codex P1 finding about `FREE_LOCAL_*_PROVIDERS` zero-pricing being
bypassable via redirected `LLAMA_SERVER_BASE_URL` is filed as a TODO under
community-pr-wave follow-ups — couples with the existing
FREE_LOCAL_PROVIDERS unification TODO so both close in one v0.41+ PR.

* ci(guard): handle mixed type+value imports + macOS BSD sed POSIX classes

Codex structured review [P3] caught a bypass in the freshly-tightened
gateway-routed guard:

  import { type Message, Anthropic } from '@anthropic-ai/sdk';
  new Anthropic();

The previous regex `^\s*import\s+[^t][^y]*from ...` was meant to exclude
`import type ...` but stops at the `y` in `type` inside the brace list,
silently allowing the value-import `Anthropic` through. Two fixes:

1. Replace the brittle regex-based type-exclusion with a clause-level
   parse: extract the brace-list specifiers, allow the import iff EVERY
   non-empty specifier is `type`-prefixed. Catches mixed-import bypasses
   (`{ type Foo, Bar }`) while keeping all-type braces (`{ type Foo, type Bar }`)
   passing. Default + namespace imports remain always-value-shaped.

2. Replace `\s` with POSIX `[[:space:]]` in the sed extract — macOS BSD sed
   doesn't honor `\s` in extended-regex mode (it silently no-ops the pattern
   so `specifiers` comes back empty and the script falls through to the
   default/namespace branch's wrong error message).

Hermetic 7-shape regression matrix now verifies every TypeScript import
shape against the expected ALLOW/BLOCK verdict; all 7 pass:
- ALLOW: `import type Anthropic from '...'`
- ALLOW: `import type { Foo } from '...'`
- ALLOW: `import { type Message, type Foo } from '...'`
- BLOCK: `import { type Message, Anthropic } from '...'`
- BLOCK: `import { Anthropic } from '...'`
- BLOCK: `import Anthropic from '...'`
- BLOCK: `import * as A from '...'`

Subshell-trap fix in the same commit: the previous "exit 1 inside while-pipe"
pattern doesn't propagate to the outer `$?` because the pipe spawns a
subshell. Switched to a tmpfile-flagged sentinel so the verdict survives
the subshell boundary cleanly.

* chore: bump version and changelog (v0.41.4.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(audit-writer): route log() to file matching event ts, not real-now

CI failure surfaced a time-dependent test flake in
`test/audit/audit-writer.test.ts` "returns events from current week,
filtered by ts cutoff" (added in v0.40.4.0 PR #1300). The test pinned
synthetic `now = 2026-05-22T12:00:00Z` (ISO week 21), logged 3 events
with synthetic ts values, then called `readRecent(7, now)` expecting
to find 2 events in window.

Root cause: `log()` ignored the caller-supplied `ts` for filename
routing and ALWAYS wrote to the file matching real-time-now's ISO
week. When real CI time crossed into 2026-W22 (this Monday), the
events went to W22's file but `readRecent` walked W21 + W20 → 0 hits.

Fix:
- `log()` parses `event.ts` (when provided) and routes to the file
  matching that ts's ISO week. Falls back to real-now when ts is
  missing or unparseable.
- No behavior change for production callers — none of the 5 audit
  consumers pass `ts` explicitly (rerank-audit, audit-slug-fallback,
  content-sanity-audit, graph-signals, supervisor-audit). The writer
  stamps real-now → both ts and filename use real-now → same file
  as before.
- Sibling test "honors caller-supplied ts override" also pinned a
  fixed ts and would have broken from the opposite angle (test
  read from `computeFilename()` default = real-now). Updated to
  read from `computeFilename(new Date(fixedTs))` so it asserts the
  per-row file routing the wave now provides.

22/22 audit-writer cases pass. Production callers (5 sites) unchanged.

Pre-existing on master since v0.40.4.0; surfaced when real time
crossed into a different ISO week than the test's synthetic now.
NOT introduced by this PR (#1377 community-PR-wave) — audit-writer
files aren't touched by the wave.

---------

Co-authored-by: Tobias <34135750+tobbecokta@users.noreply.github.com>
Co-authored-by: kohai-ut <chris@tincreek.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: justemu <noreply@github.com>
Co-authored-by: justemu <206393437+justemu@users.noreply.github.com>
Co-authored-by: ecat2010 <90021101+ecat2010@users.noreply.github.com>
…mpatible fallback

The dream synthesize phase's significance judge was hardcoded to use
Anthropic's Haiku via ANTHROPIC_API_KEY, blocking users who rely on
other high-quality, cost-effective LLM providers (DeepSeek, OpenRouter,
local endpoints, etc.).

Changes:
- Rename makeHaikuClient() to makeJudgeClient() with a fallback chain:
  1. ANTHROPIC_API_KEY → Anthropic SDK (original behavior, unchanged)
  2. DEEPSEEK_API_KEY / OPENAI_API_KEY / OPENROUTER_API_KEY → OpenAI-
     compatible fetch adapter
- Add makeOpenAIClient() adapter that translates Anthropic Message API
  params to OpenAI Chat Completions format and back, so judgeSignificance
  works without modification
- Support DEEPSEEK_BASE_URL / OPENAI_BASE_URL env vars for custom endpoints
- Add 15s timeout on the fetch call to prevent hangs
- Update error message from 'no ANTHROPIC_API_KEY' to 'no configured API
  key' to reflect the multi-provider reality

Tested with DeepSeek v4 Flash: 3/5 transcripts correctly judged as
'worth processing', 2/5 correctly filtered (routine operations).
Output format unchanged — judgeSignificance consumes the same shape.
@justemu justemu force-pushed the feat/provider-agnostic-dream-judge branch from 38162f8 to ae2fb76 Compare May 26, 2026 06:48
@garrytan

garrytan commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Thanks for this contribution — and apologies for the slow triage. We did a full pass over the entire PR backlog. gbrain has moved fast, and the maintainer's larger "cathedral" rewrites have superseded a big share of community PRs: the AI gateway + recipes + user_provided_models system replaced almost all individual provider PRs; #1805 fixed the whole Postgres module-singleton class; #1542 unified the type taxonomy; #1657 the retrieval path; #1802 the doctor; and so on.

We're closing this one in that cleanup — either the fix already landed on master, it duplicates another PR or merged change, or it's outside the current merge bar. Where a closed PR carried a genuinely valuable idea, we've recorded it in docs/designs/COMMUNITY_IDEAS.md so nothing good is lost (a few may graduate into TODOs).

Please don't read the close as a judgment of the work — thank you for contributing. If you believe the underlying issue is still live on the latest master, reopen with a quick note and we'll take another look. 🙏

@garrytan garrytan closed this Jun 8, 2026
garrytan-agents pushed a commit to garrytan-agents/gbrain that referenced this pull request Jun 13, 2026
…ed dream judge (6 community PRs) (garrytan#1377)

* fix(cli): use fd 0 instead of '/dev/stdin' for cross-platform stdin reads

`readFileSync('/dev/stdin', 'utf-8')` works on Unix but fails on Windows
(Git Bash, PowerShell, cmd) with `ENOENT: no such file or directory,
open '/dev/stdin'`. Windows doesn't expose `/dev/stdin` as a filesystem
path.

Reading file descriptor 0 directly (`readFileSync(0, 'utf-8')`) is the
documented Node.js idiom and works on every platform. No behavior change
on Unix — same syscall path, same semantics.

Repro on Windows before the fix:
  echo "test" | gbrain put my-page
  ENOENT: no such file or directory, open '/dev/stdin'

After: round-trip put/search/delete works on Windows Git Bash.

* v0.40.6.1 feat: llama-server reranker — local Qwen3 / self-hosted ZE via llama.cpp

Adds local reranker support so users can point gbrain's reranker call at their
own llama.cpp server instead of ZeroEntropy's hosted API. One new recipe
(`llama-server-reranker`), a `path?: string` + `default_timeout_ms?: number`
extension on `RerankerTouchpoint`, env passthrough wiring, budget-tracker
`FREE_LOCAL_RERANK_PROVIDERS` set so `--max-cost` callers don't TX2 hard-fail on
local rerank, and a doctor-probe divergence fix (probe and live search now read
the same `search.reranker.model` path via `loadSearchModeConfig` + `resolveSearchMode`).

ZE-hosted users are unchanged. Voyage / Cohere / vLLM rerankers stay out of
scope — different wire shapes need adapter hooks designed against their actual
shapes in a follow-up plan.

Verification:
- `bun run verify` (typecheck + 13 pre-checks): clean
- `bun run check:all` (15 historical checks): clean
- 107/107 expect() calls pass across 5 affected test files
- /codex review against the full diff: GATE PASS (caught one [P2] /v1 path
  doubling bug pre-merge; fixed by changing recipe path to leaf `/rerank`)
- Claude adversarial subagent: 7 net-new findings filed as v0.40.7+ TODOs
  (none currently exploitable; hardening for future contributor traps)

Test surface (107 cases, 5 files):
- test/ai/rerank.test.ts: path override (exact URL match), default_timeout_ms
  honored, empty models[] accepts any id, ZE regression
- test/ai/recipe-llama-server-reranker.test.ts: recipe shape regression guard
  + base_url + path concat assertion (codex-caught /v1/v1/ regression)
- test/search-mode.test.ts: timeout precedence chain (per-call > config >
  recipe > bundle), ZE no-recipe-default regression, unknown provider fallthrough
- test/models-doctor-reranker.test.ts: divergence-fix helper across DB-plane
  read, mode default, disabled, override, DB-error graceful fallback
- test/core/budget/budget-tracker.test.ts: free-local rerank pricing + arbitrary
  model id + chat-kind TX2 hard-fail preserved

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* docs: post-ship documentation sync

* docs: index docs/ai-providers/ in llms.txt (zeroentropy + llama-server-reranker)

The hand-curated llms-config.ts doc map never included docs/ai-providers/, so
both zeroentropy.md (since v0.35.0.0) and the new llama-server-reranker.md were
invisible to the AI-facing llms.txt / llms-full.txt index. Adds an "AI providers"
section with both. Marked includeInFull: false (setup walkthroughs belong in the
index but would push the single-fetch bundle past FULL_SIZE_BUDGET) — same
treatment CHANGELOG.md gets.

Caught by the /ship document-release subagent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix: recipe-aware embedding-provider check for local providers

doctor --remediation-plan and autopilot both judged the embedding
provider with a hosted-only key check, so a brain on ollama: or
llama-server: was reported "blocked" on a missing API key it never
needed, contradicting doctor --json's 100%-coverage health.

Extract a shared embeddingProviderConfigured() helper into
brain-score-recommendations.ts: empty auth_env.required (local
providers) is configured with no key; hosted providers check their
OWN required key. Both producers (doctor, autopilot) call it,
killing the DRY violation that caused the bug. Hosted brains with a
missing key still block.

* fix(budget): price local embed providers at $0

A --max-cost-bounded embed/reindex job configured for ollama: or
llama-server: TX2 hard-failed with no_pricing because
lookupEmbeddingPrice has no entry for local models. Add
FREE_LOCAL_EMBED_PROVIDERS (sibling to FREE_LOCAL_RERANK_PROVIDERS)
so a pricing miss on a local-inference provider returns $0 instead
of null. lmstudio/litellm intentionally excluded.

* feat(models): embedding reachability probe in gbrain models doctor

A down/misconfigured local embed server was invisible until first
embed. Add probeEmbeddingReachability() (mirrors the reranker probe):
a 1-input embed with a 5s abort timeout, classified via classifyError,
under a new 'embedding_reachability' touchpoint, gated on the
zero-network config probe returning ok first.

* fix: don't count config-plane voyage/google keys as configured

codex review caught a false positive: HOSTED_EMBED_KEY_CONFIG mapped
VOYAGE_API_KEY/GOOGLE_GENERATIVE_AI_API_KEY to config fields, but
buildGatewayConfig only threads openai/anthropic/zeroentropy config
keys into the gateway env. A Voyage/Google brain with the key only in
config.json would be judged "configured" and dispatch an embed.stale
job that then fails auth at the gateway. Drop those two from the map so
the producer closures resolve them by env var only, matching what the
gateway can actually use. Pinned by a regression test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* feat(dream): route significance judge through gateway.chat for multi-provider support

Replaces the hardcoded `new Anthropic()` client in the dream-cycle synthesize
phase with a gateway-routed JudgeClient adapter. Mirrors the v0.35.5.0 pattern
that closed garrytan#952 for runThink: construction-time provider/key probe returns null
on a clear miss (cheap pre-flight); the verdict loop wraps the chat call in
try/catch for AIConfigError mid-run.

Any provider with a registered gateway recipe (Anthropic, DeepSeek, OpenRouter,
Voyage, Ollama, llama-server, etc.) is now reachable via:

    gbrain config set models.dream.synthesize_verdict <provider>:<model>

The canonical config key `models.dream.synthesize_verdict` (per PER_TASK_KEYS
in src/core/model-config.ts) is used unchanged. The exported JudgeClient
interface signature is preserved for test-seam stability.

The original community PR (garrytan#1349) shipped a custom fetch adapter that
bypassed the gateway entirely. This reworked landing routes through the
canonical seam so future provider additions automatically benefit, and a
CI guard (T7) will land in this wave to prevent the bug class from
re-opening (the same one that bit src/core/think/index.ts before v0.35.5.0).

Co-Authored-By: justemu <206393437+justemu@users.noreply.github.com>

* test(dream): synthesize-gateway-adapter unit tests + R3 parsed-verdict parity

11 cases pin the gateway-routed JudgeClient adapter from T5:

- A1: makeJudgeClient returns null on missing Anthropic key (legacy short-circuit preserved)
- A2: returns a JudgeClient when chat provider is reachable
- A3: JudgeClient.create routes through gateway.chat (via __setChatTransportForTests)
- A4: ChatResult.text → Anthropic.Message.content[0].text mapping
- A5: empty text from gateway → graceful empty-text Anthropic.Message
- A6: non-AIConfigError from gateway propagates to caller (no swallow)
- A7: AIConfigError from gateway propagates as AIConfigError (caught per-transcript in production loop)
- A8: makeJudgeClient returns null on unknown provider prefix
- A9: returns a JudgeClient for non-anthropic providers without env-probing (delegates to gateway at call time)
- R3: parsed-verdict SEMANTIC parity — gateway-routed and legacy SDK-shape JudgeClients produce same {worth_processing, reasons} given identical canned LLM text
- R3 corollary: unparseable LLM output → both paths fall through to cheap-fallback verdict

Codex flagged byte-identical-Anthropic.Message as a meaningless gate; R3 is
parsed-verdict semantic parity instead. Mirror pattern of
test/think-gateway-adapter.test.ts for cross-site consistency with the
v0.35.5.0 runThink migration.

* ci: guard against direct Anthropic SDK construction in gateway-routed files

New scripts/check-gateway-routed-no-direct-anthropic.sh greps two guarded
files (src/core/cycle/synthesize.ts and src/core/think/index.ts) for
`new Anthropic()` constructor calls and runtime imports of @anthropic-ai/sdk.
Type-only imports (`import type Anthropic from '@anthropic-ai/sdk'`) stay
allowed because both files use Anthropic.Message / .MessageCreateParamsNonStreaming
as adapter types.

Comment lines (starting with `//` or ` *`) are excluded so historical
references in JSDoc don't false-fire. Negative test in this commit's
verification confirms: injecting `new Anthropic()` into synthesize.ts
makes the guard exit 1 with a clear error pointing at the gateway adapter
pattern; reverting restores the OK state.

Wired into both `bun run verify` and `bun run check:all`. Closes the bug
class that bit synthesize.ts in PR garrytan#1349 (which would have shipped a
parallel fetch stack instead of routing through the canonical gateway).
The same class previously bit think/index.ts and was fixed structurally
in v0.35.5.0; this guard prevents either file from regressing.

Extend GUARDED_FILES in the script when migrating another file off
direct SDK construction.

* docs(put_page): point Windows / pipe-buffer users at gbrain capture --file

Extends the put_page op description (surfaced by `gbrain put --help`) with a
one-line pointer to `gbrain capture --file PATH --slug SLUG` for the file-
as-input use case. Capture (v0.39.3.0) is the canonical Windows-pipe-buffer
escape route: reads files as a Buffer first, scans the first 8KB for NUL bytes
to refuse binary content, decodes to UTF-8 only after the safety check, and
adds provenance write-through.

Lands the user-facing value the closed PR garrytan#1365 was reaching for, without
duplicating the CLI surface. Credits the original contributor.

Co-Authored-By: ecat2010 <90021101+ecat2010@users.noreply.github.com>

* test: R1+R2+R4 critical regression pins for the community-PR-wave landing

Per the wave's eng-review plan (IRON RULE — mandatory):

  R1 — get_page handler accepts calls without `content` param. Pre-wave
       PR garrytan#1365 landed its `!p.content → throw` check in the WRONG handler
       (get_page instead of put_page), which would have broken every read
       in the system. Pin: get_page MUST NOT require content + the schema
       carries no `content` or `file` param.

  R2 — put_page schema content stays `required: true`. PR garrytan#1365 also
       flipped `content` from required→optional in the schema. Pin: the
       contract stays at `required: true` + the closed PR's `file` param
       is NOT in the schema.

  R4 — Cross-platform stdin via fd 0 (PR garrytan#1325 regression pin). Source-grep
       asserts src/cli.ts uses `readFileSync(0, ...)` and NOT the legacy
       `readFileSync('/dev/stdin', ...)`. Belt-and-suspenders pattern
       assertions confirm the parseOpArgs branch shape (cliHints.stdin
       check, 5MB cap, isTTY gate) hasn't drifted.

R3 (gateway-adapter parsed-verdict parity) lives in the sibling file
test/cycle/synthesize-gateway-adapter.test.ts.

* test(e2e): update dream-synthesize no-key reason text + harden hermeticity

After T5's gateway-adapter rework, the "no API key" verdict text changed from
'no ANTHROPIC_API_KEY for significance judge' to
'no configured provider for verdict model: <model>' (broader + names the
actual model so the user sees WHICH provider failed). Update both assertions
that check the old text.

Hermeticity bug fix in the same commit: `withoutAnthropicKey` previously only
cleared the env var. After the rework, `makeJudgeClient` ALSO checks
`loadConfig().anthropic_api_key` (same hasAnthropicKey() pattern think/index.ts
uses since v0.35.5.0). If the developer running the test has the key set in
~/.gbrain/config.json, the test would behave non-deterministically. Fix:
override GBRAIN_HOME to a fresh tmpdir for the duration of the body, restore
on return (even on throw).

* test(e2e): pin verdict-loop AIConfigError catch from T5 rework end-to-end

Drives runPhaseSynthesize against a real PGLite engine with the gateway
chat transport stubbed to throw AIConfigError on every call (simulates a
revoked/misconfigured provider surfacing mid-run). Asserts:

  - Phase does NOT crash; converts the throw to a per-transcript verdict
    with worth=false and reasons[0] matching "gateway error: ...".
  - status='ok' so subsequent transcripts in the loop would continue
    being judged (not visible in 1-transcript test, but the loop shape is
    proven not to abort).

Pre-rework (T5), this code path didn't exist — judgeSignificance threw
directly to runPhaseSynthesize and crashed the whole phase. Pin so a
future regression that removes the try/catch fires loudly.

* docs(claude.md): annotate v0.41+ community-PR-wave changes

Two additions to the Key files section:

- src/core/cycle/synthesize.ts — appends a v0.41+ paragraph documenting
  the gateway-adapter rework (makeJudgeClient + AIConfigError catch loop +
  canonical config key + JudgeClient interface preserved + CI guard
  reference + test file references).

- scripts/check-gateway-routed-no-direct-anthropic.sh — new entry
  documenting the CI guard's contract, scope, and how to extend
  GUARDED_FILES when migrating another file off direct SDK construction.

CLAUDE.md drives /sync-gbrain and llms.txt generation; both need the
wave's annotations to land BEFORE the llms regeneration step (T10).

* docs(llms): regenerate llms.txt + llms-full.txt for v0.41+ wave

Refreshes the auto-generated llms.txt bundles to pick up the CLAUDE.md
annotations landed earlier in this wave (gateway-adapter synthesize.ts
+ check-gateway-routed-no-direct-anthropic.sh + the cherry-picked
llama-server-reranker recipe). Pinned by test/build-llms.test.ts.

* fix(providers): dynamic-width id column accommodates llama-server-reranker

v0.40.6.1 introduced `llama-server-reranker` (21 chars), which overflowed
formatRecipeTable's static 14-char PROVIDER column. When the id is longer
than the column, padEnd is a no-op — the row starts with the tier name
directly, no space delimiter. test/providers.test.ts 'each recipe appears
at most once' iterates every recipe and asserts at least one row starts
with `${id} ` or `${id}  `; with no space after `llama-server-reranker`,
the assertion fails and the recipe appears effectively missing from the
human-readable list.

Fix: compute column width dynamically as `max(14, max(id.length) + 1)` so
every id is followed by at least one space, regardless of length. Also
widens the separator rule to match. 14 stays as the floor so the existing
short-id rows (openai 6, ollama 6, anthropic 9, ...) keep their familiar
layout when llama-server-reranker isn't in the active recipe set.

10/10 cases in test/providers.test.ts pass after the fix.

* chore: pre-landing review polish — refresh models doctor tip + file embed timeout TODO

Two pre-landing review absorptions:

- `src/commands/models.ts:154` — the help-text tip said `gbrain models doctor`
  "spends ~1 token per model" but the wave added an `embed(['probe'])` call
  AND a reranker probe. Generalize to "spends a minimal request per configured
  chat/embed/rerank surface" so the cost expectation matches reality.

- `TODOS.md` — file a follow-up to widen `default_timeout_ms` from
  RerankerTouchpoint to EmbeddingTouchpoint so `probeEmbeddingReachability`
  doesn't hardcode 5000ms while the sibling reranker probe reads the
  recipe's configured timeout. Local CPU embedding endpoints (llama-server)
  hit the same cold-start curve as Qwen3-Reranker-4B; workaround today is
  "re-run the probe" per the existing JSDoc.

Other informational findings from pre-landing review either match
established patterns (no behavioral test for `probeEmbeddingReachability`,
matching `probeRerankerReachability`), are intentional choices documented
in JSDoc (the `as unknown as Anthropic.Message` cast), or are micro-perf
in non-hot paths (autopilot's 4 sequential `getConfig` awaits per
5-minute tick). All non-blocking.

* ci: tighten gateway-routed guard against import bypass shapes + honest JSDoc

Adversarial review caught two soft spots in the wave's new contracts:

1. `scripts/check-gateway-routed-no-direct-anthropic.sh` only matched the
   default-import shape `import Anthropic from '@anthropic-ai/sdk'`. A future
   contributor (or, more realistically, a future refactor) could bypass with:
     - `import { Anthropic } from '@anthropic-ai/sdk'`
     - `import { Anthropic as A } from '@anthropic-ai/sdk'`
     - `import * as Anthropic from '@anthropic-ai/sdk'`
     - `const x = await import('@anthropic-ai/sdk')`
   Tightened the regex to match ANY value-shaped import from the SDK module
   (excluding only the explicit `import type ... from '@anthropic-ai/sdk'`
   form which the adapter's Anthropic.Message return type needs). Added a
   second grep for dynamic imports. Verified all four bypass shapes now
   trigger the guard against synthesize.ts; type-only import still passes.

2. `synthesize.ts:makeJudgeClient` JSDoc claimed the adapter "tolerates the
   array-of-blocks shape for future flexibility" — but the mapping flattens
   ONLY text blocks; `tool_use`, `tool_result`, image blocks silently
   become empty strings. Today only `judgeSignificance` calls this and it
   only sends string content, so no behavior bug. But the comment was
   marketing future flexibility the code doesn't deliver. Narrowed to call
   out the silent-drop and say to extend the mapping if a future caller
   wires non-text content through.

Both wave-scope: the CI guard was added by the wave, the JSDoc was added
by the wave's T5 rework. Adversarial review caught them before merge.

* fix(models doctor): reranker probe timeout matches live search precedence chain

Codex Pass-9 adversarial review caught a probe-vs-production divergence:
production `hybridSearch` resolves reranker timeout via the full chain
(per-call > config > recipe > bundle) by going through
`loadSearchModeConfig + resolveSearchMode`, but `probeRerankerReachability`
was reading ONLY the recipe's `default_timeout_ms` — so an operator who
set `search.reranker.timeout_ms=1000` would see doctor wait 30s and report
"reachable" while production search timed out at 1s and fail-opened.
A higher configured timeout produces the opposite false failure (probe
gives up at 5s when production would have waited longer).

Fix: extract `resolveLiveRerankerTimeoutMs(engine)` parallel to the
existing `resolveLiveRerankerModel(engine)` — same precedence chain,
same DB-plane consistency posture. The probe now reads the SAME timeout
live search reads, on the same lookup path.

The codex P1 finding about `FREE_LOCAL_*_PROVIDERS` zero-pricing being
bypassable via redirected `LLAMA_SERVER_BASE_URL` is filed as a TODO under
community-pr-wave follow-ups — couples with the existing
FREE_LOCAL_PROVIDERS unification TODO so both close in one v0.41+ PR.

* ci(guard): handle mixed type+value imports + macOS BSD sed POSIX classes

Codex structured review [P3] caught a bypass in the freshly-tightened
gateway-routed guard:

  import { type Message, Anthropic } from '@anthropic-ai/sdk';
  new Anthropic();

The previous regex `^\s*import\s+[^t][^y]*from ...` was meant to exclude
`import type ...` but stops at the `y` in `type` inside the brace list,
silently allowing the value-import `Anthropic` through. Two fixes:

1. Replace the brittle regex-based type-exclusion with a clause-level
   parse: extract the brace-list specifiers, allow the import iff EVERY
   non-empty specifier is `type`-prefixed. Catches mixed-import bypasses
   (`{ type Foo, Bar }`) while keeping all-type braces (`{ type Foo, type Bar }`)
   passing. Default + namespace imports remain always-value-shaped.

2. Replace `\s` with POSIX `[[:space:]]` in the sed extract — macOS BSD sed
   doesn't honor `\s` in extended-regex mode (it silently no-ops the pattern
   so `specifiers` comes back empty and the script falls through to the
   default/namespace branch's wrong error message).

Hermetic 7-shape regression matrix now verifies every TypeScript import
shape against the expected ALLOW/BLOCK verdict; all 7 pass:
- ALLOW: `import type Anthropic from '...'`
- ALLOW: `import type { Foo } from '...'`
- ALLOW: `import { type Message, type Foo } from '...'`
- BLOCK: `import { type Message, Anthropic } from '...'`
- BLOCK: `import { Anthropic } from '...'`
- BLOCK: `import Anthropic from '...'`
- BLOCK: `import * as A from '...'`

Subshell-trap fix in the same commit: the previous "exit 1 inside while-pipe"
pattern doesn't propagate to the outer `$?` because the pipe spawns a
subshell. Switched to a tmpfile-flagged sentinel so the verdict survives
the subshell boundary cleanly.

* chore: bump version and changelog (v0.41.4.0)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

* fix(audit-writer): route log() to file matching event ts, not real-now

CI failure surfaced a time-dependent test flake in
`test/audit/audit-writer.test.ts` "returns events from current week,
filtered by ts cutoff" (added in v0.40.4.0 PR garrytan#1300). The test pinned
synthetic `now = 2026-05-22T12:00:00Z` (ISO week 21), logged 3 events
with synthetic ts values, then called `readRecent(7, now)` expecting
to find 2 events in window.

Root cause: `log()` ignored the caller-supplied `ts` for filename
routing and ALWAYS wrote to the file matching real-time-now's ISO
week. When real CI time crossed into 2026-W22 (this Monday), the
events went to W22's file but `readRecent` walked W21 + W20 → 0 hits.

Fix:
- `log()` parses `event.ts` (when provided) and routes to the file
  matching that ts's ISO week. Falls back to real-now when ts is
  missing or unparseable.
- No behavior change for production callers — none of the 5 audit
  consumers pass `ts` explicitly (rerank-audit, audit-slug-fallback,
  content-sanity-audit, graph-signals, supervisor-audit). The writer
  stamps real-now → both ts and filename use real-now → same file
  as before.
- Sibling test "honors caller-supplied ts override" also pinned a
  fixed ts and would have broken from the opposite angle (test
  read from `computeFilename()` default = real-now). Updated to
  read from `computeFilename(new Date(fixedTs))` so it asserts the
  per-row file routing the wave now provides.

22/22 audit-writer cases pass. Production callers (5 sites) unchanged.

Pre-existing on master since v0.40.4.0; surfaced when real time
crossed into a different ISO week than the test's synthetic now.
NOT introduced by this PR (garrytan#1377 community-PR-wave) — audit-writer
files aren't touched by the wave.

---------

Co-authored-by: Tobias <34135750+tobbecokta@users.noreply.github.com>
Co-authored-by: kohai-ut <chris@tincreek.com>
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Co-authored-by: justemu <noreply@github.com>
Co-authored-by: justemu <206393437+justemu@users.noreply.github.com>
Co-authored-by: ecat2010 <90021101+ecat2010@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Dream synthesize significance judge hardcodes ANTHROPIC_API_KEY — no fallback for other LLM providers

2 participants