Skip to content

v0.42.13.0 fix(search): archive/ content findable by default, demoted not hard-excluded (#1777)#1797

Merged
garrytan merged 6 commits into
masterfrom
garrytan/fix-issue-1777
Jun 3, 2026
Merged

v0.42.13.0 fix(search): archive/ content findable by default, demoted not hard-excluded (#1777)#1797
garrytan merged 6 commits into
masterfrom
garrytan/fix-issue-1777

Conversation

@garrytan

@garrytan garrytan commented Jun 3, 2026

Copy link
Copy Markdown
Owner

What & why

Closes #1777. Content committed under archive/ was being synced, embedded, and graphed, then silently hard-excluded from every default search. You could search an exact phrase you knew was in an archived page, get nothing, and wrongly conclude the page didn't exist. A hardcoded DEFAULT_HARD_EXCLUDES list dropped the whole archive/ subtree unless the caller passed include_slug_prefixes.

Guiding principle the bug violated: if it's committed to the brain, it should be findable. Findability should be opt-out, not opt-in.

The fix

  • Demote, not exclude. archive/ moves out of DEFAULT_HARD_EXCLUDES (now test/, attachments/, .raw/ only) into DEFAULT_SOURCE_BOOSTS at 0.5 — findable by default, ranked below curated content. The demote is a prior applied at the SQL/fusion layer; the cross-encoder reranker can still promote a genuinely best-match archive page that survives into the rerank window.
  • Make withholding observable. New gbrain doctor check hidden_by_search_policy (local + remote/thin-client paths) counts chunked pages withheld by each active exclude prefix in one SQL query, reusing the canonical resolveHardExcludes + buildVisibilityClause + the now-exported escapeLikePattern so the count can't drift from what search filters. ok for intentional default excludes (prescriptive, agent-readable message), warn only when a non-default GBRAIN_SEARCH_EXCLUDE prefix hides pages.
  • Cache correctness. KNOBS_HASH_VERSION 8→9 — the exclude policy isn't in the cache key, so the bump invalidates archive-excluded query_cache rows on upgrade (one-time global cold-miss, refills within TTL).

Acceptance criteria (#1777)

  • ✅ A synced archive/ page is returned by search by default (ranked, demoted), no caller flag.
  • test/ / attachments/ / .raw/ remain excluded.
  • gbrain doctor reports how many pages are hidden from default search by prefix policy.
  • Per-query "withheld" envelope signal scoped out (doctor check covers observability without an MCP/CLI return-shape change).

Testing

  • bun run verify: 29/29 (incl. typecheck).
  • All changed/new test files: 202/202 (sql-ranking, search-mode, knobs-hash-reranker, cross-modal-phase1, search-alias-resolved-boost, new doctor-hidden-by-search-policy, e2e search-exclude).
  • Full unit suite: every failure accounted for (env-keyless artifacts pass clean; the 5 KNOBS_HASH version pins updated 8→9).
  • Real Postgres (seeded container smoke): archive findable + demoted below curated, test/ hidden, the new doctor SQL runs clean on both engines.

Note

Reviewed via /plan-eng-review (CLEARED) + two /codex rounds (PASS). Benign side-effect documented in the CHANGELOG: archive/: 0.5 reclassifies archive pages in the contradictions probe's source-tier breakdown from other to bulk.

🤖 Generated with Claude Code

garrytan and others added 6 commits June 2, 2026 21:35
…1777)

Move archive/ out of DEFAULT_HARD_EXCLUDES into a 0.5 source-boost demote so
archived historical content is findable by default, ranked below curated
content. Add a hidden_by_search_policy doctor check so the surviving exclude
policy (test/, attachments/, .raw/) is auditable. Bump KNOBS_HASH_VERSION 8->9
so the policy change invalidates archive-excluded query_cache rows.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1777)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…1777

# Conflicts:
#	CHANGELOG.md
#	CLAUDE.md
#	VERSION
#	llms-full.txt
#	package.json
…1777

# Conflicts:
#	CHANGELOG.md
#	VERSION
#	package.json
@garrytan garrytan merged commit bea2d3e into master Jun 3, 2026
21 checks passed
mgunnin added a commit to mgunnin/gbrain that referenced this pull request Jun 3, 2026
* upstream/master:
  v0.42.23.0 feat(jobs): --nice scheduling-priority flag for jobs work/supervisor (garrytan#1815) (garrytan#1820)
  v0.42.22.0 fix(minions): supervisor progress watchdog + worker DB self-defense — alive-but-wedged worker self-heals (garrytan#1801) (garrytan#1824)
  v0.42.21.0 fix(postgres): module-singleton ownership — canonical landing for the dream-cycle "connect() has not been called" class (garrytan#1404/garrytan#1471/garrytan#1619) (garrytan#1805)
  v0.42.20.0 fix: reliability wave — PGLite capture lock-pin + Postgres reconnect race + search embed-hang (garrytan#1762 garrytan#1745 garrytan#1775) (garrytan#1810)
  v0.42.19.0 fix(skillopt): close the last gap in the AI SDK v6 tool-loop fix (write-capture mapper + regression test) (garrytan#1809)
  v0.42.18.0 fix: sync orphan-pileup watchdog (garrytan#1633) + links-lag µs stamp (garrytan#1768) (garrytan#1807)
  v0.42.17.0 fix(sync): resumable incremental sync — killed mid-import no longer loses progress (garrytan#1794) (garrytan#1808)
  v0.42.16.0 feat(doctor): brain health as a solved problem — cause-ranked doctor + OOM-loop line + auto-drain + pool-reap (garrytan#1685) (garrytan#1802)
  v0.42.15.0 fix: decouple CLI primary output from process.stdout.isTTY (garrytan#1784) (garrytan#1806)
  v0.42.14.0 fix(zero-config): code-* readiness signal + init embedding-key validation + lock self-heal (garrytan#1780) (garrytan#1804)
  v0.42.13.0 fix(search): archive/ content findable by default, demoted not hard-excluded (garrytan#1777) (garrytan#1797)
  v0.42.12.0 feat: self-upgrading gbrain — invocation-riding update check + opt-in auto-upgrade (garrytan#1798)
  v0.42.11.0 feat(skillopt): held-out eval gate, honest receipts, ENFORCE + ablation opts (garrytan#1759)
  v0.42.10.0 feat(extract): opt-in global-basename wikilink resolution (closes garrytan#972) (garrytan#1388)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

archive/ content is ingested but silently hard-excluded from all search by default (in-repo should mean findable)

1 participant