Skip to content

search: per-file cap + histogram fallback for search_content#495

Merged
esengine merged 1 commit into
mainfrom
feat/489-search-content-per-file-cap
May 9, 2026
Merged

search: per-file cap + histogram fallback for search_content#495
esengine merged 1 commit into
mainfrom
feat/489-search-content-per-file-cap

Conversation

@esengine

@esengine esengine commented May 9, 2026

Copy link
Copy Markdown
Owner

Summary

search_content had one truncation knob — a byte budget on ctx.maxListBytes. A single popular symbol could match 200+ lines in one big file and consume the entire budget before the walk reached any other file. Callers saw a wall of App.tsx:NNN: ... lines and no hint that distribution was wider.

This PR adds three layers of distribution-preserving behavior:

  1. Per-file cap. MAX_HITS_PER_FILE = 30; overflow is summarized with [rel: N more matches in this file — re-grep with a tighter pattern or use read_file to see them].
  2. Histogram fallback. Once printed bytes cross 80 % of the byte budget, remaining files switch to rel: N matches form. A one-line notice marks the flip.
  3. summary_only: true arg. Skip line content entirely; return just the histogram. Cheap one-shot for "where does this exist at all" before drilling in with a targeted read_file.

The existing byte-budget cap stays as the safety net.

Closes #489

Test plan

  • tests/filesystem-tools.test.ts — four new cases under a per-file cap and histogram fallback describe block: hits-cap + footer, no-footer when under cap, summary_only shape, and the 80 % flip-to-summary trigger.
  • npm run verify (full suite, 2295 tests).

A single popular symbol could match 200+ lines in a big file (App.tsx,
loop.ts) and eat the entire byte budget before the walk reached any
other file. Truncation marker fired, every other matching file got
zero coverage, and the caller saw a wall of one file's hits with no
hint that distribution was wider than that.

Three knobs:

1. MAX_HITS_PER_FILE = 30. Beyond that, the per-file output ends with
   "[rel: N more matches in this file — re-grep with a tighter
   pattern or use read_file to see them]". One bullhorn file can no
   longer dominate the byte budget.

2. After each file completes, if printed bytes have crossed 80% of
   the byte budget, remaining files switch to histogram form
   ("rel: N matches"). A one-line notice marks the flip so the
   caller knows distribution from this point on is summary-only.

3. New `summary_only:true` arg skips line content entirely and
   returns just the histogram. Useful for "where does this exist at
   all" before drilling in with a targeted read_file.

Closes #489
@esengine esengine merged commit 3f6fb51 into main May 9, 2026
3 checks passed
@esengine esengine deleted the feat/489-search-content-per-file-cap branch May 9, 2026 04:50
@esengine esengine mentioned this pull request May 9, 2026
2 tasks
ChasLui pushed a commit to ChasLui/DeepSeek-Reasonix that referenced this pull request May 23, 2026
…sengine#495)

A single popular symbol could match 200+ lines in a big file (App.tsx,
loop.ts) and eat the entire byte budget before the walk reached any
other file. Truncation marker fired, every other matching file got
zero coverage, and the caller saw a wall of one file's hits with no
hint that distribution was wider than that.

Three knobs:

1. MAX_HITS_PER_FILE = 30. Beyond that, the per-file output ends with
   "[rel: N more matches in this file — re-grep with a tighter
   pattern or use read_file to see them]". One bullhorn file can no
   longer dominate the byte budget.

2. After each file completes, if printed bytes have crossed 80% of
   the byte budget, remaining files switch to histogram form
   ("rel: N matches"). A one-line notice marks the flip so the
   caller knows distribution from this point on is summary-only.

3. New `summary_only:true` arg skips line content entirely and
   returns just the histogram. Useful for "where does this exist at
   all" before drilling in with a targeted read_file.

Closes esengine#489
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tools: search_content has no per-file cap, high-frequency hits drown the result

1 participant