explore: searchContent Tier 1 sort buries the definition-dense file behind unrelated small files

## Problem

`Explorer.searchContent` (`src/explore.zig:1509`) is the function `handleSearch` calls in production. Its Tier 1 (trigram candidates) sorts the candidate file list by **file content length ascending** at `src/explore.zig:1590-1598`:

```zig
const SortCtx = struct {
    contents: *const std.StringHashMap([]const u8),
    pub fn lessThan(ctx: @This(), a: []const u8, b: []const u8) bool {
        const a_len = if (ctx.contents.get(a)) |c| c.len else std.math.maxInt(usize);
        const b_len = if (ctx.contents.get(b)) |c| c.len else std.math.maxInt(usize);
        return a_len < b_len;
    }
};
```

It then applies a per-file cap of `max(1, max_results / estimated_total)` (line 1601). With many candidates, that cap is `1`. Small unrelated files contribute one hit each in the order they appear, saturate `result_list` to `max_results`, and trigger the early-return at line 1607 *before* the larger canonical file is ever scanned.

Concrete reproduction (eval, codedb 0.2.5805): `codedb_search(query="trigram_index", max_results=20)` returns hits from `adversarial_tests.zig` (2 occurrences) and `main.zig` but **does not** return `src/explore.zig`, which has 15 occurrences and is the definition site of the symbol. With `max_results=50`, `explore.zig` finally appears but is then re-truncated to 5 lines by `handleSearch`'s per-file cap.

The post-search frequency rerank at `src/explore.zig:1681-1693` operates only on the lines already collected, so it cannot recover a file that was never read.

## Failing Test

```zig
test "issue-427: searchContent Tier 1 sort starves the definition-dense file" {
    var arena = std.heap.ArenaAllocator.init(testing.allocator);
    defer arena.deinit();
    var explorer = Explorer.init(arena.allocator());

    const small_count: usize = 8;
    var i: usize = 0;
    while (i < small_count) : (i += 1) {
        var path_buf: [32]u8 = undefined;
        const path = try std.fmt.bufPrint(&path_buf, "small_{d}.zig", .{i});
        try explorer.indexFile(path, "fn s() void { _ = widgetX; }\n");
    }
    const canonical_content =
        "fn canonical() void {\n" ++
        "    _ = widgetX;\n" ++
        "    _ = widgetX;\n" ++
        "    _ = widgetX;\n" ++
        "    _ = widgetX;\n" ++
        "    // padding line ...\n" ++
        "    // padding line ...\n" ++
        "    // padding line ...\n" ++
        "    _ = 0;\n" ++
        "}\n";
    try explorer.indexFile("canonical.zig", canonical_content);

    const results = try explorer.searchContent("widgetX", testing.allocator, 5);
    // ...defer free...

    var found_canonical = false;
    for (results) |r| {
        if (std.mem.eql(u8, r.path, "canonical.zig")) found_canonical = true;
    }
    try testing.expect(found_canonical);
}
```

Failing test on branch `issue-427-failing-test` (commit `7b7495e`).

```
$ zig build test 2>&1 | rg "issue-427"
error: 'tests.test.issue-427: searchContent Tier 1 sort starves the definition-dense file' failed
       /Users/.../src/tests.zig:10515: try testing.expect(found_canonical);
```

## Expected

A file with the most occurrences of the term should appear in the result set. The reranker should not silently exclude the canonical file in favor of unrelated small files just because they were lexically shorter.

## Fix

Two complementary changes in `Explorer.searchContent` (`src/explore.zig:1587-1693`):

1. **Replace the file-length sort with a relevance-first order.** Prefer files that the word index identifies as having the term in a symbol-definition context, then by total per-file occurrences if available. Effort: small (~10 lines).

2. **Aggregate per-file occurrence counts before truncating to `max_results`.** Run `searchInContent` for every candidate (still bounded by the per-file cap), collect counts into a file-keyed map, then sort the result list by `(per_file_total desc, per_line_count desc, path asc, line_num asc)`. Drop the early-return at line 1607 in favor of post-aggregation truncation. Effort: medium.

A simpler intermediate stop-gap: when `max_per_file == 1` and there are more candidates than `max_results`, skip the length sort and use word-index hit count per file as the primary key. This alone would fix the reported case.

The repo already has `searchContentRanked` (BM25) at `src/explore.zig:1703` which does proper document-level ranking — `handleSearch` could be opted onto that path for queries with multiple word-tokens, leaving `searchContent` as the substring-match fast path with a fixed Tier 1 sort.

## Eval context

Found by an automated codedb evaluation against codedb 0.2.5805. Filed alongside #425 (substring leakage in callers) and #426 (non-code files in callers).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

explore: searchContent Tier 1 sort buries the definition-dense file behind unrelated small files #427

Problem

Failing Test

Expected

Fix

Eval context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

explore: searchContent Tier 1 sort buries the definition-dense file behind unrelated small files #427

Description

Problem

Failing Test

Expected

Fix

Eval context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions