Skip to content

rank: mention-dense tooling files saturate past the path prior (occurrence count needs a pre-boost cap) #598

@justrach

Description

@justrach

Problem

The tooling-path prior (#546, ×0.5 for bench/scripts/website/install) is multiplicative against the raw per-line occurrence count, so a mention-dense line shrugs it off: a bench script repeating a term six times scores 6.0×0.5=3.0 and beats the implementation's 2.0. Live: codedb search capture returned benchmarks/search-shootout/shootout.py in every top-8 slot despite the prior.

A naive total-score cap (like the doc-language penalty's @min(score,1)×0.5) would destroy eponymy — codedb search install must still rank install/install.sh first. The fix is to cap only the OCCURRENCE BASE for tooling paths BEFORE the stem/symbol boosts: density can't dominate, the +15 stem boost still wins for eponymous lookups.

Failing Test

src/test_search.zig (named after this issue once assigned): dense bench/shootout.py (6 mentions/line) must rank below src/owner.zig (2 mentions); install/install.sh must still rank first for query install. Fails red on release/0.2.5825 (bench file first).

Fix

In rerankSignalScore: compute tooling-ness early, score = @min(score, 2.0) before the boost section when tooling, keep the existing ×0.5 at the end.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:p2Medium priority

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions