Skip to content

Improve literal search recall and exact filename ranking #363

@justrach

Description

@justrach

Summary

While using codedb against a healthy local index of blackfloofie/codegraff, a couple of query behaviors made the source navigation less reliable than expected:

  1. Literal phrase search missed a confirmed Rust source match.
  2. Fuzzy filename search ranked an exact filename match below unrelated lib.rs files.
  3. Batched sub-operations with missing required arguments returned per-operation errors; this is user error, but clearer validation/examples could make the failure easier to recover from.

Environment

  • Target project indexed locally: blackfloofie/codegraff
  • codedb status during repro:
    seq: 950
    files: 942
    outlines: 942
    contents_cached: 942
    trigram_index: mmap+overlay (935 files)
    scan: ready
    
  • Source definition confirmed at crates/forge_app/src/app.rs:47:
    pub struct ForgeApp<S> {

Reproduction

Literal phrase search recall

  1. Confirm the source line exists:

    • codedb_read path=crates/forge_app/src/app.rs line_start=47 line_end=55
    • Result includes crates/forge_app/src/app.rs:47 with pub struct ForgeApp<S> {.
  2. Search for the identifier:

    • codedb_search query=ForgeApp
    • Result includes crates/forge_app/src/app.rs:47.
  3. Search for the literal phrase:

    • codedb_search query="pub struct ForgeApp" regex=false
    • Result only returned architecture.md:232, missing the Rust source match at crates/forge_app/src/app.rs:47.
  4. Search with regex:

    • codedb_search query="pub\\s+struct\\s+ForgeApp" regex=true
    • Result correctly included crates/forge_app/src/app.rs:47.

Fuzzy filename ranking

  1. Run fuzzy file search:
    • codedb_find query=cli.rs max_results=5
  2. Observed result order:
    1. crates/forge_ci/src/lib.rs
    2. crates/forge_fs/src/lib.rs
    3. crates/forge_app/src/lib.rs
    4. crates/forge_api/src/lib.rs
    5. crates/forge_main/src/cli.rs
    
  3. The exact filename match crates/forge_main/src/cli.rs was present, but ranked fifth. The CLI definitions in that file start at crates/forge_main/src/cli.rs:15.

Batched query validation UX

Calling a batch with sub-operations that omitted required fields returned per-operation errors like:

--- [1] codedb_outline ---
error: missing 'path' argument
--- [2] codedb_search ---
error: missing 'query' argument

This is technically correct, but a bundle-level schema hint or examples of valid sub-operation arguments might make recovery faster.

Expected behavior

  • Literal phrase search for pub struct ForgeApp should find crates/forge_app/src/app.rs:47 because the phrase is a contiguous substring of pub struct ForgeApp<S> {.
  • Fuzzy file search for cli.rs should heavily boost exact basename matches so crates/forge_main/src/cli.rs ranks first or near-first.
  • Batched operation argument errors should remain precise, but ideally include enough context to quickly correct the malformed sub-operation.

Actual behavior

  • Identifier search and regex search found the Rust source definition, but literal phrase search did not.
  • Fuzzy file search found the exact cli.rs path only after several unrelated lib.rs files.
  • Batch errors were accurate, but sparse.

Impact

Agents can still work around these issues by preferring symbol lookup, regex search, and direct reads, but literal search recall and exact filename ranking are common navigation paths. Improving these would reduce redundant queries and make codedb more dependable for codebase exploration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingpriority:p2Medium prioritystatus:backlogWork item has not been started

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions