Problem Statement
There is a discrepancy between the CLI backfill logic and the live bot pipeline:
- CLI Backfill: Indexes
Title + Body + Comments.
- Live Bot: Indexes
Title + Body ONLY.
This leads to inconsistent search results depending on how an item was indexed. An issue backfilled via CLI will have richer context than one processed by the bot in real-time.
Proposed Solution
- Create a shared function (e.g., in
internal/utils/text) to generate the content string for embedding.
- Refactor
internal/steps/indexer.go (Bot) to use this function.
- Refactor
cmd/simili/commands/index.go (CLI) to use this function.
- The shared function MUST include comments for both Issues and PRs to ensure rich context for similarity search.
Feature Scope
Problem Statement
There is a discrepancy between the CLI backfill logic and the live bot pipeline:
Title+Body+ Comments.Title+BodyONLY.This leads to inconsistent search results depending on how an item was indexed. An issue backfilled via CLI will have richer context than one processed by the bot in real-time.
Proposed Solution
internal/utils/text) to generate the content string for embedding.internal/steps/indexer.go(Bot) to use this function.cmd/simili/commands/index.go(CLI) to use this function.Feature Scope