Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

search: apply phrase boosting to more queries#64207

Merged
jtibshirani merged 2 commits into
mainfrom
jtibs/phrase-boost
Aug 1, 2024
Merged

search: apply phrase boosting to more queries#64207
jtibshirani merged 2 commits into
mainfrom
jtibs/phrase-boost

Conversation

@jtibshirani

@jtibshirani jtibshirani commented Aug 1, 2024

Copy link
Copy Markdown
Contributor

To ease the transition from older search semantics to keyword search, we boost matches on phrases. For example if you search for // The vararg opts parameter can include, we ensure the match in the method comment is highest, even though there were no explicit quotes in the query.

We decided to limit this boosting to searches with 3 or more terms. With 2 terms, it's possible there are a ton of phrase matches, which can drown out high quality 'AND' matches. However, we've seen a few dogfood examples with fewer terms where boosting would have been really useful.

This PR removes the 3 term restriction on boosting. To mitigate noise, it updates the phrase matching to only match on word boundaries. It also switches to matching on the original query string instead of reconstructing the phrase from terms. That lets us match even when there's special characters, like return "disabled".

Relates to SPLF-168

Test plan

Adapted unit tests. Lots of manual testing:

  • invalid ModelID
  • return "disabled"
  • func TestClient_do(
  • // The vararg opts parameter can include functions to configure the
  • test server -- this still works quite well, it's a bit fragile but restricting the matches to word boundaries helped reduce noise
  • bytes buffer

@cla-bot cla-bot Bot added the cla-signed label Aug 1, 2024
@github-actions github-actions Bot added team/product-platform team/search-platform Issues owned by the search platform team labels Aug 1, 2024
@jtibshirani jtibshirani requested a review from a team August 1, 2024 10:06
// Remove predicates from the original query to keep just the pattern string.
terms := strings.Fields(originalQuery)
filteredTerms := make([]string, 0)
for _, term := range terms {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeking input on this part -- too fragile or reasonable invariant that all predicates will include a colon?

@stefanhengl stefanhengl Aug 1, 2024

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use basic.Pattern instead of filtering manually? Predicates should all be part of basic.Parameters, no?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this on Zoom. Using basic is not a good choice because it doesn't contain all the information we need for proper phrase matching, like for example quotes.

@jtibshirani jtibshirani merged commit c966d94 into main Aug 1, 2024
@jtibshirani jtibshirani deleted the jtibs/phrase-boost branch August 1, 2024 13:26

@keegancsmith keegancsmith left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

jtibshirani added a commit that referenced this pull request Aug 6, 2024
Follow up to #64207. In our old search semantics, quotes were
interpreted literally. So a query like `"sourcegraph"` would match only
strings like `fmt.Println("sourcegraph")`. Now, both single and double
quotes are used for escaping, and mean that the contents should be
searched exactly.

This PR makes sure to boost matches on quoted terms in result ranking.
This way, users familiar with the old syntax are more likely to find
what they're after.

## Test plan

Adapted unit tests. Re-tested all queries from #64207 manually, plus
these ones:
* `'sourcegraph'`
* `"sourcegraph"`
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/product-platform team/search-platform Issues owned by the search platform team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants