This repository was archived by the owner on Sep 30, 2024. It is now read-only.
search: apply phrase boosting to more queries#64207
Merged
Merged
Conversation
jtibshirani
commented
Aug 1, 2024
| // Remove predicates from the original query to keep just the pattern string. | ||
| terms := strings.Fields(originalQuery) | ||
| filteredTerms := make([]string, 0) | ||
| for _, term := range terms { |
Contributor
Author
There was a problem hiding this comment.
Seeking input on this part -- too fragile or reasonable invariant that all predicates will include a colon?
Member
There was a problem hiding this comment.
Can you use basic.Pattern instead of filtering manually? Predicates should all be part of basic.Parameters, no?
Member
There was a problem hiding this comment.
We discussed this on Zoom. Using basic is not a good choice because it doesn't contain all the information we need for proper phrase matching, like for example quotes.
stefanhengl
approved these changes
Aug 1, 2024
jtibshirani
added a commit
that referenced
this pull request
Aug 6, 2024
Follow up to #64207. In our old search semantics, quotes were interpreted literally. So a query like `"sourcegraph"` would match only strings like `fmt.Println("sourcegraph")`. Now, both single and double quotes are used for escaping, and mean that the contents should be searched exactly. This PR makes sure to boost matches on quoted terms in result ranking. This way, users familiar with the old syntax are more likely to find what they're after. ## Test plan Adapted unit tests. Re-tested all queries from #64207 manually, plus these ones: * `'sourcegraph'` * `"sourcegraph"`
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
To ease the transition from older search semantics to keyword search, we boost matches on phrases. For example if you search for
// The vararg opts parameter can include, we ensure the match in the method comment is highest, even though there were no explicit quotes in the query.We decided to limit this boosting to searches with 3 or more terms. With 2 terms, it's possible there are a ton of phrase matches, which can drown out high quality 'AND' matches. However, we've seen a few dogfood examples with fewer terms where boosting would have been really useful.
This PR removes the 3 term restriction on boosting. To mitigate noise, it updates the phrase matching to only match on word boundaries. It also switches to matching on the original query string instead of reconstructing the phrase from terms. That lets us match even when there's special characters, like
return "disabled".Relates to SPLF-168
Test plan
Adapted unit tests. Lots of manual testing:
invalid ModelIDreturn "disabled"func TestClient_do(// The vararg opts parameter can include functions to configure thetest server-- this still works quite well, it's a bit fragile but restricting the matches to word boundaries helped reduce noisebytes buffer