Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

Search: boost matches on quoted terms#64298

Merged
jtibshirani merged 1 commit into
mainfrom
jtibs/phrase-boost
Aug 6, 2024
Merged

Search: boost matches on quoted terms#64298
jtibshirani merged 1 commit into
mainfrom
jtibs/phrase-boost

Conversation

@jtibshirani

Copy link
Copy Markdown
Contributor

Follow up to #64207. In our old search semantics, quotes were interpreted literally. So a query like "sourcegraph" would match only strings like fmt.Println("sourcegraph"). Now, both single and double quotes are used for escaping, and mean that the contents should be searched exactly.

This PR makes sure to boost matches on quoted terms in result ranking. This way, users familiar with the old syntax are more likely to find what they're after.

Test plan

Adapted unit tests. Re-tested all queries from #64207 manually, plus these ones:

  • 'sourcegraph'
  • "sourcegraph"

@cla-bot cla-bot Bot added the cla-signed label Aug 6, 2024
@github-actions github-actions Bot added team/product-platform team/search-platform Issues owned by the search platform team labels Aug 6, 2024
@jtibshirani

Copy link
Copy Markdown
Contributor Author

Some notes:

  • To get this to work, I needed to remove the word-boundary anchoring. This regressed our classic queries test server and bytes buffer a bit, but the correct result is still in second place, so I feel this is a good trade-off.
  • This change correctly doesn't touch search semantics, because all lines that match 'sourcegraph' literally must also match the term sourcegraph. However we do highlight boosted match, which is a bit weird but also feels okay.
Screenshot 2024-08-06 at 3 02 00 PM

@jtibshirani jtibshirani requested a review from a team August 6, 2024 12:12
autogold.Expect(`(or "(^|\\b)\\* int func\\(($|\\b)" (and "*" "int" "func("))`).Equal(t, test("* int func(", SearchTypeKeyword))
autogold.Expect(`(or "(^|\\b)\"foo bar\" bas qux($|\\b)" (and "foo bar" "bas" "qux"))`).Equal(t, test(`"foo bar" bas qux`, SearchTypeKeyword))
autogold.Expect(`(or "(^|\\b)foo 'bar bas' qux($|\\b)" (and "foo" "bar bas" "qux"))`).Equal(t, test(`foo 'bar bas' qux`, SearchTypeKeyword))
autogold.Expect(`(or "foo bar bas" (and "foo" "bar" "bas"))`).Equal(t, test("foo bar bas", SearchTypeKeyword))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, having the word boundaries seemed like nice property. Does this mean the boundaries are not as important as I thought?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They did make a small positive difference in my tests. But after looking at more examples, it was not as critical as I thought. For example, our classic queries test server and bytes buffer still show the correct result from golang/go in second place. So it's nice to keep things simple and remove it.

Comment on lines +722 to +723
Value: query,
Annotation: Annotation{Labels: Boost | Literal | Standard},

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!

@jtibshirani jtibshirani merged commit 958afb0 into main Aug 6, 2024
@jtibshirani jtibshirani deleted the jtibs/phrase-boost branch August 6, 2024 12:55
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/product-platform team/search-platform Issues owned by the search platform team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants