Search: add repo:has.topic() predicate#48875
Conversation
22fcd89 to
e5d9b6a
Compare
Bundle size report 📦
Look at the Statoscope report for a full comparison between the commits c7c225f and 46e0f62 or learn more. Open explanation
|
e5d9b6a to
acb4b68
Compare
acb4b68 to
901f491
Compare
901f491 to
681eafb
Compare
|
I will be out for most of tomorrow, and backend integration tests aren't yet passing, but I wanted to get this open for review ASAP given branch cut is looming. I will fix the integration tests once I get back. |
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff 46e0f62...c7c225f.
|
jtibshirani
left a comment
There was a problem hiding this comment.
This is looking good to me, but I still need to review the database logic. To check -- do you want to ship this with the upcoming release (so we should be speedy with reviews!)
One overall question: am I understanding right that we'll now have both has.tag and has.topic predicates? This feels a little confusing, will we eventually remove has.tag?
Yes! We have quite a few customers asking for this, so I'd really like to not make them wait until June 🙂
Yes, that is correct. I agree this is a little confusing. I would love to do a little bit of consolidation of our query language, and the kvp predicates could use some careful design. |
4526636 to
de20269
Compare
|
Fixed backend integration tests. The problem is I added topics to the repos in |
I added a few comments to the code in an attempt to help clarify what's going on there |
jtibshirani
left a comment
There was a problem hiding this comment.
This looks good to me. I gave my best shot at a review, but I'm still getting up to speed on our SQL standards and approach to database migrations.
I left a couple questions but didn't see anything blocking.
| // the repo's metadata. This is designed to work with the | ||
| // idx_repo_github_topics index. | ||
| // | ||
| // We use the unusual `jsonb_build_array` and `jsonb_build_object` |
There was a problem hiding this comment.
I was wondering if we ever use prepared statements to optimize common queries and prevent SQL injections?
There was a problem hiding this comment.
There are two parts to this answer:
For query optimization, our Postgres driver automatically prepares and caches prepared statements. I haven't actually looked into how well that cache is utilized -- this would be worth looking into. For something like listing repos though, explicitly preparing all variations of the query would likely become untenable given the number of different options we support (all of which change the shape of the query, making the prepared statement unusable).
For SQL injections, sqlf.Sprintf() does actually protect against this. It is not the same as fmt.Sprintf(). It maintains the separation of query text and query variables which we then pass separately to Postgres, which handles the rest.
My comment here was just about not being able to construct a JSONB object safely with the string syntax. SELECT jsonb_build_object('Name', $1::text); is valid, SELECT '{"Name": "$1"}'::jsonb; is not. I didn't want to do something like q := "SELECT '{\"Name": \"" + topic +" \"}'::jsonb" in the application code because that's super vulnerable.
| @@ -0,0 +1,4 @@ | |||
| -- Undo the changes made in the up migration | |||
|
|
|||
| DROP INDEX IF EXISTS idx_repo_github_topics; | |||
There was a problem hiding this comment.
In terms of testing -- am I understanding right that CI automatically runs basic checks against migrations (like for idempotency, running them before starting unit tests, etc.)
There was a problem hiding this comment.
Yes! We test that all migrations run, they are idempotent, and they can be re-applied after migrating down. All unit tests run against a fully migrated database. I don't know of any good ways to test "does this index apply to my query" other than doing a manual test on a realistic instance and ensuring the EXPLAIN output looks right -- would be interested in solutions here 🙂
This adds first-class support for searching repos by GitHub topic.
I added automated ingestion of GitHub topics in https://github.com/sourcegraph/sourcegraph/pull/47635. This PR add support for searching repos using this information.
Overview of this PR:
repo:has.topic()predicate, which exposes this via the search languageThis is an alternative to the planned implementation where we ingest topics as KVPs then query them with
repo:has.tag(topic). I decided to switch course for a few reasons:repo:has.tag()is confusing because it suggests git tags.repo:has.topic()has a meaning that is more obvious.repo_kvptable consistent with therepotable is surprisingly difficult, and there were a lot of edge cases like conflicting keys that would be difficult to solve in a maintainable way.Test plan
I added a handful of unit tests, database tests, and integration tests to exercise the new behavior. I manually tested that the added index works well on sourcegraph.com.