Search backend: remove plan expansion of file:contains.content()#39501
Conversation
* refactor and adding tests * update schema for new policy * remove PasswordPolicy from default dev config * add more tests, fix bugs * Add PasswordPolicy changes to frontend * add PasswordPolicy jsoncontext for frontend * add PasswordPolicy to schema * create security helper functions, code re-use * dedup validatePassword functionality * remove duplicate auth.passwordPolicy * always use minPasswordLen * fix build, add types, fix tests * add experimental passwordpolicy back to schema * refactor, dedup password requirement check * add more tests * refactor, run linters * add deprecation notices * always return a GenericPasswordPolicy * remove conf.go move to general conf * remove interface type * deprecate PasswordPolicy * fix tests, refactor * run pretty * serialize json as frontend expects * Apply suggestions from code review Co-authored-by: Thorsten Ball <mrnugget@gmail.com> * Apply suggestions from code review Co-authored-by: Thorsten Ball <mrnugget@gmail.com> * change test to be table driven Co-authored-by: Thorsten Ball <mrnugget@gmail.com>
…n server-side execution (#38921) * add migration * add sql for migration * add sql to squashed.sql * specify batch change id in createBatchSpecFromRaw * pass batch change in frontend * add comment * arrange methods properly * remove trailing lines * revert arrngement * update * fix failing test * fix failing tests * add test for resolver * add test for service * add test for service * more test for service * set field name to bigint * remove comment * update the update query * remove confusing comment * rename variables * rename BastchChangeID variable * add test for unauthorised user creating a batch spec from raw * update db schema * remove trailing line * remove comment * prettier things
This prevents serving 3.42.0 by default, which is broken for the preview/apply workflows.
For permission testing I wrote this command to very conveniently switch between users. This is kind of like the already existing testproxy, except much more convenient to use as well as being targetted for auth testing rather than http-header testing. I'm unsure of how to document this further so people are aware of it. Alternatively I think it is also useful to maybe spin up by default in our enterprise env. I'll leave both of those for future PRs. For now I will advertise in #dev-chat and #dev-experience. Here is the example output to give you a feel for what it does: $ go run ./dev/internal/cmd/auth-proxy-http-header https://docs.sourcegraph.com/admin/auth#http-authentication-proxies "auth.providers": [ { "type": "http-header", "usernameHeader": "X-Forwarded-User", "emailHeader": "X-Forwarded-Email" } ] Visit http://127.0.0.1:10810 for keegan keegan@sourcegraph.com Visit http://127.0.0.1:10811 for user1 keegan+user1@sourcegraph.com Visit http://127.0.0.1:10812 for user2 keegan+user2@sourcegraph.com Visit http://127.0.0.1:10813 for user3 keegan+user3@sourcegraph.com Visit http://127.0.0.1:10814 for user4 keegan+user4@sourcegraph.com Visit http://127.0.0.1:10815 for user5 keegan+user5@sourcegraph.com Test Plan: Ran locally
…information (#39250) This PR adds support for reading the [GitHub style code owner specs](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners) from repositories to infer code ownership information automatically. To do that, we add a new dependency: https://github.com/hmarr/codeowners (MIT license so I don't see any legal issues). Here's a bullet point recap of what changes we can find here: - I've added a new `codeownership.Ruleset` type that currently is a proxy to the `codeowners.Ruleset` (from the new dependency). This allows us to expose a nice API but also will make it easier to later add other ways to get the code ownership mapping: E.g. we may want to fetch data rules from our postgres database instead. - We have a new method to create a ruleset based on querying the gitserver for a specific repo at a specific revision. This will download a static number of blobs that are [allowed code owner locations](https://docs.github.com/en/repositories/managing-your-repositorys-settings-and-features/customizing-your-repository/about-code-owners#codeowners-file-location) (The plugin also [supports GitLab style code owners](hmarr/codeowners@f72d282) so I've added the [`.gitlab` path](https://docs.gitlab.com/ee/user/project/code_owners.html#set-up-code-owners)) - In the filter job, we create a new `map` so that we only have to create the `codeowners.Ruleset` for a specific repo once per search. So if one repo returns 100 results, we only have one ruleset. - When a code ownership mapping was found, we test the result paths against the mapping. If the owners contain the owner we need to include, we keep the result, otherwise we drop it. - I added a new test for the filter job. - I also migrated the filter job test to use `autogold` while at it. - I added a feature flag to gate out the work.
Instrumentation showed it is not being used anymore.
This PR makes two functional improvements to the RepoInfo client method and a semantic one. The functional improvements are: 1. Limit length of map to max number of shards For reasons unknown now, we were making an unnecesarily high number of requests since the `shards` map being created as a factor of number of repos. But it should only ever be the total number of shards of gitserver currently running. 2. Do not hide response body for failed HTTP reqs Currently, if the API server returns anythign but a 200 OK, we return the HTTP status, but swallow the HTTP body. This will contain important information that will be helpful to understand the error, which is not possible as of now. The semantic improvement is: 1. Add comment and rename shard -> repoInfoReq I spent an unnecessarily high amount of time trying to "fix" this code for what I thought was not the optimal way of collecting and sending the API requests, until I had the aha moment and finally understood why we're doing it this way. As a result, added a comment to clarify this as well as renamed shard to repoInfoReq as it doesn't look like that's the right name here and confused me somewhat more when I was reading this code.
… database (#39483) Previously every test case was run in the same test function which led to every consequent test case use the database with some state changed from every previous test case. As all the test cases deal with similar tables in the database, this can lead to unexpected behaviour or false positives, which possibility is now eliminated as every test runs against the clean database.
phabricator: Simply repo create We no longer need to perform a remote call from repo-updater in order to create a Phabricator repo, we can simply call the DB directly. This removes the need for an internal handler in frontend to talk to the DB as well as the client code to call this handler.
There was a problem hiding this comment.
Note: this is a change in behavior. Previously, we would search for files containing after_success on the HEAD commit, which would return .travis.yml, then we would search for all diffs on that file. Now, we search for diffs where the file contains after_success in the commit that is being searched. after_success was added in the last commit that touched .travis.yml, which is why this now returns 1 result rather than 10.
add a flaky test exception
rvantonder
left a comment
There was a problem hiding this comment.
Nice built-in treatment for text search :-) I played around locally, all looks great
57c6355 to
2740885
Compare
|
Codenotify: Notifying subscribers in OWNERS files for diff ee0908d...fb601f6.
|
Co-authored-by: Rijnard van Tonder <rvantonder@gmail.com>
|
Not notifying subscribers because the number of notifying subscribers (20) has exceeded the threshold (15). |
This modifies the evaluation of the
file:contains.content()predicate to no longer expand ahead of time. There are very few cases where this would work correctly in the past because the number of files we expanded into was enormous, and they all had to be scoped in the query by a repo, which made for extremely complexORqueries that would just cause stack overflows when we tried to process them.There are two cases where we support
file:contains.content():Text search is implemented in a very efficient manner. Basically, for a user input like
file:contains(abc) def, we execute the search as if the user searched forabc and def, then we filter out the ranges that matchedabc(but keep any that matchdef). This lets us to take full advantage of our existing, efficientAND/ORmachinery.Diff search is implemented in an extremely inefficient manner. For each result that comes through, we execute an unindexed search on the files matched in the diff at the matched commit and ensure that they contain all the patterns specified by the
file:contains.content()predicate. This is slow, but diff search is also slow, and I expect that thefile:contains.content()feature for diff search is hardly used, if at all, so I think it's fine. I don't want to put the effort into supporting this natively in diff search right now.Stacked on https://github.com/sourcegraph/sourcegraph/pull/39383
This is the last predicate that used the query expansion machinery, so I will remove that in the next PR.
Test plan
Added tests, backend integration test changes reflect changes in behavior.