fix: allow more than 65535 repos in deny list #61580
Conversation
camdencheek
left a comment
There was a problem hiding this comment.
Nice! This looks good to me. Left a few comments, but nothing critical. Happy to provide a followup review if you'd like, but not necessary.
| // Based on https://stackoverflow.com/a/59051095 | ||
| n := 0 | ||
| for _, x := range points { | ||
| // If there is no RepoId on a point, then we can't filter by repo permissions and default to allowing access. |
There was a problem hiding this comment.
This is a change in behavior, right? Previously we were checking repo_id NOT IN (...), but if repo_id is NULL, Postgres evaluates NULL not in (1,2,3) as NULL, which I think would mean this was previously excluded
There was a problem hiding this comment.
Example of what I'm talking about:
localhost sourcegraph@sourcegraph=# create table test (id int);
CREATE TABLE
Time: 22.762 ms
localhost sourcegraph@sourcegraph=# insert into test values (1), (2), (3);
INSERT 0 3
Time: 1.803 ms
localhost sourcegraph@sourcegraph=# select * from test;
┌────┐
│ id │
├────┤
│ 1 │
│ 2 │
│ 3 │
└────┘
(3 rows)
Time: 0.826 ms
localhost sourcegraph@sourcegraph=# select * from test where NULL in (5,6,7);
┌────┐
│ id │
├────┤
└────┘
(0 rows)
Time: 0.935 ms
There was a problem hiding this comment.
I need to break this down myself (see below) to make sure I understand it right, and got to the same conclusion: The original code prevents users from seeing records where the ID is null. Notable difference on my side is that I see not in instead of in. The query result is the same though.
This however means that no one (not even admins) would be allowed to see those records, because their list of denied repos will just be empty and the check for NULL not in () still yields false.
Users with no denied repos can see all data points, because the predicate is not added to their query:
if len(excludedRepoIDs) > 0 {
preds = append(preds, sqlf.Sprintf("sp.repo_id not in (%s)", sqlf.Join(excludedRepoIDs, ",")))
}We have this original query
select iv.title, ivs.label, i.query, isrt.recording_time, rn.name, coalesce(sp.value, 0) as value, sp.capture
from %s isrt
join insight_series i on i.id = isrt.insight_series_id
join insight_view_series ivs ON i.id = ivs.insight_series_id
join insight_view iv ON ivs.insight_view_id = iv.id
left outer join %s sp on sp.series_id = i.series_id and sp.time = isrt.recording_time
left outer join repo_names rn on sp.repo_name_id = rn.id
where iv.unique_id = %s and %s
order by iv.title, isrt.recording_time, ivs.label, sp.capture;that we can condense down to
select *
from %s isrt
left outer join %s sp
where %sThe first %s is some table, the second is the series points table, and the third are all the predicates (including denied repos and regexes).
The denied repos part is essentially sp.repo_id not in (%s) where %s is a list of repo ids.
Filled in the query above looks like this:
select *
from %s isrt
left outer join %s sp
where sp.repo_id not in (%s)We can further simplify this to
select *
from TABLE t
where t.id not in (%s)If id is in the list it should be blocked. The case we're discussing here is if the record should be blocked when t.id is NULL.
Here's a fiddle I used, where I get the same results as you:
There was a problem hiding this comment.
TIL about db-fiddle! This all sounds right to me. Thanks for the detailed thought process
| - Fixed an issue where repositories with a name ending in `.git` failed to clone. [#60627](https://github.com/sourcegraph/sourcegraph/pull/60627) | ||
| - Fixed an issue where Sourcegraph could lose track of repositories on gitserver, leaving behind unnecessary data and inconsistent clone status in the UI. [#60627](https://github.com/sourcegraph/sourcegraph/pull/60627) | ||
| - The "Commits" button in repository and folder pages links to commits in the current revision instead of in the default branch. [#61408](https://github.com/sourcegraph/sourcegraph/pull/61408) | ||
| - Fixed an issue where code insights queries would fail if there are more than 65535 restricted repositories and regular expressions. [#61580](https://github.com/sourcegraph/sourcegraph/pull/61580) |
There was a problem hiding this comment.
Thank you for the changelog entry 🙂
| // Based on https://stackoverflow.com/a/59051095 | ||
| n := 0 | ||
| for _, x := range points { | ||
| // If there is no RepoId on a point, then we can't filter by repo permissions and default to allowing access. |
There was a problem hiding this comment.
TIL about db-fiddle! This all sounds right to me. Thanks for the detailed thought process
|
FYI, "Closes issue #NUMBER" does not auto-close that issue, but |
Closes issue https://github.com/sourcegraph/sourcegraph/issues/61353
Closes PR https://github.com/sourcegraph/sourcegraph/pull/61525
This PR fixes the error
extended protocol limited to 65535 parametersby not sending a large list of predicates via an SQL query, but filtering out repos that a user should not have access to in the application layer.The new filter runs with O(n) time complexity which should be equal or better than the previous SQL query. There is however more network traffic, as we now transfer all the repos that a user does not have access to from the database to the application.
Test plan