Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

fix: allow more than 65535 repos in deny list #61580

Merged
camdencheek merged 11 commits into
mainfrom
bahrmichael/61353-3
Apr 5, 2024
Merged

fix: allow more than 65535 repos in deny list #61580
camdencheek merged 11 commits into
mainfrom
bahrmichael/61353-3

Conversation

@bahrmichael

@bahrmichael bahrmichael commented Apr 4, 2024

Copy link
Copy Markdown
Contributor

Closes issue https://github.com/sourcegraph/sourcegraph/issues/61353

Closes PR https://github.com/sourcegraph/sourcegraph/pull/61525

This PR fixes the error extended protocol limited to 65535 parameters by not sending a large list of predicates via an SQL query, but filtering out repos that a user should not have access to in the application layer.

The new filter runs with O(n) time complexity which should be equal or better than the previous SQL query. There is however more network traffic, as we now transfer all the repos that a user does not have access to from the database to the application.

Test plan

  • Added new unit and benchmarking tests
  • Previous use cases are covered by existing test cases

@cla-bot cla-bot Bot added the cla-signed label Apr 4, 2024
@bahrmichael bahrmichael requested a review from camdencheek April 4, 2024 14:55
@bahrmichael bahrmichael self-assigned this Apr 4, 2024

@camdencheek camdencheek left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This looks good to me. Left a few comments, but nothing critical. Happy to provide a followup review if you'd like, but not necessary.

Comment thread internal/insights/store/store.go Outdated
Comment thread internal/insights/store/store_test.go Outdated
Comment thread internal/insights/store/store.go Outdated
Comment thread internal/insights/store/store.go Outdated
Comment thread internal/insights/store/store.go
Comment thread internal/insights/store/store.go Outdated
// Based on https://stackoverflow.com/a/59051095
n := 0
for _, x := range points {
// If there is no RepoId on a point, then we can't filter by repo permissions and default to allowing access.

@camdencheek camdencheek Apr 4, 2024

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a change in behavior, right? Previously we were checking repo_id NOT IN (...), but if repo_id is NULL, Postgres evaluates NULL not in (1,2,3) as NULL, which I think would mean this was previously excluded

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of what I'm talking about:

localhost sourcegraph@sourcegraph=# create table test (id int);
CREATE TABLE
Time: 22.762 ms
localhost sourcegraph@sourcegraph=# insert into test values (1), (2), (3);
INSERT 0 3
Time: 1.803 ms
localhost sourcegraph@sourcegraph=# select * from test;
┌────┐
│ id │
├────┤
│  1 │
│  2 │
│  3 │
└────┘
(3 rows)

Time: 0.826 ms
localhost sourcegraph@sourcegraph=# select * from test where NULL in (5,6,7);
┌────┐
│ id │
├────┤
└────┘
(0 rows)

Time: 0.935 ms

@bahrmichael bahrmichael Apr 5, 2024

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to break this down myself (see below) to make sure I understand it right, and got to the same conclusion: The original code prevents users from seeing records where the ID is null. Notable difference on my side is that I see not in instead of in. The query result is the same though.

This however means that no one (not even admins) would be allowed to see those records, because their list of denied repos will just be empty and the check for NULL not in () still yields false.

Users with no denied repos can see all data points, because the predicate is not added to their query:

if len(excludedRepoIDs) > 0 {
    preds = append(preds, sqlf.Sprintf("sp.repo_id not in (%s)", sqlf.Join(excludedRepoIDs, ",")))
}

We have this original query

select iv.title, ivs.label, i.query, isrt.recording_time, rn.name, coalesce(sp.value, 0) as value, sp.capture
from %s isrt
    join insight_series i on i.id = isrt.insight_series_id
    join insight_view_series ivs ON i.id = ivs.insight_series_id
    join insight_view iv ON ivs.insight_view_id = iv.id
    left outer join %s sp on sp.series_id = i.series_id and sp.time = isrt.recording_time
    left outer join repo_names rn on sp.repo_name_id = rn.id
	where iv.unique_id = %s and %s
    order by iv.title, isrt.recording_time, ivs.label, sp.capture;

that we can condense down to

select *
from %s isrt
    left outer join %s sp
	where %s

The first %s is some table, the second is the series points table, and the third are all the predicates (including denied repos and regexes).

The denied repos part is essentially sp.repo_id not in (%s) where %s is a list of repo ids.

Filled in the query above looks like this:

select *
from %s isrt
    left outer join %s sp
	where sp.repo_id not in (%s)

We can further simplify this to

select *
from TABLE t
	where t.id not in (%s)

If id is in the list it should be blocked. The case we're discussing here is if the record should be blocked when t.id is NULL.

Here's a fiddle I used, where I get the same results as you:

https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/12982

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about db-fiddle! This all sounds right to me. Thanks for the detailed thought process

Comment thread CHANGELOG.md
- Fixed an issue where repositories with a name ending in `.git` failed to clone. [#60627](https://github.com/sourcegraph/sourcegraph/pull/60627)
- Fixed an issue where Sourcegraph could lose track of repositories on gitserver, leaving behind unnecessary data and inconsistent clone status in the UI. [#60627](https://github.com/sourcegraph/sourcegraph/pull/60627)
- The "Commits" button in repository and folder pages links to commits in the current revision instead of in the default branch. [#61408](https://github.com/sourcegraph/sourcegraph/pull/61408)
- Fixed an issue where code insights queries would fail if there are more than 65535 restricted repositories and regular expressions. [#61580](https://github.com/sourcegraph/sourcegraph/pull/61580)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the changelog entry 🙂

Comment thread internal/insights/store/store.go Outdated
// Based on https://stackoverflow.com/a/59051095
n := 0
for _, x := range points {
// If there is no RepoId on a point, then we can't filter by repo permissions and default to allowing access.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL about db-fiddle! This all sounds right to me. Thanks for the detailed thought process

@camdencheek

Copy link
Copy Markdown
Member

FYI, "Closes issue #NUMBER" does not auto-close that issue, but Closes #NUMBER does. GitHub auto-close syntax is kinda picky 🙂

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants