Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

github: add ability to exclude repositories based on size & stars#58377

Merged
mrnugget merged 4 commits into
mainfrom
mrn/exclude-size-stars
Nov 16, 2023
Merged

github: add ability to exclude repositories based on size & stars#58377
mrnugget merged 4 commits into
mainfrom
mrn/exclude-size-stars

Conversation

@mrnugget

@mrnugget mrnugget commented Nov 16, 2023

Copy link
Copy Markdown
Contributor

This adds the ability to exclude GitHub repositories based their stars or size.

Main goal: add exclude: [{"size": "> 1GB", "stars": "< 100"}] to our configuration on dotcom to free up disk space there.

It does that by extending the exclude schema of the GitHub code host connection in the following ways:

  • It adds stars that accepts a conditional expression. Examples: < 500, >100, <= 2000, >=3333
  • It adds size that accepts a conditional expression and multiple size units (see code, docs will follow). Examples: >= 500 MB, <= 4000 KiB, > 1GB, <22 MiB
  • It allows multiple attributes on an exlude which then get ANDed together. But for now this only works for size/stars. I want to clean this up in a follow-up PR but want to get this one out so I can update the code host connection on dotcom ASAP.

Limitations

  • Doesn't allow negative values
  • Doesn't allow decimal point in values
  • Doesn't support all units

Example config

exclude

Test plan

  • Manually tested this in my local instance
  • Lots of tests

@cla-bot cla-bot Bot added the cla-signed label Nov 16, 2023
@github-actions github-actions Bot added the team/source Tickets under the purview of Source - the one Source to graph it all label Nov 16, 2023
@mrnugget mrnugget force-pushed the mrn/exclude-size-stars branch from 8b6f084 to b7bb2f9 Compare November 16, 2023 14:23
@mrnugget mrnugget changed the base branch from main to mrn/persist-disk-usage November 16, 2023 14:23
@mrnugget mrnugget force-pushed the mrn/exclude-size-stars branch from b7bb2f9 to 2bf6403 Compare November 16, 2023 14:28
@mrnugget mrnugget force-pushed the mrn/persist-disk-usage branch from f3540b7 to 8a67c37 Compare November 16, 2023 14:31
@mrnugget mrnugget force-pushed the mrn/exclude-size-stars branch 2 times, most recently from 17a675e to 6b556cb Compare November 16, 2023 15:00
Base automatically changed from mrn/persist-disk-usage to main November 16, 2023 15:10
@mrnugget mrnugget force-pushed the mrn/exclude-size-stars branch from 6b556cb to e598ed5 Compare November 16, 2023 15:26
@mrnugget mrnugget marked this pull request as ready for review November 16, 2023 15:33
@mrnugget mrnugget requested a review from a team November 16, 2023 15:33
@sourcegraph-bot

sourcegraph-bot commented Nov 16, 2023

Copy link
Copy Markdown
Contributor

Codenotify: Notifying subscribers in CODENOTIFY files for diff fcc502b...ae67759.

Notify File(s)
@eseliger internal/extsvc/github/BUILD.bazel
internal/extsvc/github/common.go
internal/repos/BUILD.bazel
internal/repos/exclude.go
internal/repos/exclude_test.go
internal/repos/github.go
internal/repos/github_test.go
internal/repos/sources_test.go

@mrnugget mrnugget enabled auto-merge (squash) November 16, 2023 17:45
Comment thread internal/bytesize/bytesize.go
Comment thread internal/bytesize/bytesize.go
Comment thread schema/github.schema.json
Comment thread internal/repos/exclude.go
Comment on lines +161 to +164
size, err := bytesize.Parse(sizeMatch[2])
if err != nil {
return nil, err
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we special case 0 here, or make it return a pointer to signify absence of data, in case some repo doesn't have it or github messes up?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I forgot that, thanks! Well, if I understood it correctly: this does not parse the size of the repo, but what's in the rule. That is already covered by the regex.

But I changed the evaluation to ensure that repos with size == 0 will not get excluded, because that's a special case. And turns out that even empty repositories have 1 from what I've seen.

@mrnugget mrnugget merged commit 8f0c040 into main Nov 16, 2023
@mrnugget mrnugget deleted the mrn/exclude-size-stars branch November 16, 2023 18:03
Comment thread internal/repos/exclude.go
Comment on lines +173 to +187
const (
opLess operator = "<"
opLessOrEqual operator = "<="
opGreater operator = ">"
opGreaterOrEqual operator = ">="
)

func newOperator(input string) (operator, error) {
if input != "<" && input != "<=" && input != ">" && input != ">=" {
return "", errors.Newf("invalid operator %q", input)
}
return operator(input), nil
}

func (o operator) Eval(left, right int) bool {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, there is a great library https://github.com/antonmedv/expr I've used in Cloud tooling that allows you to write expressions like this with variables, custom functions etc, though it might make sense to roll your own to keep the functionality specific

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, yeah that's a full-on evaluation framework. I think I'll stick with my little fucntion here :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanding the thinking a bit more: a full evaluation framework you can use a struct as a variable, e.g. Repo with properties about it, which would enable crazy things like "only include repos that start with foo and have label bar and have over 100 stars but less than 150 stars, and also has an odd number of characters in the repo name, and is >0MB and <100MB in size"... in case anyone ever asks for that 😁

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/source Tickets under the purview of Source - the one Source to graph it all

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants