Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

fix(code hosts): Use more deterministic API endpoints for GitHub code host connections#63445

Merged
pjlast merged 6 commits into
mainfrom
petrilast/src-423-the-github-public-repositoryquery-option-needs-to-be
Jul 2, 2024
Merged

fix(code hosts): Use more deterministic API endpoints for GitHub code host connections#63445
pjlast merged 6 commits into
mainfrom
petrilast/src-423-the-github-public-repositoryquery-option-needs-to-be

Conversation

@pjlast

@pjlast pjlast commented Jun 24, 2024

Copy link
Copy Markdown
Contributor

Sourcegraph currently uses this GitHub API endpoint to fetch public repositories on an instance. However, this endpoint is a lot more limited than the rest of the repository API endpoints. For example, it doesn't return the archive status of a repo, its visibility, or any tags the repo might have.

The aim of this PR is to make additional GraphQL API requests to fetch additional information for the repositories we fetch via this endpoint.

Other approaches we have tried is to use the Search API to fetch public repositories, but pagination on the search API proves to be unreliable.

Additionally, this PR also changes the "internal" repositoryQuery to use a more reliable API instead of the search API

Test plan

Updated recorded tests

Changelog

  • The "internal" repositoryQuery for GitHub code host connections now use a more deterministic API that's less susceptible to missing repositories
  • The "public" repositoryQuery for GitHub code host connections now make additional requests to fetch missing repository details, like topics. This fixes an issue where repos added by the "public" repositoryQuery would have missing repo details

@cla-bot cla-bot Bot added the cla-signed label Jun 24, 2024
@github-actions github-actions Bot added team/product-platform team/source Tickets under the purview of Source - the one Source to graph it all labels Jun 24, 2024
@pjlast pjlast changed the title List internal repos by paginating through orgs Use GitHub GraphQL API to fetch missing info for public repos Jun 25, 2024
@pjlast pjlast marked this pull request as ready for review July 1, 2024 10:06
@pjlast pjlast requested a review from a team July 1, 2024 10:06
@pjlast pjlast changed the title Use GitHub GraphQL API to fetch missing info for public repos fix(code hosts): Use more deterministic API endpoints for GitHub code host connections Jul 1, 2024

@eseliger eseliger left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub can't render the diffs for the recordings, I trust that the changes there make sense

Comment thread internal/repos/github.go
defer cancel()

go func() {
s.listPublicArchivedRepos(archivedReposCtx, archivedReposChan)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this function should be unused now and can be removed

Comment thread internal/repos/github.go
Comment on lines +753 to +754
// The ListPublicRepositories endpoint returns incomplete information,
// so we make additional calls to get the full information of each repo.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish the method above would make that clearer by returning a separate type that doesn't have the fields we use - that way it would be impossible to forget to do this extra step anywhere.

but this isn't blocking.

Comment on lines +177 to +178
if countArchived != 2 {
t.Errorf("unexpected archived repo count, wanted: 2, but got: %d", countArchived)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did this change?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our ghe.sgdev.org instance changed since the previous recording

@pjlast pjlast merged commit 9b31f4e into main Jul 2, 2024
@pjlast pjlast deleted the petrilast/src-423-the-github-public-repositoryquery-option-needs-to-be branch July 2, 2024 14:25
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla-signed team/product-platform team/source Tickets under the purview of Source - the one Source to graph it all

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants