Skip to content
This repository was archived by the owner on Sep 30, 2024. It is now read-only.

feat: incomplete datapoints can now resolve the affected repositories#62756

Merged
bahrmichael merged 13 commits into
mainfrom
bahrmichael/62578
May 27, 2024
Merged

feat: incomplete datapoints can now resolve the affected repositories#62756
bahrmichael merged 13 commits into
mainfrom
bahrmichael/62578

Conversation

@bahrmichael

@bahrmichael bahrmichael commented May 17, 2024

Copy link
Copy Markdown
Contributor

Closes https://github.com/sourcegraph/sourcegraph/issues/62578

For https://github.com/sourcegraph/sourcegraph/issues/62295

Previously with our GraphQL api, you couldn't figure out which repositories caused incomplete datapoints. With this change you can now provide an argument to the incompleteDatapoints to not aggregate points for repositories, and then resolve the repositories for each datapoint.

This PR is needed to help debug incomplete datapoints in Code Insights. When customers create Code Insights for a large number of repositories, it's hard to understand how big the impact of incomplete datapoints is, and which repositories those issues are coming from. If you don't have access to the logs it's basically impossible to isolate problematic repositories.

Queries work as before, when you don't add the aggregateRepositories=false parameter or resolve the repository.
Screenshot 2024-05-24 at 12 22 58

When you add the aggregateRepositories=false parameter and resolve the repository, you get individual datapoints for each repository that had a problem.

Screenshot 2024-05-24 at 12 22 34

If you set aggregateRepositories=true and attempt to resolve the repository, it will be null.
Screenshot 2024-05-24 at 12 22 43

Test plan

  • Existing code paths are covered by CI
  • I will add more tests if this approach is accepted

@bahrmichael

Copy link
Copy Markdown
Contributor Author

The license check is a known issue: https://sourcegraph.slack.com/archives/C04MYFW01NV/p1715937672950199

Comment thread cmd/frontend/graphqlbackend/insights.graphql Outdated
Comment thread internal/insights/store/store.go Outdated
By default, incomplete datapoints are aggregated across all repositories.
Setting this to false will allow resolving the repository.
"""
aggregateRepositories: Boolean = true

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: now that repositories is an array, do we still need this parameter? If a client doesn't care about the repository list, they can just exclude that from the list of fields in their query. Excluding this also removes the (documented but still maybe surprising) dependency between the repositories field and this argument

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! Thank you for the reminder. I was able to clean it up, and things seem to work as expected. Since I can't find any problematic to the store method, it should be good as long as CI passes.

Comment on lines +880 to +882
if repoId.Valid {
mappedRepoIds[i] = int(repoId.Int64)
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: the DB schema says repo_id is nullable, but that's kinda surprising to me. Do you understand why that is?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found https://github.com/sourcegraph/sourcegraph/pull/45282 which inserts null here and mentions global queries. The repoId and repoName should be available though, based on the types that this incomplete insert runs on. I haven't found any places where the repoId and repoName on RecordSeriesPointArgs are not set. Maybe it's to reduce the number of inserts for global queries?

In the backend documentation it sounds like there should also be repo information no matter if it's global or not.

https://github.com/sourcegraph/sourcegraph/blob/d4a6b274037c67e3b76250b2c67f7c80df34da51/doc/dev/background-information/insights/backend.md?plain=1#L144-L145

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find!

Not blocking, just thinking out loud to try to understand this better. What is a global code insight? It kinda makes sense that a global job wouldn't have a repo ID because it's running against everything, but when would we do that? Maybe there's a special case for an insight that only runs against public repositories, so we know that all users can view all the data, and don't need to keep track of which repo the points are for?

@bahrmichael bahrmichael enabled auto-merge (squash) May 27, 2024 10:27
@bahrmichael bahrmichael changed the title feat: incomplete datapoints can now resolve the affected repository feat: incomplete datapoints can now resolve the affected repositories May 27, 2024
@bahrmichael bahrmichael merged commit ff9ef6f into main May 27, 2024
@bahrmichael bahrmichael deleted the bahrmichael/62578 branch May 27, 2024 10:32
bahrmichael referenced this pull request in sourcegraph/docs May 31, 2024
This PR updates the documentation to explain how users can use a new
GraphQL field introduced with
https://github.com/sourcegraph/sourcegraph/pull/62756 to identify
repositories that cause incomplete datapoints.

For https://github.com/sourcegraph/sourcegraph/issues/62295

## Pull Request approval

Although pull request approval is not enforced for this repository in
order to reduce friction, merging without a review will generate a
ticket for the docs team to review your changes. So if possible, have
your pull request approved before merging.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Code insights: Extend GraphQL schema to expose repositories that cause incomplete datapoints

2 participants