feat: incomplete datapoints can now resolve the affected repositories#62756
Conversation
|
The license check is a known issue: https://sourcegraph.slack.com/archives/C04MYFW01NV/p1715937672950199 |
| By default, incomplete datapoints are aggregated across all repositories. | ||
| Setting this to false will allow resolving the repository. | ||
| """ | ||
| aggregateRepositories: Boolean = true |
There was a problem hiding this comment.
Q: now that repositories is an array, do we still need this parameter? If a client doesn't care about the repository list, they can just exclude that from the list of fields in their query. Excluding this also removes the (documented but still maybe surprising) dependency between the repositories field and this argument
There was a problem hiding this comment.
Yes! Thank you for the reminder. I was able to clean it up, and things seem to work as expected. Since I can't find any problematic to the store method, it should be good as long as CI passes.
| if repoId.Valid { | ||
| mappedRepoIds[i] = int(repoId.Int64) | ||
| } |
There was a problem hiding this comment.
Q: the DB schema says repo_id is nullable, but that's kinda surprising to me. Do you understand why that is?
There was a problem hiding this comment.
I found https://github.com/sourcegraph/sourcegraph/pull/45282 which inserts null here and mentions global queries. The repoId and repoName should be available though, based on the types that this incomplete insert runs on. I haven't found any places where the repoId and repoName on RecordSeriesPointArgs are not set. Maybe it's to reduce the number of inserts for global queries?
In the backend documentation it sounds like there should also be repo information no matter if it's global or not.
There was a problem hiding this comment.
Good find!
Not blocking, just thinking out loud to try to understand this better. What is a global code insight? It kinda makes sense that a global job wouldn't have a repo ID because it's running against everything, but when would we do that? Maybe there's a special case for an insight that only runs against public repositories, so we know that all users can view all the data, and don't need to keep track of which repo the points are for?
This PR updates the documentation to explain how users can use a new GraphQL field introduced with https://github.com/sourcegraph/sourcegraph/pull/62756 to identify repositories that cause incomplete datapoints. For https://github.com/sourcegraph/sourcegraph/issues/62295 ## Pull Request approval Although pull request approval is not enforced for this repository in order to reduce friction, merging without a review will generate a ticket for the docs team to review your changes. So if possible, have your pull request approved before merging.
Closes https://github.com/sourcegraph/sourcegraph/issues/62578
For https://github.com/sourcegraph/sourcegraph/issues/62295
Previously with our GraphQL api, you couldn't figure out which repositories caused incomplete datapoints. With this change you can now provide an argument to the
incompleteDatapointsto not aggregate points for repositories, and then resolve the repositories for each datapoint.This PR is needed to help debug incomplete datapoints in Code Insights. When customers create Code Insights for a large number of repositories, it's hard to understand how big the impact of incomplete datapoints is, and which repositories those issues are coming from. If you don't have access to the logs it's basically impossible to isolate problematic repositories.
Queries work as before, when you don't add the

aggregateRepositories=falseparameter or resolve the repository.When you add the
aggregateRepositories=falseparameter and resolve the repository, you get individual datapoints for each repository that had a problem.If you set

aggregateRepositories=trueand attempt to resolve the repository, it will be null.Test plan