Code monitors: fix a bunch of stuff#57546
Conversation
This removes the feature flag `cc-repo-aware-monitors` that has been enabled by default for more than a year.
| userID, orgID, err := graphqlbackend.UnmarshalNamespaceToIDs(args.Monitor.Namespace) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Just moved this up and out of the transaction to make it more obvious what needed to be in the transaction
| return err | ||
| } | ||
|
|
||
| if featureflag.FromContext(ctx).GetBoolOr("cc-repo-aware-monitors", true) { |
There was a problem hiding this comment.
This feature flag has been enabled by default for over a year
| // On subsequent runs, this allows us to treat all new repos or sets of args as something new that should | ||
| // be searched from the beginning. | ||
| func Snapshot(ctx context.Context, logger log.Logger, db database.DB, query string, monitorID int64) error { | ||
| func Snapshot(ctx context.Context, logger log.Logger, db database.DB, query string) (map[api.RepoID][]string, error) { |
There was a problem hiding this comment.
This is the biggest change. Rather than Snapshot saving the results of the snapshot in the database, it returns the set of commitIDs resolved for each searched repo. This makes it possible to run the "UpsertLastSearched" in a transaction even though Snapshot itself can't be run in a transaction itself because search jobs will use the transaction concurrently. It also, conveniently, makes it easier to test.
| if db.Handle().InTransaction() { | ||
| return nil, errors.New("Snapshot cannot be run in a transaction") | ||
| } |
There was a problem hiding this comment.
Rather than return a super cryptic error when the transaction is used concurrently (e.g. bad connection * conn busy), just fail up front if someone tries to run this in a transaction.
| // HACK(camdencheek): limit the concurrency of the commit search job | ||
| // because the db passed into this function might actually be a transaction | ||
| // and transactions cannot be used concurrently. | ||
| planJob = limitConcurrency(planJob) |
There was a problem hiding this comment.
Now that we explicitly don't support using a transaction here, it's fine to run the jobs for multiple repos concurrently so we can get rid of this hack.
| func gqlURL(queryName string) (string, error) { | ||
| u, err := url.Parse(internalapi.Client.URL) | ||
| if err != nil { | ||
| return "", err | ||
| } | ||
| u.Path = "/.internal/graphql" | ||
| u.RawQuery = queryName | ||
| return u.String(), nil |
| // NOTE(camdencheek): we want to always save the "last searched" commits | ||
| // because if we stream results, the user will get a notification for them | ||
| // whether or not there was an error and forcing a re-search will cause the | ||
| // user to get repeated notifications for the same commits. This makes code | ||
| // monitors look very broken, and should be avoided. |
There was a problem hiding this comment.
This is the primary tradeoff. Previously, we would retry searches that failed for whatever reason. This just skips commits that failed, ensuring we always make forward progress and don't notify users multiple times about the same commit. This means it's possible to miss some commits, but I think that's a fair trade for not spamming users and looking super broken, especially since many types of errors are not temporary.
A real solution here would probably look like retrying on error, but since we stream the results upwards, that's a lot more work than I'm willing to put in right now.
| it := repos.Iterator(ctx, j.RepoOpts) | ||
|
|
||
| p := pool.New().WithContext(ctx).WithMaxGoroutines(j.Concurrency).WithFirstError() | ||
| p := pool.New().WithContext(ctx).WithMaxGoroutines(4).WithFirstError() |
There was a problem hiding this comment.
now that concurrency doesn't need to be configurable, just inline the default
There was a problem hiding this comment.
For my own learning, is there a reason behind the number 4 for the concurrency value?
f57d7ea to
eb29c01
Compare
eb29c01 to
e2554ef
Compare
|
Codenotify: Notifying subscribers in CODENOTIFY files for diff f17dd8b...e2554ef.
|
- Fixes an issue where creating a monitor will fail due to concurrent transaction usage - Fixes an issue where a timeout will cause a monitor to continually get re-run - Cleans up some old feature flags - Cleans up some dead code (cherry picked from commit afd3156)
Code monitors: fix a bunch of stuff (#57546) - Fixes an issue where creating a monitor will fail due to concurrent transaction usage - Fixes an issue where a timeout will cause a monitor to continually get re-run - Cleans up some old feature flags - Cleans up some dead code (cherry picked from commit afd3156) Co-authored-by: Camden Cheek <camden@ccheek.com>
I apologize for the multi-focused PR, but a few of these fixes bled into each other.
Comments inline.
Test plan
Manually tested that monitors still work. Existing tests cover standard cases, and I updated the test that checks that a failure logs a message.