feat: Add scheduler option and introduce Round Robin scheduler#545
feat: Add scheduler option and introduce Round Robin scheduler#545hermanschaaf merged 15 commits intomainfrom
Conversation
⏱️ Benchmark resultsComparing with f6818cb
|
| for _, relation := range resource.Table.Relations { | ||
| relation := relation | ||
| if err := p.tableSems[depth].Acquire(ctx, 1); err != nil { | ||
| // This means context was cancelled |
There was a problem hiding this comment.
An opportunity to use goto (add a label in L196 where it wg.Wait()s and the fn ends) was missed here... (could also add a label to the outer loop and do break myLabel)
| p.resolveTableRoundRobin(ctx, table, cl, nil, resolvedResources, 1) | ||
| // Round Robin currently uses the DFS algorithm to resolve the tables, but this | ||
| // may change in the future. | ||
| p.resolveTableDfs(ctx, table, cl, nil, resolvedResources, 1) |
| @@ -0,0 +1,120 @@ | |||
| package source | |||
There was a problem hiding this comment.
Everything in this file was moved from scheduler_dfs.go with no changes
Yes, correct. I'm working on the assumption that the same rate limits will probably be applied to a table and its children (this might not always be true) |
candiduslynx
left a comment
There was a problem hiding this comment.
several comments to be considered+ a couple of nits
candiduslynx
left a comment
There was a problem hiding this comment.
only several more-or-less-nit comments
yevgenypats
left a comment
There was a problem hiding this comment.
Looks good but I think there is a bug in the interleave logic which skips some clients in some cases.
| } | ||
| c++ | ||
| if !addedNew { | ||
| break |
There was a problem hiding this comment.
Im not sure this will work. what if one table has less clients then the others? then it wont go to add others.
There was a problem hiding this comment.
The code was written to take that case into account, so I think it should work. The algorithm is roughly like this:
set counter to 1
add first client for all tables with >= 1 clients
set counter to 2
add second client for all tables with >= 2 clients
...
set counter to N
add Nth client for all tables with >= N clients
set counter to N+1
stop because no tables are found to have >= N+1 clients
I believe it handles all cases, but I've extracted this into a function now and added a test for it, so we can be sure.
🤖 I have created a release *beep* *boop* --- ## [1.19.0](v1.18.0...v1.19.0) (2023-01-05) ### Features * Add scheduler option and introduce Round Robin scheduler ([#545](#545)) ([d89a911](d89a911)) * Add unwrap option to transformations ([#573](#573)) ([a17ee4b](a17ee4b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
This adds an optional Scheduler option to source configs. It still defaults to DFS, our current scheduler. However, for cases like GCP, alternative scheduling algorithms may be beneficial for large deployments (as explained in cloudquery/cloudquery#5389). For this reason, this change also introduces a new (currently opt-in only) scheduler called
round-robin.A new benchmark scenario has been added to showcase when round robin really shines. In this scenario, strict global rate limits mean that waiting for each table to finish all of its clients is not the most efficient schedule. Instead, a schedule that interleaves the clients for all tables (as round robin does) performs near optimally, and the benchmarks show this difference:
In the
BenchmarkTablesWithRateLimitingcase, Round Robin completes in just over 1s, DFS completes in 35s.Notes for Reviewers
targetResources/sfrom the benchmarks in this PR. They were useful once, but with rate limiting scenarios it's becoming too difficult to maintain accuracy.