Skip to content

feat: Add scheduler option and introduce Round Robin scheduler#545

Merged
hermanschaaf merged 15 commits intomainfrom
round-robin
Jan 4, 2023
Merged

feat: Add scheduler option and introduce Round Robin scheduler#545
hermanschaaf merged 15 commits intomainfrom
round-robin

Conversation

@hermanschaaf
Copy link
Copy Markdown
Contributor

This adds an optional Scheduler option to source configs. It still defaults to DFS, our current scheduler. However, for cases like GCP, alternative scheduling algorithms may be beneficial for large deployments (as explained in cloudquery/cloudquery#5389). For this reason, this change also introduces a new (currently opt-in only) scheduler called round-robin.

A new benchmark scenario has been added to showcase when round robin really shines. In this scenario, strict global rate limits mean that waiting for each table to finish all of its clients is not the most efficient schedule. Instead, a schedule that interleaves the clients for all tables (as round robin does) performs near optimally, and the benchmarks show this difference:

BenchmarkDefaultConcurrencyDFS
BenchmarkDefaultConcurrencyDFS-8              	       1	     10835 resources/s	73788592 B/op	 1057929 allocs/op
BenchmarkDefaultConcurrencyRoundRobin
BenchmarkDefaultConcurrencyRoundRobin-8       	       2	     11276 resources/s	72121956 B/op	 1054439 allocs/op
BenchmarkTablesWithChildrenDFS
BenchmarkTablesWithChildrenDFS-8              	       1	     31900 resources/s	465282072 B/op	 6859524 allocs/op
BenchmarkTablesWithChildrenRoundRobin
BenchmarkTablesWithChildrenRoundRobin-8       	       1	     33276 resources/s	463475144 B/op	 6856392 allocs/op
BenchmarkTablesWithRateLimitingDFS
BenchmarkTablesWithRateLimitingDFS-8          	       1	        28.30 resources/s	 9098456 B/op	  132129 allocs/op
BenchmarkTablesWithRateLimitingRoundRobin
BenchmarkTablesWithRateLimitingRoundRobin-8   	       1	       821.5 resources/s	 8213856 B/op	  112748 allocs/op
PASS

In the BenchmarkTablesWithRateLimiting case, Round Robin completes in just over 1s, DFS completes in 35s.

Notes for Reviewers

  • No functional changes to the DFS scheduler were made in this PR, but some functions were moved to a different file so they can be re-used by Round Robin.
  • I'm also removing targetResources/s from the benchmarks in this PR. They were useful once, but with rate limiting scenarios it's becoming too difficult to maintain accuracy.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Dec 30, 2022

⏱️ Benchmark results

Comparing with f6818cb

  • DefaultConcurrencyDFS-2 resources/s: 11,474
  • DefaultConcurrencyRoundRobin-2 resources/s: 12,402
  • Glob-2 ns/op: 158.2 ⬇️ 22.95% decrease vs. f6818cb
  • TablesWithChildrenDFS-2 resources/s: 30,937
  • TablesWithChildrenRoundRobin-2 resources/s: 29,841
  • TablesWithRateLimitingDFS-2 resources/s: 28.4
  • TablesWithRateLimitingRoundRobin-2 resources/s: 831.9
  • BufferedScanner-2 ns/op: 10.11 ⬇️ 18.89% decrease vs. f6818cb
  • LogReader-2 ns/op: 30.54 ⬇️ 24.10% decrease vs. f6818cb

for _, relation := range resource.Table.Relations {
relation := relation
if err := p.tableSems[depth].Acquire(ctx, 1); err != nil {
// This means context was cancelled
Copy link
Copy Markdown
Member

@disq disq Dec 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An opportunity to use goto (add a label in L196 where it wg.Wait()s and the fn ends) was missed here... (could also add a label to the outer loop and do break myLabel)

p.resolveTableRoundRobin(ctx, table, cl, nil, resolvedResources, 1)
// Round Robin currently uses the DFS algorithm to resolve the tables, but this
// may change in the future.
p.resolveTableDfs(ctx, table, cl, nil, resolvedResources, 1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@@ -0,0 +1,120 @@
package source
Copy link
Copy Markdown
Contributor Author

@hermanschaaf hermanschaaf Dec 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything in this file was moved from scheduler_dfs.go with no changes

Copy link
Copy Markdown
Member

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great 🚀 To confirm, this only does round robin at the top level correct?
(Not that we need to do it for relations, just wondering)

@hermanschaaf
Copy link
Copy Markdown
Contributor Author

hermanschaaf commented Jan 2, 2023

@erezrokah To confirm, this only does round robin at the top level correct?

Yes, correct. I'm working on the assumption that the same rate limits will probably be applied to a table and its children (this might not always be true)

Copy link
Copy Markdown
Contributor

@candiduslynx candiduslynx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

several comments to be considered+ a couple of nits

Copy link
Copy Markdown
Contributor

@candiduslynx candiduslynx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only several more-or-less-nit comments

Copy link
Copy Markdown
Contributor

@yevgenypats yevgenypats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good but I think there is a bug in the interleave logic which skips some clients in some cases.

}
c++
if !addedNew {
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure this will work. what if one table has less clients then the others? then it wont go to add others.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code was written to take that case into account, so I think it should work. The algorithm is roughly like this:

set counter to 1
add first client for all tables with >= 1 clients
set counter to 2
add second client for all tables with >= 2 clients
...
set counter to N
add Nth client for all tables with >= N clients
set counter to N+1
stop because no tables are found to have >= N+1 clients

I believe it handles all cases, but I've extracted this into a function now and added a test for it, so we can be sure.

@hermanschaaf hermanschaaf merged commit d89a911 into main Jan 4, 2023
@hermanschaaf hermanschaaf deleted the round-robin branch January 4, 2023 15:10
kodiakhq bot pushed a commit that referenced this pull request Jan 5, 2023
🤖 I have created a release *beep* *boop*
---


## [1.19.0](v1.18.0...v1.19.0) (2023-01-05)


### Features

* Add scheduler option and introduce Round Robin scheduler ([#545](#545)) ([d89a911](d89a911))
* Add unwrap option to transformations ([#573](#573)) ([a17ee4b](a17ee4b))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants