Skip to content

feat: Resolve table relations in parallel#416

Merged
kodiakhq[bot] merged 2 commits intomainfrom
fetch-child-tables-concurrently
Nov 21, 2022
Merged

feat: Resolve table relations in parallel#416
kodiakhq[bot] merged 2 commits intomainfrom
fetch-child-tables-concurrently

Conversation

@hermanschaaf
Copy link
Copy Markdown
Contributor

This changes the resolver for table relations to work concurrently, like parent tables. To do so safely and without the risk of deadlock, we instantiate one semaphore per depth level. (The size of the semaphore is decreased logarithmically for each depth.)

This change will only improve performance for tables with child relations.

To compare the performance before and after, I used the benchmarks in #415

Before

goos: darwin
goarch: arm64
pkg: github.com/cloudquery/plugin-sdk/plugins
BenchmarkDefaultConcurrency-8                     	       1	     11957 resources/s	     12626 targetResources/s	73597280 B/op	 1032425 allocs/op
BenchmarkTablesWithChildrenDefaultConcurrency-8   	       1	       545.9 resources/s	     40606 targetResources/s	461622584 B/op	 6690105 allocs/op
PASS
ok  	github.com/cloudquery/plugin-sdk/plugins	150.596s

After

goos: darwin
goarch: arm64
pkg: github.com/cloudquery/plugin-sdk/plugins
BenchmarkDefaultConcurrency-8                     	       1	     11373 resources/s	     12626 targetResources/s	73692608 B/op	 1033614 allocs/op
BenchmarkTablesWithChildrenDefaultConcurrency-8   	       1	     30162 resources/s	     40606 targetResources/s	464285672 B/op	 6697508 allocs/op
PASS
ok  	github.com/cloudquery/plugin-sdk/plugins	6.241s

Analysis

This change focuses on the BenchmarkTablesWithChildrenDefaultConcurrency case. The change improves resources/s from 545.9 to 30162, an improvement of about 55x. Memory and allocs are mostly the same.

The small downward change in resources/s in the BenchmarkDefaultConcurrency case is likely due to the few additional allocs needed.

Closes #358

@yevgenypats
Copy link
Copy Markdown
Contributor

@hermanschaaf Really awesome tests! Im still reviewing and playing with it as quite a lot to intake but posting some thoughts in the meanwhile. Maybe add golang.org/x/perf/cmd/benchstat to also compare those in CI and catch if anything change the performance significantly as otherwise we would forget to run it

@yevgenypats
Copy link
Copy Markdown
Contributor

some tests! Im still reviewing and playing with it as quite a lot to intake but posting some thoughts in the meanwhile. Maybe add golang.org/x/perf/cmd/benchstat to also compare those in CI and catch if anything change the performance significantly as otherwise we would forget to run it

comment on the wrong thread :)

Copy link
Copy Markdown
Contributor

@yevgenypats yevgenypats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@kodiakhq kodiakhq bot merged commit aadbde9 into main Nov 21, 2022
@kodiakhq kodiakhq bot deleted the fetch-child-tables-concurrently branch November 21, 2022 15:47
hermanschaaf added a commit that referenced this pull request Nov 22, 2022
kodiakhq bot pushed a commit that referenced this pull request Nov 22, 2022
Reverts #416

Nothing wrong with it as far as we know, but we want to release it separately from #421
kodiakhq bot pushed a commit that referenced this pull request Nov 23, 2022
This changes the resolver for table relations to work concurrently, like parent tables. To do so safely and without the risk of deadlock, we instantiate one semaphore per depth level. (The size of the semaphore is decreased logarithmically for each depth.)

This change will only improve performance for tables with child relations.

To compare the performance before and after, I used the benchmarks in #415

## Before
```
goos: darwin
goarch: arm64
pkg: github.com/cloudquery/plugin-sdk/plugins
BenchmarkDefaultConcurrency-8                     	       1	     11957 resources/s	     12626 targetResources/s	73597280 B/op	 1032425 allocs/op
BenchmarkTablesWithChildrenDefaultConcurrency-8   	       1	       545.9 resources/s	     40606 targetResources/s	461622584 B/op	 6690105 allocs/op
PASS
ok  	github.com/cloudquery/plugin-sdk/plugins	150.596s
```

## After
```
goos: darwin
goarch: arm64
pkg: github.com/cloudquery/plugin-sdk/plugins
BenchmarkDefaultConcurrency-8                     	       1	     11373 resources/s	     12626 targetResources/s	73692608 B/op	 1033614 allocs/op
BenchmarkTablesWithChildrenDefaultConcurrency-8   	       1	     30162 resources/s	     40606 targetResources/s	464285672 B/op	 6697508 allocs/op
PASS
ok  	github.com/cloudquery/plugin-sdk/plugins	6.241s
```

## Analysis

This change focuses on the `BenchmarkTablesWithChildrenDefaultConcurrency` case. The change improves `resources/s` from 545.9 to 30162, an improvement of about 55x. Memory and allocs are mostly the same.

The small downward change in `resources/s` in the `BenchmarkDefaultConcurrency` case is likely due to the few additional allocs needed. 

Closes #358
Copy of #416 after it was reverted to schedule its release for another time.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Resolve relation tables in parallel

2 participants