feat: Add Metrics and improve scheduler with DFS by yevgenypats · Pull Request #318 · cloudquery/plugin-sdk

yevgenypats · 2022-10-30T10:08:09Z

Trying to split this PR: #298 into smaller bits.

Metrics:

This introduce Metrics as stateful struct as it should be (instead of prop drilling) + GetMetrics Protobuf API

Scheduler:

I believed I've hit a bug/deadlock in our current scheduler and also for metrics to work I needed to move the scheduler from schema to plugins so I added that in the same PR.

Currently the user will specify only one variable concurrency and the scheduler will decide on how to split it between levels. For simplicity I kept it the same way as before with concurrency for only the first level.

Concurrent DFS will make sure there are no deadlocks and memory is always kept at O(goroutines) and o(h) (where h is height).

cloudquery/plugin-sdk#318

erezrokah

This looks good and nothing blocking.

Added a few comments but I don't think any are blocking

plugins/source.go

erezrokah · 2022-10-30T11:51:14Z

plugins/source_metrics.go

+	EndTime   time.Time
+}
+
+func (s *TableClientMetrics) Equal(other *TableClientMetrics) bool {


Is there a reason not to use https://pkg.go.dev/gotest.tools/assert#DeepEqual in tests? Or https://github.com/google/go-cmp?

If we need the Equal signature we can wrap a call to go-cmp

Same for the other Equal.

If the reason is to skip StartTime and EndTime looks like we can do it via google/go-cmp#143 (comment)

DeepEqual is super slow and can have unpredictable results depending on what you are trying to achieve so we want to define equal for every type. DeepEqual is usually used for testing purposes when you don't control or have access to some external struct.

My concern is that someone adds a field to the struct and then forgets to add it to the Equal.
i.e. We'll have to maintain this implementation. Not sure the performance hit is something that can really slow us down.

Where is the Equal function used?

we use it in tests but what it the issue of maintaining this and using a library that is used to compare structs which are not in control of the author?

If you look at SourceMetrics the Equal function is not that simple comparing the map but it ensure it will work and I can also add tests to that to ensure we don't forget to update that.

If it's only used in the tests I would say we don't need it as a part of the struct and just do the equality in the test (via library or a helper function).

Again this is not blocking for the PR, just seems there's already existing code we can use to compare structs in tests, so we don't need to re-implement it

+1 to what @erezrokah said; not a blocker, but I'd vote for go-cmp in tests rather than maintain equality operators if we don't need them in our (non-test) code.

plugins/source_scheduler_dfs.go

plugins/source_test.go

serve/destination.go

specs/source.go

schema/meta.go

Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>

plugins/source_metrics_test.go

🤖 I have created a release *beep* *boop* --- ## [0.13.15](v0.13.14...v0.13.15) (2022-10-30) ### Features * Add Metrics and improve scheduler with DFS ([#318](#318)) ([2d7a83b](2d7a83b)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

Should be merged after this SDK PR cloudquery/plugin-sdk#318

to go with cloudquery/plugin-sdk#320 and cloudquery/plugin-sdk#318

hermanschaaf

I read through, and it all looks good to me; the concurrency also looks good, though I'd like to tweak it a bit further once we have benchmarks in place in the coming weeks 👍

hermanschaaf · 2022-10-31T09:31:23Z

plugins/source_scheduler_dfs.go

+				defer wg.Done()
+				defer p.tableSem.Release(1)


nit: I think the ordering of the defers need to be swapped here to match the order of the Acquire and wg.Add operations (since defers get executed in reverse order)

hermanschaaf · 2022-10-31T09:36:29Z

plugins/source_scheduler_dfs.go

+		p.metrics.initWithClients(table, clients)
+		for _, client := range clients {
+			client := client
+			if err := p.tableSem.Acquire(ctx, 1); err != nil {


Just a passing comment, but since the semaphore is acquired inside the clients loop, technically the tableSem is more like a tableClientSem - so if you have multiple accounts and set table concurrency to 1, only one account will be resolved at a time. Not necessarily a bad thing, but maybe just worth keeping in mind / documenting.

This add a type system to CloudQuery SDK. This is mandatory to support multiple destinations. Also, this found quite a few bugs where we were sending random stuff over the wire without any validation apart from maybe when we were hitting the database and then failing batches all together. In CloudQuery type system I used heavily https://github.com/jackc/pgtype and kept the license and the copyright in it's own package `cqtype`. This is a continue of this PR #298 where I split it into this one and #318 Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com> Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com>

to go with cloudquery/plugin-sdk#320 and cloudquery/plugin-sdk#318

This is instead of #3176 SDK PRs: cloudquery/plugin-sdk#318 cloudquery/plugin-sdk#320 Previous related CloudQuery PRs: #3286 Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com> Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>

Should be merged after this SDK PR cloudquery/plugin-sdk#318

This is instead of cloudquery/cloudquery#3176 SDK PRs: cloudquery/plugin-sdk#318 cloudquery/plugin-sdk#320 Previous related CloudQuery PRs: cloudquery/cloudquery#3286 Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com> Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>

feat: Add Metrics and improve scheduler with DFS

702f073

yevgenypats requested review from erezrokah and hermanschaaf October 30, 2022 10:08

github-actions bot added the feat label Oct 30, 2022

yevgenypats added 5 commits October 30, 2022 12:11

fix lint

e1c5abf

add GetMetrics to client

03ce002

remove unused file

794b7ef

some more finetune to deprecated sync summary

70e6e57

fix concurrency param

d557f96

yevgenypats added a commit to cloudquery/cloudquery that referenced this pull request Oct 30, 2022

feat: Update all plugins to SDK with metrics and DFS scheduler

3d82d36

cloudquery/plugin-sdk#318

yevgenypats mentioned this pull request Oct 30, 2022

feat: Update all plugins to SDK with metrics and DFS scheduler cloudquery/cloudquery#3286

Merged

erezrokah approved these changes Oct 30, 2022

View reviewed changes

yevgenypats and others added 7 commits October 30, 2022 15:21

Update plugins/source.go

bb19570

Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>

Update plugins/source_scheduler_dfs.go

8256391

Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>

Update specs/source.go

5f9745a

Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>

address review

4ff6ebb

address more review

9152de3

Change Name to ID

5b03e54

add tests to source metrics

4dd2ba8

erezrokah reviewed Oct 30, 2022

View reviewed changes

plugins/source_metrics_test.go Show resolved Hide resolved

yevgenypats merged commit 2d7a83b into main Oct 30, 2022

yevgenypats deleted the feat/metrics_and_scheduling branch October 30, 2022 14:12

cq-bot mentioned this pull request Oct 30, 2022

chore(main): Release v0.13.15 #319

Merged

kodiakhq bot pushed a commit to cloudquery/cloudquery that referenced this pull request Oct 30, 2022

feat: Update all plugins to SDK with metrics and DFS scheduler (#3286)

a35b8e8

Should be merged after this SDK PR cloudquery/plugin-sdk#318

This was referenced Oct 30, 2022

feat: Add CQ type system to support multiple destinations #320

Merged

feat: Metrics and Type system #298

Closed

yevgenypats added a commit to cloudquery/cloudquery that referenced this pull request Oct 30, 2022

feat: Migrate cli, plugins and destinations to new type system

ada271d

to go with cloudquery/plugin-sdk#320 and cloudquery/plugin-sdk#318

yevgenypats mentioned this pull request Oct 30, 2022

feat: Migrate cli, plugins and destinations to new type system cloudquery/cloudquery#3323

Merged

hermanschaaf reviewed Oct 31, 2022

View reviewed changes

yevgenypats added a commit to cloudquery/cloudquery that referenced this pull request Oct 31, 2022

feat: Migrate cli, plugins and destinations to new type system

4839c76

to go with cloudquery/plugin-sdk#320 and cloudquery/plugin-sdk#318

yevgenypats added a commit to cloudquery/cloudquery that referenced this pull request Oct 31, 2022

feat: Migrate cli, plugins and destinations to new type system

6c9e08e

to go with cloudquery/plugin-sdk#320 and cloudquery/plugin-sdk#318

daniel-garcia pushed a commit to infobloxopen/ibcq-source-k8s that referenced this pull request Feb 24, 2026

feat: Update all plugins to SDK with metrics and DFS scheduler (#3286)

fff38f8

Should be merged after this SDK PR cloudquery/plugin-sdk#318

Conversation

yevgenypats commented Oct 30, 2022

Uh oh!

erezrokah left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

erezrokah Oct 30, 2022

Choose a reason for hiding this comment

Uh oh!

yevgenypats Oct 30, 2022

Choose a reason for hiding this comment

Uh oh!

erezrokah Oct 30, 2022

Choose a reason for hiding this comment

Uh oh!

yevgenypats Oct 30, 2022

Choose a reason for hiding this comment

Uh oh!

erezrokah Oct 30, 2022

Choose a reason for hiding this comment

Uh oh!

hermanschaaf Oct 31, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

hermanschaaf left a comment

Choose a reason for hiding this comment

Uh oh!

hermanschaaf Oct 31, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hermanschaaf Oct 31, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hermanschaaf Oct 31, 2022 •

edited

Loading