Skip to content

feat: Metrics and Type system#298

Closed
yevgenypats wants to merge 35 commits intomainfrom
feat/stats
Closed

feat: Metrics and Type system#298
yevgenypats wants to merge 35 commits intomainfrom
feat/stats

Conversation

@yevgenypats
Copy link
Copy Markdown
Contributor

@yevgenypats yevgenypats commented Oct 16, 2022

While fixing the PostgreSQL performance bugs found quite a few issues and rabbit holes we need to take care of (This is the quick fix around order of columns which should be good for now - cloudquery/cloudquery#2887).

Few change are required to write protocol:

  • First message should contain tables schema (just like in migration) as this can be useful for writes to improve performance and in general it is not expensive as every message after that can contain only the values (without columns) - this should also save some bandwidth and solve the ordering bug out of the box.
  • we have an initial start of stats, metrics but this started to go clunky so I already implement a Stats/Metrics API which clients can now query on their schedule and also we can push those in the future to various places like Prometheus and so on.
  • While working on stats, I saw that the Resolver function is in the incorrect place so I had to move it to plugins as otherwise things like stats/metrics wont be possible to implement.

Because those are major changes in the protocol including incompatible protobuf changes, I changed the application level to 2.

Also found number of concurrency issues and graceful shutdown not working that is fixed in this PR.

Another big missing piece I found that is in this PR:

  • Our typing system: Is quite broken: There is no standard interface for types, verification is all around the place. In general it should look something like this - https://github.com/jackc/pgx/tree/master/pgtype but for cqtypes. I ended up reusing pgtype for our use-cases as this library is battle tested and kept the license and the copyright.

Also, Implemented concurrent dfs scheduler to make sure memory is limited by O(h) (h is height of the tree/tables).

@github-actions github-actions bot added the feat label Oct 16, 2022
@yevgenypats yevgenypats marked this pull request as draft October 16, 2022 20:58
@github-actions github-actions bot added feat and removed feat labels Oct 16, 2022
yevgenypats added a commit to cloudquery/cloudquery that referenced this pull request Oct 22, 2022
yevgenypats added a commit to cloudquery/cloudquery that referenced this pull request Oct 22, 2022
@yevgenypats yevgenypats mentioned this pull request Oct 27, 2022
5 tasks
@yevgenypats yevgenypats marked this pull request as ready for review October 28, 2022 06:53
@github-actions github-actions bot added feat and removed feat labels Oct 28, 2022
@yevgenypats yevgenypats requested a review from disq October 28, 2022 09:13
@yevgenypats yevgenypats changed the title feat: Add stats feat: Metrics and Type system Oct 28, 2022
@github-actions github-actions bot added feat and removed feat labels Oct 28, 2022
yevgenypats added a commit that referenced this pull request Oct 30, 2022
Trying to split this PR:
#298 into smaller bits.

Metrics:

- This introduce Metrics as stateful struct as it should be (instead of
prop drilling) + GetMetrics Protobuf API

Scheduler:

- I believed I've hit a bug/deadlock in our current scheduler and also
for metrics to work I needed to move the scheduler from `schema` to
`plugins` so I added that in the same PR.

Currently the user will specify only one variable `concurrency` and the
scheduler will decide on how to split it between levels. For simplicity
I kept it the same way as before with concurrency for only the first
level.

Concurrent DFS will make sure there are no deadlocks and memory is
always kept at O(goroutines) and o(h) (where h is height).

Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>
@yevgenypats
Copy link
Copy Markdown
Contributor Author

Closing as this will continue:

#320
#318

yevgenypats added a commit that referenced this pull request Oct 31, 2022
This add a type system to CloudQuery SDK.

This is mandatory to support multiple destinations.

Also, this found quite a few bugs where we were sending random stuff
over the wire without any validation apart from maybe when we were
hitting the database and then failing batches all together.

In CloudQuery type system I used heavily https://github.com/jackc/pgtype
and kept the license and the copyright in it's own package `cqtype`.

This is a continue of this PR
#298 where I split it into
this one and #318

Co-authored-by: Erez Rokah <erezrokah@users.noreply.github.com>
Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant