-
Notifications
You must be signed in to change notification settings - Fork 26
Permalink
Choose a base ref
{{ refName }}
default
Choose a head ref
{{ refName }}
default
Comparing changes
Choose two branches to see what’s changed or to start a new pull request.
If you need to, you can also or
learn more about diff comparisons.
Open a pull request
Create a new pull request by comparing changes across two branches. If you need to, you can also .
Learn more about diff comparisons here.
base repository: cloudquery/plugin-sdk
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 28ccca7
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
...
head repository: cloudquery/plugin-sdk
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: d51e172
Could not load branches
Nothing to show
Loading
Could not load tags
Nothing to show
{{ refName }}
default
Loading
- 5 commits
- 21 files changed
- 4 contributors
Commits on Oct 2, 2024
-
chore(deps): Update module github.com/cloudquery/plugin-sdk/v4 to v4.…
…64.1 (#1917) This PR contains the following updates: | Package | Type | Update | Change | |---|---|---|---| | [github.com/cloudquery/plugin-sdk/v4](https://togithub.com/cloudquery/plugin-sdk) | require | patch | `v4.64.0` -> `v4.64.1` | --- ### Release Notes <details> <summary>cloudquery/plugin-sdk (github.com/cloudquery/plugin-sdk/v4)</summary> ### [`v4.64.1`](https://togithub.com/cloudquery/plugin-sdk/releases/tag/v4.64.1) [Compare Source](https://togithub.com/cloudquery/plugin-sdk/compare/v4.64.0...v4.64.1) ##### Bug Fixes - Error handling in StreamingBatchWriter ([#​1913](https://togithub.com/cloudquery/plugin-sdk/issues/1913)) ([d852119](https://togithub.com/cloudquery/plugin-sdk/commit/d8521194dee50d93d74a7156ed607d442ab1db45)) </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate). <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJhdXRvbWVyZ2UiXX0=-->
Configuration menu - View commit details
-
Copy full SHA for 00b9d9a - Browse repository at this point
Copy the full SHA 00b9d9aView commit details
Commits on Oct 3, 2024
-
fix: Revert "fix: Error handling in StreamingBatchWriter" (#1918)
Reverts #1913 This broke come stuff, so reverting it to unblock SDK changes cloudquery/cloudquery#19312 (comment)
Configuration menu - View commit details
-
Copy full SHA for 38b4bfd - Browse repository at this point
Copy the full SHA 38b4bfdView commit details -
feat: Implement RandomQueue scheduler strategy (#1914)
This PR implements a new Scheduler Strategy based on a _Concurrent Random Queue_. It is based on @erezrokah 's Priority Queue Scheduler Strategy. ## How does it work This is hopefully a much simpler scheduling strategy. It doesn't have any semaphores; it just uses the existing concurrency setting. Table resolvers (and their relations) get `Push`ed into a work queue, and `concurrency` workers `Pull` from this queue, but they pull a random element from it. ## Why it should work better **The key benefit of this strategy is this:** - Assumption 1: most slow syncs are actually slow because of rate limits, not because of I/O limits or too much data. - Assumption 2: the meaty part of the sync is syncing relations, because each child table has a resolver per parent. - Benefit: because the likelihood of picking up a child resolver of a given table is uniformly distributed across the `int32` range, all relation API calls are evenly spread throughout the sync, thus optimally minimising rate limits! ## Does it work better? Still working on results. Notably AWS & Azure yield mixed results; still have to look into why. ### GCP **Before** ``` $ cli sync . Loading spec(s) from . Starting sync for: gcp (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.0.7)] Sync completed successfully. Resources: 25799, Errors: 0, Warnings: 0, Time: 2m23s ``` UPDATE: GCP is moving to Round Robin strategy, and it's much faster with this strategy: ``` $ cli sync . Loading spec(s) from . Starting sync for: gcp (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.0.7)] Sync completed successfully. Resources: 26355, Errors: 0, Warnings: 0, Time: 40s ``` **After** ``` $ cli sync . Loading spec(s) from . Starting sync for: gcp (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.0.7)] Sync completed successfully. Resources: 26186, Errors: 0, Warnings: 0, Time: 34s ``` **Result: 76.22% reduction in time, or 3.21 times faster.** **Result against Round Robin: 15% reduction in time, or 0.18 times faster (probably within margin of error)** ### BigQuery **Before** ``` $ cli sync bigquery_to_postgresql.yaml Loading spec(s) from bigquery_to_postgresql.yaml Starting sync for: bigquery (cloudquery/bigquery@v1.7.0) -> [postgresql (cloudquery/postgresql@v8.6.0)] Sync completed successfully. Resources: 26139, Errors: 0, Warnings: 0, Time: 2m7s ``` **After** ``` $ cli sync bigquery_to_postgresql.yaml Loading spec(s) from bigquery_to_postgresql.yaml Starting sync for: bigquery (cloudquery/bigquery@v1.7.0) -> [postgresql (cloudquery/postgresql@v8.6.0)] Sync completed successfully. Resources: 26139, Errors: 0, Warnings: 0, Time: 1m26s ``` **Result: 32.28% reduction in time, or 0.48 times faster** ### SentinelOne **Before** (it was already quite fast due to previous merged improvement) ``` $ cli sync . Loading spec(s) from . Starting sync for: sentinelone (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.5.5)] Sync completed successfully. Resources: 1295, Errors: 0, Warnings: 0, Time: 15s ``` **After** ``` $ cli sync . Loading spec(s) from . Starting sync for: sentinelone (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.5.5)] Sync completed successfully. Resources: 1295, Errors: 0, Warnings: 0, Time: 8s ``` **Result: 46.67% reduction in time, or 0.875 times faster** ## How to test - Add a `go.mod` replace for sdk: `replace github.com/cloudquery/plugin-sdk/v4 => github.com/cloudquery/plugin-sdk/v4 v4.63.1-0.20241002131015-243705c940c6` (check last commit on this PR) - Run source plugin via grpc locally; make sure to configure the scheduler strategy to `scheduler.StrategyRandomQueue`. ## How scary is it to merge - This scheduler strategy is not used by any plugins by default, so in principle this should be safe to merge. --------- Co-authored-by: erezrokah <erezrokah@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for af8ac87 - Browse repository at this point
Copy the full SHA af8ac87View commit details
Commits on Oct 4, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 08e18e2 - Browse repository at this point
Copy the full SHA 08e18e2View commit details -
chore(main): Release v4.65.0 (#1919)
🤖 I have created a release *beep* *boop* --- ## [4.65.0](v4.64.1...v4.65.0) (2024-10-04) ### Features * Implement RandomQueue scheduler strategy ([#1914](#1914)) ([af8ac87](af8ac87)) ### Bug Fixes * Revert "fix: Error handling in StreamingBatchWriter" ([#1918](#1918)) ([38b4bfd](38b4bfd)) * **tests:** WriterTestSuite.handleNulls should not overwrite columns ([#1920](#1920)) ([08e18e2](08e18e2)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Configuration menu - View commit details
-
Copy full SHA for d51e172 - Browse repository at this point
Copy the full SHA d51e172View commit details
Loading
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff 28ccca7...d51e172