Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: cloudquery/plugin-sdk
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v4.64.1
Choose a base ref
...
head repository: cloudquery/plugin-sdk
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v4.65.0
Choose a head ref
  • 5 commits
  • 21 files changed
  • 4 contributors

Commits on Oct 2, 2024

  1. chore(deps): Update module github.com/cloudquery/plugin-sdk/v4 to v4.…

    …64.1 (#1917)
    
    This PR contains the following updates:
    
    | Package | Type | Update | Change |
    |---|---|---|---|
    | [github.com/cloudquery/plugin-sdk/v4](https://togithub.com/cloudquery/plugin-sdk) | require | patch | `v4.64.0` -> `v4.64.1` |
    
    ---
    
    ### Release Notes
    
    <details>
    <summary>cloudquery/plugin-sdk (github.com/cloudquery/plugin-sdk/v4)</summary>
    
    ### [`v4.64.1`](https://togithub.com/cloudquery/plugin-sdk/releases/tag/v4.64.1)
    
    [Compare Source](https://togithub.com/cloudquery/plugin-sdk/compare/v4.64.0...v4.64.1)
    
    ##### Bug Fixes
    
    -   Error handling in StreamingBatchWriter ([#&#8203;1913](https://togithub.com/cloudquery/plugin-sdk/issues/1913)) ([d852119](https://togithub.com/cloudquery/plugin-sdk/commit/d8521194dee50d93d74a7156ed607d442ab1db45))
    
    </details>
    
    ---
    
    ### Configuration
    
    📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).
    
    🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.
    
    ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
    
    🔕 **Ignore**: Close this PR and you won't be reminded about this update again.
    
    ---
    
     - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box
    
    ---
    
    This PR has been generated by [Renovate Bot](https://togithub.com/renovatebot/renovate).
    <!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNy40NDAuNyIsInVwZGF0ZWRJblZlciI6IjM3LjQ0MC43IiwidGFyZ2V0QnJhbmNoIjoibWFpbiIsImxhYmVscyI6WyJhdXRvbWVyZ2UiXX0=-->
    cq-bot authored Oct 2, 2024
    Configuration menu
    Copy the full SHA
    00b9d9a View commit details
    Browse the repository at this point in the history

Commits on Oct 3, 2024

  1. fix: Revert "fix: Error handling in StreamingBatchWriter" (#1918)

    Reverts #1913
    
    This broke come stuff, so reverting it to unblock SDK changes cloudquery/cloudquery#19312 (comment)
    erezrokah authored Oct 3, 2024
    Configuration menu
    Copy the full SHA
    38b4bfd View commit details
    Browse the repository at this point in the history
  2. feat: Implement RandomQueue scheduler strategy (#1914)

    This PR implements a new Scheduler Strategy based on a _Concurrent
    Random Queue_. It is based on @erezrokah 's Priority Queue Scheduler
    Strategy.
    
    ## How does it work
    
    This is hopefully a much simpler scheduling strategy. It doesn't have
    any semaphores; it just uses the existing concurrency setting.
    
    Table resolvers (and their relations) get `Push`ed into a work queue,
    and `concurrency` workers `Pull` from this queue, but they pull a random
    element from it.
    
    ## Why it should work better
    
    **The key benefit of this strategy is this:**
    - Assumption 1: most slow syncs are actually slow because of rate
    limits, not because of I/O limits or too much data.
    - Assumption 2: the meaty part of the sync is syncing relations, because
    each child table has a resolver per parent.
    - Benefit: because the likelihood of picking up a child resolver of a
    given table is uniformly distributed across the `int32` range, all
    relation API calls are evenly spread throughout the sync, thus optimally
    minimising rate limits!
    
    ## Does it work better?
    
    Still working on results. Notably AWS & Azure yield mixed results; still
    have to look into why.
    
    ### GCP
    
    **Before**
    
    ```
    $ cli sync .
    Loading spec(s) from .
    Starting sync for: gcp (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.0.7)]
    Sync completed successfully. Resources: 25799, Errors: 0, Warnings: 0, Time: 2m23s
    ```
    
    UPDATE: GCP is moving to Round Robin strategy, and it's much faster with
    this strategy:
    
    ```
    $ cli sync .
    Loading spec(s) from .
    Starting sync for: gcp (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.0.7)]
    Sync completed successfully. Resources: 26355, Errors: 0, Warnings: 0, Time: 40s
    ```
    
    **After**
    
    ```
    $ cli sync .
    Loading spec(s) from .
    Starting sync for: gcp (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.0.7)]
    Sync completed successfully. Resources: 26186, Errors: 0, Warnings: 0, Time: 34s
    ```
    
    **Result:  76.22% reduction in time, or 3.21 times faster.**
    **Result against Round Robin: 15% reduction in time, or 0.18 times
    faster (probably within margin of error)**
    
    ### BigQuery
    
    **Before**
    
    ```
    $ cli sync bigquery_to_postgresql.yaml
    Loading spec(s) from bigquery_to_postgresql.yaml
    Starting sync for: bigquery (cloudquery/bigquery@v1.7.0) -> [postgresql (cloudquery/postgresql@v8.6.0)]
    Sync completed successfully. Resources: 26139, Errors: 0, Warnings: 0, Time: 2m7s
    ```
    
    **After**
    
    ```
    $ cli sync bigquery_to_postgresql.yaml
    Loading spec(s) from bigquery_to_postgresql.yaml
    Starting sync for: bigquery (cloudquery/bigquery@v1.7.0) -> [postgresql (cloudquery/postgresql@v8.6.0)]
    Sync completed successfully. Resources: 26139, Errors: 0, Warnings: 0, Time: 1m26s
    ```
    
    **Result: 32.28% reduction in time, or 0.48 times faster**
    
    ### SentinelOne
    
    **Before** (it was already quite fast due to previous merged
    improvement)
    
    ```
    $ cli sync .
    Loading spec(s) from .
    Starting sync for: sentinelone (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.5.5)]
    Sync completed successfully. Resources: 1295, Errors: 0, Warnings: 0, Time: 15s
    ```
    
    **After**
    
    ```
    $ cli sync .
    Loading spec(s) from .
    Starting sync for: sentinelone (grpc@localhost:7777) -> [postgresql (cloudquery/postgresql@v8.5.5)]
    Sync completed successfully. Resources: 1295, Errors: 0, Warnings: 0, Time: 8s
    ```
    
    **Result: 46.67% reduction in time, or 0.875 times faster**
    
    ## How to test
    
    - Add a `go.mod` replace for sdk: `replace
    github.com/cloudquery/plugin-sdk/v4 =>
    github.com/cloudquery/plugin-sdk/v4
    v4.63.1-0.20241002131015-243705c940c6` (check last commit on this PR)
    - Run source plugin via grpc locally; make sure to configure the
    scheduler strategy to `scheduler.StrategyRandomQueue`.
    
    ## How scary is it to merge
    
    - This scheduler strategy is not used by any plugins by default, so in
    principle this should be safe to merge.
    
    ---------
    
    Co-authored-by: erezrokah <erezrokah@users.noreply.github.com>
    marianogappa and erezrokah authored Oct 3, 2024
    Configuration menu
    Copy the full SHA
    af8ac87 View commit details
    Browse the repository at this point in the history

Commits on Oct 4, 2024

  1. Configuration menu
    Copy the full SHA
    08e18e2 View commit details
    Browse the repository at this point in the history
  2. chore(main): Release v4.65.0 (#1919)

    🤖 I have created a release *beep* *boop*
    ---
    
    
    ## [4.65.0](v4.64.1...v4.65.0) (2024-10-04)
    
    
    ### Features
    
    * Implement RandomQueue scheduler strategy ([#1914](#1914)) ([af8ac87](af8ac87))
    
    
    ### Bug Fixes
    
    * Revert "fix: Error handling in StreamingBatchWriter" ([#1918](#1918)) ([38b4bfd](38b4bfd))
    * **tests:** WriterTestSuite.handleNulls should not overwrite columns ([#1920](#1920)) ([08e18e2](08e18e2))
    
    ---
    This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
    cq-bot authored Oct 4, 2024
    Configuration menu
    Copy the full SHA
    d51e172 View commit details
    Browse the repository at this point in the history
Loading