Conversation
This PR has the following changes to source plugin(s) tables:
|
plugins/source/hackernews/resources/services/items/items_fetch.go
Outdated
Show resolved
Hide resolved
erezrokah
left a comment
There was a problem hiding this comment.
Looks good, added a non blocking comment.
plugins/source/hackernews/resources/services/items/items_fetch.go
Outdated
Show resolved
Hide resolved
|
@yevgenypats @erezrokah @disq This is ready for another review 🙇 Still depends on cloudquery/plugin-sdk#569 which is why the tests are currently not passing, but I greatly simplified the resolver code using @erezrokah's suggestion to do it in batches. The resulting code is slightly less efficient, but still close enough that I think it's better to keep it this simple. |
erezrokah
left a comment
There was a problem hiding this comment.
Looks great, added a comment for your consideration (non blocking)
| // The important thing is that the state backend does not ensure that the cursor | ||
| // is strictly increasing--it is the responsibility of the resolver to ensure this. | ||
| for cursor < maxID { | ||
| endID := cursor + 1000 |
There was a problem hiding this comment.
| endID := cursor + 1000 | |
| endID := cursor + c.Spec.ItemConcurrency |
If we do this we won't have to g.SetLimit(c.Spec.ItemConcurrency). Also we keep the batches as small as possible so lower chances of a batch to fail
There was a problem hiding this comment.
I tried this, but it has a pretty big impact on performance, even after playing with different values for ItemConcurrency (on my machine: ~500 items/s falls to ~300 items/s). I think let's leave it at 1000 for now
A state store allows us to store cursors for #413. This first iteration introduces the interface and implements it for a local JSON storage backend. After some back and forth I decided to call it `backend`, similar to Terraform. Alternatives considered were `state` (confusing because it's not a single state) and `storage` (confusing because it's similar to `destination`). Since the interface is a generic key-value store, we lose the ability to know whether a table is incremental or not, and which columns are used as cursor keys if it is. I'm proposing that we add some fields to the `Table` definition to be able to at least document these aspects, but it's not enforced. See cloudquery/cloudquery#6336 for an example implementation using this new functionality.
yevgenypats
left a comment
There was a problem hiding this comment.
Looks great! Good idea on this plugin for cursor testing and I think can be a great post for Show HN :)
Maybe move https://github.com/hermanschaaf/hackernews to cloudquery/hackernews so anyone in the team can contribute easily if needed?
|
@yevgenypats Sure, I can migrate it to the CloudQuery org, just wasn't sure whether we want to be the maintainers of it? (Personally I'm fine with being the maintainter 😃 ) |
🤖 I have created a release *beep* *boop* --- ## 1.0.0 (2023-01-10) ### Features * Hacker News source plugin ([#6336](#6336)) ([183299e](183299e)) ### Bug Fixes * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.24.1 ([#6553](#6553)) ([392b848](392b848)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [2.1.0](cli-v2.0.31...cli-v2.1.0) (2023-01-10) ### Features * Hacker News source plugin ([#6336](#6336)) ([183299e](183299e)) ### Bug Fixes * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.21.0 ([#6382](#6382)) ([5baea40](5baea40)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.22.0 ([#6516](#6516)) ([b7e4e73](b7e4e73)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.23.0 ([#6522](#6522)) ([ce24f1d](ce24f1d)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.24.1 ([#6553](#6553)) ([392b848](392b848)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
This adds a Hacker News source plugin that is able to download all items from the Hacker News API. It can be useful, but is mainly meant to be for instructional purposes, as this is the first example of a plugin making use of the state backend for incremental syncing being introduced in cloudquery/plugin-sdk#569 (this PR depends on that one).