Skip to content

feat: Hacker News source plugin#6336

Merged
kodiakhq[bot] merged 17 commits intomainfrom
hackernews
Jan 9, 2023
Merged

feat: Hacker News source plugin#6336
kodiakhq[bot] merged 17 commits intomainfrom
hackernews

Conversation

@hermanschaaf
Copy link
Copy Markdown
Contributor

This adds a Hacker News source plugin that is able to download all items from the Hacker News API. It can be useful, but is mainly meant to be for instructional purposes, as this is the first example of a plugin making use of the state backend for incremental syncing being introduced in cloudquery/plugin-sdk#569 (this PR depends on that one).

@github-actions
Copy link
Copy Markdown

github-actions bot commented Jan 4, 2023

This PR has the following changes to source plugin(s) tables:

  • Table hackernews_items was added

Copy link
Copy Markdown
Member

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, added a non blocking comment.

@hermanschaaf
Copy link
Copy Markdown
Contributor Author

@yevgenypats @erezrokah @disq This is ready for another review 🙇 Still depends on cloudquery/plugin-sdk#569 which is why the tests are currently not passing, but I greatly simplified the resolver code using @erezrokah's suggestion to do it in batches. The resulting code is slightly less efficient, but still close enough that I think it's better to keep it this simple.

Copy link
Copy Markdown
Member

@erezrokah erezrokah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, added a comment for your consideration (non blocking)

// The important thing is that the state backend does not ensure that the cursor
// is strictly increasing--it is the responsibility of the resolver to ensure this.
for cursor < maxID {
endID := cursor + 1000
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
endID := cursor + 1000
endID := cursor + c.Spec.ItemConcurrency

If we do this we won't have to g.SetLimit(c.Spec.ItemConcurrency). Also we keep the batches as small as possible so lower chances of a batch to fail

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried this, but it has a pretty big impact on performance, even after playing with different values for ItemConcurrency (on my machine: ~500 items/s falls to ~300 items/s). I think let's leave it at 1000 for now

kodiakhq bot pushed a commit to cloudquery/plugin-sdk that referenced this pull request Jan 9, 2023
A state store allows us to store cursors for #413. This first iteration introduces the interface and implements it for a local JSON storage backend. After some back and forth I decided to call it `backend`, similar to Terraform. Alternatives considered were `state` (confusing because it's not a single state) and `storage` (confusing because it's similar to `destination`). 

Since the interface is a generic key-value store, we lose the ability to know whether a table is incremental or not, and which columns are used as cursor keys if it is. I'm proposing that we add some fields to the `Table` definition to be able to at least document these aspects, but it's not enforced.

See cloudquery/cloudquery#6336 for an example implementation using this new functionality.
Copy link
Copy Markdown
Contributor

@yevgenypats yevgenypats left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Good idea on this plugin for cursor testing and I think can be a great post for Show HN :)

Maybe move https://github.com/hermanschaaf/hackernews to cloudquery/hackernews so anyone in the team can contribute easily if needed?

@hermanschaaf
Copy link
Copy Markdown
Contributor Author

hermanschaaf commented Jan 9, 2023

@yevgenypats Sure, I can migrate it to the CloudQuery org, just wasn't sure whether we want to be the maintainers of it? (Personally I'm fine with being the maintainter 😃 )

@hermanschaaf hermanschaaf added the automerge Automatically merge once required checks pass label Jan 9, 2023
@kodiakhq kodiakhq bot merged commit 183299e into main Jan 9, 2023
@kodiakhq kodiakhq bot deleted the hackernews branch January 9, 2023 20:12
kodiakhq bot pushed a commit that referenced this pull request Jan 10, 2023
🤖 I have created a release *beep* *boop*
---


## 1.0.0 (2023-01-10)


### Features

* Hacker News source plugin ([#6336](#6336)) ([183299e](183299e))


### Bug Fixes

* **deps:** Update module github.com/cloudquery/plugin-sdk to v1.24.1 ([#6553](#6553)) ([392b848](392b848))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
kodiakhq bot pushed a commit that referenced this pull request Jan 10, 2023
🤖 I have created a release *beep* *boop*
---


## [2.1.0](cli-v2.0.31...cli-v2.1.0) (2023-01-10)


### Features

* Hacker News source plugin ([#6336](#6336)) ([183299e](183299e))


### Bug Fixes

* **deps:** Update module github.com/cloudquery/plugin-sdk to v1.21.0 ([#6382](#6382)) ([5baea40](5baea40))
* **deps:** Update module github.com/cloudquery/plugin-sdk to v1.22.0 ([#6516](#6516)) ([b7e4e73](b7e4e73))
* **deps:** Update module github.com/cloudquery/plugin-sdk to v1.23.0 ([#6522](#6522)) ([ce24f1d](ce24f1d))
* **deps:** Update module github.com/cloudquery/plugin-sdk to v1.24.1 ([#6553](#6553)) ([392b848](392b848))

---
This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

automerge Automatically merge once required checks pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants