feat: Improve destination Read API #542

hermanschaaf · 2022-12-28T17:16:45Z

This adds new options to the Read API to:

limit the number of records returned
order the returned records by a set of columns
select only rows that match a given value
select only a subset of columns

I have chosen not to support filtering using operators (like greater than, less than) as an option for now, as it's a complicated and we don't have a strong need for it. We may want to add it later to make it feature-complete.

This update allows us to fix cloudquery/cloudquery#6069 (will be done in another PR to the CloudQuery repo) and should also allow for cursor-based syncing (#413). This is not implemented as part of this PR though.

hermanschaaf · 2022-12-28T17:19:14Z

Here is a draft PR for the cloudquery repo to show what usage would look like: cloudquery/cloudquery#6107

hermanschaaf · 2022-12-28T17:19:46Z

~~I see the unit tests are failing: will mark this as a draft for now and look into it tomorrow~~ Updated

github-actions · 2022-12-29T10:49:13Z

⏱️ Benchmark results

Comparing with dcd8834

DefaultConcurrency-2 resources/s: 11,633 ⬇️ 0.46% decrease vs. dcd8834
Glob-2 ns/op: 156.9 ⬆️ 6.37% increase vs. dcd8834
TablesWithChildrenDefaultConcurrency-2 resources/s: 32,679 ⬆️ 4.65% increase vs. dcd8834
BufferedScanner-2 ns/op: 10.1 ⬆️ 7.15% increase vs. dcd8834
LogReader-2 ns/op: 30.76 ⬆️ 0.78% increase vs. dcd8834

…to update-read-api

yevgenypats · 2022-12-29T12:25:54Z

I like the interface but have a few questions/thought:

Do we really need the LessThan for all types? also for some types Im not sure there is such a think, like for json or for bytea. can we maybe have additional api for destinations called ReadOne which gets and ID? This way it will be consistent? and can also serve for the cursor based thing?

hermanschaaf · 2022-12-29T12:56:11Z

@yevgenypats Indeed, the meaning of LessThan is unclear for some types, but I don't foresee us using those any time soon, because just like with a JSON database column, sorting on it would be ill-advised. I'm not sure I want to necessarily panic there though. In these cases any kind of ordering is okay, as long as its consistent.

A single ReadOne function, though simpler, will not be enough. I see your point and it could work for the testing case in isolation, because we already know the IDs we want to read (because we just wrote them). But then the test wouldn't be able to assert that the correct number of records were inserted. Maybe we can live with this.

However, when we get to cursor-based syncing, we don't know the ID we're looking for up-front. Let's take a few examples:

message history for the Slack plugin: here we need to fetch the latest ts (timestamp) that was inserted into the database in a previous sync. The only information we have before reading is the source name. There is no one ID that we can look up by, but we can get the latest timestamp by issuing an order by ts desc limit 1 read query.
service stats for the Fastly plugin: here we need to know the latest start_time for a given service that was inserted in a previous sync. So we'll need to issue an where service_id = $1 order by start_time limit 1 query (aside: this made me realize that we will need some filtering capabilities soon after all)
stories for a hypothetical Hacker News plugin: here we need to know the latest story ID that was inserted into the database. We'd want to issue a order by id desc limit 1 read query.

The important thing is that we have OrderBy. To implement OrderBy for the memory db and file types like CSV and JSON, we need LessThan on the types. For databases we can rely on the built-in ORDER BY functionality.

erezrokah

Added a few comments, but I think I'm missing something regarding the reason this is required for testing purposes.
Could we update the equality check in the tests to ignore the order of the results instead?

I get that we might need this for a future feature (curser based fetching), but since the read is only used in tests, maybe we could find a solution that doesn't require adding functionality to non test related code?

As for cursor based fetching, maybe we could add the order by functionality once we implement that feature, as we might discover more requirements once we try to implement cursor based fetching.

Approving the code change regardless of ⬆️

erezrokah · 2023-01-01T10:03:54Z

Makefile

 .PHONY: lint
 lint:
-	golangci-lint run
+	golangci-lint run --config .golangci.yml


[non blocking] I don't think we need to specify the configuration here as golangci-lint looks for it by default https://golangci-lint.run/usage/configuration/#config-file

erezrokah · 2023-01-01T10:05:00Z

internal/memdb/memdb.go

+		}
+		sortedRes = append(sortedRes, row)
+	}
+	sort.Slice(sortedRes, func(i, j int) bool {


Suggested change

sort.Slice(sortedRes, func(i, j int) bool {

sort.SliceStable(sortedRes, func(i, j int) bool {

erezrokah · 2023-01-01T10:13:09Z

plugins/destination/plugin_testing.go


-	resourcesRead, err := p.readAll(ctx, tables[0], sourceName)
+	cqIDIndex := table.Columns.Index(schema.CqIDColumn.Name)
+	sort.Slice(resources, func(i, j int) bool {


Suggested change

sort.Slice(resources, func(i, j int) bool {

sort.SliceStable(resources, func(i, j int) bool {

hermanschaaf · 2023-01-02T08:36:11Z

@erezrokah Yeah, I was working on this with the eye on it being the first step towards cursor-based syncing (rather than doing it all in one or two massive PRs). However after seeing how it works out, I think we'll probably not go with this Read API approach and use a different interface to achieve cursor-based syncing.

I'll simplify this PR to focus on fixing the tests and mostly try and leave the Read API untouched.

Extracted from #542 to allow us to fix cloudquery/cloudquery#6069 Before this change, running cockroachDB tests: ``` Jan 2 11:03:08.760000 INF Query args=["testAppendSource474bca19-69ba-4a5a-8af0-3d890198a243"] commandTag="SELECT 2" module=pgx pid=1570743 sql="SELECT * FROM \"cq_test_write_append\" WHERE _cq_source_name = $1 order by _cq_sync_time asc" --- FAIL: TestPgPlugin (0.36s) --- FAIL: TestPgPlugin/TestWriteOverwrite (0.22s) client_test.go:20: expected first resource diff: value at 2 is(d1ebd2a374f14556852c4d5b23608201) TypeUUID while other is(3da5b2eafc814ecb973bcee770bb70e1) TypeUUID value at 3 is(d03dd7f8781a4dbb93aaebefb6aaba87) TypeUUID while other is(f16b7d3a0f85441daf30da5d2d8afb26) TypeUUID value at 7 is(758c419b5a1b4a46ba45499b402c6aaf) TypeUUID while other is(0f252e92137b4f55a856cbf97d43d098) TypeUUID FAIL FAIL github.com/cloudquery/cloudquery/plugins/destination/postgresql/client 0.640s ``` After this change: ``` ok github.com/cloudquery/cloudquery/plugins/destination/postgresql/client 0.662s ```

Improve destination Read API

b15558e

hermanschaaf requested a review from yevgenypats as a code owner December 28, 2022 17:16

github-actions bot added feat and removed feat labels Dec 28, 2022

hermanschaaf mentioned this pull request Dec 28, 2022

fix: Update destinations to use new Read API cloudquery/cloudquery#6107

Closed

hermanschaaf marked this pull request as draft December 28, 2022 17:19

Make test deterministic

0677d25

hermanschaaf added 2 commits December 29, 2022 11:11

Add tests for LessThan

9900b5e

Format file

0873435

hermanschaaf marked this pull request as ready for review December 29, 2022 11:28

hermanschaaf added 5 commits December 29, 2022 11:28

Merge branch 'main' into update-read-api

7c4c982

Use byte comparison for inet and add test

4a3a8b8

Merge branch 'update-read-api' of github.com:cloudquery/plugin-sdk in…

96ade2d

…to update-read-api

Add test for timestamp LessThan

7df85c0

Linting

09825cb

Add Filter option for exact matches

422eb37

github-actions bot added feat and removed feat labels Dec 29, 2022

Lint

a8a754c

erezrokah approved these changes Jan 1, 2023

View reviewed changes

hermanschaaf mentioned this pull request Jan 2, 2023

fix(testing): Sort results before comparison in tests #561

Merged

hermanschaaf closed this Jan 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Improve destination Read API #542

feat: Improve destination Read API #542

Uh oh!

hermanschaaf commented Dec 28, 2022 •

edited

Loading

Uh oh!

hermanschaaf commented Dec 28, 2022

Uh oh!

hermanschaaf commented Dec 28, 2022 •

edited

Loading

Uh oh!

github-actions bot commented Dec 29, 2022 •

edited

Loading

Uh oh!

yevgenypats commented Dec 29, 2022

Uh oh!

hermanschaaf commented Dec 29, 2022

Uh oh!

erezrokah left a comment

Uh oh!

erezrokah Jan 1, 2023

Uh oh!

erezrokah Jan 1, 2023

Uh oh!

erezrokah Jan 1, 2023

Uh oh!

hermanschaaf commented Jan 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	sort.Slice(sortedRes, func(i, j int) bool {
	sort.SliceStable(sortedRes, func(i, j int) bool {

	sort.Slice(resources, func(i, j int) bool {
	sort.SliceStable(resources, func(i, j int) bool {

feat: Improve destination Read API #542

feat: Improve destination Read API #542

Uh oh!

Conversation

hermanschaaf commented Dec 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hermanschaaf commented Dec 28, 2022

Uh oh!

hermanschaaf commented Dec 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Dec 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⏱️ Benchmark results

Uh oh!

yevgenypats commented Dec 29, 2022

Uh oh!

hermanschaaf commented Dec 29, 2022

Uh oh!

erezrokah left a comment

Choose a reason for hiding this comment

Uh oh!

erezrokah Jan 1, 2023

Choose a reason for hiding this comment

Uh oh!

erezrokah Jan 1, 2023

Choose a reason for hiding this comment

Uh oh!

erezrokah Jan 1, 2023

Choose a reason for hiding this comment

Uh oh!

hermanschaaf commented Jan 2, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hermanschaaf commented Dec 28, 2022 •

edited

Loading

hermanschaaf commented Dec 28, 2022 •

edited

Loading

github-actions bot commented Dec 29, 2022 •

edited

Loading