feat: New destination file plugin by yevgenypats · Pull Request #4902 · cloudquery/cloudquery

yevgenypats · 2022-11-22T09:12:08Z

New destination plugin file will support multiple format (csv, json) with multiple storage backends such as local, S3, GCS, Azure Blob Storage.

This is ready now for initial review.

disq

Good start, few nits and remarks.

Should consider adding in-line gzip and combined-file support once it's reviewed/merged/refactored before that blog post though :-) As you'll soon see high number of tables will be an issue when using with data lakes.

disq · 2022-11-23T08:13:50Z

plugins/destination/file/client/client.go

+	case BackendTypeS3:
+		awsCfg, err := config.LoadDefaultConfig(ctx)
+		if err != nil {
+			return nil, fmt.Errorf("unable to load AWS SDK config, %w", err)


Suggested change

return nil, fmt.Errorf("unable to load AWS SDK config, %w", err)

return nil, fmt.Errorf("unable to load AWS SDK config: %w", err)

err conversion is wrapper text: %w

disq · 2022-11-23T08:14:06Z

plugins/destination/file/client/client.go

+		var err error
+		c.gcpStorageClient, err = storage.NewClient(ctx)
+		if err != nil {
+			return nil, fmt.Errorf("failed to create GCP storage client %w", err)


Suggested change

return nil, fmt.Errorf("failed to create GCP storage client %w", err)

return nil, fmt.Errorf("failed to create GCP storage client: %w", err)

disq · 2022-11-23T08:15:54Z

plugins/destination/file/client/client.go

+		c.awsUploader = manager.NewUploader(awsClient)
+		c.awsDownloader = manager.NewDownloader(awsClient)
+
+		if _, err := c.awsUploader.Upload(ctx, &s3.PutObjectInput{


Test objects shouldn't be part of a "serious" release but they can stay for now.

disq · 2022-11-23T08:17:36Z

plugins/destination/file/client/json.go

+	defer f.Close()
+
+	for r := range resources {
+		b, err := json.Marshal(r)


In the future a drop in encoding/json replacement (eg. fastjson) can be used here.

disq · 2022-11-23T08:19:43Z

plugins/destination/file/client/json.go

+	"github.com/cloudquery/plugin-sdk/schema"
+)
+
+const maxJsonSize = 1024 * 1024 * 20


This is only used for reading/scanner yet the name doesn't suggest that. (Could also move the const defn inside the read method)

disq · 2022-11-23T08:21:21Z

plugins/destination/file/go.mod

+	github.com/rs/zerolog v1.28.0
+)
+
+replace github.com/cloudquery/plugin-sdk => ../../../../plugin-sdk-split


This needs to be removed before merge

disq · 2022-11-23T08:23:48Z

plugins/destination/file/main.go

+)
+
+const (
+	sentryDSN = "https://79d5e237dafe45e1a4ec0785bc528280@o1396617.ingest.sentry.io/4504083471335424"


Consider replacing this before release (belongs to the CSV dest)

yevgenypats · 2022-11-23T13:38:10Z

Closes: #4983

yevgenypats · 2022-11-23T22:16:26Z

plugins/destination/file/internal/backends/azure-blob-storage/file.go

@@ -0,0 +1,124 @@
+package azure_blob_storage


can we keep the directory also azure_blog_storage instead of azure-blog-storage ?

Added another option needed for destination testing. Needed by this PR: cloudquery/cloudquery#4902 Also, added a `FlattenTables` function for `Tables` type

This is a snowflake destination plugin. This support streaming directly to snowflake (via local csv uploads as streaming insert is not supporting all snowflake data-types as of right now). The streaming way (or psudo streaming) is useful to get started and to test stuff locally but for production usages it is recommended to use standard data-pipelines that uploads first csv/json to a remote storage (S3, GCS, Azure Blob...) and then via a periodic job or snowpipe to load it to Snowflake DB will be the most performant and cheap way to do at scale. For the latter, this PR needs to be implemented and tested #4902 Known Issue: Migrations are slow due to currently not using any batching to get all the tables. I've an idea on how to solve it but maybe will do it in a follow-up PR.

erezrokah

Added a few comments. I would also verify the steps in https://www.notion.so/cloudquery/Adding-a-new-plugin-to-the-monorepo-f216b653dbe648b2b3512fb8d59a8f89

erezrokah · 2022-11-28T14:47:18Z

plugins/destination/file/.gitignore

@@ -0,0 +1,2 @@
+file
+cq_csv_output


Suggested change

cq_csv_output

cq_file_output

erezrokah · 2022-11-28T14:47:24Z

plugins/destination/file/.goreleaser.yaml

+variables:
+  component: destination/csv
+  binary: csv
+
+project_name: plugins/destination/csv
+
+monorepo:
+  tag_prefix: plugins-destination-csv-
+  dir: plugins/destination/csv
+
+includes:
+  - from_file:
+      # Relative to the directory Go Releaser is run from (which is the root of the repository)
+      path: ./plugins/.goreleaser.yaml


Suggested change

variables:

component: destination/csv

binary: csv

project_name: plugins/destination/csv

monorepo:

tag_prefix: plugins-destination-csv-

dir: plugins/destination/csv

includes:

- from_file:

# Relative to the directory Go Releaser is run from (which is the root of the repository)

path: ./plugins/.goreleaser.yaml

variables:

component: destination/file

binary: file

project_name: plugins/destination/file

monorepo:

tag_prefix: plugins-destination-file-

dir: plugins/destination/file

includes:

- from_file:

# Relative to the directory Go Releaser is run from (which is the root of the repository)

path: ./plugins/.goreleaser.yaml

erezrokah · 2022-11-28T14:47:46Z

plugins/destination/file/CHANGELOG.md

@@ -0,0 +1,2 @@
+# Changelog


You don't need to add this, it will be created by release please

erezrokah · 2022-11-28T14:48:29Z