Conversation
erezrokah
left a comment
There was a problem hiding this comment.
Looks good (didn't test). So the TDLR for this PR is that we use Apache arrow type system instead of ours (extending types they don't support)?
A walkthrough will be great, and also seeing them accept some of our extended types as contributions
| case schema.TypeIntArray: | ||
| typ = arrow.ListOf(arrow.PrimitiveTypes.Int64) | ||
| case schema.TypeTimestamp: | ||
| typ = arrow.FixedWidthTypes.Timestamp_us |
There was a problem hiding this comment.
Maybe add a comment that us means micros seconds (and not US timezone)
There was a problem hiding this comment.
arrow type is named badly, should've been arrow.FixedWidthTypes.TimestampMicros
internal/cqarrow/mac.go
Outdated
| return b.Unmarshal(dec) | ||
| } | ||
|
|
||
| // UUIDArray is a simple array which is a FixedSizeBinary(16) |
There was a problem hiding this comment.
Same here: It looks like all the comments in this file that refer to FixedSizeBinary or UUID need to be updated (they were copied from UUID)
This should unblock cloudquery/filetypes#87 Add supports for timestamp to decode arrow string and for byte to decode base64 (backward compat is saved)
Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com>
Co-authored-by: Kemal <223029+disq@users.noreply.github.com>
Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com>
Co-authored-by: Herman Schaaf <hermanschaaf@gmail.com>
cded2e3 to
2ce1b28
Compare
| "github.com/google/go-cmp/cmp" | ||
| ) | ||
|
|
||
| func DiffSchema(sc, o *arrow.Schema) (string, bool) { |
There was a problem hiding this comment.
I think this is some dead code that can be removed?
| if diff := cmp.Diff(arrowSchema.String(), expecetdSchema.String()); diff != "" { | ||
| t.Errorf(diff) | ||
| } | ||
| // if diff, _ := DiffSchema(arrowSchema, expecetdSchema); diff != "" { |
There was a problem hiding this comment.
(The only use of DiffSchema is here and it's commented out)
| b.AppendNull() | ||
| return | ||
| } | ||
| bytes, err := json.Marshal(v) |
There was a problem hiding this comment.
I'm wondering if we need to use json.Encoder here as well, like we had to in plugin-sdk (https://github.com/cloudquery/plugin-sdk/pull/714/files)?
hermanschaaf
left a comment
There was a problem hiding this comment.
LGTM - I tested it against the old json filetypes plugin for a couple of sources and the output was the byte-for-byte the same 👍 We may want to consider using json.Encoder but I think it's not a blocker
🤖 I have created a release *beep* *boop* --- ## [1.5.0](v1.4.2...v1.5.0) (2023-03-06) ### Features * Arrow migration ([#87](#87)) ([135f022](135f022)) ### Bug Fixes * **deps:** Update golang.org/x/xerrors digest to 04be3eb ([#78](#78)) ([5d47135](5d47135)) * **deps:** Update module github.com/apache/thrift to v0.18.0 ([#83](#83)) ([bce51fa](bce51fa)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.38.1 ([#72](#72)) ([0eadcfd](0eadcfd)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.38.2 ([#74](#74)) ([90bdbba](90bdbba)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.39.0 ([#75](#75)) ([b65db2e](b65db2e)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.39.1 ([#76](#76)) ([c03025e](c03025e)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.40.0 ([#77](#77)) ([7ec1df0](7ec1df0)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.41.0 ([#86](#86)) ([5b89f33](5b89f33)) * **deps:** Update module github.com/cloudquery/plugin-sdk to v1.42.0 ([#88](#88)) ([99da3b2](99da3b2)) * **deps:** Update module github.com/klauspost/compress to v1.16.0 ([#84](#84)) ([a9c9da3](a9c9da3)) * **deps:** Update module github.com/pierrec/lz4/v4 to v4.1.17 ([#79](#79)) ([fbc42d4](fbc42d4)) * **deps:** Update module github.com/stretchr/testify to v1.8.2 ([#80](#80)) ([b080141](b080141)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Blocked by cloudquery/plugin-sdk#724
but ready for initial review and discussion. I can do a short walkthrough for anyone up for review.
Apache arrow fork is here (We use it until this is merged): https://github.com/cloudquery/arrow/tree/feat_extension_builder.
Some more notes: