Skip to content

[Go] Extension Builder Interface #34453

@yevgenypats

Description

@yevgenypats

Describe the enhancement requested

(This was also send in the mailing list and discussed shortly with @zeroshade ). Copy of what was sent in the mailing list and a PR will quickly follow:

Hopefully this is the right place to ask. As some background I'm Yevgeny Pats, Founder @ CloudQuery . We are very interested in migrating our protocol and Go type system to Apache Arrow. Extensions are a critical part for us and thus I've the following questions on whether it's a usage problem on my end or something that is not yet available. I'll give here an example for Go but I believe the same issue exists in all libraries/languages.

Here is a public github gist.

What are the problems:

  • The problems are around the abstraction for the extension types. While I understand that the underlying storage needs to be supported in the library we don't have a way for extensions to provide its own builder which means the user needs to know how the extension type stores the type inside the binary. This creates a leaky abstraction and the need for various helper functions like UUIDToBinary
  • The other way is fine as you can have methods like ToUUID on top of the extension array. But this creates asymmetry in the abstraction.
  • Because we don't control the builder for extensions this cripples into other places like json and csv where we can't control marshalling (in the same way we control all other built-in types). So basically for extensions that use binary type as underlying storage in case of json and csv those will always be encoded as base64 which is not very useful (think about uuid, ip address, mac address).

The main point is that I think the right abstraction for extensions should provide all the apis (type, array, builder) just like built-in types, otherwise the abstraction is incomplete or "leaky". Of course we can still have limitations like the custom builder must use an underlying known storage (for it to work over ipc) but it can still control various other types like marshaling, unmarshaling, building, and so on.

Hopefully this gives enough context but would love to elaborate.

Thanks,
Yevgeny

Component(s)

Go

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions