GH-43624: [Go] Add JSON/UUID extension types, extend arrow -> parquet logical type mapping#43679
GH-43624: [Go] Add JSON/UUID extension types, extend arrow -> parquet logical type mapping#43679joellubi merged 21 commits intoapache:mainfrom
Conversation
|
Thanks for the thorough review @pitrou. The UUID extension was moved from a previously |
pitrou
left a comment
There was a problem hiding this comment.
With my very basic knowledge of Go, this looks ok to me. I'd rather @zeroshade vet it though :-)
go/arrow/extensions/json.go
Outdated
|
|
||
| // GetOneForMarshal implements arrow.Array. | ||
| func (a *JSONArray) GetOneForMarshal(i int) interface{} { | ||
| return a.Value(i) |
There was a problem hiding this comment.
should this instead use ValueJSON and return the json.RawMessage instead?
There was a problem hiding this comment.
Yes that makes more sense!
go/arrow/extensions/json_test.go
Outdated
| expectedJSON := strings.ReplaceAll(` | ||
| {"json":"foobar"} | ||
| {"json":null} | ||
| {"json":{"foo":"bar"}} | ||
| {"json":42} | ||
| {"json":true} | ||
| {"json":[1,true,"3",null,{"five":5}]}`, | ||
| "\t", "") + "\n" // strip indentation, add trailing newline | ||
| require.Equal(t, expectedJSON, buf.String()) |
There was a problem hiding this comment.
you could probably use JSONEq instead of needing to do the ReplaceAll
There was a problem hiding this comment.
RecordToJSON returns jsonlines rather than json itself, but it is still cleaner to do a line-by-line JSONEq comparison. I just added that.
| *array.ExtensionBuilder | ||
| } | ||
|
|
||
| func NewUUIDBuilder(mem memory.Allocator) *UUIDBuilder { |
There was a problem hiding this comment.
I must have missed that, just added one!
zeroshade
left a comment
There was a problem hiding this comment.
Just a couple nits to address first, otherwise I'm good with this
|
After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit d47b305. There were no benchmark performance regressions. 🎉 The full Conbench report has more details. It also includes information about 13 possible false positives for unstable benchmarks that are known to sometimes produce them. |
#### Summary Goes with cloudquery/plugin-pb-go#445 Interesting bit is https://arrow.apache.org/blog/2024/10/23/arrow-go-18.0.0-release/#canonical-extension-types that we might be able to drop our UUID and JSON own extension types, but probably better to do that separately Had to do 3ae4778 to get the tests working, followed apache/arrow#43679 as a reference ---
Rationale for this change
JSONextension type implementation.What changes are included in this PR?
UUIDTypefrominternaltoarrow/extensionsJSONcanonical extension typeCustomParquetTypeinterface so extension types can specify their targetLogicalTypein ParquetfieldToNodeto split upPrimitiveNodetype mapping for leaves fromGroupNodecompositionLogicalTypeto use only value receiversAre these changes tested?
Yes
Are there any user-facing changes?
UUIDandJSONextension types are available to end users.CustomParquetTypeinterface to control Parquet conversion without needing to fork or upstream the change.