changefeedccl: support arrays in avro encoding#64251
changefeedccl: support arrays in avro encoding#64251craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
stevendanna
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @miretskiy)
a discussion (no related file):
It'll be nice to make the list of unsupported types even smaller.
I've held off on approving since it looks like there are some related test failures but have left some relatively minor inline comments.
pkg/ccl/changefeedccl/avro.go, line 173 at r1 (raw file):
func typeToAvroSchema(typ *types.T, reuseMap bool) (*avroSchemaField, error) { schema := &avroSchemaField{ typ: typ,
naming: A bit longer but sqlType might be more clear here
pkg/ccl/changefeedccl/avro.go, line 461 at r1 (raw file):
) case types.ArrayFamily: itemSchema, err := typeToAvroSchema(typ.InternalType.ArrayContents, false /*reuse map*/)
Might be useful to comment here on why we can't reuse the map
pkg/ccl/changefeedccl/avro.go, line 485 at r1 (raw file):
}, func(x interface{}) (tree.Datum, error) { datumArr := tree.NewDArray(itemSchema.typ)
Interesting, looks like we don't have a NewDArrayWithCapacity or anything like that.
pkg/ccl/changefeedccl/avro_test.go, line 731 at r1 (raw file):
func BenchmarkEncodeIntArray(b *testing.B) { benchmarkEncodeType(b, types.IntArray, randEncDatumRow(types.IntArray))
Thanks for adding this benchmark! I noticed that there are some subtests in TestAvroSchema that might be potentials for updating as well, such as type_goldens and value_goldens?
pkg/ccl/changefeedccl/encoder_test.go, line 413 at r1 (raw file):
sqlDB := sqlutils.MakeSQLRunner(db) sqlDB.Exec(t, `CREATE TABLE foo (a INT PRIMARY KEY, b INT[])`) var ts1 string
Is ts1 used for anything? If not perhaps we can remove it and use sqlDB.Exec below without the returning clause.
miretskiy
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @stevendanna)
pkg/ccl/changefeedccl/avro.go, line 461 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Might be useful to comment here on why we can't reuse the map
It is sad that we can't reuse the map. I wonder though what the benchmark says. Arrays might not be a big enough deal to warrant this, but one idea is to have a pool of maps allocated for each type. Again, not sure if worth it, but could be benchmarked.
pkg/ccl/changefeedccl/avro_test.go, line 731 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Thanks for adding this benchmark! I noticed that there are some subtests in TestAvroSchema that might be potentials for updating as well, such as type_goldens and value_goldens?
What are the numbers here? Allocations?
HonoreDB
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @stevendanna)
pkg/ccl/changefeedccl/avro.go, line 173 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
naming: A bit longer but sqlType might be more clear here
typ gets used a lot in existing code for values of type *types.T, so I'd rather keep in line with that.
pkg/ccl/changefeedccl/avro.go, line 461 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
It is sad that we can't reuse the map. I wonder though what the benchmark says. Arrays might not be a big enough deal to warrant this, but one idea is to have a pool of maps allocated for each type. Again, not sure if worth it, but could be benchmarked.
Added a code comment above.
pkg/ccl/changefeedccl/avro.go, line 485 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Interesting, looks like we don't have a NewDArrayWithCapacity or anything like that.
I'm sure there's a way, but this function isn't used outside of tests so no need to optimize it for large arrays.
pkg/ccl/changefeedccl/avro_test.go, line 731 at r1 (raw file):
Previously, miretskiy (Yevgeniy Miretskiy) wrote…
What are the numbers here? Allocations?
The benchmark gives 13 allocs/op, for a random distribution of array sizes from 0 to 9 (whatever weird distribution is given by for i := 0; i < rng.Intn(10); i++). Seems acceptable for now?
Added arrays to the golden values.
pkg/ccl/changefeedccl/encoder_test.go, line 413 at r1 (raw file):
Previously, stevendanna (Steven Danna) wrote…
Is ts1 used for anything? If not perhaps we can remove it and use sqlDB.Exec below without the returning clause.
Done.
d6c6726 to
4a8a6f9
Compare
|
No urgency, but I assert that this is ready for re-review. Note that I've changed the schema format slightly to work around a difference between confluent_schema_registry and goavro's schema parsing...the new format is terser and works with all existing types, so I'm not too bothered, but still opened an Avro issue. |
miretskiy
left a comment
There was a problem hiding this comment.
Reviewed 1 of 4 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @stevendanna)
|
bors r+ |
|
Build failed: |
Arrays are one of a few remaining types we don't support in avro changefeeds. This serializes them. Sadly can't use @miretskiy's new optimization since we need to load multiple maps with the same key into an array. Release note (bug fix): Changefeeds on tables with array columns support avro
|
bors r+ |
|
Build succeeded: |
Arrays are one of a few remaining types we don't support
in avro changefeeds. This serializes them.
Sadly can't use @miretskiy's new optimization since we
need to load multiple maps with the same key into an array.
Release note (bug fix): Changefeeds on tables with array columns support avro