Skip to content

changefeedccl: support arrays in avro encoding#64251

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
HonoreDB:avro_array
May 4, 2021
Merged

changefeedccl: support arrays in avro encoding#64251
craig[bot] merged 1 commit intocockroachdb:masterfrom
HonoreDB:avro_array

Conversation

@HonoreDB
Copy link
Copy Markdown
Contributor

Arrays are one of a few remaining types we don't support
in avro changefeeds. This serializes them.
Sadly can't use @miretskiy's new optimization since we
need to load multiple maps with the same key into an array.

Release note (bug fix): Changefeeds on tables with array columns support avro

@HonoreDB HonoreDB requested a review from miretskiy April 27, 2021 01:12
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @miretskiy)

a discussion (no related file):
It'll be nice to make the list of unsupported types even smaller.

I've held off on approving since it looks like there are some related test failures but have left some relatively minor inline comments.



pkg/ccl/changefeedccl/avro.go, line 173 at r1 (raw file):

func typeToAvroSchema(typ *types.T, reuseMap bool) (*avroSchemaField, error) {
	schema := &avroSchemaField{
		typ: typ,

naming: A bit longer but sqlType might be more clear here


pkg/ccl/changefeedccl/avro.go, line 461 at r1 (raw file):

		)
	case types.ArrayFamily:
		itemSchema, err := typeToAvroSchema(typ.InternalType.ArrayContents, false /*reuse map*/)

Might be useful to comment here on why we can't reuse the map


pkg/ccl/changefeedccl/avro.go, line 485 at r1 (raw file):

			},
			func(x interface{}) (tree.Datum, error) {
				datumArr := tree.NewDArray(itemSchema.typ)

Interesting, looks like we don't have a NewDArrayWithCapacity or anything like that.


pkg/ccl/changefeedccl/avro_test.go, line 731 at r1 (raw file):

func BenchmarkEncodeIntArray(b *testing.B) {
	benchmarkEncodeType(b, types.IntArray, randEncDatumRow(types.IntArray))

Thanks for adding this benchmark! I noticed that there are some subtests in TestAvroSchema that might be potentials for updating as well, such as type_goldens and value_goldens?


pkg/ccl/changefeedccl/encoder_test.go, line 413 at r1 (raw file):

		sqlDB := sqlutils.MakeSQLRunner(db)
		sqlDB.Exec(t, `CREATE TABLE foo (a INT PRIMARY KEY, b INT[])`)
		var ts1 string

Is ts1 used for anything? If not perhaps we can remove it and use sqlDB.Exec below without the returning clause.

@miretskiy miretskiy requested a review from stevendanna April 27, 2021 11:29
Copy link
Copy Markdown
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @stevendanna)


pkg/ccl/changefeedccl/avro.go, line 461 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Might be useful to comment here on why we can't reuse the map

It is sad that we can't reuse the map. I wonder though what the benchmark says. Arrays might not be a big enough deal to warrant this, but one idea is to have a pool of maps allocated for each type. Again, not sure if worth it, but could be benchmarked.


pkg/ccl/changefeedccl/avro_test.go, line 731 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Thanks for adding this benchmark! I noticed that there are some subtests in TestAvroSchema that might be potentials for updating as well, such as type_goldens and value_goldens?

What are the numbers here? Allocations?

Copy link
Copy Markdown
Contributor Author

@HonoreDB HonoreDB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @stevendanna)


pkg/ccl/changefeedccl/avro.go, line 173 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

naming: A bit longer but sqlType might be more clear here

typ gets used a lot in existing code for values of type *types.T, so I'd rather keep in line with that.


pkg/ccl/changefeedccl/avro.go, line 461 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

It is sad that we can't reuse the map. I wonder though what the benchmark says. Arrays might not be a big enough deal to warrant this, but one idea is to have a pool of maps allocated for each type. Again, not sure if worth it, but could be benchmarked.

Added a code comment above.


pkg/ccl/changefeedccl/avro.go, line 485 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Interesting, looks like we don't have a NewDArrayWithCapacity or anything like that.

I'm sure there's a way, but this function isn't used outside of tests so no need to optimize it for large arrays.


pkg/ccl/changefeedccl/avro_test.go, line 731 at r1 (raw file):

Previously, miretskiy (Yevgeniy Miretskiy) wrote…

What are the numbers here? Allocations?

The benchmark gives 13 allocs/op, for a random distribution of array sizes from 0 to 9 (whatever weird distribution is given by for i := 0; i < rng.Intn(10); i++). Seems acceptable for now?

Added arrays to the golden values.


pkg/ccl/changefeedccl/encoder_test.go, line 413 at r1 (raw file):

Previously, stevendanna (Steven Danna) wrote…

Is ts1 used for anything? If not perhaps we can remove it and use sqlDB.Exec below without the returning clause.

Done.

@HonoreDB HonoreDB force-pushed the avro_array branch 4 times, most recently from d6c6726 to 4a8a6f9 Compare April 29, 2021 20:18
@HonoreDB
Copy link
Copy Markdown
Contributor Author

HonoreDB commented Apr 30, 2021

No urgency, but I assert that this is ready for re-review. Note that I've changed the schema format slightly to work around a difference between confluent_schema_registry and goavro's schema parsing...the new format is terser and works with all existing types, so I'm not too bothered, but still opened an Avro issue.

Copy link
Copy Markdown
Contributor

@miretskiy miretskiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 4 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @HonoreDB and @stevendanna)

@HonoreDB
Copy link
Copy Markdown
Contributor Author

HonoreDB commented May 4, 2021

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 4, 2021

Build failed:

Arrays are one of a few remaining types we don't support
in avro changefeeds. This serializes them.
Sadly can't use @miretskiy's new optimization since we
need to load multiple maps with the same key into an array.

Release note (bug fix): Changefeeds on tables with array columns support avro
@HonoreDB
Copy link
Copy Markdown
Contributor Author

HonoreDB commented May 4, 2021

bors r+

@craig craig bot merged commit 181dd6f into cockroachdb:master May 4, 2021
@craig
Copy link
Copy Markdown
Contributor

craig bot commented May 4, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants