streamingccl: improvements to the random stream test client#59441
streamingccl: improvements to the random stream test client#59441craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
da8ca88 to
645dcec
Compare
pbardea
left a comment
There was a problem hiding this comment.
Updated with what we talked about offline to track in the PR.
Reviewed 3 of 14 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @adityamaru)
pkg/ccl/changefeedccl/cdctest/validator.go, line 38 at r1 (raw file):
// StreamClientValidatorWrapper wraps a Validator and exposes additional methods // used by stream ingestion to check for correctness. type StreamClientValidatorWrapper interface {
Making a note, although I think this is addressed in the follow up PR:
Would it make sense to move this somewhere within streamingccl? If we do, does it make sense for this to just be a struct that embeds cdctest.Validator?
pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 216 at r1 (raw file):
} //// Generate namespace entry.
nit: double comment
pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 295 at r1 (raw file):
systemKVs = systemKVs[1:] } else { // Generate a duplicate KVEvent, and update its timestamp to now().
CDC may emit exact duplicate of keys, don't we want to keep the timestamp the same as the original event?
pkg/ccl/streamingccl/streamingest/stream_ingestion_processor.go, line 73 at r1 (raw file):
// partitionEvent augments a normal event with the partition it came from. type partitionEvent struct {
As discussed offline, I think it makes sense to only wrap the event with the partition it came from when needed, and extend the interceptor interface to accept the partition addr.
6ab8ffa to
7ae44ad
Compare
adityamaru
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @pbardea)
pkg/ccl/changefeedccl/cdctest/validator.go, line 38 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
Making a note, although I think this is addressed in the follow up PR:
Would it make sense to move this somewhere within
streamingccl? If we do, does it make sense for this to just be a struct that embedscdctest.Validator?
Yep, I'll be sure to move it in the next PR.
pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 216 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
nit: double comment
done.
pkg/ccl/streamingccl/streamclient/random_stream_client.go, line 295 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
CDC may emit exact duplicate of keys, don't we want to keep the timestamp the same as the original event?
done, also set the key to nil every time we encounter a resolved ts to ensure we don't emit a duplicate with a ts less than the resolved ts.
pkg/ccl/streamingccl/streamingest/stream_ingestion_processor.go, line 73 at r1 (raw file):
Previously, pbardea (Paul Bardea) wrote…
As discussed offline, I think it makes sense to only wrap the event with the partition it came from when needed, and extend the interceptor interface to accept the partition addr.
Done.
7ae44ad to
37fcaf5
Compare
This change improves on the random stream client to allow for better testing of the various components of the stream ingestion job. Specifically: - Adds support for specifying number of partitions. For simplicity, a partition generates KVs for a particular table span. - Generates system KVs (descriptor and namespace) KVs, as the first two KVs on the partition stream. I played around with the idea of having a separate "system" and "table data" partition, but the code and tests became more convoluted, compared to the current approach. - Hookup the CDC orderValidator to the random stream client's output. This gives us some guarantees that the data being generated is semantically correct. - Maintain an in-memory copy of all the streamed events, that can be efficiently queried. This allows us to compare the ingested KVs to the streamed KVs and gain more confidence in our pipeline. Release note: None
37fcaf5 to
8247919
Compare
|
TFTR! bors r=pbardea |
|
Build succeeded: |
This change improves on the random stream client to allow for better
testing of the various components of the stream ingestion job.
Specifically:
Adds support for specifying number of partitions. For simplicity,
a partition generates KVs for a particular table span.
Generates system KVs (descriptor and namespace) KVs, as the first two
KVs on the partition stream. I played around with the idea of having a
separate "system" and "table data" partition, but the code and tests
became more convoluted, compared to the current approach.
Hookup the CDC orderValidator to the random stream client's output.
This gives us some guarantees that the data being generated is
semantically correct.
Maintain an in-memory copy of all the streamed events, that can be
efficiently queried. This allows us to compare the ingested KVs to the
streamed KVs and gain more confidence in our pipeline.
Infroms: #59175
Release note: None