Skip to content

changefeedccl: prevent deadlock in TestChangefeedKafkaMessageTooLarge#107752

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
jayshrivastava:deflake-kafka-message-too-large
Jul 28, 2023
Merged

changefeedccl: prevent deadlock in TestChangefeedKafkaMessageTooLarge#107752
craig[bot] merged 1 commit intocockroachdb:masterfrom
jayshrivastava:deflake-kafka-message-too-large

Conversation

@jayshrivastava
Copy link
Copy Markdown
Contributor

@jayshrivastava jayshrivastava commented Jul 27, 2023

Previously, this test would deadlock due to kafka retrying messages too many times. These messages are stored in a buffer of size 1024 created by the CDC testing infra:

feedCh := make(chan *sarama.ProducerMessage, 1024)

The test asserts that 2000 messages pass through the buffer. When the test finishes, it stops reading from the buffer. The problem is that due to retries, there may be more messages sent to the buffer than that are read out of the buffer. Even after the 2000 messages are read and the test is shutting down, the sink may be blocked trying to put resolved messages (plus retries) in the buffer. If this happens, the changefeed resumer (same goroutine as the kafka sink) gets blocked and does not terminate when the job is cancelled at the end of the test.

This change caps the number of retries at 200 for this test, so there should be no more than 200 extra messages plus a few resolved messages during this test. This is far less than the buffer size of 1024.

See detailed explanation in #107591.

Fixes: #107591
Epic: none
Release note: None

@jayshrivastava jayshrivastava requested a review from a team as a code owner July 27, 2023 20:38
@jayshrivastava jayshrivastava requested review from HonoreDB and removed request for a team July 27, 2023 20:38
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@jayshrivastava jayshrivastava force-pushed the deflake-kafka-message-too-large branch from ff789aa to 284f2b2 Compare July 27, 2023 20:39
@jayshrivastava jayshrivastava force-pushed the deflake-kafka-message-too-large branch from 284f2b2 to fa37eaf Compare July 27, 2023 20:49
@jayshrivastava jayshrivastava force-pushed the deflake-kafka-message-too-large branch from fa37eaf to a8f825f Compare July 27, 2023 21:01
Previously, this test would deadlock due to kafka retrying messages
too many times. These messages are stored in a buffer
of size 1024 created by the CDC testing infra: https://github.com/cockroachdb/cockroach/blob/5c3f96d38cdc3a2d953ca3ffb1e39e97d7e5110e/pkg/ccl/changefeedccl/testfeed_test.go#L1819

The test asserts that 2000 messages pass through the buffer.
When the test finishes, it stops reading from the buffer. The problem
is that due to retries, there may be more messages sent to the buffer
than that are read out of the buffer. Even after the 2000 messages
are read and the test is shutting down, the sink may be blocked
trying to put resolved messages (plus retries) in the buffer.
If this happens, the changefeed resumer (same goroutine as the kafka sink)
gets blocked and does not terminate when the job is cancelled
at the end of the test.

This change caps the number of retries at 200 for this test, so there
should be no more than 200 extra messages plus a few resolved messages
during this test. This is far less than the buffer size of 1024.

See detailed explanation in cockroachdb#107591.

Fixes: cockroachdb#107591
Epic: none
Release note: None
@jayshrivastava jayshrivastava force-pushed the deflake-kafka-message-too-large branch from a8f825f to 4f1e340 Compare July 27, 2023 21:02
@jayshrivastava
Copy link
Copy Markdown
Contributor Author

bors r=miretskiy

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 28, 2023

This PR was included in a batch that was canceled, it will be automatically retried

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jul 28, 2023

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

changefeedccl: flake in TestChangefeedKafkaMessageTooLarge

3 participants