-
Notifications
You must be signed in to change notification settings - Fork 4.1k
cdc: potentially dropping events during initial scan #123371
Copy link
Copy link
Labels
A-cdcChange Data CaptureChange Data CaptureC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.C-technical-advisoryCaused a technical advisoryCaused a technical advisoryGA-blockerO-supportWould prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docsWould prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docsP-1Issues/test failures with a fix SLA of 1 monthIssues/test failures with a fix SLA of 1 monthT-cdcbranch-release-22.2Used to mark GA and release blockers, technical advisories, and bugs for 22.2Used to mark GA and release blockers, technical advisories, and bugs for 22.2branch-release-23.1Used to mark GA and release blockers, technical advisories, and bugs for 23.1Used to mark GA and release blockers, technical advisories, and bugs for 23.1branch-release-23.2Used to mark GA and release blockers, technical advisories, and bugs for 23.2Used to mark GA and release blockers, technical advisories, and bugs for 23.2branch-release-24.1Used to mark GA and release blockers, technical advisories, and bugs for 24.1Used to mark GA and release blockers, technical advisories, and bugs for 24.1v23.1.22v23.1.23v23.2.5v23.2.6v24.1.1
Description
Initially, all span initial resolved timestamps are kept as zero upon resuming a
job since initial resolved timestamps are set as initial high water which
remains zero until initial scan is completed. However, since
afb95b1,
we began reloading checkpoint timestamps instead of setting them all to zero at
the start. In PR #102717, we introduced a mechanism to reduce message duplicates
by re-loading job progress upon resuming which largely increased the likelihood
of this bug. This could result in dropping events since we now sometimes make a
new frontier with all spans forwarded to initialResolved
cockroach/pkg/ccl/changefeedccl/changefeed_processors.go
Lines 528 to 535 in df19639
| for _, watch := range ca.spec.Watches { | |
| if initialHighWater.IsEmpty() || watch.InitialResolved.Less(initialHighWater) { | |
| initialHighWater = watch.InitialResolved | |
| } | |
| spans = append(spans, watch.Span) | |
| } | |
| ca.frontier, err = makeSchemaChangeFrontier(initialHighWater, spans...) |
during initial scans.
Jira issue: CRDB-38432
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
A-cdcChange Data CaptureChange Data CaptureC-bugCode not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.C-technical-advisoryCaused a technical advisoryCaused a technical advisoryGA-blockerO-supportWould prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docsWould prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docsP-1Issues/test failures with a fix SLA of 1 monthIssues/test failures with a fix SLA of 1 monthT-cdcbranch-release-22.2Used to mark GA and release blockers, technical advisories, and bugs for 22.2Used to mark GA and release blockers, technical advisories, and bugs for 22.2branch-release-23.1Used to mark GA and release blockers, technical advisories, and bugs for 23.1Used to mark GA and release blockers, technical advisories, and bugs for 23.1branch-release-23.2Used to mark GA and release blockers, technical advisories, and bugs for 23.2Used to mark GA and release blockers, technical advisories, and bugs for 23.2branch-release-24.1Used to mark GA and release blockers, technical advisories, and bugs for 24.1Used to mark GA and release blockers, technical advisories, and bugs for 24.1v23.1.22v23.1.23v23.2.5v23.2.6v24.1.1