rfc: Change Data Capture#25229
Conversation
|
There are a number of open questions and a good amount of polish left, but this is ready for initial feedback. I'm particularly looking for feedback on the "Guide-level explanation" first, since that's what we need to nail down first for the prototype. |
|
Thanks for writing this up. I like the fact that the proposal is comprehensive yet not more complex than strictly necessary for the given scope. I do not see major remaining points of discussion (although I'm not yet primed on the technical topics enough to be a suitable in-depth reviewer). From my POV, it's a good proposal already as-is. I have looked at the proposed SQL syntax only superficially. I decided to refrain from scrutinizing and engaging a fine-tuning discussion about the syntax because this is a new feature with unclear precedent in other SQL engines (i.e. no clear example to follow), and the first iteration will provide us experience about how to improve UX in later iterations. Also meanwhile I didn't find anything sticking out that would compel immediate discussion anyway. I would just like to suggest to avoid introducing new ways to specify values, and stick to the existing scalar expression format if at all possible, see my suggestion below. Also one unanswered question, see below. Reviewed 1 of 1 files at r1. docs/RFCS/20180501_change_data_capture.md, line 34 at r1 (raw file):
I'll restate my opinion here that what you are defining in this RFC is not jobs but DB-level background services (or "tasks" or "processes"). A job conceptually has a finite, initially bounded amount of work to do, and thus a predictable end time. Changefeeds do not have any of this. There's to my knowledge no historical precedent for extending the term "job" in that way and I predict it will cause concerns. As always, terminology is important. Conflating many notions behind the same words means that it becomes hard to understand what we're talking about in conversations. docs/RFCS/20180501_change_data_capture.md, line 56 at r1 (raw file):
Can you clarify in which order here, I think it's in order of mvcc timestamps. Also if you do in mvcc order, and multiple kv pairs have the same timestamp, do you want to guarantee the partial order? like mvcc ts, then key? docs/RFCS/20180501_change_data_capture.md, line 103 at r1 (raw file):
I'd suggest that if you want to give names to jobs, that this becomes a separate, orthogonal project (would also benefit other types of jobs). docs/RFCS/20180501_change_data_capture.md, line 135 at r1 (raw file):
I'd recommend the value part of each option/value pair to follow SQL expression syntax. For examples quote words like SQL string literals. This will allow users to compute the values from SQL expressions (and possibly later placeholders). docs/RFCS/20180501_change_data_capture.md, line 154 at r1 (raw file):
The RFC could provide a super-short intro to Avro to provide a little more context to the example. docs/RFCS/20180501_change_data_capture.md, line 236 at r1 (raw file):
This change is a symptom of the "job" concept being ill-suited for this. This price of breaking the abstraction is not trivial, because this change also requires updates to the web UI and possibly other tools. docs/RFCS/20180501_change_data_capture.md, line 385 at r1 (raw file):
I'd suggest having truncate trigger a function in the changefeed that will create changefeed events for all the rows currently in the truncated table. This can be done fully asynchronously since these rows will not change in KV after TRUNCATE completes (the truncate generates a new table ID). docs/RFCS/20180501_change_data_capture.md, line 473 at r1 (raw file):
How to dump/restore or backup/restore all currently configured changefeeds. I'd argue that changefeeds are much more database objects than jobs. I'd imagine them to have descriptors in the schema, and be subject to a shared code path as other descriptors. Including backup/restore, support in Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 8 unresolved discussions. docs/RFCS/20180501_change_data_capture.md, line 94 at r1 (raw file):
Summary of our discussion this morning:
We touched in that discussion on whether to offer PAUSE CHANGEFEED in addition to PAUSE JOB. I don't have firm opinion on that. We probably want to, again for consistency in docs. For this too you can desugar in the parser.
Comments from Reviewable |
|
Reviewed 1 of 1 files at r1. docs/RFCS/20180501_change_data_capture.md, line 62 at r1 (raw file):
This doesn't meet my definition of "in order". What good is this guarantee if you still have to remember that you've seen V3 so you don't overwrite it with the replayed V2? docs/RFCS/20180501_change_data_capture.md, line 77 at r1 (raw file):
You mean the timestamp of the version of the row, or could the record emitted for row version T1 contain some later timestamp T2 because it is known that another version has committed after T2? docs/RFCS/20180501_change_data_capture.md, line 81 at r1 (raw file):
How does this interact with the at-least-once replays? Can we guarantee that records will never be replayed across a timestamp closure boundary, or do we just guarantee that any record T1 that is received after a closed timestamp T2 is a duplicate of some record that was emitted before the timestamp was closed? docs/RFCS/20180501_change_data_capture.md, line 91 at r1 (raw file):
Is this syntax your own invention or is it following the example of some existing system? Postgres has docs/RFCS/20180501_change_data_capture.md, line 127 at r1 (raw file):
For ease of testing, we may want to consider a docs/RFCS/20180501_change_data_capture.md, line 172 at r1 (raw file):
ALTER shouldn't leave the job in a paused state. If we can't alter a running job then that should either be an error or it should do a PAUSE/ALTER/UNPAUSE sequence. docs/RFCS/20180501_change_data_capture.md, line 190 at r1 (raw file):
The syntax above uses FROM x TO y syntax. It would be nice to have syntactic sugar for creating a changefeed for a partition instead of having to duplicate the partition definition there. docs/RFCS/20180501_change_data_capture.md, line 385 at r1 (raw file): Previously, knz (kena) wrote…
This needs user feedback; emitting a changefeed event for every existing row is very expensive and I'm not sure that's ever what someone would want. I'd expect the user to truncate the elastic search (for example) table at the same time if they're going to truncate. I'd prefer the SQL server behavior (disallow truncation of tables with changefeeds) unless with have more confirmation of what people expect here. docs/RFCS/20180501_change_data_capture.md, line 391 at r1 (raw file):
I'm fine with making changefeeds admin-only for v1 since they have a potentially large impact on the cluster, but eventually we should allow non-admin users to create changefeeds, and admins should be able to revoke those users' permissions (breaking/cancelling the feeds) docs/RFCS/20180501_change_data_capture.md, line 405 at r1 (raw file):
We could keep the configuration complexity out of the docs/RFCS/20180501_change_data_capture.md, line 470 at r1 (raw file):
Only the final value should be visible. Comments from Reviewable |
|
@jordanlewis and/or @arjunravinarayan can you look at the distsql part of this please? I made it intentionally very simple for this first version of cdc but I’d still like more eyes to make sure I didnt miss something. Thanks for the reviews so far! I’ll do a big pass to address comments after all the initial feedback is in. |
|
Review status: all files reviewed at latest revision, 19 unresolved discussions. docs/RFCS/20180501_change_data_capture.md, line 56 at r1 (raw file):
Yes, that's correct.
Dan can correct me if I'm wrong, but I think the idea is that no two mvcc values will ever have the same timestamp at the same key. If two values do have the same timestamp then they necessarily will be at different keys. In this situation, no ordering is defined. I think this is what is implied by "Cross-row or cross-table order guarantees are not given." docs/RFCS/20180501_change_data_capture.md, line 77 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
The former. The record will contain the timestamp of the corresponding MVCC value. docs/RFCS/20180501_change_data_capture.md, line 103 at r1 (raw file): Previously, knz (kena) wrote…
I agree that it should be a separate discussion. I also agree that its something we should do. docs/RFCS/20180501_change_data_capture.md, line 154 at r1 (raw file): Previously, knz (kena) wrote…
+1, I think that would be helpful. docs/RFCS/20180501_change_data_capture.md, line 222 at r1 (raw file):
We'll want to test that this case gracefully fails without affecting foreground traffic. Can you also explain what happens if a sink falls so far behind that it can no longer spill anything else to disk? This may not necessarily imply that it has fallen behind by more than the gc threshold, so it isn't necessarily a failure state. However, there won't be any backpressure, so instead, we'll need to fall back to lots of catch-up scans after employing some kind of FIFO policy for on-disk buffered records. I think this means we'll want to tear down our change feed until the sink is able to catch back up. docs/RFCS/20180501_change_data_capture.md, line 295 at r1 (raw file):
I would change docs/RFCS/20180501_change_data_capture.md, line 311 at r1 (raw file):
Raft snapshots will also make this a complicated case to handle. Comments from Reviewable |
|
Reviewed 1 of 1 files at r1. docs/RFCS/20180501_change_data_capture.md, line 39 at r1 (raw file):
Maybe say that a current timestamp is chosen and that then the values at that timestamp are emitted. docs/RFCS/20180501_change_data_capture.md, line 56 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Rows are spread out over multiple KV pairs potentially, when there are column families. Say you have a row and first a write to docs/RFCS/20180501_change_data_capture.md, line 62 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I'm also curious whether these semantics are workable for the consumers. docs/RFCS/20180501_change_data_capture.md, line 68 at r1 (raw file):
May need to be updated to docs/RFCS/20180501_change_data_capture.md, line 97 at r1 (raw file):
What if we used a URL here?
That alleviates that there's a bunch of Kafka specific stuff that would otherwise become SQL (?). docs/RFCS/20180501_change_data_capture.md, line 127 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I like this idea. docs/RFCS/20180501_change_data_capture.md, line 140 at r1 (raw file):
The docs/RFCS/20180501_change_data_capture.md, line 275 at r1 (raw file):
multiple docs/RFCS/20180501_change_data_capture.md, line 289 at r1 (raw file):
GDPR, though? The processor may already be outside of the required jurisdiction? I think this rules out the option on filtering on the columns you mention above. docs/RFCS/20180501_change_data_capture.md, line 311 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
More so than now? The leaseholder may not be the Raft leader and in particular may also receive a snapshot. docs/RFCS/20180501_change_data_capture.md, line 323 at r1 (raw file):
Ranges are 64mb right now, so I assume your aggregator will have some limit on the parallelism and will chug through over time. But if we support larger ranges (which sounds like the plan) you may run into the situation in which you're buffering significant amounts of Raft log before you get to send it. Also, consider harping on the format in which the catch-up read is carried out. Using the KV interface naively would be pretty inefficient. You're going to want to stream out an SST or something like that. BACKUP/RESTORE will have many of these same problems once range sizes become too large to fit into ram comfortably. The way I think this is reasonably addressed at the moment is to change the command that creates an SSTable to a streaming one, without changing the KV API. Instead, the response returns an endpoint at which the replica can be reached to stream out the SST. docs/RFCS/20180501_change_data_capture.md, line 325 at r1 (raw file):
May need update w.r.t column families. docs/RFCS/20180501_change_data_capture.md, line 328 at r1 (raw file):
What do the close notifications guarantee in this PR? If an intent is written at time T, can the timestamp be closed out before it is resolved? Those are the follower reads style close notifications. If you want to make sure that no intents are below the closed out timestamp, you need something that doesn't exist today. I get the impression that you don't want to build something new here, which I support. But I don't see how consumers would use this without building that state-massaging layer that takes into account the close notifications and open intents themselves, individually. That also seems unfortunate. The PR should treat the interplay between intents, records, and closed timestamps in more detail. docs/RFCS/20180501_change_data_capture.md, line 329 at r1 (raw file):
It's probably going to be Escaping problem in the markdown here ( docs/RFCS/20180501_change_data_capture.md, line 330 at r1 (raw file):
... will be emitted. docs/RFCS/20180501_change_data_capture.md, line 385 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
What Ben said. Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 33 unresolved discussions. docs/RFCS/20180501_change_data_capture.md, line 97 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
+1, I'm into this idea. docs/RFCS/20180501_change_data_capture.md, line 203 at r1 (raw file):
We might want to consider making this variable based on total disk space, as well. if the cluster setting was set to 10 GB, but there was only 1 GB left, it would be better to stall the changefeed than let this disk buffer fill the disks completely. docs/RFCS/20180501_change_data_capture.md, line 248 at r1 (raw file):
Why 2 processors, if they're always going to be 1:1? It's more overhead (not much, but some) - I think you might be better served by a single processor and some helper functions. I think that would also fix your progress information pass-back problem? Although, what's this coordinator you speak of? Is that something above the DistSQL flow, or in int? docs/RFCS/20180501_change_data_capture.md, line 282 at r1 (raw file):
What's this? I don't see any mention of it elsewhere. docs/RFCS/20180501_change_data_capture.md, line 385 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
We also have Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 36 unresolved discussions. docs/RFCS/20180501_change_data_capture.md, line 127 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
+1 docs/RFCS/20180501_change_data_capture.md, line 203 at r1 (raw file):
Is the setting for the memory buffer size or the disk buffer size? If the former, like most of our other memory-related settings, we'll probably want this to be a command-line flag so it can be customized for nodes running on different hardware. If the latter, it may need to be a command-line parameter on each store. docs/RFCS/20180501_change_data_capture.md, line 222 at r1 (raw file):
This situation seems under-specified. Before we hit the gc threshold, what do we do when the in-memory/on-disk buffers fill up? It's different from the case above of an unavailable sink in that we can continue making progress so we wouldn't want to just stall. docs/RFCS/20180501_change_data_capture.md, line 367 at r1 (raw file):
Does this allow for manual re-partitioning as well? If so, how does changing the partitioning scheme of a live changefeed affect cockroach? Comments from Reviewable |
|
Reviewed 1 of 1 files at r1. Comments from Reviewable |
|
Thanks for the comments everyone! This doesn't address everything, but I'm flying today and wanted to get out what I had. Review status: 0 of 1 files reviewed at latest revision, 39 unresolved discussions. docs/RFCS/20180501_change_data_capture.md, line 34 at r1 (raw file): Previously, knz (kena) wrote…
Done. I avoided picking a name for the category because I think it's important but I don't yet have an opinion between services vs tasks vs processes vs something else. Do you? docs/RFCS/20180501_change_data_capture.md, line 39 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 56 at r1 (raw file):
As nathan mentions, this is intentionally not done to reduce our buffering.
Good point. We need the per-row ordering to make the "easy case easy" (e.g. elasticsearch with no buffering) , so I think we have to buffer in the ChangeAggregator processor when a multi-family table is watched. We'll already have to build this for the pre-StreamState, so it's a performance hit but shouldn't be much more complexity. Added a note. docs/RFCS/20180501_change_data_capture.md, line 62 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Agreed that this is a confused usage of ordering, but we're pretty limited by what the sinks support. They tend to have limited exactly-once guarantees on the consumer side, so my current thought is there's not much we'd gain by doing the extra work to make it exactly-once on the producer side, when that's even possible. I think I see a way to do it using Kafka transactions, but a) I'll have to run some experiments to see what the perf overhead is and b) dunno how it will work on other sinks, on some of them it may be impossible to do exactly-once. docs/RFCS/20180501_change_data_capture.md, line 68 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I think we need the row guarantees for usability (added this above), so this stays the same. docs/RFCS/20180501_change_data_capture.md, line 77 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Yeah, the former. Don't know what I was getting at with the previous wording docs/RFCS/20180501_change_data_capture.md, line 81 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
We can guarantee it if we synchronously update the job table before emitting a close notification. I added a note to the implementation section. docs/RFCS/20180501_change_data_capture.md, line 91 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
I hadn't even looked at the logical decoding syntax, since it is such a different model. The PUBLICATION/SUBSCRIPTION division could be a nice way to give users's control over how to coalesce a large number of feeds, which is something I mention in the unresolved questions. I dislike inventing our own and syntax and it does seem like we could plug into CREATE PUBLICATION/CREATE SUBSCRIPTION as-is. My biggest hesitation is that we'd have a different docs/RFCS/20180501_change_data_capture.md, line 97 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
I like this too, though I think we should only use query parameters in the URI for things specific to the sink (so envelope and column_whitelist would remain docs/RFCS/20180501_change_data_capture.md, line 127 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
I also like this, the way I've written the unit tests in the prototype is brittle. Done. docs/RFCS/20180501_change_data_capture.md, line 135 at r1 (raw file): Previously, knz (kena) wrote…
Ah, yes. I intended this, just constructed the examples wrong. docs/RFCS/20180501_change_data_capture.md, line 140 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 154 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 172 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Unless it was paused before, in which case it's surprising for alter to also unpause etc etc. I think we should do this, but there are enough UX edge cases that it's probably too low level to detail in the rfc docs/RFCS/20180501_change_data_capture.md, line 190 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Oops. I intended for there to be one, but omitted it. docs/RFCS/20180501_change_data_capture.md, line 236 at r1 (raw file): Previously, knz (kena) wrote…
We don't want to duplicate the system.jobs infrastructure but changed this to stop calling it a "job". docs/RFCS/20180501_change_data_capture.md, line 248 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Cool, this sort of thing is exactly why we looped you in, I'd been assuming that the overhead is low. I was doing this in the interest of composable processors. As for the progress, IMPORT has taught us that it's better to do it from one place if possible. The coordinator is the job coordinator, it sets up the distsql flow and restarts it from the last checkpoint on error or node death. docs/RFCS/20180501_change_data_capture.md, line 275 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 282 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Renamed to match the terminology I've been using elsewhere. docs/RFCS/20180501_change_data_capture.md, line 289 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
We currently don't have anything that causes a DistSQL flow to place processors in a certain region. Added a note above to mention there is work to be done here once distsql supports it docs/RFCS/20180501_change_data_capture.md, line 295 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 311 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
+1 I think we have to solve this regardless docs/RFCS/20180501_change_data_capture.md, line 325 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done docs/RFCS/20180501_change_data_capture.md, line 328 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I've written this assuming that the docs/RFCS/20180501_change_data_capture.md, line 329 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 330 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 367 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 385 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
I was initially hesitant for the presence of a changefeed on a table to affect which DDL statements can be run on it, but after more thought nothing else seems to make sense. Because of our row ordering guarantee, the delete backfill could cause a tremendous amount of buffering. And relying on a schema change topic is fiddly and error prone. In the absence of customer feedback otherwise, I'm taking ben's suggestion and modeling our behavior after sql server. docs/RFCS/20180501_change_data_capture.md, line 391 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 405 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
This is really an artifact of us having the equivalent of the "connector" inside cockroach, which we're forced to do because of our distributed nature. docs/RFCS/20180501_change_data_capture.md, line 470 at r1 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. Comments from Reviewable |
|
No further comment at this time. Nice RFC so far. Reviewed 1 of 1 files at r2. docs/RFCS/20180501_change_data_capture.md, line 34 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
No opinion yet. docs/RFCS/20180501_change_data_capture.md, line 209 at r2 (raw file):
For UX and ease of analyzing/testing/troubleshooting I would recommend splitting the key column into a table key prefix and per-table key. Comments from Reviewable |
|
docs/RFCS/20180501_change_data_capture.md, line 248 at r2 (raw file):
How difficult would it be to stall the gc in such a case? cc @tschottdorf Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 37 unresolved discussions. docs/RFCS/20180501_change_data_capture.md, line 248 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
I don't believe the overhead is very much, but also refactoring processors is not difficult, so I wouldn't preemptively start with two processors. We can always split them out into several processors later when we work on materialized views. You should feel free to keep this part as simple as possible, which to me sounds like 1 processor for now. docs/RFCS/20180501_change_data_capture.md, line 260 at r2 (raw file):
nit: incrementally updated materialized views. docs/RFCS/20180501_change_data_capture.md, line 300 at r2 (raw file):
It would be great if you could provide some additional insight as to how much better your life would be if DistSQL had fault tolerance and resilience built in, as that would help us organize our longer term roadmap. From this reading it sounds like you can work around it without too much trouble, which means we can continue to deprioritize it and kick that can down the road. Is that true? docs/RFCS/20180501_change_data_capture.md, line 313 at r2 (raw file):
One thing that comes up here is that lineage-based fault tolerance in DistSQL can be used to provide this (if you detect that you are now a remote processor, you can ask your parent to recreate their subflow from scratch, and then decommission you). This movement use-case also ties into the comment above on DistSQL fault tolerance. docs/RFCS/20180501_change_data_capture.md, line 316 at r2 (raw file):
@jordanlewis I believe this should not be that hard. Do we have an issue for tracking this piece of work? It might make for a good intern project (its got the trifecta of not too difficult, interesting, and not currently on the critical path for the next release). docs/RFCS/20180501_change_data_capture.md, line 483 at r2 (raw file):
What is the information necessary to reconstruct transactions? Is it just the hlc timestamp? Or is it the hlc timestamp along with the node on which the transaction master intent was? Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 41 unresolved discussions. docs/RFCS/20180501_change_data_capture.md, line 316 at r2 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Seems like a good idea to me. I don't think this is tracked. We probably need this for partitioned data as well, although maybe it happens automatically already by virtue of data locality? Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 40 unresolved discussions. docs/RFCS/20180501_change_data_capture.md, line 311 at r1 (raw file):
Yeah, that's a good point. When we see a raft snapshot, we won't know what changed because we don't have access to the individual raft entries that constitute the delta between the raft index before the snapshot and after. I don't think we can do much more than return an error on the change feed and force another catch-up scan. The downstream receiver can then use its most recent closed timestamp to bound this catch-up scan. docs/RFCS/20180501_change_data_capture.md, line 328 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
I talked to Dan about this in-person. I was under the impression that we still needed this "state-massaging layer" that buffered intents and only output committed KVs. The layer would also be in charge of translating follower reads style close timestamps to a "resolved timestamp". You say you're in support of not building anything new here, but it's not clear whether you think there's a way to avoid building this kind of translation/buffering layer. docs/RFCS/20180501_change_data_capture.md, line 248 at r2 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
This is something that we could do without too much issue, but I don't think we should actually expose this option. A general theme in this RFC is that CDC performance should not affect live traffic. With an option like this, a single CDC sink going down could have dramatic effects on the health of a user's entire database. On top of this, I don't think the dependency would be easy to explain, so I doubt many users would understand the risks until it started causing issues in Cockroach. We will want to document that if a user doesn't want to lose data in a CDC stream that's down for over 24 hours, they'll want to increase their GC TLL, but I don't think we should do anything more fancy than that. TLDR; I think it would make it way too easy for users to shoot themselves in the foot. Comments from Reviewable |
|
Thanks for all the reviews, everyone! I think this is pretty much ready to go. PTAL Review status: 0 of 1 files reviewed at latest revision, 40 unresolved discussions, some commit checks pending. docs/RFCS/20180501_change_data_capture.md, line 62 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
I had a conversation with Nathan about this and realized that it's not possible to do what I was initially thinking. I've updated this with the new ordering guarantee docs/RFCS/20180501_change_data_capture.md, line 81 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
Turns out we can't guarantee that without kafka transactions. Updated this to what we can actually guarantee docs/RFCS/20180501_change_data_capture.md, line 91 at r1 (raw file): Previously, danhhz (Daniel Harrison) wrote…
I have gone on a journey and discovered that logical decoding and logical replication are different things! I've updated this with a discussion of why we invented our own syntax, though I don't have strong feelings on this, so if anyone has input I'm all ears. Also switched docs/RFCS/20180501_change_data_capture.md, line 94 at r1 (raw file): Previously, knz (kena) wrote…
Done, except I've moved column whitelist to the unresolved section because we don't yet have the distsql infrastructure to make it useful for gdpr and I need to cut scope where I can docs/RFCS/20180501_change_data_capture.md, line 103 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Cool. In that case, I'm going to remove this TODO docs/RFCS/20180501_change_data_capture.md, line 203 at r1 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
That's a cool idea, though I'm inclined to follow precedent of the existing docs/RFCS/20180501_change_data_capture.md, line 203 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
I added both to mirror the ones we have for sql processing, but perhaps this is overkill. docs/RFCS/20180501_change_data_capture.md, line 222 at r1 (raw file): Previously, a-robinson (Alex Robinson) wrote…
Reworked this (and split the details into the reference-level explanation). I still don't feel like the story here is totally figured out tbh docs/RFCS/20180501_change_data_capture.md, line 222 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Reworked this (and split the details into the reference-level explanation). I still don't feel like the story here is totally figured out tbh docs/RFCS/20180501_change_data_capture.md, line 248 at r1 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
"refactoring processors is not difficult" seems to understate the overhead and technical debt of backwards compatibility of changing the processor spec protos. However, if I don't get this exactly right the first time, we end up in the same place, anyway. I defer to y'alls judgement; 1 processor it is. docs/RFCS/20180501_change_data_capture.md, line 311 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 323 at r1 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
I added some detail about the catch-up scan. Mentioned larger range sizes, but not sure that anything more complicated than reusing ExportRequest is going to make it in 2.1 docs/RFCS/20180501_change_data_capture.md, line 328 at r1 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Nathan explained offline that our latest thinking has been to push the open intent tracking and the catch up scan downstream of the docs/RFCS/20180501_change_data_capture.md, line 473 at r1 (raw file): Previously, knz (kena) wrote…
It's not clear to me yet how backup/restore of changefeeds should work. I've left this in unresolved questions docs/RFCS/20180501_change_data_capture.md, line 209 at r2 (raw file): Previously, knz (kena) wrote…
This key/value is the encoded key/value (with json or avro) not our raw key values. Good call on the table name, that seems useful docs/RFCS/20180501_change_data_capture.md, line 248 at r2 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Agreed. It's way too easy for this to surprise the user in a bad way. I do think it's reasonable for this to be an option in the user's production runbook if they won't be able to bring the sink back in time (and we should document that) docs/RFCS/20180501_change_data_capture.md, line 260 at r2 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 300 at r2 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Happy to talk about this in detail offline, but the summary is we can work around it but we'll have bad tail latencies until it's fixed. But this is certainly not the only thing affecting tail latencies, so I don't know yet if it will be the bottleneck. So far, I don't think there's distsql work to be done for cdc in 2.1 but we'll probably want to improve this at some point docs/RFCS/20180501_change_data_capture.md, line 313 at r2 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
Yeah, seems related. Is that something you're planning on building? docs/RFCS/20180501_change_data_capture.md, line 316 at r2 (raw file): Previously, jordanlewis (Jordan Lewis) wrote…
Yeah, it probably happens to do the right thing now, but we'll need better guarantees (and tests) if we want users to rely on this docs/RFCS/20180501_change_data_capture.md, line 483 at r2 (raw file): Previously, arjunravinarayan (Arjun Narayan) wrote…
timestamp plus close notifications. check out the guide-level explanation and let me know if it doesn't make sense Comments from Reviewable |
|
Reviewed 1 of 1 files at r3. docs/RFCS/20180501_change_data_capture.md, line 40 at r3 (raw file):
There's a practical consideration here. Say a user enables changefeeds over a very, very large table. So there are, say, millions of rows at that timestamp, and the CDC sink is not particularly fast. This initial scan may take a long time, perhaps more than a day. In the meantime, the database is perhaps active with lots of new data but also data deletions. How do you propose to track these changes while the backscan at the initial timestamp is running? In particular ensure that none of this intermediate events are lost when the GC window for them expires? docs/RFCS/20180501_change_data_capture.md, line 161 at r3 (raw file):
"The" docs/RFCS/20180501_change_data_capture.md, line 291 at r3 (raw file):
"failures should be relatively infrequent" - I'd like to challenge this: spencer has been harping on the "node failure should be considered a common occurrence" story line sufficiently long that it ostensibly clashes with the assumption here. docs/RFCS/20180501_change_data_capture.md, line 559 at r3 (raw file):
I'm hestitant to say "this RFC is good to go" when there are unresolved questions still :) Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 33 unresolved discussions, all commit checks successful. docs/RFCS/20180501_change_data_capture.md, line 107 at r3 (raw file):
This introduces two new (non-reserved, I assume) keywords. I'm fine with the new
docs/RFCS/20180501_change_data_capture.md, line 192 at r3 (raw file):
Remove this TODO unless you're planning to do it. Comments from Reviewable |
|
Review status: all files reviewed at latest revision, 35 unresolved discussions, all commit checks successful. docs/RFCS/20180501_change_data_capture.md, line 107 at r3 (raw file): Previously, bdarnell (Ben Darnell) wrote…
This reminds me that SQL has a keyword that's already dedicated to this pattern: INTO Comments from Reviewable |
Follower reads are consistent reads at historical timestamps from follower replicas. They make the non-leader replicas in a range suitable sources for historical reads. The key enabling technology is the propagation of a **closed timestamp heartbeats** from the range leaseholder to followers. The closed timestamp heartbeat (CT heartbeat) is more than just a timestamp. It is a set of conditions, that if met, guarantee that a follower replica has all state necessary to satisfy reads at or before the CT. Consistent historical reads are useful for analytics queries and in particular allow such queries to be carried out more efficiently and, with appropriate configuration, away from foreground traffic. But historical reads are also key to a proposal for [reference-like tables](cockroachdb#26301) aimed at cutting down on foreign key check latencies particularly in geo-distributed clusters; they help recover a reasonably recent consistent snapshot of a cluster after a loss of quorum; and they are one of the ingredients for [Change Data Capture](cockroachdb#25229). Release note: None
|
I think I've responded to all the comments, but Reviewable is acting up again, so please ping me if I missed one. @nvanbenschoten and @tschottdorf, mind taking another pass? I was planning on waiting for your LGTMs too before merging The user facing timestamp is a "resolved" timestamp under our new nomenclature, so I changed it here to avoid confusing our future selves. I also removed the use of Review status: 0 of 1 files reviewed at latest revision, 35 unresolved discussions, some commit checks pending. docs/RFCS/20180501_change_data_capture.md, line 40 at r3 (raw file): Previously, knz (kena) wrote…
There's not much we can do besides buffer them, which brings it's own problems. I've added a note about this in the docs/RFCS/20180501_change_data_capture.md, line 107 at r3 (raw file): Previously, knz (kena) wrote…
INTO sgtm docs/RFCS/20180501_change_data_capture.md, line 161 at r3 (raw file): Previously, knz (kena) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 192 at r3 (raw file): Previously, bdarnell (Ben Darnell) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 291 at r3 (raw file): Previously, knz (kena) wrote…
I think the conclusion here is still okay, but I reworded it a bit. docs/RFCS/20180501_change_data_capture.md, line 559 at r3 (raw file): Previously, knz (kena) wrote…
All of these fall under "not initially necessary and in the interest of cutting scope, we aren't thinking about them until at least 2.1" Comments from Reviewable |
|
Reviewed 1 of 1 files at r4. docs/RFCS/20180501_change_data_capture.md, line 248 at r2 (raw file): Previously, danhhz (Daniel Harrison) wrote…
Share Nathan and Dan's opionions, we could stall GC but definitely let's not. docs/RFCS/20180501_change_data_capture.md, line 483 at r2 (raw file): Previously, danhhz (Daniel Harrison) wrote…
You need the transaction IDs in all committed values or you're already toast during the catch-up scan (which may also be required during regular operation of a change feed). docs/RFCS/20180501_change_data_capture.md, line 43 at r4 (raw file):
Perhaps clarify to "at or after". docs/RFCS/20180501_change_data_capture.md, line 108 at r4 (raw file):
Probably unrelated question, but docs/RFCS/20180501_change_data_capture.md, line 126 at r4 (raw file):
This shouldn't be Kafka specific (since otherwise it belongs in the query string). Probably just s/Kafka//? docs/RFCS/20180501_change_data_capture.md, line 157 at r4 (raw file):
s/stop/remove/, docs/RFCS/20180501_change_data_capture.md, line 179 at r4 (raw file):
Link to the repo (unless this is a different toy example called MovR). docs/RFCS/20180501_change_data_capture.md, line 185 at r4 (raw file):
authoritative docs/RFCS/20180501_change_data_capture.md, line 237 at r4 (raw file):
"and are sharded" docs/RFCS/20180501_change_data_capture.md, line 343 at r4 (raw file):
Not to let that hold back the merge, but that TODO should be figured out. docs/RFCS/20180501_change_data_capture.md, line 412 at r4 (raw file):
RangeFeed here and elsewhere in this section? docs/RFCS/20180501_change_data_capture.md, line 420 at r4 (raw file):
The range feed will simply stream intents upstream (which means these intents may or may not commit or move or whatever), and the distributed architecture has to deduplicate/track/resolve them, correct? I feel that that should be made really clear at the beginning of the section to set the stage (even though an individual section is dedicated to it below). Apologies if I missed it somewhere up there. docs/RFCS/20180501_change_data_capture.md, line 444 at r4 (raw file):
bootstrapped Comments from Reviewable |
|
Review status: docs/RFCS/20180501_change_data_capture.md, line 412 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
+1, I'm making this change in https://github.com/cockroachdb/cockroach/blob/master/docs/RFCS/20170613_change_feeds_storage_primitive.md as well, so they won't get out of sync. docs/RFCS/20180501_change_data_capture.md, line 420 at r4 (raw file):
Yes, that's correct. Comments from Reviewable |
Change Data Capture (CDC) provides efficient, distributed, row-level change subscriptions. CockroachDB is an excellent system of record, but no technology exists in a vacuum. Users would like to keep their data mirrored in full-text indexes, analytics engines, and big data pipelines, among others. A pull model can currently be used to do this, but it’s inefficient, overly manual, and doesn’t scale; the push model described in this RFC addresses these limitations. Release note: None
|
Thanks for the reviews, everyone! Merging on green 🙌 Review status: docs/RFCS/20180501_change_data_capture.md, line 483 at r2 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
This is talking about reconstructing them from the user's perspective, which is just grouping by timestamp (which is not perfectly accurate if two transactions have exactly the same hlc timestamp) docs/RFCS/20180501_change_data_capture.md, line 43 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 108 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
tbh I'm never entirely sure until I make the grammar change and try it, but I don't think so. they don't need to be quoted when making the partition docs/RFCS/20180501_change_data_capture.md, line 126 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 157 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 179 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
looks like the repo isn't public yet. in the meantime, added "fictional" docs/RFCS/20180501_change_data_capture.md, line 185 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 237 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 343 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Oops. I did check on this and just forgot to remove the TODO docs/RFCS/20180501_change_data_capture.md, line 412 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Good call. Done docs/RFCS/20180501_change_data_capture.md, line 420 at r4 (raw file): Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Done. docs/RFCS/20180501_change_data_capture.md, line 444 at r4 (raw file): Previously, tschottdorf (Tobias Schottdorf) wrote…
Done. Comments from Reviewable |
|
bors r=bdarnell,nvanbenschoten,tschottdorf |
25229: rfc: Change Data Capture r=bdarnell,nvanbenschoten,tschottdorf a=danhhz Change Data Capture (CDC) provides efficient, distributed, row-level change subscriptions. CockroachDB is an excellent system of record, but no technology exists in a vacuum. Users would like to keep their data mirrored in full-text indexes, analytics engines, and big data pipelines, among others. A pull model can currently be used to do this, but it’s inefficient, overly manual, and doesn’t scale; the push model described in this RFC addresses these limitations. Co-authored-by: Daniel Harrison <daniel.harrison@gmail.com>
Build succeeded |
|
Review status: docs/RFCS/20180501_change_data_capture.md, line 483 at r2 (raw file): Previously, danhhz (Daniel Harrison) wrote…
Ack. But you're reconstructing sets of transactional writes, if that feature ever becomes relevant I'd make sure to call out the distinction because users will likely need to worry about the difference in some use cases. Comments from Reviewable |
|
@danhhz overall looks good although as we don't have Kafka in our stack it's useless for us until another sink(s) is/are implemented. I have a question regarding to multi-provider deployments. We started with a 1 region k8s cluster (US - GCP GKE) with CockroachDB on it but we plan to extend this to another region/provider (EU - AWS EKS). Is it possible to capture changes of one partition and propagate them to a service which is located as closest as possible? I mean, where is the change data capturer located, is this process executed close to the region so that traffic does not cross between regions? In this case it would be EU - AWS with Amazon SQS as a sink and US - GCP GKE with Google Pub Sub as a sink (our use case does not require ordering just distinguish between old and new values). This would be enough to achieve what I explained? Thanks for making this happen! /cc @kannanlakshmi |
Follower reads are consistent reads at historical timestamps from follower replicas. They make the non-leader replicas in a range suitable sources for historical reads. The key enabling technology is the propagation of a **closed timestamp heartbeats** from the range leaseholder to followers. The closed timestamp heartbeat (CT heartbeat) is more than just a timestamp. It is a set of conditions, that if met, guarantee that a follower replica has all state necessary to satisfy reads at or before the CT. Consistent historical reads are useful for analytics queries and in particular allow such queries to be carried out more efficiently and, with appropriate configuration, away from foreground traffic. But historical reads are also key to a proposal for [reference-like tables](cockroachdb#26301) aimed at cutting down on foreign key check latencies particularly in geo-distributed clusters; they help recover a reasonably recent consistent snapshot of a cluster after a loss of quorum; and they are one of the ingredients for [Change Data Capture](cockroachdb#25229). Release note: None
26362: RFC: follower reads r=bdarnell,nvanbenschoten a=tschottdorf NB: this is extracted from #21056; please don't add new commentary on the tech note there. ---- Follower reads are consistent reads at historical timestamps from follower replicas. They make the non-leader replicas in a range suitable sources for historical reads. The key enabling technology is the propagation of **closed timestamp heartbeats** from the range leaseholder to followers. The closed timestamp heartbeat (CT heartbeat) is more than just a timestamp. It is a set of conditions, that if met, guarantee that a follower replica has all state necessary to satisfy reads at or before the CT. Consistent historical reads are useful for analytics queries and in particular allow such queries to be carried out more efficiently and, with appropriate configuration, away from foreground traffic. But historical reads are also key to a proposal for [reference-like tables](#26301) aimed at cutting down on foreign key check latencies particularly in geo-distributed clusters; they help recover a reasonably recent consistent snapshot of a cluster after a loss of quorum; and they are one of the ingredients for [Change Data Capture](#25229). Release note: None 27699: storage: fix stopper race in compactor r=petermattis a=tschottdorf Starting workers without a surrounding task is unfortunately often not the right thing to do when the worker accesses other state that might become invalidated once the stopper begins to stop. In this particular case, the compactor might end up accessing the engine even though it had already been closed. I wasn't able to repro this failure in the first place, but pretty sure this: Fixes #27232. Release note: None 27704: issues: fix email fallback r=petermattis a=tschottdorf This was not my email address. Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
Change Data Capture (CDC) provides efficient, distributed, row-level
change subscriptions.
CockroachDB is an excellent system of record, but no technology exists
in a vacuum. Users would like to keep their data mirrored in full-text
indexes, analytics engines, and big data pipelines, among others. A pull
model can currently be used to do this, but it’s inefficient, overly
manual, and doesn’t scale; the push model described in this RFC
addresses these limitations.