Skip to content

cdc,opt: optimizer support for changefeeds #31214

@danhhz

Description

@danhhz

Related to #30723.

Some of how this all will work is a bit underspecified, so feel free to punt it back to me. I just wanted to make sure it was on your radar early in the cycle.

The enterprise version is being released in 2.1. The CDC RFC is a pretty good place to start for background.

In 2.2, we're planning to release the non-enterprise version of changefeeds. The exact syntax and output are still being worked out, but a user will issue a statement to follow a table (or set of tables, or a partition, or a range of primary keys on a table) and all changes will be streamed back inline until the statement is closed by the client (or if the table is dropped, etc). (This is all very similar to RethinkDB feeds, if you're familiar with them.) Something like the following:

$ CHANGEFEED FOR TABLE foo;
table | key | value
----------------------------
foo   | [1] | {a: 1, b: bar}
foo   | [2] | {a: 2, b: baz}
foo   | [1] | {a: 1, b: qux}
<continues until cancelled>

In RethinkDB, this works with the normal query filters, etc. If we decide we want to do the same (e.g. SELECT * FROM [CHANGEFEED FOR TABLE foo, bar] WHERE value->b = 'qux') then I assume we'll need first class opt support for changefeeds. RethinkDB feeds are vocally loved by their community, so I think it's worth looking at what they got right that people liked so much.

Thoughts:

  • Since the feed continues forever, something like GROUP BY or ORDER BY could never return. Presumably we'll have to disallow them.
  • CHANGEFEEDs work via a sql.PlanHook so the implementation can stay in the ccl tree. Not sure how this plays with opt.
  • CHANGEFEEDs currently do their own physical planning because it's a little different between the enterprise and the non-enterprise versions. It may also become a little more complicated in 2.2 since one of the features we're thinking about adding will tack on a new processor to the end of the logical plan. IIUC, opt is going to be doing some of our planning work in the future. I don't particularly want to be doing my own physical planning in changefeeds, but for anything else (opt) to do it, we may have to invent some ccl-hooks.

Jira issue: CRDB-4800

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-cdcChange Data CaptureC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)T-cdc

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions