Skip to content

spanconfig: introduce the spanconfig.SQLWatcher#69661

Closed
arulajmani wants to merge 1 commit intocockroachdb:masterfrom
arulajmani:spanconfigs-sql-watcher
Closed

spanconfig: introduce the spanconfig.SQLWatcher#69661
arulajmani wants to merge 1 commit intocockroachdb:masterfrom
arulajmani:spanconfigs-sql-watcher

Conversation

@arulajmani
Copy link
Copy Markdown
Collaborator

@arulajmani arulajmani commented Aug 31, 2021

This patch introduces a few more span config interfaces/structs, namely
the SQLReconciler, SQLWatcher, and an eventBuffer helper struct for the
SQLWatcher.

The SQLReconciler is responsible for reconciling SQL descriptors and
their zone configurations to span configurations. It simply constructs
the implied span configuration state and is agnostic to the actual span
configuration state in KV.

The SQLWatcher is intended to incrementally process zone configuration
updates in the SQL layer. It does so by establishing a rangefeed on
system.zones and system.descriptors from a given checkpoint. It buffers
events until the next checkpoint. The list of buffered descriptor IDs,
along with the new checkpoint, are passed to a handler callback.

The SQLWatcher uses an eventBuffer helper struct to buffer rangefeed
updates. This thing maintains the notion of a checkpoint timestamp,
which is minimum timestamp for a {zones, descriptor} event.

This flow doesn't quite work end to end yet because rangefeeds don't
expose checkpoint events yet. These interfaces aren't hooked up to
anything yet either.

References #67679

Release justification: non-production code changes
Release note: None

@arulajmani arulajmani requested review from a team as code owners August 31, 2021 18:29
@arulajmani arulajmani requested a review from a team August 31, 2021 18:29
@arulajmani arulajmani requested a review from a team as a code owner August 31, 2021 18:29
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@arulajmani arulajmani marked this pull request as draft August 31, 2021 18:31
@arulajmani arulajmani removed request for a team August 31, 2021 18:31
@arulajmani arulajmani force-pushed the spanconfigs-sql-watcher branch from 92a3a47 to f02635e Compare August 31, 2021 18:51
@rytaft
Copy link
Copy Markdown
Collaborator

rytaft commented Aug 31, 2021


pkg/sql/opt/optbuilder/mutation_builder.go, line 654 at r1 (raw file):

// for any that are computed and do not yet have values provided by the input
// expression. New columns are synthesized for any missing columns using the
// computed column expression	.

nit: stray whitespace

@arulajmani arulajmani force-pushed the spanconfigs-sql-watcher branch 5 times, most recently from ddae485 to 8840e09 Compare September 2, 2021 03:20
@arulajmani arulajmani marked this pull request as ready for review September 2, 2021 03:21
Copy link
Copy Markdown
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general structure here is coming together and this is ready for some eyes. I have everything in a single commit right now but I can break apart the rangefeed setup and full reconciliation into separate commits if it'll make things easier to review.

I still need to add a configuration that runs the data driven tests for secondary tenants as some of the stuff around named zones is tenant specific.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @irfansharif, @nvanbenschoten, and @RichardJCai)

@arulajmani arulajmani changed the title [WIP] spanconfigs: introduce the spanconfig.SQLWatcher spanconfig: introduce the spanconfig.SQLWatcher Sep 2, 2021
@arulajmani arulajmani force-pushed the spanconfigs-sql-watcher branch from 8840e09 to c22533f Compare September 2, 2021 03:43
@arulajmani
Copy link
Copy Markdown
Collaborator Author

Friendly ping on this one now that we're all back from breather week @irfansharif @ajwerner @nvanbenschoten

@irfansharif
Copy link
Copy Markdown
Contributor

Will take a look today, catching up.

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a clear vision on how checkpointing is going to fit in. I was sort of expecting the event API to involve timestamps and batches of updates in some form or another.

Reviewed 4 of 23 files at r1, 5 of 23 files at r2, 1 of 1 files at r3, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @arulajmani, @irfansharif, @nvanbenschoten, and @RichardJCai)


-- commits, line 10 at r3:
uber-nit: "if any at all"?


-- commits, line 13 at r3:
abiilty


.github/CODEOWNERS, line 96 at r3 (raw file):

/pkg/ccl/backupccl/          @cockroachdb/bulk-prs
/pkg/ccl/importccl/          @cockroachdb/bulk-prs
/pkg/ccl/spanconfigccl/      @cockroachdb/multiregion

🤔


pkg/ccl/spanconfigccl/testdata/databases, line 42 at r3 (raw file):

constraints: []
voterconstraints: []
leasepreferences: []

these might be easier to read if you omitted empty values.


pkg/ccl/spanconfigccl/testdata/full_reconciliation_named_zones_deleted, line 44 at r3 (raw file):


rangeminbytes: 134217728
rangemaxbytes: 536870912

you know what, what if you formatted it differently so it's like a list of configuration literals and then a mapping from spans to those literals. This here is too hard to visually differentiate.


pkg/spanconfig/spanconfig.go, line 37 at r3 (raw file):

// SQLWatcher emits SQL span configuration updates.
type SQLWatcher interface {

I think this would be better phrased as:

SQLWatcher emits span configs that might come from updates made to zone configurations at the SQL layer.

pkg/spanconfig/spanconfig.go, line 38 at r3 (raw file):

// SQLWatcher emits SQL span configuration updates.
type SQLWatcher interface {
	WatchForSQLUpdates(ctx context.Context) (<-chan Update, error)

This needs commentary. How do these two methods interact? (I imagine they don't). I'm surprised to not see timestamps anywhere on here. Please fill out the description of the semantics here.


pkg/spanconfig/spanconfig.go, line 61 at r3 (raw file):

	Entry roachpb.SpanConfigEntry
	// Deleted is true if the span config entry has been deleted.
	Deleted bool

It feels like this might be a bit cleaner as

type Update struct {
    Span    roachpb.Span
    Config  roachpb.SpancConfig
    Deleted bool
}

pkg/spanconfig/spanconfigsqlwatcher/sql_watcher.go, line 312 at r3 (raw file):

Quoted 10 lines of code…
	if err := sql.DescsTxn(
		ctx,
		s.execCfg,
		func(ctx context.Context, txn *kv.Txn, descsCol *descs.Collection) error {
			var err error
			updates, err = s.handleIDUpdate(ctx, id, txn, descsCol)
			return err
		}); err != nil {
		return err
	}

extreme nit: I've taken to wrapping these like:

	if err := sql.DescsTxn(ctx, s.execCfg, func(
		ctx context.Context, txn *kv.Txn, descsCol *descs.Collection,
	) (err error) {
 		updates, err = s.handleIDUpdate(ctx, id, txn, descsCol)
 		return err
 	}); err != nil {
		return err
 	}

however, while I'm here, this feels like the perfect seam for an interface boundary. I think that I would have done something like:

type Reconciler interface {
     Reconcile(context.Context, descpb.IDs) ([]spanconfig.Update, error)
}

That should make it much easier to test this logic, which is super stateful in terms of events and what not and that logic which is really a function of the database. Consider splitting these two areas apart.


pkg/sql/catalog/descs/collection.go, line 286 at r3 (raw file):

// the transaction inside the database referenced by the given database ID. It
// first checks the collections cached descriptors before defaulting to a key-value scan.
func (tc *Collection) GetAllTableDescriptorsInDatabase(

Note that it does not hydrate the descriptors. The alternative here is to fetch the IDs and then the descriptors. I was thinking just today that I wished namespace had type information.

Copy link
Copy Markdown
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew, that was hefty! Thanks for powering through, I'm glad to see these pieces coming together. This is in much better shape than our earlier attempts -- separating out a Translator was the right abstraction to carve out and the datadriven tests are much more readable. I've left some comments below for both subsystems ({Watcher,Translator}), but I think it would also make sense separating them out into individual PRs as they're mostly independent and a lot of code. I also think we're pretty close to finish line, so I'll defer to you -- I'm cool with working through with just this one PR.

Reviewed 2 of 23 files at r1, 1 of 23 files at r2, 7 of 38 files at r5, 1 of 3 files at r6, 9 of 9 files at r7, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @nvanbenschoten, and @RichardJCai)


-- commits, line 4 at r7:
Could delete this first paragraph.


-- commits, line 8 at r7:
Something's off with "IDs corresponding to descriptors and zones to .."


-- commits, line 15 at r7:
s/to do so/timestamp.


-- commits, line 21 at r7:
Typo: whcih


pkg/ccl/spanconfigccl/testdata/full_translate, line 131 at r7 (raw file):

                               num_voters=5
-----------------------
/Table/4{5-6}                  ttl_seconds=7200

Is this expected?


pkg/spanconfig/spanconfig.go, line 42 at r4 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

We're using the startTS as the initial timestamp to establish the rangefeeds over system.zones and system.descriptor. I'm not super familiar with this stuff, but looks like this eventually ends up here: https://github.com/arulajmani/cockroach/blob/c4de73f72136e774ec887f3cbc0dda6c37d60dab/pkg/kv/kvserver/replica_rangefeed.go#L171. This comment seems to suggest that the startTS is exclusive. Is my understanding correct?

I think we should be okay if this is exclusive because the resolvedTS we use when checkpointing guarantees that no events less than or equal to the resolvedTS will be observed (from https://github.com/arulajmani/cockroach/blob/30ab2e1ea8a7c862270bcad56bddcda02bab95c9/pkg/roachpb/api.pb.go#L7157).

Hm, I think it's an inclusive timestamp. See:

// Mark the data as occurring at the initial timestamp, which is the
// timestamp at which it was read.
v.Value.Timestamp = f.initialTimestamp

if err := f.client.Scan(ctx, f.span, f.initialTimestamp, scan); err != nil {


pkg/spanconfig/spanconfig.go, line 39 at r7 (raw file):

}

// SQLWatcher can be used to watch for events on system.zones and

We probably want some commentary here around how these timestamps are expected to be stitched together ("do a full translate, get that timestamp, use it for the incremental watcher"). Maybe even pulling that interface definition up above here.


pkg/spanconfig/spanconfig.go, line 56 at r7 (raw file):

	// to WatchForSQLUpdates. The previous checkpointTS above is the same as
	// startTS the very first time handle is called.
	WatchForSQLUpdates(ctx context.Context, startTS hlc.Timestamp, handle SQLWatcherHandleFunc) error

I'd prefer just inlining the definition of SQLWatcherHandleFunc given the commentary above that refers to arguments/types in the handler function type. It obscures things a bit in this current form.


pkg/spanconfig/spanconfig.go, line 60 at r7 (raw file):

	// Close closes the rangefeeds and asynchronous tasks created by the
	// SQLWatcher and waits for them to shut down. Close is idempotent.
	Close()

We don't need a Close, certainly not on the interface itself. The concrete impl of the SQLWatcher is going to be instantiated using a Stopper. It suffices to use stopper.AddCloser(<whatever>) within WatchForSQLUpdates.

type SQLWatcher struct {
        // ...
	cancel    context.CancelFunc
	stopped   chan struct{}
        // ...
}

These fields can go away; I don't think we need them given we already have a stopper in context within WatchForSQLUpdates. I think {zones,descriptors}RF can also go away.


pkg/spanconfig/spanconfig.go, line 65 at r7 (raw file):

// SQLWatcherHandleFunc is the type of the handler function expected by the
// SQLWatcher.
type SQLWatcherHandleFunc func(ids []SQLWatcherEvent, checkpointTS hlc.Timestamp) error

Rename "ids".


pkg/spanconfig/spanconfig.go, line 71 at r7 (raw file):

type SQLWatcherEvent struct {
	// ID of the descriptor/zone that has changed.
	ID descpb.ID

We could just embed these fields, they'd still be accessible via


pkg/spanconfig/spanconfig.go, line 85 at r7 (raw file):

	// Translate generates the implied span configuration state given a list of
	// {descriptor, named zone} IDs. No entry is returned for an ID if it doesn't
	// exist or has been dropped.

Why is this taking just IDs? Don't we want it to take an ID+DescriptorType tuple? Perhaps we should rename that type to something less tied to just SQLWatcher.

This would simplify the concrete impl -- we'd only need to retrieve the set of child descriptors when we're dealing with a database -- the whole "affected IDs" business would all go away.


pkg/spanconfig/spanconfig.go, line 85 at r7 (raw file):

	// Translate generates the implied span configuration state given a list of
	// {descriptor, named zone} IDs. No entry is returned for an ID if it doesn't
	// exist or has been dropped.

Silently dropping updates for deleted tables/descriptors feels like a surprising caveat; I'm curious to hear how @nvanbenschoten feels but I think we can do better without giving up any readability.

We could tighten the interface to expect descriptors that definitely exist (checked txn-ally at the caller), and error out if we get the unexpected. Having the caller verify that the IDs exist would mean either threading in a txn as part of the interface, or have each concrete impl of the interface instantiated around a new txn (we probably want the former).

Speaking of dropped descriptors - don't we need to be checking for that explicitly ... somewhere? We want to issue KVAccessor deletes over those span configs, right? I don't think silently skipping over those IDs helps us, that again sounds like something we'd want to be doing at the caller ("generate a deletion spanconfig.Update for a given table ID"). Spelling how we expect that flow to work would help quite a bit.


pkg/spanconfig/spanconfig.go, line 86 at r7 (raw file):

	// {descriptor, named zone} IDs. No entry is returned for an ID if it doesn't
	// exist or has been dropped.
	Translate(ctx context.Context, ids descpb.IDs) ([]roachpb.SpanConfigEntry, error)

Talk a bit about the fact that the method recurses into child schema objects. I feel all this would be a lot clearer+cleaner if we coalesce these two methods into just one. FullTranslate is just Translate(RANGE DEFAULT). I understand why we've separated it here -- we expect the reconciler to FullTranslate first, take the timestamp, set up the SQLWatcher, and then use SQLWatcherEvents and Translate each one. But I think we're letting the caller expand our interface, without any real benefit. I would much rather we reduce the vocabulary exposed in this package to just "translate", instead of throwing "full translate" into the mix.


pkg/spanconfig/spanconfigsqlwatcher/sql_watcher.go, line 69 at r7 (raw file):

// WatchForSQLUpdates is part of the spanconfig.SQLWatcher interface.
func (s *SQLWatcher) WatchForSQLUpdates(

The implementation here is a bit channel heavy, which comes with all the usual downsides of channels. I think there's some room to simplify -- it seems we're using channels to glue together the callback friendly rangefeed library and this high-level API, which also uses callbacks. With a bit of massaging, I think we can get to the following:

var eb buffer
for _, tbl := range []{system.descriptors, system.zones} {
    onevent := func(event) { eb.record(event) }
    oncheckpoint := func(checkpoint) {
        events, timestamp := eb.checkpoint(tbl) // subsumes "flush"
        handle(events, timestamp)
    }
    s.watch(startts, onevent, oncheckpoint)
}

I'll skip over leaving comments for the event buffer impl, I'm curious to see if we're able to simplify with the suggestions here.


pkg/spanconfig/spanconfigsqlwatcher/sql_watcher.go, line 70 at r7 (raw file):

// WatchForSQLUpdates is part of the spanconfig.SQLWatcher interface.
func (s *SQLWatcher) WatchForSQLUpdates(
	ctx context.Context, timestamp hlc.Timestamp, handle spanconfig.SQLWatcherHandleFunc,

s/timestamp/startTS for parity with the interface definition.


pkg/spanconfig/spanconfigsqlwatcher/sql_watcher.go, line 75 at r7 (raw file):

	err := s.watchForDescriptorUpdates(ctx, timestamp, updatesCh)
	if err != nil {
		log.Warningf(ctx, "error establishing rangefeed over system.descritpors %v", err)

Return a wrapped error with this message instead of logging it here. Also, typo for "descritpors". [nit] For both this and below, we could use the following pattern to de-scope err:

if err := s.watchwhatever(); err != nil {
  return errors.Wrap("whatever", err)
}

pkg/spanconfig/spanconfigsqlwatcher/sql_watcher.go, line 169 at r7 (raw file):

		}
		if !value.IsPresent() {
			return

Is this possible?


pkg/spanconfig/spanconfigsqlwatcher/sql_watcher.go, line 184 at r7 (raw file):

		}
		if descriptor.Union == nil {
			return

What does it mean if we hit this codepath?


pkg/spanconfig/spanconfigsqlwatcher/sql_watcher_test.go, line 150 at r7 (raw file):

		require.NoError(t, err)

		<-done

I think we want a SucceedsSoon or something here instead of blocking indefinitely -- this test will take forever to fail if there's a bug.


pkg/spanconfig/spanconfigsqltranslator/sql_translator.go, line 34 at r7 (raw file):

// SQLTranslator is the concrete implementation of spanconfig.SQLTranslator.
type SQLTranslator struct {

I'll take a closer look at the impl below after resolving some of the comments above. By threading in the type of the ID we're translating, and by expecting the ID to definitively exist, I think we can simplify a lot of what's below.

Copy link
Copy Markdown
Collaborator Author

@arulajmani arulajmani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was a quick review turnaround, thanks for powering through this! Like you mentioned, if I were to start from scratch, I'd probably split this into 2 (or maybe 3!) separate PRs. I'm happy to split it now and reference this PR so that some of the discussion can be linked back.

I've replied to some of your high level comments in-line. Let's hash those out on this PR? For the rest I'll address them as I split this into individual PRs.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @irfansharif, @nvanbenschoten, and @RichardJCai)


pkg/ccl/spanconfigccl/datadriven_test.go, line 92 at r6 (raw file):

Previously, irfansharif (irfan sharif) wrote…

This is all boilerplate code that already exists in sqlutils.RowsToDataDrivenOutput, lets use that instead.

TIL! Thanks for the pointer


pkg/ccl/spanconfigccl/testdata/databases, line 42 at r3 (raw file):

I think it's much clearer to have all these tests simply operate with just one field (num_replicas) and ensure that various spans have that one field correctly set. It's just as expressive as what you're doing, and a lot more concise.

Note that what I have here is basically a superset of what you're proposing. I'd rather not limit how these tests can be written (now, and in the future) unless there is a compelling reason to do it.


pkg/ccl/spanconfigccl/testdata/databases, line 31 at r6 (raw file):

Previously, irfansharif (irfan sharif) wrote…

We can make these datadriven tests a lot more ergonomic by skipping these raw IDs altogether. Something like:

translate table=t1
translate database=db
# or translate desc=<name>

In our go test could we would then look up the relevant descriptor ID from system.namespace. It simplifies these datadriven tests to the minimum we care about: "what does a given descriptor translate to with respect to span configs". Right now readers need to scan above to make sure the ID is for a specific descriptor, which is a bit cumbersome.

That's a fair point. I'll make the switch.


pkg/ccl/spanconfigccl/testdata/databases, line 35 at r6 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Drop these "------" separators, lets inline all the diffed fields into a single line. It could read something like:

/Table/5{3-4}                  num_replicas=7 num_voters=5
/Table/5{7-8}                  num_replicas=7 num_voters=7

It'll help with the readability.

How strongly do you feel about this? I personally find what I have here more readable, but I'm happy to change it if you have a strong objection :)


pkg/spanconfig/spanconfig.go, line 39 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

We probably want some commentary here around how these timestamps are expected to be stitched together ("do a full translate, get that timestamp, use it for the incremental watcher"). Maybe even pulling that interface definition up above here.

I agree, but I think this belongs on the Reconciler interface from the sketch I posted on #zone-configs a few weeks ago. I think I had it there, but if I didn't, I'll be sure to add it when the PR that actually stitches it together comes out.


pkg/spanconfig/spanconfig.go, line 60 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

We don't need a Close, certainly not on the interface itself. The concrete impl of the SQLWatcher is going to be instantiated using a Stopper. It suffices to use stopper.AddCloser(<whatever>) within WatchForSQLUpdates.

type SQLWatcher struct {
        // ...
	cancel    context.CancelFunc
	stopped   chan struct{}
        // ...
}

These fields can go away; I don't think we need them given we already have a stopper in context within WatchForSQLUpdates. I think {zones,descriptors}RF can also go away.

Isn't the stopper in the context of the entire node? If so, I was imagining a case where you'd want to stop watching for SQLUpdates (shut down rangefeeds and the async task we create). I think this would be useful if there is an error in the execution of WatchForSQLUpdates. We don't want to shut the node down -- we probably want to handle that error somehow or bounce the job to another node.


pkg/spanconfig/spanconfig.go, line 85 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Silently dropping updates for deleted tables/descriptors feels like a surprising caveat; I'm curious to hear how @nvanbenschoten feels but I think we can do better without giving up any readability.

We could tighten the interface to expect descriptors that definitely exist (checked txn-ally at the caller), and error out if we get the unexpected. Having the caller verify that the IDs exist would mean either threading in a txn as part of the interface, or have each concrete impl of the interface instantiated around a new txn (we probably want the former).

Speaking of dropped descriptors - don't we need to be checking for that explicitly ... somewhere? We want to issue KVAccessor deletes over those span configs, right? I don't think silently skipping over those IDs helps us, that again sounds like something we'd want to be doing at the caller ("generate a deletion spanconfig.Update for a given table ID"). Spelling how we expect that flow to work would help quite a bit.

I talked to @ajwerner about this after our conversation the other week and he encouraged me to embed the transaction inside the interface instead of passing it through interface boundaries (unless we absolutely need it for correctness). He mentioned that this is something we try to do as much as possible in other places as well (though I don't think I have an example of this offhand).

I personally prefer this approach as well. Scoping the transaction to the interface makes it easy to reason about. For example, in this case we have control over the fact that this transaction is entirely read only. That's not necessarily true (or obvious) if this thing expects a transaction.

I also don't think it's much of a stretch to say that "if an ID doesn't exist there is no span config entry associated with it", especially if it's spelled out clearly.


pkg/spanconfig/spanconfig.go, line 85 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Why is this taking just IDs? Don't we want it to take an ID+DescriptorType tuple? Perhaps we should rename that type to something less tied to just SQLWatcher.

This would simplify the concrete impl -- we'd only need to retrieve the set of child descriptors when we're dealing with a database -- the whole "affected IDs" business would all go away.

I don't think threading in both the ID and descriptor type tuple here serves much benefit -- the translator can infer this information when it looks up the descriptor. It's also worth noting that the ID here can refer to a named zone and we currently don't have a vocabulary to talk about descriptor types in the context of named zones which don't have descriptors associated with them -- and that's a can that I'd rather not open. This is why the system.zones rangefeed populates its event with a catalog.Any type -- it doesn't know what the event is for, but even if it did, it doesn't have (or need) the vocabulary to express this if it is for a named zone.

Having this interface work on just IDs also makes it much more test friendly IMO.


pkg/spanconfig/spanconfig.go, line 86 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Talk a bit about the fact that the method recurses into child schema objects. I feel all this would be a lot clearer+cleaner if we coalesce these two methods into just one. FullTranslate is just Translate(RANGE DEFAULT). I understand why we've separated it here -- we expect the reconciler to FullTranslate first, take the timestamp, set up the SQLWatcher, and then use SQLWatcherEvents and Translate each one. But I think we're letting the caller expand our interface, without any real benefit. I would much rather we reduce the vocabulary exposed in this package to just "translate", instead of throwing "full translate" into the mix.

I'm looking at this a bit differently -- I think the fact that Translate(RANGE DEFAULT) generates the span configuration state for the entire tenant is an implementation detail that shouldn't leak above this interface. I see some benefit to this. Maybe FullTranslate isn't the most descriptive name, which is why you feel some vocabulary is being expanded. What if we called this something like TranslateEntireSQLZoneConfigState? Would you prefer that more?

Copy link
Copy Markdown
Contributor

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of our discussions here are veering into me-being-pedantic territory, so let's fight it out during our pod today. I do think they're all worth doing obviously, but likely I'm missing things.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @irfansharif, @nvanbenschoten, and @RichardJCai)


pkg/ccl/spanconfigccl/testdata/databases, line 42 at r3 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I think it's much clearer to have all these tests simply operate with just one field (num_replicas) and ensure that various spans have that one field correctly set. It's just as expressive as what you're doing, and a lot more concise.

Note that what I have here is basically a superset of what you're proposing. I'd rather not limit how these tests can be written (now, and in the future) unless there is a compelling reason to do it.

¯_(ツ)_/¯


pkg/ccl/spanconfigccl/testdata/databases, line 35 at r6 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

How strongly do you feel about this? I personally find what I have here more readable, but I'm happy to change it if you have a strong objection :)

Strongly, Translate's gig is to translate a descriptor into its constituent spans and configs -- it's much easier to see which spans are adjacent to one another and where the gaps are when there are no line breaks in between.


pkg/spanconfig/spanconfig.go, line 39 at r7 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I agree, but I think this belongs on the Reconciler interface from the sketch I posted on #zone-configs a few weeks ago. I think I had it there, but if I didn't, I'll be sure to add it when the PR that actually stitches it together comes out.

Ah, ok. I'll encourage you to fold in that commentary into these PRs cause it's necessary information to digest when reviewing these interfaces -- it would've helped me a bit.


pkg/spanconfig/spanconfig.go, line 60 at r7 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

Isn't the stopper in the context of the entire node? If so, I was imagining a case where you'd want to stop watching for SQLUpdates (shut down rangefeeds and the async task we create). I think this would be useful if there is an error in the execution of WatchForSQLUpdates. We don't want to shut the node down -- we probably want to handle that error somehow or bounce the job to another node.

Ah, gotcha, I don't think I was seeing us stop these watchers as part of regular control flow but that makes sense. Even still, I prefer to not add this kind of complexity until we actually need it. Let's strip it out until then.


pkg/spanconfig/spanconfig.go, line 85 at r7 (raw file):

the translator can infer this information when it looks up the descriptor

But my point is -- what if the descriptor doesn't exist? I don't really understand your hesitation around descriptor types for these named zones -- they're pseudo table IDs so it seems coherent to emit a table type (which we're emitting anyway? through the watcher over system.descriptors?).

I really feel the control flow of the Translate code would be a bit clearer if we switched on the type ("if named zone or table, translate as is. if database, iterate of contained tables and translate each") instead of accumulating an opaque "affected IDs" list. All of this is a follow-on from trying to work on this area earlier when supporting translation of named-zones: https://github.com/cockroachdb/cockroach/pull/68491/files#diff-1abf71b395edbb668dda7a9c7c86b971d546f90987b4c61884668999b8dad570


pkg/spanconfig/spanconfig.go, line 85 at r7 (raw file):
I wish I was there for the discussion, and happy to re-hash in person today. I do think this is adding complexity that we can avoid, and is making for a more lax interface when I think it's easier to work with the stricter ones.

Speaking of dropped descriptors - don't we need to be checking for that explicitly ... somewhere? We want to issue KVAccessor deletes over those span configs, right? I don't think silently skipping over those IDs helps us, that again sounds like something we'd want to be doing at the caller ("generate a deletion spanconfig.Update for a given table ID"). Spelling how we expect that flow to work would help quite a bit.

This is the complexity that I'm imagining a tighter interface would help ratify.


pkg/spanconfig/spanconfig.go, line 86 at r7 (raw file):

Previously, arulajmani (Arul Ajmani) wrote…

I'm looking at this a bit differently -- I think the fact that Translate(RANGE DEFAULT) generates the span configuration state for the entire tenant is an implementation detail that shouldn't leak above this interface. I see some benefit to this. Maybe FullTranslate isn't the most descriptive name, which is why you feel some vocabulary is being expanded. What if we called this something like TranslateEntireSQLZoneConfigState? Would you prefer that more?

I'm seeing it as Translate(RootNamespaceID), which, because it's "root", is the ancestor for everything. In the same way Translate(DatabaseID) would denormalize it to all its descendent tables/indexes/partitions. I don't think this is leaking anything in the way you're describing -- the inheritance feel entirely coherent to me, hence my recommendation.

irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 6, 2021
In both cockroachdb#69614 and cockroachdb#69661 we find ourselves reaching for a thin,
memory-bounded buffer to sit on top a rangefeed. We want to be able to
accumulate raw rangefeed events, and when checkpointed, flush out a
timestamp sorted, de-duplicated batch of events covered by said
checkpoint. Package rangefeedbuffer introduces such a datastructure.

If we're past the configured memory limit before having received a
checkpoint, we have little we can do -- since we receive checkpoints and
rangefeed values on the same stream, we're waiting for a future
checkpoint to flush out the accumulated memory. Given this, we simply
error out to the caller (expecting them to re-establish the rangefeed).

Release note: None

Co-authored-by: Arul Ajmani <arula@cockroachlabs.com>
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 7, 2021
This patch adds a `OnFrontierAdvance` option to the rangefeed libary.
This option allows users of this library to install a callback which
is called whenever the frontier is advanced.

The frontier is used to track the minimum resolvedTS accross all ranges
that the rangefeed span breaks down to. The intention is to allow users
of this library to be able to use this option to checkpoint the
rangefeed progress. The plan is to use it as such for cockroachdb#69661 and
 cockroachdb#69614.

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 7, 2021
This patch adds a `OnFrontierAdvance` option to the rangefeed libary.
This option allows users of this library to install a callback which
is called whenever the frontier is advanced (with the new frontier
timestamp).

The frontier is used to track the minimum resolvedTS accross all ranges
that the rangefeed span breaks down to. The intention is to allow users
of this library to be able to use this option to checkpoint the
rangefeed progress. The plan is to use it as such for cockroachdb#69661 and
 cockroachdb#69614.

Release note: None
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 9, 2021
This patch introduces the spanconfig.SQLTranslator which translates
zone configurations to span configurations.

The zone config and span config protos look similar, but they differ
from each other in a couple of ways:
- zone configurations correspond to {descriptor, zone} IDs whereas span
configurations correspond to keyspans.
- zone configurations have a notion of inheritance which span
configurations do not. Instead, they're fully hydrated and flattened
out.

When the SQLTranslator is given a {zone,descriptor} ID it generates
span configurations for all objects under the given ID in the zone
configuration hierarchy.

The SQLTranslator fills in any missing fields by following up the
inheritance chain to fully hydrate span configs. It also accounts
for subzones (and subzone spans) when constructing (span, span config)
tuples.

Split out from cockroachdb#69661.

Release note: None
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to find some time today (before tomorrow) to check this code out and play with it.

Reviewed all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @arulajmani, @irfansharif, @nvanbenschoten, and @RichardJCai)


pkg/ccl/spanconfigccl/testdata/databases, line 35 at r6 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Strongly, Translate's gig is to translate a descriptor into its constituent spans and configs -- it's much easier to see which spans are adjacent to one another and where the gaps are when there are no line breaks in between.

One trick to avoid all of this formatting pain (and trade it for a different pain) is to use yaml. I could see this as:

"/Table/5{3-4}": {num_replicas: 7, num_voters: 5}
"/Table/5{7-8}": {num_replicas: 7, num_voters: 7}

pkg/spanconfig/spanconfig.go, line 41 at r7 (raw file):

// SQLWatcher can be used to watch for events on system.zones and
// system.descriptors.
type SQLWatcher interface {

The more I look at this thing, the less I feel like it should be an interface. Is it ever going to have more than one implementation? This feels like the driver of this whole system but it's not very important to mock. And then, in that way, I don't think we really need to expose the handler func or the events in the public API.

I need to step back and play with this code a bit.


pkg/spanconfig/spanconfig.go, line 65 at r7 (raw file):

// SQLWatcherHandleFunc is the type of the handler function expected by the
// SQLWatcher.
type SQLWatcherHandleFunc func(ids []SQLWatcherEvent, checkpointTS hlc.Timestamp) error

it seems to me like the implementation of this thing is going to do a heck of a lot of RPCs and what not. It should take a context and respond to cancellation.


pkg/spanconfig/spanconfigsqlwatcher/sql_watcher.go, line 109 at r7 (raw file):

		// TODO(arul): Should we add a timer here to accumulate updates for some
		// amount of time before calling the handler?
		default:

this smells, how about instead we pass the context into the flush and handle calls?


pkg/spanconfig/spanconfigsqlwatcher/zones_decoder_test.go, line 97 at r3 (raw file):

		require.NoError(t, err)

		// system.zones has 2 column families, so for every entry that was upsurted

this makes me so sad.

irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 12, 2021
In both cockroachdb#69614 and cockroachdb#69661 we find ourselves reaching for a thin,
memory-bounded buffer to sit on top a rangefeed. We want to be able to
accumulate raw rangefeed events, and when checkpointed, flush out a
timestamp sorted, de-duplicated batch of events covered by said
checkpoint. Package rangefeedbuffer introduces such a datastructure.

If we're past the configured memory limit before having received a
checkpoint, we have little we can do -- since we receive checkpoints and
rangefeed values on the same stream, we're waiting for a future
checkpoint to flush out the accumulated memory. Given this, we simply
error out to the caller (expecting them to re-establish the rangefeed).

Release note: None

Co-authored-by: Arul Ajmani <arula@cockroachlabs.com>
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 12, 2021
In both cockroachdb#69614 and cockroachdb#69661 we find ourselves reaching for a thin,
memory-bounded buffer to sit on top a rangefeed. We want to be able to
accumulate raw rangefeed events, and when the rangefeed frontier is
bumped, flush everything out en-masse in timestamp sorted order. Package
rangefeedbuffer introduces such a datastructure.

If we're past the configured memory limit before having observed a
frontier bump, we have little recourse -- since we receive checkpoints and
rangefeed values on the same stream, we're waiting for a future
checkpoint to flush out the accumulated memory. Given this, we simply
error out to the caller (expecting them to re-establish the rangefeed).

Release note: None

Co-authored-by: Arul Ajmani <arula@cockroachlabs.com>
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 13, 2021
This patch adds a `OnFrontierAdvance` option to the rangefeed libary.
This option allows users of this library to install a callback which
is called whenever the frontier is advanced (with the new frontier
timestamp).

The frontier is used to track the minimum resolvedTS accross all ranges
that the rangefeed span breaks down to. The intention is to allow users
of this library to be able to use this option to checkpoint the
rangefeed progress. The plan is to use it as such for cockroachdb#69661 and
 cockroachdb#69614.

Release note: None
irfansharif added a commit to irfansharif/cockroach that referenced this pull request Oct 14, 2021
In both cockroachdb#69614 and cockroachdb#69661 we find ourselves reaching for a thin,
memory-bounded buffer to sit on top a rangefeed. We want to be able to
accumulate raw rangefeed events, and when the rangefeed frontier is
bumped, flush everything out en-masse in timestamp sorted order. Package
rangefeedbuffer introduces such a datastructure.

If we're past the configured memory limit before having observed a
frontier bump, we have little recourse -- since we receive checkpoints and
rangefeed values on the same stream, we're waiting for a future
checkpoint to flush out the accumulated memory. Given this, we simply
error out to the caller (expecting them to re-establish the rangefeed).

Release note: None

Co-authored-by: Arul Ajmani <arula@cockroachlabs.com>
arulajmani added a commit to arulajmani/cockroach that referenced this pull request Oct 14, 2021
This patch adds a `OnFrontierAdvance` option to the rangefeed libary.
This option allows users of this library to install a callback which
is called whenever the frontier is advanced (with the new frontier
timestamp).

The frontier is used to track the minimum resolvedTS accross all ranges
that the rangefeed span breaks down to. The intention is to allow users
of this library to be able to use this option to checkpoint the
rangefeed progress. The plan is to use it as such for cockroachdb#69661 and
 cockroachdb#69614.

Release note: None
@irfansharif irfansharif changed the title spanconfig: introduce the spanconfig.SQL{Watcher, Translator} spanconfig: introduce the spanconfig.SQLWatcher Oct 14, 2021
craig bot pushed a commit that referenced this pull request Oct 14, 2021
71256: kvclient/rangefeed: make frontier advances observable using an option r=arulajmani a=arulajmani

This patch adds a `OnFrontierAdvance` option to the rangefeed libary.
This option allows users of this library to install a callback which
is called whenever the frontier is advanced.

The frontier is used to track the minimum resolvedTS accross all ranges
that the rangefeed span breaks down to. The intention is to allow users
of this library to be able to use this option to checkpoint the
rangefeed progress. The plan is to use it as such for #69661 and
 #69614.

Release note: None

Co-authored-by: arulajmani <arulajmani@gmail.com>
The SQLWatcher is intended to incrementally watch for events on
system.zones and system.descriptors. It establishes rangefeeds on these
tables from a given checkpoint to do so. It buffers events until the
next checkpoint. These buffered events, along with the new checkpoint
timestamp, are passed to the handler callback.

The SQLWatcher uses an eventBuffer helper struct to buffer rangefeed
updates. This thing maintains the notion of a checkpoint timestamp,
whcih is the minimum timestamp for {zone,descriptor} checkpoint
events.

These interfaces aren't hooked up to interact with eachother yet.
That's forthcoming in a future patch.

References cockroachdb#67679

Release note: None
@arulajmani
Copy link
Copy Markdown
Collaborator Author

@irfansharif I rebased this thing from the SQLTranslator PR like we spoke about earlier. I think there's still a few comments left in here that I might need to address and I need to switch out the event_buffer to use the more generalized library you merged in. I'll send out a new SQLWatcher PR with all of this done very soon.

Hopefully nothing major will change in the interface definitions here, so if you end up starting on the Reconciler before I send out the cleaned up SQLWatcher PR hopefully this rebased PR can help you for your SQLWatcher needs.

@irfansharif
Copy link
Copy Markdown
Contributor

(backlog grooming) Superseded by #71968 (merged) + #71359 (merged) and #71994 (unmerged).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants