Skip to content

migration: introduce various other bells and whistles...#57650

Merged
craig[bot] merged 10 commits intocockroachdb:masterfrom
irfansharif:201207.migration-bells-whistles
Dec 18, 2020
Merged

migration: introduce various other bells and whistles...#57650
craig[bot] merged 10 commits intocockroachdb:masterfrom
irfansharif:201207.migration-bells-whistles

Conversation

@irfansharif
Copy link
Copy Markdown
Contributor

...to pave the way for real migrations

This PR cannibalizes most of our prototype in #57445 and shimmies in a few
interfaces to make it all unit-testable. While here, we:

  • parallelize the execution of the EveryNode primitive
  • introduce the IterateRangeDescriptors primitive
  • re-write the migration registration process for better ergonomics
  • introduce a more structured Migration type, to annotate it with migration
    specific metadata (it also paves the way for logging the progress of an
    ongoing migration to a system table).

See individual commits for details.

@irfansharif irfansharif requested a review from a team as a code owner December 7, 2020 19:38
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not done yet, but flushing comments in case I won't be able to revisit for the rest of the day.

Reviewed 1 of 1 files at r1, 1 of 1 files at r2, 5 of 5 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, 3 of 3 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)


pkg/migration/helper_test.go, line 42 at r7 (raw file):

TestingNewCluster

Does this have to be exported?

@tbg tbg self-requested a review December 9, 2020 14:55
Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 7 files at r7, 2 of 2 files at r8, 4 of 4 files at r9.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @TheSamHuang)


pkg/migration/client_test.go, line 59 at r9 (raw file):

		// TODO(irfansharif): We always seem to include a second copy of r1's
		// desc. Unsure why.

Hmm, let's figure this out. Is it because of meta1 somehow?


pkg/migration/helper.go, line 124 at r8 (raw file):

				}
				client := serverpb.NewMigrationClient(conn)
				if err := fn(ctx, client); err != nil {

return fn(ctx, client)?


pkg/migration/helper_test.go, line 92 at r7 (raw file):

	t.Run("with-node-down", func(t *testing.T) {
		// Down a node mid-way through execution. We expect EveryNode to error

This isn't half-way through, right? I would've expected you to down the node in the EveryNode closure.


pkg/migration/helper_test.go, line 184 at r7 (raw file):

type testClusterImpl struct {
	ns   nodes
	err  error

// returned from ???


pkg/migration/helper_test.go, line 236 at r7 (raw file):

}

func (t *testClusterImpl) downNode(id int) {

Hmm, this is sort of clunky. Now we're not testing (clusterImpl).nodes at all but it kind of makes it seem like we did. Can we factor out the meat in that method and call into it here? Then we could mark id as dead in our fake liveness instance instead and the error would occur naturally in the real code that emits it.


pkg/migration/util.go, line 38 at r7 (raw file):

// and if not, what changed. The textual diff is only to be used for logging
// purposes.
func (ns nodes) identical(other nodes) (ok bool, diff string) {

Can this take a *redact.StringBuilder instead of returning a string?


pkg/migration/util.go, line 41 at r7 (raw file):

	a, b := ns, other
	if len(a) != len(b) {
		if len(a) < len(b) {

This isn't quite right, it could have been two nodes added, one removed, etc.


pkg/migration/util.go, line 47 at r7 (raw file):

		}
		return false, diff
	}

Could something like this be easier? (It has newline problems when there are multiple mismatches, but that could be fixed)

type ent struct { count int; n node, epochChanged bool }
m := map[NodeID]ent{}
for _, n := range ns {
  m[n.id] = ent{count:1, n: n, epochChanged: false}
}
for _, n := range other {
  e, ok := m[n.id]
  e.count--
  if ok && e.n.epoch != n.epoch {
     e.epochChanged = true
  }
  m[n.id] = e
}

for _, e := range m {
  if e.epochChanged {
    buf.Fprint("n%d: epoch changed", n.id)
    continue
  }
  if e.count > 0 {
    buf.Fprint("n%d: node decommissioned", n.id)
  } else if e.count < 0 {
    buf.Fprint("n%d: node joined cluster", n.id)
  }
}

return len(buf.Bytes()) == 0
}

Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more flushing. I did look at everything! Just need to figure out the double descriptor.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @TheSamHuang)

@irfansharif irfansharif force-pushed the 201207.migration-bells-whistles branch from d6cad63 to d8f862f Compare December 11, 2020 03:19
Makes for better logging.

Release note: None
We can separate out the `Helper`, `Migration`, and various utilities
into their own files. We'll add tests for individual components in
future commits; the physical separation here sets the foundation for
doing so (prototyped in cockroachdb#57445). This commit is purely code movement.

Release note: None
It's clearer to talk explicitly in terms of causality.

Release note: None
We re-define what the Migration type is to be able to annotate it a
description. We'll later use this description when populating the
`system.migrations` table (originally prototyped in cockroachdb#57445).

Release note: None
We make it a bit more ergonomic (this revision was originally
prototyped in cockroachdb#57445).

Release note: None
@irfansharif irfansharif force-pushed the 201207.migration-bells-whistles branch from d8f862f to b3f7aa4 Compare December 11, 2020 03:21
Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the blemishes, PTAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @tbg, and @TheSamHuang)


pkg/migration/client_test.go, line 59 at r9 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Hmm, let's figure this out. Is it because of meta1 somehow?

Yup, I cargo-culted from

return txn.Iterate(ctx, keys.MetaMin, keys.MetaMax, pageSize,
but that's wrong too.


pkg/migration/helper.go, line 124 at r8 (raw file):

Previously, tbg (Tobias Grieger) wrote…

return fn(ctx, client)?

Done.


pkg/migration/helper_test.go, line 42 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…
TestingNewCluster

Does this have to be exported?

yup, we use this in the migration_test files. We needed those tests to sit outside pkg/migration to avoid circular dependencies when using test server (pkg/migration -> pkg/testutils/serverutils -> pkg/server -> pkg/migration).


pkg/migration/helper_test.go, line 92 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This isn't half-way through, right? I would've expected you to down the node in the EveryNode closure.

Done.


pkg/migration/helper_test.go, line 184 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…

// returned from ???

Done.


pkg/migration/helper_test.go, line 236 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Hmm, this is sort of clunky. Now we're not testing (clusterImpl).nodes at all but it kind of makes it seem like we did. Can we factor out the meat in that method and call into it here? Then we could mark id as dead in our fake liveness instance instead and the error would occur naturally in the real code that emits it.

Figured out a way to mock the deadness at the liveness level instead.


pkg/migration/util.go, line 38 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Can this take a *redact.StringBuilder instead of returning a string?

Yea, but I like keeping the signature as is.


pkg/migration/util.go, line 41 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This isn't quite right, it could have been two nodes added, one removed, etc.

Done.


pkg/migration/util.go, line 47 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Could something like this be easier? (It has newline problems when there are multiple mismatches, but that could be fixed)

type ent struct { count int; n node, epochChanged bool }
m := map[NodeID]ent{}
for _, n := range ns {
  m[n.id] = ent{count:1, n: n, epochChanged: false}
}
for _, n := range other {
  e, ok := m[n.id]
  e.count--
  if ok && e.n.epoch != n.epoch {
     e.epochChanged = true
  }
  m[n.id] = e
}

for _, e := range m {
  if e.epochChanged {
    buf.Fprint("n%d: epoch changed", n.id)
    continue
  }
  if e.count > 0 {
    buf.Fprint("n%d: node decommissioned", n.id)
  } else if e.count < 0 {
    buf.Fprint("n%d: node joined cluster", n.id)
  }
}

return len(buf.Bytes()) == 0
}
</blockquote></details>

Ooo, like it. Done.


<!-- Sent from Reviewable.io -->

irfansharif added a commit to irfansharif/cockroach that referenced this pull request Dec 11, 2020
When scanning the KV keyspace for all range descriptors, we want to scan
from Meta2Prefix to MetaMax (see meta2RangeIter for other usages of
this). In our decomm status codepaths we were erroneously starting off
our scan at MetaMin, so including meta1 in our scans as well. This has
the effect of iterating over the range descriptor for r1 twice (first
observed in cockroachdb#57650 where we cargo-culted these bounds).

Release note (bug fix): When displaying replica counts during the
decommissioning process, previously we were overcounting the
replicas displayed for r1. This is no longer the case.
craig bot pushed a commit that referenced this pull request Dec 11, 2020
57812: server: fix decomm status replica overcounting for r1 r=irfansharif a=irfansharif

When scanning the KV keyspace for all range descriptors, we want to scan
from Meta2Prefix to MetaMax (see meta2RangeIter for other usages of
this). In our decomm status codepaths we were erroneously starting off
our scan at MetaMin, so including meta1 in our scans as well. This has
the effect of iterating over the range descriptor for r1 twice (first
observed in #57650 where we cargo-culted these bounds).

Release note (bug fix): When displaying replica counts during the
decommissioning process, previously we were overcounting the
replicas displayed for r1. This is no longer the case.

Co-authored-by: irfan sharif <irfanmahmoudsharif@gmail.com>
@irfansharif
Copy link
Copy Markdown
Contributor Author

Bump, @tbg+@knz.

@irfansharif irfansharif requested a review from tbg December 17, 2020 14:04
Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think r14 and r15 should be squashed: they look like they're moving code from one file to another, but in r14 the code is duplicated. (Or is my reading wrong?) I'd recommend trying to build at every commit individually.

Reviewed 1 of 1 files at r1, 11 of 11 files at r10, 1 of 1 files at r11, 5 of 5 files at r12, 1 of 1 files at r13, 1 of 1 files at r14, 3 of 3 files at r15, 7 of 7 files at r16, 2 of 2 files at r17, 4 of 4 files at r18.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @tbg, and @TheSamHuang)


pkg/migration/helper.go, line 121 at r17 (raw file):

			id := node.id // copy out of the loop variable
			grp.GoCtx(func(ctx context.Context) error {
				conn, err := h.c.dial(ctx, id)

now that's too much concurrency. I think this needs a concurrency token pool.


pkg/migration/util.go, line 47 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Ooo, like it. Done.

I don't see it done. Or maybe you forgot to push something?


pkg/server/migration.go, line 43 at r11 (raw file):

	ctx context.Context, req *serverpb.ValidateTargetClusterVersionRequest,
) (*serverpb.ValidateTargetClusterVersionResponse, error) {
	ctx, span := m.server.AnnotateCtxWithSpan(ctx, "validate-cv")

here and below: can you expand -cv into -cluster-version. These strings are cheap and it will make the logs easier to read.

@tbg tbg requested a review from knz December 17, 2020 15:38
Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no further comments, ready whenever @knz is.

Reviewed 11 of 11 files at r10, 1 of 1 files at r11, 5 of 5 files at r12, 1 of 1 files at r13, 1 of 1 files at r14, 3 of 3 files at r15, 7 of 7 files at r16, 2 of 2 files at r17, 4 of 4 files at r18.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @TheSamHuang)


pkg/migration/helper.go, line 121 at r17 (raw file):

Previously, knz (kena) wrote…

now that's too much concurrency. I think this needs a concurrency token pool.

🤔 good point,so I guess we need a magic constant. 50 sounds fine? We're only paying a grpc request per goroutine here.


pkg/migration/util.go, line 38 at r7 (raw file):

Previously, irfansharif (irfan sharif) wrote…

Yea, but I like keeping the signature as is.

Works for me if it works for @knz, though I don't understand why you "like" this more than taking a redact.StringBuilder.

@irfansharif irfansharif force-pushed the 201207.migration-bells-whistles branch from b3f7aa4 to 11a2e5f Compare December 17, 2020 21:50
Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r14 shows up as the intended comment-only commit for me, and r15 something else. Probably some reviewable funkiness? Also did try building+linting at each commit individually.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @tbg, and @TheSamHuang)


pkg/migration/helper.go, line 121 at r17 (raw file):

Previously, tbg (Tobias Grieger) wrote…

🤔 good point,so I guess we need a magic constant. 50 sounds fine? We're only paying a grpc request per goroutine here.

Done. We're coming full circle using the quotapool I worked on way back when: #15802.


pkg/migration/util.go, line 47 at r7 (raw file):

Previously, knz (kena) wrote…

I don't see it done. Or maybe you forgot to push something?

It is done, it shows up between r15-r16 for me on reviewable, and here on github (commit 7).


pkg/server/migration.go, line 43 at r11 (raw file):

Previously, knz (kena) wrote…

here and below: can you expand -cv into -cluster-version. These strings are cheap and it will make the logs easier to read.

Done.

Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: after comments!

Reviewed 4 of 4 files at r19, 4 of 4 files at r20, 2 of 2 files at r21.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @TheSamHuang)


pkg/migration/helper.go, line 124 at r19 (raw file):

			id := node.id // copy out of the loop variable
			grp.GoCtx(func(ctx context.Context) error {
				alloc, err := qp.Acquire(ctx, 1)

This is still unbounded concurrency (the unbounded now sits on Acquire instead of in the RPC call). I think you want to acquire before calling GoCtx and keep releasing inside (note the usual "local in a loop" footgun). The other way to make this work is to spawn 25 goroutines upfront and have them consume from a channel, but I like about the current approach that it starts only the goroutines it needs.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 4 of 4 files at r20, 1 of 2 files at r21.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @irfansharif, @knz, and @TheSamHuang)


pkg/migration/util.go, line 92 at r16 (raw file):

// SafeFormat implements redact.SafeFormatter.
func (ns nodes) SafeFormat(s redact.SafePrinter, _ rune) {
	s.SafeString(redact.SafeString(ns.String()))

No that really does not work for me, like, at all.
The very point of the design of the API is to ensure that future changes around the code do not run the risk of introducing unsafe strings. Your usage of SafeString() here is completely at odds with that. You just break all the protections.

The desired idiom is the other way around:

  1. implement the pretty-printing logic in SafeFormat
  2. only then implement String() as return redact.StringWithoutMarkers(self)

@irfansharif irfansharif force-pushed the 201207.migration-bells-whistles branch from 11a2e5f to cf9254f Compare December 18, 2020 13:58
Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @knz, @tbg, and @TheSamHuang)


pkg/migration/helper.go, line 124 at r19 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This is still unbounded concurrency (the unbounded now sits on Acquire instead of in the RPC call). I think you want to acquire before calling GoCtx and keep releasing inside (note the usual "local in a loop" footgun). The other way to make this work is to spawn 25 goroutines upfront and have them consume from a channel, but I like about the current approach that it starts only the goroutines it needs.

gah, good catch. Done.


pkg/migration/util.go, line 92 at r16 (raw file):

Previously, knz (kena) wrote…

No that really does not work for me, like, at all.
The very point of the design of the API is to ensure that future changes around the code do not run the risk of introducing unsafe strings. Your usage of SafeString() here is completely at odds with that. You just break all the protections.

The desired idiom is the other way around:

  1. implement the pretty-printing logic in SafeFormat
  2. only then implement String() as return redact.StringWithoutMarkers(self)

Doh, done. That makes a lot more sense. I'll admit (as is glaringly obvious), this was done entirely without thinking.

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 7 files at r22, 4 of 4 files at r24, 2 of 2 files at r25.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)


pkg/migration/helper.go, line 150 at r24 (raw file):

		if ok, diff := ns.identical(curNodes); !ok {
			log.Infof(ctx, "[%s], retrying", redact.Safe(diff))
log.Infof(ctx, "[%s], retrying", diff)

pkg/migration/util.go, line 38 at r22 (raw file):

// and if not, what changed (in terms of cluster membership operations and epoch
// changes). The textual diff is only to be used for logging purposes.
func (ns nodes) identical(other nodes) (ok bool, diff string) {

diff redact.RedactableString


pkg/migration/util.go, line 62 at r22 (raw file):

	for id, e := range m {
		if e.epochChanged {
			diffs = append(diffs, fmt.Sprintf("n%d's epoch changed", id))

redact.Sprintf(...) here


pkg/migration/util.go, line 65 at r22 (raw file):

		}
		if e.count > 0 {
			diffs = append(diffs, fmt.Sprintf("n%d was decommissioned", id))

ditto


pkg/migration/util.go, line 68 at r22 (raw file):

		}
		if e.count < 0 {
			diffs = append(diffs, fmt.Sprintf("n%d joined the cluster", id))

ditto


pkg/migration/util.go, line 72 at r22 (raw file):

	}

	sort.Strings(diffs)

hold on for a bit and you'll have redact.SortStrings here


pkg/migration/util.go, line 82 at r22 (raw file):

// SafeFormat implements redact.SafeFormatter.
func (ns nodes) SafeFormat(s redact.SafePrinter, _ rune) {
	var b strings.Builder

ok try this:

s.SafeString("n{")
if len(ns) > 0 {
   s.Printf("%d", ns[0].id)
   for _, node  := range ns[1:] {
       s.Printf(",%d", node.id)
   }
}
s.SafeString("}")  // or b.SafeRune('}')

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)


pkg/migration/util.go, line 73 at r22 (raw file):

	sort.Strings(diffs)
	return len(diffs) == 0, strings.Join(diffs, ", ")

I'd also suggest bluntly returning the slice here, instead of joining it (and sorting it)
let the caller decide what it wants

In the particular log call that uses this today, we don't even need to sort!

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)


pkg/migration/util.go, line 72 at r22 (raw file):

Previously, knz (kena) wrote…

hold on for a bit and you'll have redact.SortStrings here

cockroachdb/redact#13

@irfansharif irfansharif force-pushed the 201207.migration-bells-whistles branch from cf9254f to e876d9b Compare December 18, 2020 15:02
Copy link
Copy Markdown
Contributor Author

@irfansharif irfansharif left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)


pkg/migration/helper.go, line 150 at r24 (raw file):

Previously, knz (kena) wrote…
log.Infof(ctx, "[%s], retrying", diff)

Done.


pkg/migration/util.go, line 38 at r7 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Works for me if it works for @knz, though I don't understand why you "like" this more than taking a redact.StringBuilder.

I've now changed it as originally suggested, but I'll take the podium here anyway. I think my preference came from wanting my base utility functions (like this contextual diff generator) to not need to know or care about "redactability" whatsoever. A function signature with the named return type diff string to me is ever-so-slightly more understandable than seeing our redaction package being plumbed into it. As for the point at which we're logging the diff, I'm happy to mark the string as redact.Safe by hand, as opposed to plumbing a redact.<type> all throughout up until that point.

Though I'll admit I'm only now realizing this is at odds with us wanting all logging to be safe by construction/non-redaction to be something opted out of.


pkg/migration/util.go, line 72 at r22 (raw file):

Previously, knz (kena) wrote…

cockroachdb/redact#13

I'll skip the string sorting in this PR, you're right that we don't actually need it.


pkg/migration/util.go, line 73 at r22 (raw file):

Previously, knz (kena) wrote…

I'd also suggest bluntly returning the slice here, instead of joining it (and sorting it)
let the caller decide what it wants

In the particular log call that uses this today, we don't even need to sort!

Done. Thanks for walking me through this 🙌


pkg/migration/util.go, line 82 at r22 (raw file):

Previously, knz (kena) wrote…

ok try this:

s.SafeString("n{")
if len(ns) > 0 {
   s.Printf("%d", ns[0].id)
   for _, node  := range ns[1:] {
       s.Printf(",%d", node.id)
   }
}
s.SafeString("}")  // or b.SafeRune('}')

</blockquote></details>

Done.


<!-- Sent from Reviewable.io -->

Copy link
Copy Markdown
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 2 of 8 files at r26, 4 of 4 files at r28, 2 of 2 files at r29.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @irfansharif, @knz, @tbg, and @TheSamHuang)


pkg/migration/util.go, line 85 at r27 (raw file):

		}
	}
	s.SafeString("}") // or b.SafeRune('}')

you get to choose which of the two you want, no need to keep the comment

To facilitate testing `Helper` in isolation, we introduce a `cluster`
interface that we'll mock out in tests. It's through this interface that
the migration infrastructure will able to dial out to a specific node,
grab hold of a kv.DB instance, and retrieve the current cluster
membership.

Part of diff also downgrades `RequiredNodes` from being a first class
primitive, instead tucking it away for internal usage only. Given
retrieving the cluster membership made no guarantees about new nodes
being added to the cluster, it's entirely possible for that to happen
concurrently with it. Appropriate usage then entailed wrapping it under
a stabilizing loop, like we do so in `EveryNode`. This tells us there's
no need to expose it directly to migration authors.

Release note: None
It's not currently wired up to anything. We'll use it in future PRs to
send out `Migrate` requests to the entire keyspace. This was originally
prototyped in cockroachdb#57445. See the inline comments and the RFC (cockroachdb#48843) for
the motivation here.

Release note: None
@irfansharif irfansharif force-pushed the 201207.migration-bells-whistles branch from e876d9b to a0c5160 Compare December 18, 2020 15:32
@irfansharif
Copy link
Copy Markdown
Contributor Author

phew, TFTRs!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 18, 2020

Build failed:

@irfansharif
Copy link
Copy Markdown
Contributor Author

Looks like acceptance/reset-quorum timed out here. @TheSamHuang, mind trying it out locally to see what's up?

bors retry

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Dec 18, 2020

Build succeeded:

@craig craig bot merged commit 05da184 into cockroachdb:master Dec 18, 2020
@irfansharif irfansharif deleted the 201207.migration-bells-whistles branch December 18, 2020 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants