[wip] storage: add closed timestamp subsystem#27595
[wip] storage: add closed timestamp subsystem#27595tbg wants to merge 17 commits intocockroachdb:masterfrom
Conversation
2b24931 to
5bd7b9b
Compare
nvb
left a comment
There was a problem hiding this comment.
I'm happy to see this pulled out of the storage package. It all should be able to strang on its own, and this is a big step towards that.
Reviewed 16 of 20 files at r1, 3 of 4 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/storage/ct/ct.go, line 72 at r2 (raw file):
// A Notifyee is a sink for closed timestamp updates. type Notifyee interface { Notify(roachpb.NodeID) chan<- ctpb.Entry
Add a comment here. Also, describe the special handling of nodeID=0, which you exploit later on.
pkg/storage/ct/ct.go, line 85 at r2 (raw file):
// EpochT is a wrapper around int64 which helps avoid confusing it with other // int64s used for different purposes. type EpochT struct {
type EpochT int64? Or better type Epoch int64 with no getters or setters. Same with LAIT? Maybe I'm missing why these structs are important.
pkg/storage/ct/ct.go, line 134 at r2 (raw file):
type PeerRegistry interface { // Request asks the peer node to produce an update for the supplied RangeID. Request(roachpb.NodeID, roachpb.RangeID)
Are all of these methods asynchronous? It's a little strange to see none of them return errors.
pkg/storage/ct/ct.go, line 146 at r2 (raw file):
// extra constraints that the local node is live for the returned timestamp at // the given epoch. type LiveClockFn func() (liveNow hlc.Timestamp, liveEpoch int64, _ error)
Can we use EpochT here? Or is this matching an existing function?
pkg/storage/ct/ctpb/entry.proto, line 29 at r2 (raw file):
int64 epoch = 1; util.hlc.Timestamp closed_timestamp = 2 [(gogoproto.nullable) = false]; bool full = 3;
Add comments to these fields. Specifically, I'm not 100% sure what the semantics of full are.
pkg/storage/ct/ctstorage/storage.go, line 56 at r2 (raw file):
// an ill-timed crash, and so historic information is invaluable. // // TODO(tschottdorf): this shouldn't be an interface but a concrete impl, the
+1, this sounds like a good idea.
pkg/storage/ct/provider/provider.go, line 15 at r2 (raw file):
// permissions and limitations under the License. package provider
It is a little strange that some of the ct subpackages are prefixed with ct while others aren't. I'd make that uniform.
I'm also wondering if ct is a little too concise. Based on other packages in the storage package (e.g. idalloc, rditer, tscache) it seems like closedts might be a better fit because it gives a bit more insight when using it.
pkg/storage/ct/provider/provider.go, line 137 at r2 (raw file):
// Notify implements ct.Notifyee. func (p *Provider) Notify(nodeID roachpb.NodeID) chan<- ctpb.Entry {
Do we expect this to be called once per store or once per replica. I think the former, which is fine. If it's the latter though, I worry about the number of goroutines we'll have to create here.
pkg/storage/ct/provider/provider.go, line 156 at r2 (raw file):
p.mu.Lock() for i = range p.mu.subscribers { if p.mu.subscribers == nil {
What's going on here? We're iterating over a slice and then looking at whether that slice is nil? Is this even reachable? Did you mean p.mu.subscribers[i] == nil? Is this to allow for re-use within the slice?
pkg/storage/ct/provider/provider.go, line 220 at r2 (raw file):
var ok bool // TODO(tschottdorf): add and use VisitDescending. p.cfg.Storage.VisitAscending(nodeID, func(entry ctpb.Entry) bool {
It seems like this lookup could be more efficient. It's surprising to me that we'd have to iterate over all historical states. This is going to be on the hot path of reads, so I think we want the index structure here to be optimized around this lookup. I don't have any great suggestions at the moment though.
tbg
left a comment
There was a problem hiding this comment.
First round of comments addressed (with the exception of package renames, for which I'm not sure we've come up with something good yet). Is it time to fill in the missing code or are there any bigger reorgs that you're considering suggesting?
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/storage/ct/ct.go, line 72 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Add a comment here. Also, describe the special handling of
nodeID=0, which you exploit later on.
Done.
pkg/storage/ct/ct.go, line 85 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
type EpochT int64? Or bettertype Epoch int64with no getters or setters. Same withLAIT? Maybe I'm missing why these structs are important.
You're right, I forgot how this worked. I thought that without the struct you could still use an Epoch as an MLAI in too many places, but it's not true (ok, https://play.golang.com/p/ONjx2KoSyGn now runs, but I think that's fine as we wouldn't ever pass consts in production code). Done.
pkg/storage/ct/ct.go, line 134 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Are all of these methods asynchronous? It's a little strange to see none of them return errors.
I tried to clarify the comments but yes, basically everything is asynchronous here. Calling Request is just saying "hey, next time you talk to that node, please send it these rangeIDs so that it'll emit updates for them".
pkg/storage/ct/ct.go, line 146 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Can we use
EpochThere? Or is this matching an existing function?
This line predated the introduction of EpochT, so I forgot to add it. Changed.
pkg/storage/ct/ctpb/entry.proto, line 29 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Add comments to these fields. Specifically, I'm not 100% sure what the semantics of
fullare.
Done.
pkg/storage/ct/ctstorage/storage.go, line 56 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
+1, this sounds like a good idea.
Now that I'm looking at it again, I'd like to wait once I've written tests. For introducing persistent storage we'd not write another impl of this interface (that's what the comment said) but I may still want to unit test the MultiStorage and use the interface in that context.
pkg/storage/ct/provider/provider.go, line 15 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It is a little strange that some of the
ctsubpackages are prefixed withctwhile others aren't. I'd make that uniform.I'm also wondering if
ctis a little too concise. Based on other packages in thestoragepackage (e.g.idalloc,rditer,tscache) it seems likeclosedtsmight be a better fit because it gives a bit more insight when using it.
./storage/closedts
./storage/closedts/closedtstransport
Like that? I like closedts for the root package, but then using it as a prefix for the others seems too much. OTOH, it seems awkward to keep using ct there. WDYT?
pkg/storage/ct/provider/provider.go, line 137 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Do we expect this to be called once per store or once per replica. I think the former, which is fine. If it's the latter though, I worry about the number of goroutines we'll have to create here.
It's once per node, so the number of goroutines is at most the number of nodes in the cluster.
pkg/storage/ct/provider/provider.go, line 156 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What's going on here? We're iterating over a slice and then looking at whether that slice is nil? Is this even reachable? Did you mean
p.mu.subscribers[i] == nil? Is this to allow for re-use within the slice?
Yeah, this is definitely the kind of bug that people write unit tests for. Fixed.
pkg/storage/ct/provider/provider.go, line 220 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
It seems like this lookup could be more efficient. It's surprising to me that we'd have to iterate over all historical states. This is going to be on the hot path of reads, so I think we want the index structure here to be optimized around this lookup. I don't have any great suggestions at the moment though.
I think that VisitDescending (see the TODO above which I just took care of) addresses this sufficiently as the first bucket you visit will typically be the one that allows serving the request already. Besides, I don't think we'll have more than two buckets now and potentially a handful more (below 10) later, at which point any type of fancy data structure is not likely to do better.
nvb
left a comment
There was a problem hiding this comment.
Is it time to fill in the missing code or are there any bigger reorgs that you're considering suggesting?
You can begin filling in the missing code. I don't have any large reorg suggestions. Thanks for structuring this well!
Reviewed 2 of 11 files at r4, 2 of 8 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/storage/ct/ctpb/entry.proto, line 29 at r2 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Done.
The description still doesn't give a lot of insight into what "full" means. Does it mean that the mlai map includes all ranges on the store that's sending the entry?
pkg/storage/ct/ctstorage/storage.go, line 56 at r2 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Now that I'm looking at it again, I'd like to wait once I've written tests. For introducing persistent storage we'd not write another impl of this interface (that's what the comment said) but I may still want to unit test the MultiStorage and use the interface in that context.
Ack.
pkg/storage/ct/provider/provider.go, line 15 at r2 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
./storage/closedts ./storage/closedts/closedtstransportLike that? I like
closedtsfor the root package, but then using it as a prefix for the others seems too much. OTOH, it seems awkward to keep usingctthere. WDYT?
./storage/closedts
./storage/closedts/transport
Is what I would suggest. You can always import ./storage/closedts/transport as cttransport.
tbg
left a comment
There was a problem hiding this comment.
No functional changes in latest push, just a bunch of renaming.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/storage/ct/ctpb/entry.proto, line 29 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The description still doesn't give a lot of insight into what "full" means. Does it mean that the mlai map includes all ranges on the store that's sending the entry?
Yep, basically. But most importantly, it allows the receiver to assume that "no new MLAI is equal to previously seen MLAI". I expanded the comment.
pkg/storage/ct/provider/provider.go, line 15 at r2 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
./storage/closedts ./storage/closedts/transportIs what I would suggest. You can always import
./storage/closedts/transportascttransport.
Done. I renamed config -> container in the process. ctpb remains (since we always use that naming scheme for the protobuf packages).
nvb
left a comment
There was a problem hiding this comment.
Reviewed 8 of 19 files at r10, 5 of 22 files at r11, 4 of 4 files at r13.
Reviewable status:complete! 0 of 0 LGTMs obtained
pkg/base/config.go, line 284 at r13 (raw file):
// GetClientTLSConfig returns the client TLS config, initializing it if needed. // If Insecure is true, return a nil config, otherwise ask the certificate // manager for a TLS config using certs for the container.User.
There were a number of s/config/container replacements in this commit that I don't think you meant to make.
pkg/storage/closedts/closedts.go, line 19 at r13 (raw file):
// // The following diagram illustrates how these components fit together. In // running operation, the components are grouped in a ctconfig.Container
s/ctconfig/container
pkg/storage/closedts/container/container.go, line 63 at r13 (raw file):
func NewContainer(cfg Config) *Container { storage := storage.NewMultiStorage(func() storage.SingleStorage { return storage.NewMemStorage(5*closedts.TargetDuration.Get(&cfg.Settings.SV), 2)
Pull these numbers out into constants.
pkg/storage/closedts/container/container.go, line 123 at r13 (raw file):
func DialerAdapter(dialer *nodedialer.Dialer) closedts.Dialer { return (*dialerAdapter)(dialer)
Extra line.
pkg/storage/closedts/container/container_test.go, line 48 at r13 (raw file):
} func TestContainer(t *testing.T) {
Are you planning on adding to this test?
pkg/storage/closedts/provider/provider.go, line 88 at r13 (raw file):
defer close(ch) t := timeutil.NewTimer()
You can just put this on the stack. Also, don't you need to defer t.Stop()?
pkg/storage/closedts/provider/provider.go, line 129 at r13 (raw file):
func (p *Provider) processNotification(nodeID roachpb.NodeID, entry ctpb.Entry) { p.cfg.Storage.Add(nodeID, entry) if nodeID == 0 {
I'd pull 0 into a constant. const localNode = 0 or something.
pkg/storage/closedts/provider/provider.go, line 141 at r13 (raw file):
func (p *Provider) Notify(nodeID roachpb.NodeID) chan<- ctpb.Entry { ch := make(chan ctpb.Entry) ctx := context.TODO()
Should this be Background or should we modify the interface to pass a context?
pkg/storage/closedts/provider/provider.go, line 154 at r13 (raw file):
// Subscribe implements closedts.Producer. func (p *Provider) Subscribe(ctx context.Context, ch chan<- ctpb.Entry) { var i int
nit: consider pulling everything that uses i into a tighter scope.
pkg/storage/closedts/provider/provider.go, line 189 at r13 (raw file):
for { p.cond.L.Lock() p.cond.Wait()
What happens if the provider's ct-closer task exits before we get here? Do we deadlock? Do we need to drain all subscriptions before allowing ct-closer to finish?
pkg/storage/closedts/storage/storage.go, line 80 at r13 (raw file):
// MultiStorage implements the closedts.Storage interface. type MultiStorage struct { f func() SingleStorage
What is f? A constructor? I'd at least leave comment mentioning that, or give this a better name.
pkg/storage/closedts/storage/storage_test.go, line 26 at r13 (raw file):
) func ExampleSingleStorage() {
These examples are really nice!
pkg/storage/ct/ctpb/entry.proto, line 29 at r2 (raw file):
Previously, tschottdorf (Tobias Schottdorf) wrote…
Yep, basically. But most importantly, it allows the receiver to assume that "no new MLAI is equal to previously seen MLAI". I expanded the comment.
Thanks.
tbg
left a comment
There was a problem hiding this comment.
I added rudimentary testing for the transport package. Hopefully doing the same for provider and container tomorrow and then I'll hook things up.
Reviewable status:
complete! 0 of 0 LGTMs obtained
pkg/base/config.go, line 284 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
There were a number of
s/config/containerreplacements in this commit that I don't think you meant to make.
Done.
pkg/storage/closedts/closedts.go, line 19 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
s/ctconfig/container
Done.
pkg/storage/closedts/container/container.go, line 63 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Pull these numbers out into constants.
Done. I think they'll require some tweaking, but for now it'll do.
pkg/storage/closedts/container/container.go, line 123 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Extra line.
Done.
pkg/storage/closedts/container/container_test.go, line 48 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Are you planning on adding to this test?
Yes, this is just. a stub.
pkg/storage/closedts/provider/provider.go, line 88 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
You can just put this on the stack. Also, don't you need to defer
t.Stop()?
Done.
pkg/storage/closedts/provider/provider.go, line 129 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I'd pull
0into a constant.const localNode = 0or something.
I actually backed out this special casing. Was more trouble than it's worth.
pkg/storage/closedts/provider/provider.go, line 141 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Should this be Background or should we modify the interface to pass a context?
I'll use context.Background for now.
pkg/storage/closedts/provider/provider.go, line 154 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: consider pulling everything that uses
iinto a tighter scope.
It's used all the way through, and I don't want to take a reference into something that is protected by the mutex.
pkg/storage/closedts/provider/provider.go, line 189 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What happens if the provider's
ct-closertask exits before we get here? Do we deadlock? Do we need to drain all subscriptions before allowingct-closerto finish?
Good point, done (though I have little confidence in the correctness until it's tested).
pkg/storage/closedts/storage/storage.go, line 80 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What is
f? A constructor? I'd at least leave comment mentioning that, or give this a better name.
Done.
pkg/storage/closedts/storage/storage_test.go, line 26 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
These examples are really nice!
Thanks! I like this format too.
|
Added end-to-end-ish tests ( |
|
@nvanbenschoten this PR is getting... pretty large. I'm almost done writing tests for |
This commit introduces the basic closed timestamp subsystem. The system
is far from complete (as a starter, it's 100% untested, so it definitely
has lots of bugs and won't work properly at all), but the point of this
PR is to get buy-in on the overall architectural approach, so that
shortcomings can be addressed before they are cemented in more code.
Great care has been taken to make each individual component self-
contained and cleanly interfaced to the outside world, which should aid
greatly in unit testing them in isolation. The package layout relative
to `ct` is the following:
.: slim root package containing only the most basic CT interfaces
./ctpb: protobuf definitions and service
./ctconfig: fat package pulling in everything needed to actually run
./ctstorage: storage for per-node closed timestamp state
./minprop: minimum proposal timestamp tracker
./provider: the Provider.
Most of these packages depend on `ct` (more precisely, on the interfaces
they need to use), but in turn don't depend on each other. I'm open to
changing the name of individual packages (for example, prefixing them
all with `ct`).
As an entry point for review, the following diagram together with the
`ctconfig` and `ct` packages should be instructive in getting a bird's
eye view of how the subsystem fits into CockroachDB: All of the
components are grouped in a Container (intended as a pass-around
per-instance Singleton) and is passed down to the Stores via the
StoreConfig. Replicas proposing commands talk to the Tracker; replicas
trying to serve follower reads talk to the Provider, which receives
closed timestamp updates for the local node and its peers.
```
Node 1 | Node 2
|
+---------+ Close +-----------+ | +-----------+
| Tracker |<--------| | | | |
+-----+---+ | +-------+ | | | +-------+ | CanServe
^ | |Storage| | | | |Storage| |<---------+
| | --------+ | | | +-------+ | |
|Track | | | | | +----+----+
| | Provider | | | Provider | | Follower|
| +-----------+ | +-----------+ | Replica |
| ^ ^ +----+----+
| |Subscribe |Notify |
| | | |
+---------+ | Request | |
|Proposing| Refresh +---+----+ <----- +----+-----+ Request |
| Replica |<--------| Server | | Clients |<----------+
+---------+ +--------+ -----> +----------+ EnsureClient
CT
```
Release note: None
jeez, these shook out a bunch of bugs.
nvb
left a comment
There was a problem hiding this comment.
Reviewed 5 of 20 files at r18, 2 of 7 files at r23, 4 of 14 files at r24, 1 of 4 files at r25, 2 of 3 files at r28, 6 of 8 files at r29, 5 of 6 files at r30.
Reviewable status:complete! 1 of 0 LGTMs obtained
pkg/storage/closedts/provider/provider_test.go, line 48 at r30 (raw file):
st := cluster.MakeTestingClusterSettings() // We'll only unleash the closer loop only when the test is basically done, and
nit: two "only"s
pkg/storage/closedts/container/container_test.go, line 130 at r30 (raw file):
} func TestTwoNodes(t *testing.T) {
Great test!
pkg/storage/closedts/transport/server.go, line 57 at r30 (raw file):
ch := make(chan ctpb.Entry, 10) // TODO: X*closedts.CloseFraction*TargetInterval
Did you mean to address this now?
tbg
left a comment
There was a problem hiding this comment.
TFTR! Closing for #27922 (which doesn't have the internal churn that confuses the stress tester).
Reviewable status:
complete! 1 of 0 LGTMs obtained
pkg/storage/closedts/container/container_test.go, line 130 at r30 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Great test!
Done.
pkg/storage/closedts/transport/server.go, line 57 at r30 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Did you mean to address this now?
No, but I made this a proper TODO.
27922: storage: add closed timestamp subsystem r=nvanbenschoten a=tschottdorf NB: This is a reopen of #27595 which needs a fix to the CI stress tester to merge before it can pass CI (and I don't want to wait for that). No need to review. ---- This commit introduces the basic closed timestamp subsystem. It does not hook it up into the CockroachDB node yet. Great care has been taken to make each individual component self- contained and cleanly interfaced to the outside world. The package layout relative to `closedts` is the following: .: slim root package containing only the most basic CT interfaces ./ctpb: protobuf definitions and service ./config: fat package pulling in everything needed to actually run ./storage: storage for per-node closed timestamp state ./minprop: minimum proposal timestamp tracker ./provider: the Provider, centralizing all closed timestamp information. Most of these packages depend on `closedts` (more precisely, on the interfaces they need to use), but in turn don't depend on each other. The following diagram together with the `ctconfig` and `ct` packages should be instructive in getting a bird's eye view of how the subsystem fits into CockroachDB: All of the components are grouped in a Container (intended as a pass-around per-instance Singleton) and is passed down to the Stores via the StoreConfig. Replicas proposing commands talk to the Tracker; replicas trying to serve follower reads talk to the Provider, which receives closed timestamp updates for the local node and its peers. ``` Node 1 | Node 2 | +---------+ Close +-----------+ | +-----------+ | Tracker |<--------| | | | | +-----+---+ | +-------+ | | | +-------+ | CanServe ^ | |Storage| | | | |Storage| |<---------+ | | --------+ | | | +-------+ | | |Track | | | | | +----+----+ | | Provider | | | Provider | | Follower| | +-----------+ | +-----------+ | Replica | | ^ ^ +----+----+ | |Subscribe |Notify | | | | | +---------+ | Request | | |Proposing| Refresh +---+----+ <----- +----+-----+ Request | | Replica |<--------| Server | | Clients |<----------+ +---------+ +--------+ -----> +----------+ EnsureClient CT ``` Release note: None Co-authored-by: Tobias Schottdorf <tobias.schottdorf@gmail.com>
This commit introduces the basic closed timestamp subsystem. The system
is far from complete (as a starter, it's 100% untested, so it definitely
has lots of bugs and won't work properly at all), but the point of this
PR is to get buy-in on the overall architectural approach, so that
shortcomings can be addressed before they are cemented in more code.
Great care has been taken to make each individual component self-
contained and cleanly interfaced to the outside world, which should aid
greatly in unit testing them in isolation. The package layout relative
to
ctis the following:.: slim root package containing only the most basic CT interfaces
./ctpb: protobuf definitions and service
./ctconfig: fat package pulling in everything needed to actually run
./ctstorage: storage for per-node closed timestamp state
./minprop: minimum proposal timestamp tracker
./provider: the Provider.
Most of these packages depend on
ct(more precisely, on the interfacesthey need to use), but in turn don't depend on each other. I'm open to
changing the name of individual packages (for example, prefixing them
all with
ct).As an entry point for review, the following diagram together with the
ctconfigandctpackages should be instructive in getting a bird'seye view of how the subsystem fits into CockroachDB: All of the
components are grouped in a Container (intended as a pass-around
per-instance Singleton) and is passed down to the Stores via the
StoreConfig. Replicas proposing commands talk to the Tracker; replicas
trying to serve follower reads talk to the Provider, which receives
closed timestamp updates for the local node and its peers.
Release note: None