Conversation
storage/range_test.go
Outdated
|
Could you clean up the following outdated comments in our codebase while you're at it? |
proto/internal.go
Outdated
|
LGTM |
There was a problem hiding this comment.
Is it difficult to make a merge transactional just because it's hard to make it efficient? Because a merge is really just a read followed by a write.
There was a problem hiding this comment.
RocksDB merges are way more involved than just a read followed by a write, they are really going for high performance here.
A Merge usually will go to memory and not touch the actual key. As more merge requests add up to it, RocksDB will (in-memory) pre-merge those merges (still not touching the actual key), assuming associativity. Only when you actually want to know what the value is (or probably automatically after some time when there's a convenient moment to do so) RocksDB will merge into the stored key and write the update.
There was a problem hiding this comment.
Ah, interesting. And yet it seems difficult to have properly ordered time series data without transactions -- assuming these values are even replicated without transactions.
There was a problem hiding this comment.
Every command is replicated, including non-transactional changes.
Time series data is always ordered because every piece of data is paired with a timestamp. The individual samples can be written out-of-order, and that ordering can be easily corrected by the client when the data is read. There are concerns regarding dropped writes, but monitoring systems are designed to account for this possibility.
Add the InternalMerge command, which is replacing the anticipated (but never implemented) AccumulateTS command. InternalMerge is similar to a put in that it writes a single value to a key - however, the supplied value is *merged* into the existing value instead of replacing it. Merging is not transactional, and thus it is only exposed internally for usage in certain operations which require a high write performance (such as the accumulation of time series data).
a96360a to
122f55d
Compare
There was a problem hiding this comment.
s/not transactional/requires context-specific implementation and is not transactional/
|
LGTM |
We have been seeing long startup times which disappear spontaneously. During a
restart of the beta cluster, the following goroutine was observed, which suggests
that we were spending a lot of time GCing replicas on startup.
engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0)
engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235)
storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316)
storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0)
storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...)
storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...)
engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...)
storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110)
storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, #185, 0x1)
server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55)
server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...)
server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1)
cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0)
cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177)
cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72)
cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343)
cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0)
main main.go:37 main()
We have been seeing long startup times which disappear spontaneously. During a
restart of the beta cluster, the following goroutine was observed, which suggests
that we were spending a lot of time GCing replicas on startup.
engine ??:0 _Cfunc_DBIterNext(cockroachdb#324, cockroachdb#323, 0, 0, 0, 0, 0, 0, 0)
engine rocksdb.go:1135 (*rocksDBIterator).Next(cockroachdb#235)
storage replica_data_iter.go:104 (*ReplicaDataIterator).Next(cockroachdb#316)
storage store.go:1748 (*Store).destroyReplicaData(#109, cockroachdb#317, 0, 0)
storage store.go:841 (*Store).Start.func2(0x101b, cockroachdb#300, 0x36, 0x40, cockroachdb#301, 0x36, 0x40, cockroachdb#315, 0x3, 0x4, ...)
storage store.go:734 IterateRangeDescriptors.func1(cockroachdb#306, 0x40, 0x41, cockroachdb#307, 0x92, 0x92, cockroachdb#341, 0, 0x186c, 0x4000, ...)
engine mvcc.go:1593 MVCCIterate(cockroachdb#329, #68, #47, #81, cockroachdb#232, 0x9, 0x10, cockroachdb#233, 0xb, 0x10, ...)
storage store.go:738 IterateRangeDescriptors(cockroachdb#330, cockroachdb#196, #47, #81, cockroachdb#195, #179, #110)
storage store.go:867 (*Store).Start(#109, cockroachdb#330, cockroachdb#196, #179, #185, 0x1)
server node.go:405 (*Node).initStores(#78, cockroachdb#330, cockroachdb#196, #98, 0x1, 0x1, #179, 0, #55)
server node.go:330 (*Node).start(#78, cockroachdb#330, cockroachdb#196, #42, #129, #98, 0x1, 0x1, 0, 0, ...)
server server.go:431 (*Server).Start(#5, cockroachdb#330, cockroachdb#196, #95, 0x1)
cli start.go:368 runStart(#34, #178, 0, 0x9, 0, 0)
cobra command.go:599 (*Command).execute(#34, #177, 0x9, 0x9, #34, #177)
cobra command.go:689 (*Command).ExecuteC(#33, #70, cockroachdb#343, #72)
cobra command.go:648 (*Command).Execute(#33, #71, cockroachdb#343)
cli cli.go:96 Run(#64, 0xa, 0xa, 0, 0)
main main.go:37 main()
Add the InternalMerge command, which is replacing the anticipated (but never
implemented) AccumulateTS command. InternalMerge is similar to a put in that it
writes a single value to a key - however, the supplied value is merged into
the existing value instead of replacing it.
Merging is not transactional, and thus it is only exposed internally for usage
in certain operations which require a high write performance (such as the
accumulation of time series data).