testcluster: Introduce a HybridManualClock and wire it into TestCluster#59059
Conversation
e0131bd to
42a7b9d
Compare
cd270ba to
4b2b221
Compare
tbg
left a comment
There was a problem hiding this comment.
Reviewed 10 of 10 files at r1.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @lunevalex)
pkg/kv/kvserver/store.go, line 807 at r1 (raw file):
// use the TestCluster dont have an easy way to get at the config. // TODO(lunevalex): Remove the LeaseExpiration on StoreConfig, once multiTestContext is removed. func (s *Store) LeaseExpiration() int64 {
I would expose the StoreConfig instead.
pkg/server/testserver.go, line 1367 at r1 (raw file):
// an expiration based lease. // // Calling this multiple times is undefined (but see
This should work fine ,does it not? I understand you're just cargo culting but let's take a moment to find out why the old code is so weird.
I think you wanted to mentionTestCluster.ScratchRange in the comment.
pkg/testutils/testcluster/testcluster.go, line 861 at r1 (raw file):
func (tc *TestCluster) ScratchRange(t testing.TB) roachpb.Key { if tc.scratchRangeKey != nil { return tc.scratchRangeKey
I never quite understood what this caching was for. I guess ScratchRange() is not idempotent, let's find out why and revisit.
pkg/util/hlc/hlc.go, line 148 at r1 (raw file):
// Increment atomically increments the hybrid manual clock's timestamp. func (m *HybridManualClock) Increment(incr int64) {
Btw nothing prevents this clock from going backwards, which, maybe, is fine? The HLC is going to tick the logicals instead. The real question is which parts of the system don't work with a clock whose walltime does not move forward. Maybe instead of my current approach of calling this situation bad, it is in fact the code that we have in the system that uses hlc timestamps to measure durations that is bad, such as this:
cockroach/pkg/kv/kvserver/store_pool.go
Lines 751 to 767 in 789ea8c
If we wanted to fix this, we'd need to separate out two clock sources: one clock source that powers the hlc (and which may just never move and that's fine), and another one that we use to measure durations. In practice, that just means using timeutil.Now(), etc and never using the hlc timestamps for setting backoffs, i.e. this would just use timeutil.Now():
cockroach/pkg/kv/kvserver/store_pool.go
Lines 493 to 495 in 789ea8c
Of course this messes with how this time source can be mocked in tests, so in the long run it does make more sense to remove timeutil (while continuing to prohibit direct references to time.Now) and instead force use of a more locally scoped clock for this purpose.
Filed #59239
|
A note about durations when wall clock does not move: we can still compute a duration, just adding more digits of precision at the end, with the logical clock value (i.e. make the logical part pico/femtoseconds) |
lunevalex
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)
pkg/kv/kvserver/store.go, line 807 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I would expose the
StoreConfiginstead.
OK, was not sure if it was private for a reason. store,go has quite a bit of code that exposes things from the StoreConfig in this manner. I will expose the Config and the do a follow up to clean up the other places where we exposed fields from the config in this manner for testing.
pkg/server/testserver.go, line 1367 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This should work fine ,does it not? I understand you're just cargo culting but let's take a moment to find out why the old code is so weird.
I think you wanted to mention
TestCluster.ScratchRangein the comment.
I think it's there because of the split, I tested it locally and calling split on the same key twice works just fine. replica_command has a guard against that If the range starts at the splitKey, we treat the AdminSplit as a no-op and return success instead of throwing an error.. I think given all of this I can clean up the language and get rid of the caching in TestCluster, i.e. the comment below.
pkg/testutils/testcluster/testcluster.go, line 861 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I never quite understood what this caching was for. I guess
ScratchRange()is not idempotent, let's find out why and revisit.
ScratchRange issues a split on the range, so the caching is the guard to protect us from calling the split a second time, which based on the comment above it not really needed.
Makes progress on cockroachdb#8299 This commit introduces a new type of ManualClock a HybridManualClock. and wires it into the TestCluster. This clock follows the physical wall time of a regular clock, but allows the developer to move it forward. This is needed to be able to test functionality around lease expiration and other time based mechanisms. To verify that the clock is usefull in tests, a single test in client_merge_tests.go is converted to use TestCluster with a HybridManualClock. To make the test possible we also need a simple way to create a range with an expiration based lease. This is done through the new TestCluster.ScratchRangeWithExpirationLease function. Release note: None
4b2b221 to
634286a
Compare
tbg
left a comment
There was a problem hiding this comment.
Reviewed 1 of 5 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @lunevalex, and @tbg)
pkg/testutils/testcluster/testcluster_test.go, line 351 at r2 (raw file):
require.NotNil(t, lease.Expiration) // Verify idempotence of ScratchRangeWithExpirationLease
.
|
TFTR! bors r+ |
|
Build succeeded: |
…s private A lot of tests required things from StoreConfig, which was previously private on the Store. Now that cockroachdb#59059 exposed the StoreConfig, we can clean up some of this and remove the extra getters added to extract things from the config for example NodeLiveness, Raft settings etc. Release note: None
59196: ui: show correct zone config r=dhartunian a=nkodali Previously on the table detail page in DB Console, the default zone config for the db was shown, rather than any table level settings. Additionally, constraints and lease preferences were not being serialized properly, displaying as "...Object object..." This fixes both of those bugs. The display for the zone configuration statement is additionally updated to show the actual SQL statement to replicate the zone config in the SQL shell. Previously an invalid statement "CONFIGURE ZONE USING..." was displayed. Resolves #57896. See also: https://github.com/cockroachlabs/support/issues/737. See also: https://github.com/cockroachlabs/support/issues/727. Release note (bug fix): Fixed a bug introduced in v20.1 in DB Console where incorrect zone configuration values were shown on the table details page and constraints and lease preferences were not displayed. Release note (ui change): Updates the table details page to show table specific zone configuration values when set, show constraints and lease preferences, and display a valid statement to re-configure zone configuration for that table. 59265: kvserver: cleanup store.go to remove functions exposed because cfg was private r=tbg a=lunevalex A lot of tests required things from StoreConfig, which was previously private on the Store. Now that #59059 exposed the StoreConfig, we can clean up some of this and remove the extra getters added to extract things from the config for example NodeLiveness, Raft settings etc. Release note: None 59280: kvserver: replace multiTestContext with TestCluster/TestServer in node_liveness_test r=tbg a=lunevalex Makes progress on #8299 multiTestContext is a legacy construct that is deprecated in favor of running tests via TestCluster. This is one PR out of many to remove the usage of multiTestContext in the node_liveness_test test cases. Release note: None Co-authored-by: Namrata Kodali <namrata@cockroachlabs.com> Co-authored-by: Alex Lunev <alexl@cockroachlabs.com>
Makes progress on #8299
This commit introduces a new type of ManualClock a HybridManualClock.
and wires it into the TestCluster. This clock follows the physical
wall time of a regular clock, but allows the developer to move it
forward. This is needed to be able to test functionality around
lease expiration and other time based mechanisms.
To verify that the clock is usefull in tests, a single test in
client_merge_tests.go is converted to use TestCluster with a
HybridManualClock. To make the test possible we also need a
simple way to create a range with an expiration based lease. This
is done through the new TestCluster.ScratchRangeWithExpirationLease
function.
Release note: None