Skip to content

testcluster: Introduce a HybridManualClock and wire it into TestCluster#59059

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
lunevalex:alex/mtc-hybrid-manual-clock
Jan 21, 2021
Merged

testcluster: Introduce a HybridManualClock and wire it into TestCluster#59059
craig[bot] merged 1 commit intocockroachdb:masterfrom
lunevalex:alex/mtc-hybrid-manual-clock

Conversation

@lunevalex
Copy link
Copy Markdown

Makes progress on #8299

This commit introduces a new type of ManualClock a HybridManualClock.
and wires it into the TestCluster. This clock follows the physical
wall time of a regular clock, but allows the developer to move it
forward. This is needed to be able to test functionality around
lease expiration and other time based mechanisms.

To verify that the clock is usefull in tests, a single test in
client_merge_tests.go is converted to use TestCluster with a
HybridManualClock. To make the test possible we also need a
simple way to create a range with an expiration based lease. This
is done through the new TestCluster.ScratchRangeWithExpirationLease
function.

Release note: None

@lunevalex lunevalex requested a review from a team as a code owner January 15, 2021 18:13
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@lunevalex lunevalex force-pushed the alex/mtc-hybrid-manual-clock branch from e0131bd to 42a7b9d Compare January 15, 2021 18:41
@lunevalex lunevalex requested review from andreimatei and tbg January 15, 2021 18:42
@lunevalex lunevalex force-pushed the alex/mtc-hybrid-manual-clock branch 4 times, most recently from cd270ba to 4b2b221 Compare January 20, 2021 06:05
Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 10 of 10 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @lunevalex)


pkg/kv/kvserver/store.go, line 807 at r1 (raw file):

// use the TestCluster dont have an easy way to get at the config.
// TODO(lunevalex): Remove the LeaseExpiration on StoreConfig, once multiTestContext is removed.
func (s *Store) LeaseExpiration() int64 {

I would expose the StoreConfig instead.


pkg/server/testserver.go, line 1367 at r1 (raw file):

// an expiration based lease.
//
// Calling this multiple times is undefined (but see

This should work fine ,does it not? I understand you're just cargo culting but let's take a moment to find out why the old code is so weird.

I think you wanted to mentionTestCluster.ScratchRange in the comment.


pkg/testutils/testcluster/testcluster.go, line 861 at r1 (raw file):

func (tc *TestCluster) ScratchRange(t testing.TB) roachpb.Key {
	if tc.scratchRangeKey != nil {
		return tc.scratchRangeKey

I never quite understood what this caching was for. I guess ScratchRange() is not idempotent, let's find out why and revisit.


pkg/util/hlc/hlc.go, line 148 at r1 (raw file):

// Increment atomically increments the hybrid manual clock's timestamp.
func (m *HybridManualClock) Increment(incr int64) {

Btw nothing prevents this clock from going backwards, which, maybe, is fine? The HLC is going to tick the logicals instead. The real question is which parts of the system don't work with a clock whose walltime does not move forward. Maybe instead of my current approach of calling this situation bad, it is in fact the code that we have in the system that uses hlc timestamps to measure durations that is bad, such as this:

case throttleDeclined:
timeout := DeclinedReservationsTimeout.Get(&sp.st.SV)
detail.throttledUntil = sp.clock.PhysicalTime().Add(timeout)
if log.V(2) {
ctx := sp.AnnotateCtx(context.TODO())
log.Infof(ctx, "snapshot declined (%s), s%d will be throttled for %s until %s",
why, storeID, timeout, detail.throttledUntil)
}
case throttleFailed:
timeout := FailedReservationsTimeout.Get(&sp.st.SV)
detail.throttledUntil = sp.clock.PhysicalTime().Add(timeout)
if log.V(2) {
ctx := sp.AnnotateCtx(context.TODO())
log.Infof(ctx, "snapshot failed (%s), s%d will be throttled for %s until %s",
why, storeID, timeout, detail.throttledUntil)
}
}

If we wanted to fix this, we'd need to separate out two clock sources: one clock source that powers the hlc (and which may just never move and that's fine), and another one that we use to measure durations. In practice, that just means using timeutil.Now(), etc and never using the hlc timestamps for setting backoffs, i.e. this would just use timeutil.Now():

// NB: We use clock.Now().GoTime() instead of clock.PhysicalTime() is order to
// take clock signals from remote nodes into consideration.
now := sp.clock.Now().GoTime()

Of course this messes with how this time source can be mocked in tests, so in the long run it does make more sense to remove timeutil (while continuing to prohibit direct references to time.Now) and instead force use of a more locally scoped clock for this purpose.

Filed #59239

@knz
Copy link
Copy Markdown
Contributor

knz commented Jan 21, 2021

A note about durations when wall clock does not move: we can still compute a duration, just adding more digits of precision at the end, with the logical clock value (i.e. make the logical part pico/femtoseconds)

Copy link
Copy Markdown
Author

@lunevalex lunevalex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @tbg)


pkg/kv/kvserver/store.go, line 807 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I would expose the StoreConfig instead.

OK, was not sure if it was private for a reason. store,go has quite a bit of code that exposes things from the StoreConfig in this manner. I will expose the Config and the do a follow up to clean up the other places where we exposed fields from the config in this manner for testing.


pkg/server/testserver.go, line 1367 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

This should work fine ,does it not? I understand you're just cargo culting but let's take a moment to find out why the old code is so weird.

I think you wanted to mentionTestCluster.ScratchRange in the comment.

I think it's there because of the split, I tested it locally and calling split on the same key twice works just fine. replica_command has a guard against that If the range starts at the splitKey, we treat the AdminSplit as a no-op and return success instead of throwing an error.. I think given all of this I can clean up the language and get rid of the caching in TestCluster, i.e. the comment below.


pkg/testutils/testcluster/testcluster.go, line 861 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I never quite understood what this caching was for. I guess ScratchRange() is not idempotent, let's find out why and revisit.

ScratchRange issues a split on the range, so the caching is the guard to protect us from calling the split a second time, which based on the comment above it not really needed.

Makes progress on cockroachdb#8299

This commit introduces a new type of ManualClock a HybridManualClock.
and wires it into the TestCluster.  This clock follows the physical
wall time of a regular clock, but allows the developer to move it
forward. This is needed to be able to test functionality around
lease expiration and other time based mechanisms.

To verify that the clock is usefull in tests, a single test in
client_merge_tests.go is converted to use TestCluster with a
HybridManualClock. To make the test possible we also need a
simple way to create a range with an expiration based lease. This
is done through the new TestCluster.ScratchRangeWithExpirationLease
function.

Release note: None
@lunevalex lunevalex force-pushed the alex/mtc-hybrid-manual-clock branch from 4b2b221 to 634286a Compare January 21, 2021 16:02
Copy link
Copy Markdown
Member

@tbg tbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 1 of 5 files at r2.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @lunevalex, and @tbg)


pkg/testutils/testcluster/testcluster_test.go, line 351 at r2 (raw file):

	require.NotNil(t, lease.Expiration)

	// Verify idempotence of ScratchRangeWithExpirationLease

.

@tbg tbg self-requested a review January 21, 2021 16:42
@lunevalex
Copy link
Copy Markdown
Author

TFTR!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 21, 2021

Build succeeded:

@craig craig bot merged commit 3b6a17d into cockroachdb:master Jan 21, 2021
@lunevalex lunevalex deleted the alex/mtc-hybrid-manual-clock branch January 21, 2021 19:40
lunevalex pushed a commit to lunevalex/cockroach that referenced this pull request Jan 21, 2021
…s private

A lot of tests required things from StoreConfig, which was previously
private on the Store. Now that cockroachdb#59059 exposed the StoreConfig,
we can clean up some of this and remove the extra getters added
to extract things from the config for example NodeLiveness, Raft settings etc.

Release note: None
craig bot pushed a commit that referenced this pull request Jan 28, 2021
59196: ui: show correct zone config r=dhartunian a=nkodali

Previously on the table detail page in DB Console,
the default zone config for the db was shown, rather
than any table level settings. Additionally,
constraints and lease preferences were not being
serialized properly, displaying as
"...Object object..." This fixes both of those bugs.

The display for the zone configuration statement is
additionally updated to show the actual SQL statement
to replicate the zone config in the SQL shell.
Previously an invalid statement "CONFIGURE ZONE USING..."
was displayed.

Resolves #57896.
See also: https://github.com/cockroachlabs/support/issues/737.
See also: https://github.com/cockroachlabs/support/issues/727.

Release note (bug fix): Fixed a bug introduced in v20.1 in
DB Console where incorrect zone configuration values were
shown on the table details page and constraints and lease
preferences were not displayed.

Release note (ui change): Updates the table details page
to show table specific zone configuration values when set,
show constraints and lease preferences, and display
a valid statement to re-configure zone configuration
for that table.

59265: kvserver: cleanup store.go to remove functions exposed because cfg was private r=tbg a=lunevalex

A lot of tests required things from StoreConfig, which was previously
private on the Store. Now that #59059 exposed the StoreConfig,
we can clean up some of this and remove the extra getters added
to extract things from the config for example NodeLiveness, Raft settings etc.

Release note: None

59280: kvserver: replace multiTestContext with TestCluster/TestServer in node_liveness_test r=tbg a=lunevalex

Makes progress on #8299

multiTestContext is a legacy construct that is deprecated in favor of running
tests via TestCluster. This is one PR out of many to remove the usage of
multiTestContext in the node_liveness_test test cases.

Release note: None

Co-authored-by: Namrata Kodali <namrata@cockroachlabs.com>
Co-authored-by: Alex Lunev <alexl@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants