Skip to content

sql: lease acquisition of OFFLINE descs may starve bulk operations#62611

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
fqazi:leaseStarvation
Mar 30, 2021
Merged

sql: lease acquisition of OFFLINE descs may starve bulk operations#62611
craig[bot] merged 1 commit intocockroachdb:masterfrom
fqazi:leaseStarvation

Conversation

@fqazi
Copy link
Copy Markdown
Collaborator

@fqazi fqazi commented Mar 25, 2021

Fixes: #61798

Previously, offline descriptors would never have their leases
cached and they would be released once the reference count hit zero.
This was inadequate because when attempting to online these tables
again the lease acquisition could be pushed back by other operations,
leading to starvation / live locks. To address this, this patch will
allow the leases of offline descriptors to be cached.

Release note (bug fix): Lease acquisitions of descriptor in a
offline state may starve out bulk operations (backup / restore)

@fqazi fqazi requested a review from a team March 25, 2021 17:56
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@fqazi fqazi force-pushed the leaseStarvation branch 2 times, most recently from c6992cb to 8db656d Compare March 25, 2021 21:55
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @fqazi)


pkg/sql/catalog/lease/lease_test.go, line 2726 at r1 (raw file):

}

func TestOfflineLeaseRefresh(t *testing.T) {

a comment sketching the test and what it demonstrates would be nice.


pkg/sql/catalog/lease/lease_test.go, line 2750 at r1 (raw file):

	params := base.TestServerArgs{Knobs: base.TestingKnobs{Store: knobs}}

	//ReplicaRequestFilter

detritus


pkg/sql/catalog/lease/lease_test.go, line 2812 at r1 (raw file):

				// Select from an unrelated table
				_, err = s.InternalExecutor().(sqlutil.InternalExecutor).ExecEx(ctx, "inline-exec", txn, sessiondata.InternalExecutorOverride{User: security.RootUserName()}, "insert into d1.t2 values (10);")

nit: wrap and what not


pkg/sql/catalog/lease/lease_test.go, line 2823 at r1 (raw file):

	for notify := range waitForTxn {
		close(notify)
		mu.RLock()

I think the locking here would bite you if the transaction above restarted. I feel like it can be simplified.


pkg/sql/catalog/lease/lease_test.go, line 2828 at r1 (raw file):

			// leasing out the table again
			_, err = conn.Query("select  * from d1.t1")
			if err == nil {

this conditional seems off.

@fqazi fqazi force-pushed the leaseStarvation branch 2 times, most recently from 6112eb1 to dd97e90 Compare March 26, 2021 15:15
@fqazi fqazi requested a review from ajwerner March 26, 2021 15:15
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should not make 21.1.0 but rather 21.1.1 as it is not new.

It seems like the test times out under stress. Run it on your gceworker with a shorter timeout to shake out these flakes.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @fqazi)

@fqazi fqazi force-pushed the leaseStarvation branch from dd97e90 to 7d5543f Compare March 29, 2021 19:54
@fqazi fqazi requested a review from ajwerner March 29, 2021 19:54
@fqazi fqazi force-pushed the leaseStarvation branch from 7d5543f to 21bba87 Compare March 29, 2021 22:10
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 3 of 3 files at r2.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @fqazi)


pkg/sql/catalog/lease/lease.go, line 1432 at r2 (raw file):

				}
			}
			return descVersion.Descriptor, descVersion.expiration, nil

nit: roll thing into a helper to attach the commentary. Also note that this is to match behavior which existed before we leased offline descriptors.

@fqazi fqazi force-pushed the leaseStarvation branch 3 times, most recently from c4e55b6 to 21c01ec Compare March 30, 2021 17:04
Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs a bazel-generate

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @fqazi)

@fqazi fqazi force-pushed the leaseStarvation branch from 21c01ec to eaf73a6 Compare March 30, 2021 18:19
@fqazi
Copy link
Copy Markdown
Collaborator Author

fqazi commented Mar 30, 2021

needs a bazel-generate

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajwerner and @fqazi)

Thanks @ajwerner , didn't realize what was needed

Fixes: cockroachdb#61798

Previously, offline descriptors would never have their leases
cached and they would be released once the reference count hit zero.
This was inadequate because when attempting to online these tables
again the lease acquisition could be pushed back by other operations,
leading to starvation / live locks. To address this, this patch will
allow the leases of offline descriptors to be cached.

Release note (bug fix): Lease acquisitions of descriptor in a
offline state may starve out bulk operations (backup / restore)
@fqazi fqazi force-pushed the leaseStarvation branch from eaf73a6 to 198fa1a Compare March 30, 2021 19:46
@fqazi
Copy link
Copy Markdown
Collaborator Author

fqazi commented Mar 30, 2021

bors r=ajwerner

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Mar 30, 2021

Build succeeded:

@craig craig bot merged commit fed2225 into cockroachdb:master Mar 30, 2021
@knz
Copy link
Copy Markdown
Contributor

knz commented Apr 13, 2021

@fqazi is this a candidate for a backport?

@knz
Copy link
Copy Markdown
Contributor

knz commented Apr 13, 2021

oh nvm please disregard my last comment, I see the changes have been reverted

@rafiss rafiss added this to the 21.1 milestone Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sql/catalog/lease: high priority reads in lease acquisition of OFFLINE descs may starve bulk operations

5 participants