jobs: prevent a deadlock during upgrades#100579
jobs: prevent a deadlock during upgrades#100579craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
|
It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR? 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
|
The CI failure seems unrelated to this change (maybe_stressrace - see internal discussion). The jobs package passes a stress test on my machine. |
c51d193 to
add193d
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
I believe at some point we already had a code like this (i.e. disabling DistSQL either with old upgrade manager (I think the upgrades have been changed recently, right?) or at about preStart of the SQL server), but I couldn't quickly find it right now. Do you recall something like this?
Reviewed 2 of 2 files at r1, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @adityamaru)
add193d to
383d50b
Compare
|
i might add a test to check the override mechanism on its own |
|
@yuzefovich could you have another look? and @rafiss I'd like to know if you support this general direction. |
yuzefovich
left a comment
There was a problem hiding this comment.
but I agree another pair of eyes would be good here.
Reviewed 3 of 3 files at r2, all commit messages.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @adityamaru, @knz, and @rafiss)
pkg/sql/internal.go line 1527 at r2 (raw file):
} // SessionDataOverride returns a function that can be used to override some
nit: perhaps s/returns a/is a/g.
rafiss
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @adityamaru and @knz)
pkg/sql/internal.go line 1529 at r2 (raw file):
// SessionDataOverride returns a function that can be used to override some // fields in the session data. type SessionDataOverride = func(sd *sessiondata.SessionData)
maybe we could add some commentary on how this differs from sessiondata.InternalExecutorOverride.
some details worth mentioning: this SessionDataOverride one is used when creating the isql.DB, the sessiondata.InternalExecutorOverride is used when invoking the internal executor and it overrides SessionDataOverride
383d50b to
0709809
Compare
0709809 to
9446118
Compare
knz
left a comment
There was a problem hiding this comment.
Also added unit testing for the new mechanism.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @rafiss, and @yuzefovich)
pkg/sql/internal.go line 1527 at r2 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: perhaps
s/returns a/is a/g.
Done.
pkg/sql/internal.go line 1529 at r2 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
maybe we could add some commentary on how this differs from
sessiondata.InternalExecutorOverride.some details worth mentioning: this
SessionDataOverrideone is used when creating the isql.DB, thesessiondata.InternalExecutorOverrideis used when invoking the internal executor and it overridesSessionDataOverride
Good idea, done.
9446118 to
e5113fa
Compare
yuzefovich
left a comment
There was a problem hiding this comment.
Reviewed 9 of 9 files at r3, 4 of 4 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @knz, and @rafiss)
pkg/sql/internal_test.go line 636 at r4 (raw file):
row, err := txn.QueryRow(ctx, "test", txn.KV(), "show default_int_size") require.NoError(t, err) assert.Equal(t, "'0'", row[0].String())
nit: why use both require and assert? The former could handle the latter's cases too I think.
knz
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @adityamaru, @rafiss, and @yuzefovich)
pkg/sql/internal_test.go line 636 at r4 (raw file):
Previously, yuzefovich (Yahor Yuzefovich) wrote…
nit: why use both
requireandassert? The former could handle the latter's cases too I think.
require does log.Fatal under the hood; so if there are multiple errors the first one will mask errors that came afterwards.
With assert, if there are multiple failures I get to see them all side-by-side.
e5113fa to
6cf1f5f
Compare
|
bors r=yuzefovich |
|
This PR was included in a batch that timed out, it will be automatically retried |
Release note: None
Release note: None
Prior to this patch, if multiple SQL instances were started side-by-side but behind on migrations, they would deadlock on performing their migrations because distsql on each instance would be unable to reach other other instances. More generally, we're finding it undesirable for the jobs subsystem to operate on system.jobs / job_info using distributed queries. This patch fixes it by disabling query distribution during job operations. Release note: None
6cf1f5f to
9d4a2a1
Compare
|
Canceled. |
|
bors r=yuzefovich |
|
Build succeeded: |
|
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from b776994 to blathers/backport-release-23.1-100579: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
Needed for #100436.
Informs #100578
Epic: None
Prior to this patch, if multiple SQL instances were started
side-by-side but behind on migrations, they would deadlock on
performing their migrations because distsql on each instance would be
unable to reach other other instances.
More generally, we're finding it undesirable for the jobs subsystem to
operate on system.jobs / job_info using distributed queries.
This patch fixes it by disabling query distribution during job
operations.
The testing here is implicit when the test TestServerControllerMultiNodeTenantStartup is stressed - this used to deadlock under stress without this patch.