jobs: introduce system.job_info table and helpers#84866
jobs: introduce system.job_info table and helpers#84866craig[bot] merged 3 commits intocockroachdb:masterfrom
Conversation
postamar
left a comment
There was a problem hiding this comment.
I feel like there must be some issue or RFC or other documentation providing context as to what purpose this table will serve, could you provide a link please? I'm not sure where to look, I've looked in the RFCs folder and the Jobs board to no avail.
Reviewed 10 of 10 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 41 of 41 files at r4, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @samiskin, and @stevendanna)
pkg/jobs/registry.go line 975 at r2 (raw file):
} if nDeletedInfos, err = r.ex.Exec( ctx, "gc-jobs", nil /* txn */, infoStmt, toDelete,
Shouldn't this be in the same transaction as above? We don't have an explicit foreign key but I'm slightly worried about logical referential integrity, if that makes sense.
pkg/jobs/registry.go line 977 at r2 (raw file):
ctx, "gc-jobs", nil /* txn */, infoStmt, toDelete, ); err != nil { return false, 0, errors.Wrap(err, "deleting old jobs")
nit: is there any value in giving this query and error message wrapper distinct string literals, versus the system.jobs query?
pkg/jobs/registry.go line 682 at r3 (raw file):
// IterateJobInfo iterates though the info records for a given job and info key // prefix. // TODO(dt): use pagination in query.
Shouldn't this be tracked in an issue which is marked as a GA-blocker? It seems important enough. Please ignore if this is the case already.
pkg/jobs/registry.go line 744 at r3 (raw file):
ctx, "job-info-write", txn, sessiondata.InternalExecutorOverride{User: username.NodeUserName()}, "DELETE FROM system.job_info WHERE job_id = $1 AND info_key = $2",
Why is written in the primary key at all, if there's effectively a 1:1 relationship between it and the (job_id, info_key) pair? This deserves commentary, otherwise one might wonder: why not simply rely on the crdb_internal_mvcc_timestamp column instead?
pkg/jobs/registry.go line 755 at r3 (raw file):
ctx, "job-info-write", txn, sessiondata.InternalExecutorOverride{User: username.NodeUserName()}, `INSERT INTO system.job_info (job_id, info_key, written, value) VALUES ($1, $2, now(), $3)`,
nit: is the inclusion of written deliberate here? I feel it's fine to remove and rely on its default value, it's a common-enough pattern.
pkg/sql/catalog/systemschema/system.go line 682 at r1 (raw file):
CREATE TABLE system.job_info ( job_id INT8 NOT NULL, info_key BYTES NOT NULL,
I'm missing context here on what this is. Can you please explain, either in the code or in an issue? Am I correct in guessing that this table is a deliberately unstructured key-value store for each job?
pkg/upgrade/upgrades/system_job_info_test.go line 28 at r1 (raw file):
) func TestSystemJobInfoMigration(t *testing.T) {
nit: FYI we call them "upgrades" now, "migration" is so 2021.
pkg/upgrade/upgrades/system_job_info_test.go line 41 at r1 (raw file):
tc := testcluster.StartTestCluster(t, 1, base.TestClusterArgs{ ServerArgs: base.TestServerArgs{ Settings: settings,
Is this necessary at all? I'm not seeing this pattern in other upgrade tests.
pkg/upgrade/upgrades/system_job_info_test.go line 45 at r1 (raw file):
Server: &server.TestingKnobs{ DisableAutomaticVersionUpgrade: make(chan struct{}), BinaryVersionOverride: clusterversion.ByKey(clusterversion.SystemPrivilegesTable - 1),
Shouldn't the key here be CreateSystemJobInfoTable instead of SystemPrivilegesTable? See also elsewhere in this test definition. I'm guessing that's why the test is failing presently.
pkg/jobs/registry_test.go
Outdated
| ctx := context.Background() | ||
| defer s.Stopper().Stop(ctx) | ||
|
|
||
| //db := sqlutils.MakeSQLRunner(kv) |
| info_key BYTES NOT NULL, | ||
| written TIMESTAMPTZ NOT NULL DEFAULT now(), | ||
| value BYTES, | ||
| CONSTRAINT "primary" PRIMARY KEY (job_id, info_key, written DESC), |
There was a problem hiding this comment.
why is written part of the primary key, given that the accessor helpers (in commit 3) look up by (job_id, info_key)?
I guess I'll need to read the implementation to see how we handle the case when there are >1 rows with the same (job_id, info_key) but different 'written' time.
|
Thanks for this @ajwerner. This helps clear up a lot of the questions I had regarding the |
|
Frankly I'd prefer we didn't pass naked |
dt
left a comment
There was a problem hiding this comment.
I very much don't want a central library handing out ImportInfo / RestoreInfo / SchemaChangeInfo / etc; we shouldn't need to edit the core jobs code dependency root every time a leaf job changes a field in its persisted data (we do today and it's not good). I want a "I want to store this thing and I'll ask for it later" API where the library is agnostic to what we're storing.
That said, I could see switching from []byte to proto.Message or a message job.InfoRecord { types.Any payload = 1} wrapper proto or something (though I've started to worry about types.Any and refactoring changing persisted type names more lately).
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @postamar, @samiskin, and @stevendanna)
pkg/jobs/registry.go line 975 at r2 (raw file):
Previously, postamar (Marius Posta) wrote…
Shouldn't this be in the same transaction as above? We don't have an explicit foreign key but I'm slightly worried about logical referential integrity, if that makes sense.
Great question!
I wanted an FK but ultimately we can't have one, since we want to allow job creators to stage info rows before creating a job row, and indeed even to do so in separate transactions, before the txn that creates the job row. Allowing this -- that info rows can be created even without a job row -- is pretty important since we want to do as little as possible in that job-creating transaction since that causes job. So we've accepted that there may be at least temporary inconsistency between these, and indeed more than that -- a job creator could crash after some of its staging, with some info rows and then fail to create a job and not clean them up. So we might need a recon cleanup loop of last resort anyway (though not urgently -- a couple orphaned info rows are pretty harmless). Given that, I'm inclined not to pile them into the same txn here., as long as we delete from the jobs first, then the infos (which we do), just to minimize the risk of something in the info deletion blocking job deletion.
pkg/jobs/registry.go line 977 at r2 (raw file):
Previously, postamar (Marius Posta) wrote…
nit: is there any value in giving this query and error message wrapper distinct string literals, versus the
system.jobsquery?
I think our errors library usually gets us the stack anyway so I don't think so?
pkg/jobs/registry.go line 682 at r3 (raw file):
Previously, postamar (Marius Posta) wrote…
Shouldn't this be tracked in an issue which is marked as a GA-blocker? It seems important enough. Please ignore if this is the case already.
Only an issue if/once it has a production caller who is iterating a non-trivial number of rows, so I'd say it isn't a blocker until then -- i.e. if initial callers just persist 2 info rows for each job (payload and progress) then this can wait.
pkg/jobs/registry.go line 744 at r3 (raw file):
Previously, postamar (Marius Posta) wrote…
Why is
writtenin the primary key at all, if there's effectively a 1:1 relationship between it and the(job_id, info_key)pair? This deserves commentary, otherwise one might wonder: why not simply rely on thecrdb_internal_mvcc_timestampcolumn instead?
As you found/mentioned above, this is mentioned in the design doc but we're doing our own mvcc since we know these job info (logical) rows get lots of "in-place" updates in long-running jobs so we're splitting that into separate crdb table (physical) rows, since KV and SQL don't allow splitting between kv mvcc revisions of a single sql row, but we want to be able to split between revisions of a job row.
pkg/jobs/registry.go line 755 at r3 (raw file):
Previously, postamar (Marius Posta) wrote…
nit: is the inclusion of
writtendeliberate here? I feel it's fine to remove and rely on its default value, it's a common-enough pattern.
Personal preference: I like to see what I'm writing and not need to pull up the schema to find the default defs.
92d0130 to
6e4e97a
Compare
dt
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @baoalvin1, @postamar, @samiskin, @stevendanna, and @Xiang-Gu)
pkg/sql/catalog/systemschema/system.go line 685 at r1 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
why is
writtenpart of the primary key, given that the accessor helpers (in commit 3) look up by (job_id, info_key)?I guess I'll need to read the implementation to see how we handle the case when there are >1 rows with the same (job_id, info_key) but different 'written' time.
I added a comment inline here summarizing the reasoning explored in the RFC.
d278530 to
453309b
Compare
dt
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @baoalvin1, @dt, @postamar, @samiskin, @stevendanna, and @Xiang-Gu)
pkg/sql/catalog/systemschema/system.go line 682 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
I'm missing context here on what this is. Can you please explain, either in the code or in an issue? Am I correct in guessing that this table is a deliberately unstructured key-value store for each job?
The RFC linked in the PR now is pretty explicit about the fact being arbitrary keys is deliberate.
pkg/jobs/registry_test.go line 1155 at r3 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
remove this line?
Done.
pkg/jobs/registry_test.go line 1172 at r3 (raw file):
Previously, Xiang-Gu (Xiang Gu) wrote…
nit: "is now found with "
Done.
pkg/upgrade/upgrades/system_job_info_test.go line 41 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
Is this necessary at all? I'm not seeing this pattern in other upgrade tests.
I think all the upgrade tests do this, so they can initialize the version to before they run.
pkg/upgrade/upgrades/system_job_info_test.go line 45 at r1 (raw file):
Previously, postamar (Marius Posta) wrote…
Shouldn't the key here be
CreateSystemJobInfoTableinstead ofSystemPrivilegesTable? See also elsewhere in this test definition. I'm guessing that's why the test is failing presently.
The start should be before the migration so that the test can assert that running the migration actually advances from before to after states as expected; at the time I wrote this SystemPrivilegesTable was the most recent version so that's what it used as its before, but now I makes more sense to be V22_2.
postamar
left a comment
There was a problem hiding this comment.
I understand that you're still fighting CI test failures, but this seems close enough.
I very much don't want a central library handing out ImportInfo ... [...] That said, I could see switching from []byte to proto.Message
Fair enough. Also, the []byte -> proto.Message doesn't need to happen now, if ever.
f01f634 to
1fd3669
Compare
Think that's the last of them, for this rebase at least! |
912bfa9 to
7849028
Compare
|
Bump; this is ready for another (last? maybe?) round of reviews |
pkg/upgrade/upgrades/precondition_before_starting_an_upgrade_external_test.go
Show resolved
Hide resolved
| // GetJobInfo fetches the latest info record for the given job and infoKey. | ||
| func (r *Registry) GetJobInfo( |
There was a problem hiding this comment.
One thing that's sort of nice about the job writing APIs are that they observe the session claim if there is one (which there is on the job object during execution) and will fail if the job has been preempted. We'll lose this in this PR. I also am rapidly falling out of love with these methods on the registry which take a txn. Can I nudge you towards making a new structure which holds onto a reference to the job and the txn and then has methods to read and write these keys?
maybe like:
func (j *Job) InfoStorage(txn *kv.Txn) InfoStorage
type InfoStorage struct {
j *Job
txn *kv.Txn
}
func (is InfoStore) Get(key []byte) (value []byte, exists bool, _ error) There was a problem hiding this comment.
We discussed this offline a bit, and I while I'm not opposed to adding wrapper for interacting with Infos that closes over what txn to use and/or adds lease-held assertions, I feel like that could easily be done in a follow-up change, and I'd rather land this one with just the bare-bones system table and raw accessors, which could then be the basis for the proposed wrapper in a follow-up?
msbutler
left a comment
There was a problem hiding this comment.
Hope no one slips another system table in before you merge :D
ajwerner
left a comment
There was a problem hiding this comment.
I'm cool with the wait-and-see approach
stevendanna
left a comment
There was a problem hiding this comment.
Thanks for working on this.
It will be nice to start playing with this.
I think we will want to think about the semantics of updating a job_info record with or without the lease, but I'm happy to leave that to a follow up if it ends up being needed, as you already indicated.
| fn func(infoKey []byte, value []byte) error, | ||
| txn *kv.Txn, | ||
| ) (retErr error) { | ||
| // TODO(dt): verify this predicate hits the index. |
There was a problem hiding this comment.
I'm not seeing a secondary index for it to hit. For the primary index, it looks like the job_id predicate does allow for a limited scan on the job_id but it doesn't do anything with the info_key (which I think is probably fine for now):
> EXPLAIN SELECT info_key, value FROM job_info WHERE job_id = 98 AND substring(info_key for 3) = 'foo' ORDER BY info_key ASC, written DESC;
info
------------------------------------------------------------------------------------
distribution: local
vectorized: true
• filter
│ estimated row count: 1
│ filter: substring(info_key, 1, 3) = '\x666f6f'
│
└── • scan
estimated row count: 3 (0.10% of the table; stats collected 5 seconds ago)
table: job_info@primary
spans: [/98 - /98]
There was a problem hiding this comment.
the primary (only) index is the one I want it to hit, but I wanted that span to be tighter -- in theory the optimizer should be able to know to constrain it to /98/foo, not just /98
Release note: none. Epic: none.
Release note: none.
Release note: none.
|
TFTRs! bors r+ |
|
Build succeeded: |
See commits.
Design: #82638
Epic: None.