Skip to content

jobs: introduce system.job_info table and helpers#84866

Merged
craig[bot] merged 3 commits intocockroachdb:masterfrom
dt:job-info
Nov 29, 2022
Merged

jobs: introduce system.job_info table and helpers#84866
craig[bot] merged 3 commits intocockroachdb:masterfrom
dt:job-info

Conversation

@dt
Copy link
Copy Markdown
Contributor

@dt dt commented Jul 21, 2022

See commits.

Design: #82638

Epic: None.

@dt dt requested review from a team and adityamaru July 21, 2022 18:51
@dt dt requested review from a team as code owners July 21, 2022 18:51
@dt dt requested a review from a team July 21, 2022 18:51
@dt dt requested a review from a team as a code owner July 21, 2022 18:51
@dt dt requested a review from a team July 21, 2022 18:51
@dt dt requested a review from a team as a code owner July 21, 2022 18:51
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@shermanCRL shermanCRL requested review from a team, samiskin and stevendanna and removed request for a team July 21, 2022 18:58
Copy link
Copy Markdown

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there must be some issue or RFC or other documentation providing context as to what purpose this table will serve, could you provide a link please? I'm not sure where to look, I've looked in the RFCs folder and the Jobs board to no avail.

Reviewed 10 of 10 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 41 of 41 files at r4, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @dt, @samiskin, and @stevendanna)


pkg/jobs/registry.go line 975 at r2 (raw file):

		}
		if nDeletedInfos, err = r.ex.Exec(
			ctx, "gc-jobs", nil /* txn */, infoStmt, toDelete,

Shouldn't this be in the same transaction as above? We don't have an explicit foreign key but I'm slightly worried about logical referential integrity, if that makes sense.


pkg/jobs/registry.go line 977 at r2 (raw file):

			ctx, "gc-jobs", nil /* txn */, infoStmt, toDelete,
		); err != nil {
			return false, 0, errors.Wrap(err, "deleting old jobs")

nit: is there any value in giving this query and error message wrapper distinct string literals, versus the system.jobs query?


pkg/jobs/registry.go line 682 at r3 (raw file):

// IterateJobInfo iterates though the info records for a given job and info key
// prefix.
// TODO(dt): use pagination in query.

Shouldn't this be tracked in an issue which is marked as a GA-blocker? It seems important enough. Please ignore if this is the case already.


pkg/jobs/registry.go line 744 at r3 (raw file):

		ctx, "job-info-write", txn,
		sessiondata.InternalExecutorOverride{User: username.NodeUserName()},
		"DELETE FROM system.job_info WHERE job_id = $1 AND info_key = $2",

Why is written in the primary key at all, if there's effectively a 1:1 relationship between it and the (job_id, info_key) pair? This deserves commentary, otherwise one might wonder: why not simply rely on the crdb_internal_mvcc_timestamp column instead?


pkg/jobs/registry.go line 755 at r3 (raw file):

		ctx, "job-info-write", txn,
		sessiondata.InternalExecutorOverride{User: username.NodeUserName()},
		`INSERT INTO system.job_info (job_id, info_key, written, value) VALUES ($1, $2, now(), $3)`,

nit: is the inclusion of written deliberate here? I feel it's fine to remove and rely on its default value, it's a common-enough pattern.


pkg/sql/catalog/systemschema/system.go line 682 at r1 (raw file):

CREATE TABLE system.job_info (
	job_id INT8 NOT NULL,
	info_key BYTES NOT NULL,

I'm missing context here on what this is. Can you please explain, either in the code or in an issue? Am I correct in guessing that this table is a deliberately unstructured key-value store for each job?


pkg/upgrade/upgrades/system_job_info_test.go line 28 at r1 (raw file):

)

func TestSystemJobInfoMigration(t *testing.T) {

nit: FYI we call them "upgrades" now, "migration" is so 2021.


pkg/upgrade/upgrades/system_job_info_test.go line 41 at r1 (raw file):

	tc := testcluster.StartTestCluster(t, 1, base.TestClusterArgs{
		ServerArgs: base.TestServerArgs{
			Settings: settings,

Is this necessary at all? I'm not seeing this pattern in other upgrade tests.


pkg/upgrade/upgrades/system_job_info_test.go line 45 at r1 (raw file):

				Server: &server.TestingKnobs{
					DisableAutomaticVersionUpgrade: make(chan struct{}),
					BinaryVersionOverride:          clusterversion.ByKey(clusterversion.SystemPrivilegesTable - 1),

Shouldn't the key here be CreateSystemJobInfoTable instead of SystemPrivilegesTable? See also elsewhere in this test definition. I'm guessing that's why the test is failing presently.

ctx := context.Background()
defer s.Stopper().Stop(ctx)

//db := sqlutils.MakeSQLRunner(kv)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this line?

info_key BYTES NOT NULL,
written TIMESTAMPTZ NOT NULL DEFAULT now(),
value BYTES,
CONSTRAINT "primary" PRIMARY KEY (job_id, info_key, written DESC),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is written part of the primary key, given that the accessor helpers (in commit 3) look up by (job_id, info_key)?

I guess I'll need to read the implementation to see how we handle the case when there are >1 rows with the same (job_id, info_key) but different 'written' time.

@ajwerner
Copy link
Copy Markdown
Contributor

@postamar #82638

@postamar
Copy link
Copy Markdown

Thanks for this @ajwerner. This helps clear up a lot of the questions I had regarding the written and info_key columns, please interpret my review comments accordlngly.

@postamar
Copy link
Copy Markdown

Frankly I'd prefer we didn't pass naked []bytes to and from the accessors like GetJobInfo and instead had strongly-typed accessors for each job (sub) type, ideally relying on some specific protobuf message definitions. These would, for the time being, be placeholder empty messages. You probably want protobufs anyway, to guarantee encoding stability across cluster versions and because it's easy to design these messages in a way that allows straightforward definition of prefixes.

Copy link
Copy Markdown
Contributor Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I very much don't want a central library handing out ImportInfo / RestoreInfo / SchemaChangeInfo / etc; we shouldn't need to edit the core jobs code dependency root every time a leaf job changes a field in its persisted data (we do today and it's not good). I want a "I want to store this thing and I'll ask for it later" API where the library is agnostic to what we're storing.

That said, I could see switching from []byte to proto.Message or a message job.InfoRecord { types.Any payload = 1} wrapper proto or something (though I've started to worry about types.Any and refactoring changing persisted type names more lately).

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @postamar, @samiskin, and @stevendanna)


pkg/jobs/registry.go line 975 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

Shouldn't this be in the same transaction as above? We don't have an explicit foreign key but I'm slightly worried about logical referential integrity, if that makes sense.

Great question!

I wanted an FK but ultimately we can't have one, since we want to allow job creators to stage info rows before creating a job row, and indeed even to do so in separate transactions, before the txn that creates the job row. Allowing this -- that info rows can be created even without a job row -- is pretty important since we want to do as little as possible in that job-creating transaction since that causes job. So we've accepted that there may be at least temporary inconsistency between these, and indeed more than that -- a job creator could crash after some of its staging, with some info rows and then fail to create a job and not clean them up. So we might need a recon cleanup loop of last resort anyway (though not urgently -- a couple orphaned info rows are pretty harmless). Given that, I'm inclined not to pile them into the same txn here., as long as we delete from the jobs first, then the infos (which we do), just to minimize the risk of something in the info deletion blocking job deletion.


pkg/jobs/registry.go line 977 at r2 (raw file):

Previously, postamar (Marius Posta) wrote…

nit: is there any value in giving this query and error message wrapper distinct string literals, versus the system.jobs query?

I think our errors library usually gets us the stack anyway so I don't think so?


pkg/jobs/registry.go line 682 at r3 (raw file):

Previously, postamar (Marius Posta) wrote…

Shouldn't this be tracked in an issue which is marked as a GA-blocker? It seems important enough. Please ignore if this is the case already.

Only an issue if/once it has a production caller who is iterating a non-trivial number of rows, so I'd say it isn't a blocker until then -- i.e. if initial callers just persist 2 info rows for each job (payload and progress) then this can wait.


pkg/jobs/registry.go line 744 at r3 (raw file):

Previously, postamar (Marius Posta) wrote…

Why is written in the primary key at all, if there's effectively a 1:1 relationship between it and the (job_id, info_key) pair? This deserves commentary, otherwise one might wonder: why not simply rely on the crdb_internal_mvcc_timestamp column instead?

As you found/mentioned above, this is mentioned in the design doc but we're doing our own mvcc since we know these job info (logical) rows get lots of "in-place" updates in long-running jobs so we're splitting that into separate crdb table (physical) rows, since KV and SQL don't allow splitting between kv mvcc revisions of a single sql row, but we want to be able to split between revisions of a job row.


pkg/jobs/registry.go line 755 at r3 (raw file):

Previously, postamar (Marius Posta) wrote…

nit: is the inclusion of written deliberate here? I feel it's fine to remove and rely on its default value, it's a common-enough pattern.

Personal preference: I like to see what I'm writing and not need to pull up the schema to find the default defs.

@dt dt requested a review from a team as a code owner October 12, 2022 03:23
@baoalvin1 baoalvin1 self-requested a review October 24, 2022 13:56
@dt dt force-pushed the job-info branch 2 times, most recently from 92d0130 to 6e4e97a Compare November 1, 2022 16:18
Copy link
Copy Markdown
Contributor Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @baoalvin1, @postamar, @samiskin, @stevendanna, and @Xiang-Gu)


pkg/sql/catalog/systemschema/system.go line 685 at r1 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

why is written part of the primary key, given that the accessor helpers (in commit 3) look up by (job_id, info_key)?

I guess I'll need to read the implementation to see how we handle the case when there are >1 rows with the same (job_id, info_key) but different 'written' time.

I added a comment inline here summarizing the reasoning explored in the RFC.

@dt dt force-pushed the job-info branch 2 times, most recently from d278530 to 453309b Compare November 9, 2022 10:49
Copy link
Copy Markdown
Contributor Author

@dt dt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @baoalvin1, @dt, @postamar, @samiskin, @stevendanna, and @Xiang-Gu)


pkg/sql/catalog/systemschema/system.go line 682 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

I'm missing context here on what this is. Can you please explain, either in the code or in an issue? Am I correct in guessing that this table is a deliberately unstructured key-value store for each job?

The RFC linked in the PR now is pretty explicit about the fact being arbitrary keys is deliberate.


pkg/jobs/registry_test.go line 1155 at r3 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

remove this line?

Done.


pkg/jobs/registry_test.go line 1172 at r3 (raw file):

Previously, Xiang-Gu (Xiang Gu) wrote…

nit: "is now found with "

Done.


pkg/upgrade/upgrades/system_job_info_test.go line 41 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

Is this necessary at all? I'm not seeing this pattern in other upgrade tests.

I think all the upgrade tests do this, so they can initialize the version to before they run.


pkg/upgrade/upgrades/system_job_info_test.go line 45 at r1 (raw file):

Previously, postamar (Marius Posta) wrote…

Shouldn't the key here be CreateSystemJobInfoTable instead of SystemPrivilegesTable? See also elsewhere in this test definition. I'm guessing that's why the test is failing presently.

The start should be before the migration so that the test can assert that running the migration actually advances from before to after states as expected; at the time I wrote this SystemPrivilegesTable was the most recent version so that's what it used as its before, but now I makes more sense to be V22_2.

Copy link
Copy Markdown

@postamar postamar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that you're still fighting CI test failures, but this seems close enough.

I very much don't want a central library handing out ImportInfo ... [...] That said, I could see switching from []byte to proto.Message

Fair enough. Also, the []byte -> proto.Message doesn't need to happen now, if ever.

@dt dt force-pushed the job-info branch 4 times, most recently from f01f634 to 1fd3669 Compare November 10, 2022 13:55
@dt
Copy link
Copy Markdown
Contributor Author

dt commented Nov 10, 2022

still fighting CI test failures

Think that's the last of them, for this rebase at least!

@dt dt force-pushed the job-info branch 4 times, most recently from 912bfa9 to 7849028 Compare November 10, 2022 17:30
@rafiss rafiss removed the request for review from a team November 15, 2022 16:21
@dt
Copy link
Copy Markdown
Contributor Author

dt commented Nov 16, 2022

Bump; this is ready for another (last? maybe?) round of reviews

Comment on lines +657 to +671
// GetJobInfo fetches the latest info record for the given job and infoKey.
func (r *Registry) GetJobInfo(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing that's sort of nice about the job writing APIs are that they observe the session claim if there is one (which there is on the job object during execution) and will fail if the job has been preempted. We'll lose this in this PR. I also am rapidly falling out of love with these methods on the registry which take a txn. Can I nudge you towards making a new structure which holds onto a reference to the job and the txn and then has methods to read and write these keys?

maybe like:

func (j *Job) InfoStorage(txn *kv.Txn) InfoStorage

type InfoStorage struct {
    j *Job
    txn *kv.Txn
}

func (is InfoStore) Get(key []byte) (value []byte, exists bool, _ error) 

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this offline a bit, and I while I'm not opposed to adding wrapper for interacting with Infos that closes over what txn to use and/or adds lease-held assertions, I feel like that could easily be done in a follow-up change, and I'd rather land this one with just the bare-bones system table and raw accessors, which could then be the basis for the proposed wrapper in a follow-up?

Copy link
Copy Markdown
Collaborator

@msbutler msbutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hope no one slips another system table in before you merge :D

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm cool with the wait-and-see approach

Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this.

It will be nice to start playing with this.

I think we will want to think about the semantics of updating a job_info record with or without the lease, but I'm happy to leave that to a follow up if it ends up being needed, as you already indicated.

fn func(infoKey []byte, value []byte) error,
txn *kv.Txn,
) (retErr error) {
// TODO(dt): verify this predicate hits the index.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not seeing a secondary index for it to hit. For the primary index, it looks like the job_id predicate does allow for a limited scan on the job_id but it doesn't do anything with the info_key (which I think is probably fine for now):

> EXPLAIN SELECT info_key, value       FROM job_info WHERE job_id = 98 AND substring(info_key for 3) = 'foo' ORDER BY info_key ASC, written DESC;
                                        info
------------------------------------------------------------------------------------
  distribution: local
  vectorized: true

  • filter
  │ estimated row count: 1
  │ filter: substring(info_key, 1, 3) = '\x666f6f'
  │
  └── • scan
        estimated row count: 3 (0.10% of the table; stats collected 5 seconds ago)
        table: job_info@primary
        spans: [/98 - /98]

Copy link
Copy Markdown
Contributor Author

@dt dt Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the primary (only) index is the one I want it to hit, but I wanted that span to be tighter -- in theory the optimizer should be able to know to constrain it to /98/foo, not just /98

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Nov 29, 2022

TFTRs!

bors r+

@craig craig bot merged commit edc6fda into cockroachdb:master Nov 29, 2022
@craig
Copy link
Copy Markdown
Contributor

craig bot commented Nov 29, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants