Skip to content

sql: force current tenant prefix on read ResumeSpans#73831

Merged
craig[bot] merged 1 commit intocockroachdb:masterfrom
dt:spans
Jan 19, 2022
Merged

sql: force current tenant prefix on read ResumeSpans#73831
craig[bot] merged 1 commit intocockroachdb:masterfrom
dt:spans

Conversation

@dt
Copy link
Copy Markdown
Contributor

@dt dt commented Dec 15, 2021

This changes the only two places where ResumeSpansList is read to force
every read span to have the current tenant's prefix, adding or updating
the existing prefix as needed. This ensures that even if the resume
spans actually stored in the job record value have a prefix
corresponding to some other tenant ID from when that job was backed up,
when read by the new tenant ID those spans instead cover the spans in
that new tenant's prefix.

A future change could update the persisted spans to not include a tenant
prefix in the first place, however such a change would need to take some
care with backwards compatibility -- older nodes, that read spans prior
to this change would expect still the spans they read to have the right
tenant prefix. This change thus paves the way for that future change,
though it becomes mostly an asthenic choice at that point, rather than
important to correctness of rekeyed jobs, as this change on its own
means the persisted prefix, if any, is now largely irrelevant.

Fixes #73801.

Release note: none.

@dt dt requested review from ajwerner and stevendanna December 15, 2021 03:05
@dt dt requested a review from a team as a code owner December 15, 2021 03:05
@dt dt marked this pull request as draft December 15, 2021 03:05
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Dec 15, 2021

I still need to come up with some tests here, but wanted to float this for early feedback, specifically the "we can just do it on read" thesis. I like that it dodges the backwards compat question for now, since we don't change what we write at all and can read the old node writes, and it also makes the migration to write no-prefix span easier later , since we could just version gate to be sure everyone had this patch, compared to changing what we write immediately with a version gate, dual field writes and fallback reads until then, etc.

Copy link
Copy Markdown
Contributor

@ajwerner ajwerner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the thesis seems sound. I've got a PR inbound for the progress on the declarative schema changer which makes sure not to write the prefix.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @stevendanna)

Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this might be a little hard to test since we currently disable cross-tenant cluster restores. Perhaps we could write some unit tests for now and add end-to-end tests when we fix the other issues blocking the backup/restore combinations that would hit this path.

We could use the customRestoreFunc to do this translation once on restore, but this seems like a better step along the road of removing tenant prefixes from the data in these jobs altogether.

@dt dt force-pushed the spans branch 2 times, most recently from 6f937ff to ed37a58 Compare December 22, 2021 06:53
@dt dt marked this pull request as ready for review December 22, 2021 13:09
@dt
Copy link
Copy Markdown
Contributor Author

dt commented Dec 22, 2021

Okay, after a little detour to make it easier to write logic tests that shows what happens -- the backfill runs and validates successfully -- if we resume a backfill with resume spans that belong to a different tenant or to no tenant, and that test file runs in the system tenant and guest tenants, so I think this covers all our cases.

@dt
Copy link
Copy Markdown
Contributor Author

dt commented Jan 14, 2022

friendly ping

This changes the only two places where ResumeSpansList is read to force
every read span to have the current tenant's prefix, adding or updating
the existing prefix as needed. This ensures that even if the resume
spans actually stored in the job record value have a prefix
corresponding to some other tenant ID from when that job was backed up,
when read by the new tenant ID those spans instead cover the spans in
that new tenant's prefix.

A future change could update the persisted spans to not include a tenant
prefix in the first place, however such a change would need to take some
care with backwards compatibility -- older nodes, that read spans prior
to this change would expect still the spans they read to have the right
tenant prefix. This change thus paves the way for that future change,
though it becomes mostly an asthenic choice at that point, rather than
important to correctness of rekeyed jobs, as this change on its own
means the persisted prefix, if any, is now largely irrelevant.

Release note: none.
@dt
Copy link
Copy Markdown
Contributor Author

dt commented Jan 19, 2022

TFTRs!

bors r+

@craig
Copy link
Copy Markdown
Contributor

craig bot commented Jan 19, 2022

Build succeeded:

@craig craig bot merged commit 5aefc07 into cockroachdb:master Jan 19, 2022
@dt dt deleted the spans branch January 19, 2022 14:03
stevendanna added a commit to stevendanna/cockroach that referenced this pull request Apr 5, 2022
This removes a prohibition for cluster restores with mismatched tenant
IDs since we believe they are now correct as of cockroachdb#73831

This allows users to take a cluster backup in a tenant and restore it
into another tenant.

The new tenant_settings table needs special care since it may exist in
the source tenant but not the target tenant when the source tenant is
the system tenant.

In this change, we throw an error in the case of a non-empty
tenant_settings table being restored into a non-system tenant. This is
a bit user-unfriendly since we detect this error rather late in the
restore process.

Release note: None
stevendanna added a commit to stevendanna/cockroach that referenced this pull request Apr 5, 2022
This removes a prohibition for cluster restores with mismatched tenant
IDs since we believe they are now correct as of cockroachdb#73831

This allows users to take a cluster backup in a tenant and restore it
into another tenant.

The new tenant_settings table needs special care since it may exist in
the source tenant but not the target tenant when the source tenant is
the system tenant.

In this change, we throw an error in the case of a non-empty
tenant_settings table being restored into a non-system tenant. This is
a bit user-unfriendly since we detect this error rather late in the
restore process.

Release note: None
craig bot pushed a commit that referenced this pull request Apr 5, 2022
…79409 #79427 #79428 #79433 #79444

76312: kvserver, batcheval: pin Engine state during read-only command evaluation r=aayushshah15 a=aayushshah15

This commit makes it such that we eagerly pin the engine state of the `Reader`
created during the evaluation of read-only requests.

Generally, reads will hold latches throughout the course of their evaluation
(particularly, while they do their `MVCCScan`). Mainly, this commit paves the
way for us to move to a world where we avoid holding latches during the
MVCCScan. Additionally, it also lets us make MVCC garbage collection latchless
as described in #55293.

There are a few notable changes in this patch:

1. Pinning the engine state eagerly runs into #70974. To resolve this, the
closed timestamp of the `Replica` is now captured at the time the `EvalContext`
is created, and not during the command evaluation of
`QueryResolvedTimestampRequest`.

2. `EvalContext` now has a `ImmutableEvalContext` embedded into it. The
`ImmutableEvalContext` is supposed to encapsulate state that must not change
after the `EvalContext` is created. The closed timestamp of the replica is part
of the `ImmutableEvalContext`.

3. `Replica` no longer fully implements the `EvalContext` interface. Instead,
it implements everything but `GetClosedTimestamp()` (which is implemented by
`ImmutableEvalContext` instead).

Relates to #55293
Resolves #55461
Resolves #70974

Release note: None


78652: sql: implement to_reg* builtins r=otan a=e-mbrown

Resolves #77838
This commit implements the `to_regclass`, `to_regnamespace`, `to_regproc`,
`to_regprocedure`, `to_regrole`, and `to_regtype` builtins.

Release note (<category, see below>): The `to_regclass`, `to_regnamespace`, `to_regproc`,
`to_regprocedure`, `to_regrole`, and `to_regtype` builtin functions are now supported,
improving compatibility with PostgreSQL.

79022: server/status: add running non-idle jobs metric r=darinpp a=darinpp

Previously serverless was using the sql jobs running metric to determine
if a tenant process is idle and can be shut down. With the introduction
of continiously running jobs this isn't a good indicator anymore. A
recent addition is a per job metrics that show running or idle. The auto
scaler doesn't care about the individual jobs and only cares about the
total number of jobs that a running but haven't reported as being idle.
The pull rate is also very high so the retriving all the individual
running/idle metrics for each job type isn't optimal. So this PR adds a
single metric that just aggregates and tracks the total count of jobs
running and not idle.

Release justification: Bug fixes and low-risk updates to new functionality
Release note: None

Will be re-based once #79021 is merged

79157: cli: tweak slow decommission message r=knz a=cameronnunez

Release note: None

79313: opt: do not push LIMIT into the scan of a virtual table r=msirek a=msirek

Fixes #78578

Previously, a LIMIT operation could be pushed into the scan of a virtual
table with an ORDER BY clause.              

This was inadequate because in-order scans of virtual indexes aren't
supported. When an index that should provide the order requested by a
query is used, a sort is actually produced under the covers:
```
EXPLAIN(vec)
SELECT oid, typname FROM pg_type ORDER BY OID;
               info
----------------------------------
  │
  └ Node 1
    └ *colexec.sortOp
      └ *sql.planNodeToRowSource

```
Functions `CanLimitFilteredScan` and `GenerateLimitedScans` are modified
to avoid pushing LIMIT operations into ordered scans of virtual indexes. 

Release justification: Low risk fix for incorrect results in queries
involving virtual system tables.

Release note (bug fix): LIMIT queries with an ORDER BY clause which scan
the index of a virtual system tables, such as `pg_type`, could
previously return incorrect results. This is corrected by teaching the
optimizer that LIMIT operations cannot be pushed into ordered scans of
virtual indexes.


79346: ccl/sqlproxyccl: add rebalancer queue for connection rebalancing r=JeffSwenson a=jaylim-crl

#### ccl/sqlproxyccl: add rebalancer queue for rebalance requests 

This commit adds a rebalancer queue implementation to the balancer component.
The queue will be used for rebalance requests for the connection migration
work. This is done to ensure a centralized location that invokes the
TransferConnection method on the connection handles. Doing this also enables
us to limit the number of concurrent transfers within the proxy.

Release note: None

#### ccl/sqlproxyccl: run rebalancer queue processor in the background 

The previous commit added a rebalancer queue. This commit connects the queue to
the balancer, and runs the queue processor in the background. By the default,
we limit up to 100 concurrent transfers at any point in time, and each transfer
will be retried up to 3 times.

Release note: None

Jira issue: CRDB-14727

79362: kv: remove stale comment in processOneChange r=nvanbenschoten a=nvanbenschoten

The comment was added in 2fb56bd and hasn't been accurate since 5178559.

Jira issue: CRDB-14753

79368: ccl/sqlproxyccl: include DRAINING pods in the directory cache r=JeffSwenson a=jaylim-crl

Previously, #67452 removed DRAINING pods from the directory cache. This commit
adds that back. The connector will now need to filter for RUNNING pods manually
before invoking the balancer. This is needed so that we could track DRAINING
pods, and wait until 60 seconds has elapsed before transferring connections
away from them. To support that, we also update the Pod's proto definition to
include a StateTimestamp field to reprevent that timestamp that the state field
was last updated.

The plan is to have a polling mechanism every X seconds to check DRAINING pods,
and use that information to start migrating connections.

Release note: None

Jira issue: CRDB-14759

79386: colexec: remove redundant benchmarks r=yuzefovich a=yuzefovich

This commit finishes the transition of some of the benchmarks in the
colexec package started in 22.1 cycle.

Fixes: #75106.

Release note: None

Jira issue: CRDB-14783

79409: sql: refactor deps tests to use bazel r=yuzefovich a=yuzefovich

This commit refactors most `VerifyNoImports` dependency tests in the sql
folder to use the newly introduced bazel test utilities.

Release note: None

Jira issue: CRDB-14814

79427: backupccl: allow cluster restore from different tenant r=dt a=stevendanna

This removes a prohibition for cluster restores with mismatched tenant
IDs since we believe they are now correct as of #73831

This allows users to take a cluster backup in a tenant and restore it
into another tenant.

The new tenant_settings table needs special care since it may exist in
the source tenant but not the target tenant when the source tenant is
the system tenant.

In this change, we throw an error in the case of a non-empty
tenant_settings table being restored into a non-system tenant. This is
a bit user-unfriendly since we detect this error rather late in the
restore process.

Release note: None

Jira issue: CRDB-14844

79428: backupccl: Refactor encryption utility functions into their own file. r=benbardin a=benbardin

Release note: None

Jira issue: CRDB-14845

79433: sql: use new ALTER TENANT syntax in tests r=stetvendanna a=rafiss

Release note: None

79444: roachtest: warmup follower-reads for fixed duration, not fixed number of ops r=nvanbenschoten a=nvanbenschoten

Fixes #78596.

This change switches the warmup phase of the follower-read roachtest
suite from running a fixed number of operations (100) to running for a
fixed duration (15s). This should ensure that the single-region variant
of the test is given sufficient time to warm up follower reads immediately
after one of its nodes is restarted.

Before this change, the single-region variant was only being given about
500ms after startup to catch up on the closed timestamp, which made the
test flaky.

Release justification: testing only

Co-authored-by: Aayush Shah <aayush.shah15@gmail.com>
Co-authored-by: e-mbrown <ebsonari@gmail.com>
Co-authored-by: Darin Peshev <darinp@gmail.com>
Co-authored-by: Cameron Nunez <cameron@cockroachlabs.com>
Co-authored-by: Mark Sirek <sirek@cockroachlabs.com>
Co-authored-by: Jay <jay@cockroachlabs.com>
Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Co-authored-by: Yahor Yuzefovich <yahor@cockroachlabs.com>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Ben Bardin <bardin@cockroachlabs.com>
Co-authored-by: Rafi Shamim <rafi@cockroachlabs.com>
blathers-crl bot pushed a commit that referenced this pull request Apr 5, 2022
This removes a prohibition for cluster restores with mismatched tenant
IDs since we believe they are now correct as of #73831

This allows users to take a cluster backup in a tenant and restore it
into another tenant.

The new tenant_settings table needs special care since it may exist in
the source tenant but not the target tenant when the source tenant is
the system tenant.

In this change, we throw an error in the case of a non-empty
tenant_settings table being restored into a non-system tenant. This is
a bit user-unfriendly since we detect this error rather late in the
restore process.

Release note: None
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

backfiller: persisted progress spans include tenant prefixes

5 participants