Skip to content

tenantcapabilitieswatcher: make the watcher react faster#112094

Closed
knz wants to merge 3 commits intocockroachdb:masterfrom
knz:20231010-faster-tenant-watcher
Closed

tenantcapabilitieswatcher: make the watcher react faster#112094
knz wants to merge 3 commits intocockroachdb:masterfrom
knz:20231010-faster-tenant-watcher

Conversation

@knz
Copy link
Copy Markdown
Contributor

@knz knz commented Oct 10, 2023

Needed for #111637.
Epic: CRDB-26691.

Prior to this patch, the tenant info watcher would only react to
changes to system.tenants upon rangefeed cache flushes, which could
be (in default config) up to 3 seconds after the change is committed.

This commit accelerates the behavior by processing updates as soon as
the change is observed in the rangefeed.

This new behavior is similar to the way that cluster settings changes
are effected immediately in the settings watcher (pkg/settingswatcher).

@knz knz requested a review from stevendanna October 10, 2023 14:12
@knz knz requested a review from a team as a code owner October 10, 2023 14:12
@cockroach-teamcity
Copy link
Copy Markdown
Member

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to double-check, but I believe that the semantics of the translateEvent callback is that you are subject to the redelivery of duplicates. So I think we need to store the event's timestamp in our cache and skip updates from any older events to prevent regressing the state of a given tenant.

@knz
Copy link
Copy Markdown
Contributor Author

knz commented Oct 10, 2023

good point. let me pick that up.

@knz knz force-pushed the 20231010-faster-tenant-watcher branch from 601d6cc to a5022fe Compare October 10, 2023 15:06
Prior to this patch, the tenant info watcher would only react to
changes to `system.tenants` upon rangefeed cache flushes, which could
be (in default config) up to 3 seconds after the change is committed.

This commit accelerates the behavior by processing updates as soon as
the change is observed in the rangefeed.

This new behavior is similar to the way that cluster settings changes
are effected immediately in the settings watcher (pkg/settingswatcher).

Release note: None
@knz knz force-pushed the 20231010-faster-tenant-watcher branch from a5022fe to 76b1930 Compare October 10, 2023 17:12
@knz knz requested a review from stevendanna October 10, 2023 17:12
Copy link
Copy Markdown
Collaborator

@stevendanna stevendanna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good to me. One consequence of doing this is that changefeed restarts will cause >1 update for those watching for updates such as the server orchestrator. But as we discussed, those systems should be able to deal with many such updates per second.

A potential follow-up is that we added some code that restarts the capability watcher to avoid waiting out the 3 seconds (see TestingRestart in this file). With this change, I think that should largely be unnecessary.

Reviewed 1 of 1 files at r1, 7 of 7 files at r4.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/testdata/rangefeed_errors line 106 at r5 (raw file):

get-capabilities ten=10
----
ten=10 name=foo state=add service=shared cap={can_admin_unsplit:true}

Is this test telling us that we won't subsequently update our cache on the tenant deletion event if it happens during an error restart? Is that something we need to handle?

stevendanna added a commit to stevendanna/cockroach that referenced this pull request Oct 16, 2023
In cockroachdb#112001 we introduced a bug and an unintended behaviour change.

The bug is that if we receive a notification of a state change from
none to shared when the server is still shutting down, that state
change will be ignored. Namely, the following can happen:

1. ALTER VIRTUAL CLUSTER a STOP SERVICE
2. Watcher gets notification of shutdown and notifies virtual
   cluster's SQL server.
3. Tenant "a" starts shutdown but does not fully complete it
4. ALTER VIRTUAL CLUSTER a START SERVICE SHARED
5. Watcher notifies the server orchestrator; but, since the SQL server has
   not finished stopping from the previous stop request, it appears as if
   it is already started.
6. Tenant "a" finishes shutdown.
7. Server orchestrator never again tries to start the virtual cluster.

The newly added test reveals this under stress.

The behaviour change is that previously if a SQL server for a virtual
cluster failed to start up, it would previously be restarted.

Here, we fix both of these by re-introducing a periodic polling of the
service state.  Unlike the previous polling, we poll the watcher state
so we are not generating a SQL query every second.

Further, since we are now calling the tenantcapabailities watcher
GetAllTenants method every second in addition to on every update, I've
moved where we allocate the list of all tenants to our handle update
call.

An alternative here would be to revert cockroachdb#112001 completely.  I think
there are still advantage to using the watcher: not generating a SQL
query on every node once per second and more responsive server startup
after the integration of cockroachdb#112094.

Fixes cockroachdb#112077

Release note: None
stevendanna added a commit to stevendanna/cockroach that referenced this pull request Oct 16, 2023
In cockroachdb#112001 we introduced a bug and an unintended behaviour change.

The bug is that if we receive a notification of a state change from
none to shared when the server is still shutting down, that state
change will be ignored. Namely, the following can happen:

1. ALTER VIRTUAL CLUSTER a STOP SERVICE
2. Watcher gets notification of shutdown and notifies virtual
   cluster's SQL server.
3. Tenant "a" starts shutdown but does not fully complete it
4. ALTER VIRTUAL CLUSTER a START SERVICE SHARED
5. Watcher notifies the server orchestrator; but, since the SQL server has
   not finished stopping from the previous stop request, it appears as if
   it is already started.
6. Tenant "a" finishes shutdown.
7. Server orchestrator never again tries to start the virtual cluster.

The newly added test reveals this under stress.

The behaviour change is that previously if a SQL server for a virtual
cluster failed to start up, it would previously be restarted.

Here, we fix both of these by re-introducing a periodic polling of the
service state.  Unlike the previous polling, we poll the watcher state
so we are not generating a SQL query every second.

Further, since we are now calling the tenantcapabailities watcher
GetAllTenants method every second in addition to on every update, I've
moved where we allocate the list of all tenants to our handle update
call.

An alternative here would be to revert cockroachdb#112001 completely.  I think
there are still advantage to using the watcher: not generating a SQL
query on every node once per second and more responsive server startup
after the integration of cockroachdb#112094.

Fixes cockroachdb#112077

Release note: None
stevendanna added a commit to stevendanna/cockroach that referenced this pull request Oct 17, 2023
In cockroachdb#112001 we introduced a bug and an unintended behaviour change.

The bug is that if we receive a notification of a state change from
none to shared when the server is still shutting down, that state
change will be ignored. Namely, the following can happen:

1. ALTER VIRTUAL CLUSTER a STOP SERVICE
2. Watcher gets notification of shutdown and notifies virtual
   cluster's SQL server.
3. Tenant "a" starts shutdown but does not fully complete it
4. ALTER VIRTUAL CLUSTER a START SERVICE SHARED
5. Watcher notifies the server orchestrator; but, since the SQL server has
   not finished stopping from the previous stop request, it appears as if
   it is already started.
6. Tenant "a" finishes shutdown.
7. Server orchestrator never again tries to start the virtual cluster.

The newly added test reveals this under stress.

The behaviour change is that previously if a SQL server for a virtual
cluster failed to start up, it would previously be restarted.

Here, we fix both of these by re-introducing a periodic polling of the
service state.  Unlike the previous polling, we poll the watcher state
so we are not generating a SQL query every second.

Further, since we are now calling the tenantcapabailities watcher
GetAllTenants method every second in addition to on every update, I've
moved where we allocate the list of all tenants to our handle update
call.

An alternative here would be to revert cockroachdb#112001 completely.  I think
there are still advantage to using the watcher: not generating a SQL
query on every node once per second and more responsive server startup
after the integration of cockroachdb#112094.

Fixes cockroachdb#112077

Release note: None
@stevendanna stevendanna self-assigned this Oct 17, 2023
craig bot pushed a commit that referenced this pull request Nov 21, 2023
114022: obsservice: export statement insights to obs service r=maryliag a=maryliag

When an Insights is detected, we store it in memory and
also export it to Observability Service.

This behaviour is controlled by a new cluster setting
`sql.insights.export.enabled`. If the value is disabled, no
data is exported.

Epic: CC-25978

Running YCSB for 10min (with Export Enable and Disabled):

| Export | Type | ops(total) | ops/sec(cum) | avg(ms) | p50(ms) | p95(ms) | p99(ms) | pMax(ms) |
|--|--|--|--|--|--|--|--|--|
| Disabled | Read | 11779980 | 19633.3 | 0.9 | 0.7 | 2.2 | 3.7 | 192.9 |
| Enabled | Read | 12470462 | 20784.1 | 0.9 | 0.7 | 2.0 | 3.4 | 201.3 |
| Disabled | Update | 619139 | 1031.9 | 1.8 | 1.6 | 3.8 | 6.3 | 159.4 |
| Enabled | Update | 656124 | 1093.5 | 1.7 | 1.4 | 3.5 | 5.8 | 121.6 |
| Disabled | Result | 12399119 | 20665.2 | 1.0 | 0.8 | 2.4 | 3.9 | 192.9 |
| Enabled | Total | 13126586 | 21877.6 | 0.9 | 0.7 | 2.2 | 3.7 | 201.3 |


Running the Insights Workload (with Export Enable and Disabled) which is mostly statements that will be detected as having an Insight

| Export | ops(total) | ops/sec(cum) | avg(ms) | p50(ms) | p95(ms) | p99(ms) | pMax(ms) |
|--|--|--|--|--|--|--|--|
|Disabled| 5948 | 9.9 | 2013.6 | 1811.9 | 5100.3 | 6442.5 | 10200.5 |
|Enabled| 5906 | 9.8 | 2025.1 | 1744.8 | 5100.3 | 6174.0 | 9663.7 |

Release note: None

114719: tenantcapabilitieswatcher: make the watcher react faster r=yuzefovich a=stevendanna

Needed for #111637.
Epic: CRDB-26691

Superceeds #112094

Prior to this patch, the tenant info watcher would only react to
changes to `system.tenants` upon rangefeed cache flushes, which could
be (in default config) up to 3 seconds after the change is committed.

This commit accelerates the behavior by processing updates as soon as
the rangefeed observes the change.

This new behavior is similar to the way that cluster settings changes
are processed immediately in the settings
watcher (pkg/settingswatcher).

In order to handle deletions that occur during errors that aren't
automatically retried inside the rangefeed library (and are instead
retried by the watcher resulting in a new full scan), we emit any
scan-generated rangefeed events at their scan timestamp, allowing us a
means of clearing any stale data from the cache.

Release note: None

114743: server, ui: show internal stats with custer setting enabled r=maryliag a=maryliag

Previously, when the cluster setting `sql.stats.response.show_internal.enabled` was set to true, we were still trying to use the activity tables to return data to the UI, but that tables doesn't have data from internal stats.

This commit does that proper check to not use the Activity tables if that value is enabled.

Also, since enabling that setting means the user wants to see the internal stats, this commit also changes the UI so that it returns the internal stats, oterhwise it might be confusing selecting top 100 but the UI only showing 40 for example.

Fixes #114460

https://www.loom.com/share/8f3279ae4fd441999c24c68fb0fa8b14

Release note (bug fix): SQL Activity properly showing internal statements when cluster setting `sql.stats.response.show_internal.enabled` is set to true.

114746: storage: correct terminology in log line for disk stalls r=jbowens a=itsbilal

Previously, we'd use the term "file write stall" in the log line for a disk stall, and not use the term "disk stall" which is what metrics and documentation refers to this event as. Furthermore, "write stalls" are a different, unrelated kind of stall in Pebble.

This change addresses this by changing the wording to refer to this event as a "disk stall" in the log.

Epic: none

Release note (ops change): Updates the error message in case of stalled disks to use the appropriate term for that event ("disk stall"), which is also what metrics/dashboards refer to that event as.

Co-authored-by: maryliag <marylia@cockroachlabs.com>
Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Steven Danna <danna@cockroachlabs.com>
Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
@stevendanna
Copy link
Copy Markdown
Collaborator

Merged as part of #114719

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants