tenantcapabilitieswatcher: make the watcher react faster#112094
tenantcapabilitieswatcher: make the watcher react faster#112094knz wants to merge 3 commits intocockroachdb:masterfrom
Conversation
Release note: None
stevendanna
left a comment
There was a problem hiding this comment.
I need to double-check, but I believe that the semantics of the translateEvent callback is that you are subject to the redelivery of duplicates. So I think we need to store the event's timestamp in our cache and skip updates from any older events to prevent regressing the state of a given tenant.
|
good point. let me pick that up. |
Release note: None
601d6cc to
a5022fe
Compare
Prior to this patch, the tenant info watcher would only react to changes to `system.tenants` upon rangefeed cache flushes, which could be (in default config) up to 3 seconds after the change is committed. This commit accelerates the behavior by processing updates as soon as the change is observed in the rangefeed. This new behavior is similar to the way that cluster settings changes are effected immediately in the settings watcher (pkg/settingswatcher). Release note: None
a5022fe to
76b1930
Compare
stevendanna
left a comment
There was a problem hiding this comment.
Overall this looks good to me. One consequence of doing this is that changefeed restarts will cause >1 update for those watching for updates such as the server orchestrator. But as we discussed, those systems should be able to deal with many such updates per second.
A potential follow-up is that we added some code that restarts the capability watcher to avoid waiting out the 3 seconds (see TestingRestart in this file). With this change, I think that should largely be unnecessary.
Reviewed 1 of 1 files at r1, 7 of 7 files at r4.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @knz)
pkg/multitenant/tenantcapabilities/tenantcapabilitieswatcher/testdata/rangefeed_errors line 106 at r5 (raw file):
get-capabilities ten=10 ---- ten=10 name=foo state=add service=shared cap={can_admin_unsplit:true}
Is this test telling us that we won't subsequently update our cache on the tenant deletion event if it happens during an error restart? Is that something we need to handle?
In cockroachdb#112001 we introduced a bug and an unintended behaviour change. The bug is that if we receive a notification of a state change from none to shared when the server is still shutting down, that state change will be ignored. Namely, the following can happen: 1. ALTER VIRTUAL CLUSTER a STOP SERVICE 2. Watcher gets notification of shutdown and notifies virtual cluster's SQL server. 3. Tenant "a" starts shutdown but does not fully complete it 4. ALTER VIRTUAL CLUSTER a START SERVICE SHARED 5. Watcher notifies the server orchestrator; but, since the SQL server has not finished stopping from the previous stop request, it appears as if it is already started. 6. Tenant "a" finishes shutdown. 7. Server orchestrator never again tries to start the virtual cluster. The newly added test reveals this under stress. The behaviour change is that previously if a SQL server for a virtual cluster failed to start up, it would previously be restarted. Here, we fix both of these by re-introducing a periodic polling of the service state. Unlike the previous polling, we poll the watcher state so we are not generating a SQL query every second. Further, since we are now calling the tenantcapabailities watcher GetAllTenants method every second in addition to on every update, I've moved where we allocate the list of all tenants to our handle update call. An alternative here would be to revert cockroachdb#112001 completely. I think there are still advantage to using the watcher: not generating a SQL query on every node once per second and more responsive server startup after the integration of cockroachdb#112094. Fixes cockroachdb#112077 Release note: None
In cockroachdb#112001 we introduced a bug and an unintended behaviour change. The bug is that if we receive a notification of a state change from none to shared when the server is still shutting down, that state change will be ignored. Namely, the following can happen: 1. ALTER VIRTUAL CLUSTER a STOP SERVICE 2. Watcher gets notification of shutdown and notifies virtual cluster's SQL server. 3. Tenant "a" starts shutdown but does not fully complete it 4. ALTER VIRTUAL CLUSTER a START SERVICE SHARED 5. Watcher notifies the server orchestrator; but, since the SQL server has not finished stopping from the previous stop request, it appears as if it is already started. 6. Tenant "a" finishes shutdown. 7. Server orchestrator never again tries to start the virtual cluster. The newly added test reveals this under stress. The behaviour change is that previously if a SQL server for a virtual cluster failed to start up, it would previously be restarted. Here, we fix both of these by re-introducing a periodic polling of the service state. Unlike the previous polling, we poll the watcher state so we are not generating a SQL query every second. Further, since we are now calling the tenantcapabailities watcher GetAllTenants method every second in addition to on every update, I've moved where we allocate the list of all tenants to our handle update call. An alternative here would be to revert cockroachdb#112001 completely. I think there are still advantage to using the watcher: not generating a SQL query on every node once per second and more responsive server startup after the integration of cockroachdb#112094. Fixes cockroachdb#112077 Release note: None
In cockroachdb#112001 we introduced a bug and an unintended behaviour change. The bug is that if we receive a notification of a state change from none to shared when the server is still shutting down, that state change will be ignored. Namely, the following can happen: 1. ALTER VIRTUAL CLUSTER a STOP SERVICE 2. Watcher gets notification of shutdown and notifies virtual cluster's SQL server. 3. Tenant "a" starts shutdown but does not fully complete it 4. ALTER VIRTUAL CLUSTER a START SERVICE SHARED 5. Watcher notifies the server orchestrator; but, since the SQL server has not finished stopping from the previous stop request, it appears as if it is already started. 6. Tenant "a" finishes shutdown. 7. Server orchestrator never again tries to start the virtual cluster. The newly added test reveals this under stress. The behaviour change is that previously if a SQL server for a virtual cluster failed to start up, it would previously be restarted. Here, we fix both of these by re-introducing a periodic polling of the service state. Unlike the previous polling, we poll the watcher state so we are not generating a SQL query every second. Further, since we are now calling the tenantcapabailities watcher GetAllTenants method every second in addition to on every update, I've moved where we allocate the list of all tenants to our handle update call. An alternative here would be to revert cockroachdb#112001 completely. I think there are still advantage to using the watcher: not generating a SQL query on every node once per second and more responsive server startup after the integration of cockroachdb#112094. Fixes cockroachdb#112077 Release note: None
114022: obsservice: export statement insights to obs service r=maryliag a=maryliag When an Insights is detected, we store it in memory and also export it to Observability Service. This behaviour is controlled by a new cluster setting `sql.insights.export.enabled`. If the value is disabled, no data is exported. Epic: CC-25978 Running YCSB for 10min (with Export Enable and Disabled): | Export | Type | ops(total) | ops/sec(cum) | avg(ms) | p50(ms) | p95(ms) | p99(ms) | pMax(ms) | |--|--|--|--|--|--|--|--|--| | Disabled | Read | 11779980 | 19633.3 | 0.9 | 0.7 | 2.2 | 3.7 | 192.9 | | Enabled | Read | 12470462 | 20784.1 | 0.9 | 0.7 | 2.0 | 3.4 | 201.3 | | Disabled | Update | 619139 | 1031.9 | 1.8 | 1.6 | 3.8 | 6.3 | 159.4 | | Enabled | Update | 656124 | 1093.5 | 1.7 | 1.4 | 3.5 | 5.8 | 121.6 | | Disabled | Result | 12399119 | 20665.2 | 1.0 | 0.8 | 2.4 | 3.9 | 192.9 | | Enabled | Total | 13126586 | 21877.6 | 0.9 | 0.7 | 2.2 | 3.7 | 201.3 | Running the Insights Workload (with Export Enable and Disabled) which is mostly statements that will be detected as having an Insight | Export | ops(total) | ops/sec(cum) | avg(ms) | p50(ms) | p95(ms) | p99(ms) | pMax(ms) | |--|--|--|--|--|--|--|--| |Disabled| 5948 | 9.9 | 2013.6 | 1811.9 | 5100.3 | 6442.5 | 10200.5 | |Enabled| 5906 | 9.8 | 2025.1 | 1744.8 | 5100.3 | 6174.0 | 9663.7 | Release note: None 114719: tenantcapabilitieswatcher: make the watcher react faster r=yuzefovich a=stevendanna Needed for #111637. Epic: CRDB-26691 Superceeds #112094 Prior to this patch, the tenant info watcher would only react to changes to `system.tenants` upon rangefeed cache flushes, which could be (in default config) up to 3 seconds after the change is committed. This commit accelerates the behavior by processing updates as soon as the rangefeed observes the change. This new behavior is similar to the way that cluster settings changes are processed immediately in the settings watcher (pkg/settingswatcher). In order to handle deletions that occur during errors that aren't automatically retried inside the rangefeed library (and are instead retried by the watcher resulting in a new full scan), we emit any scan-generated rangefeed events at their scan timestamp, allowing us a means of clearing any stale data from the cache. Release note: None 114743: server, ui: show internal stats with custer setting enabled r=maryliag a=maryliag Previously, when the cluster setting `sql.stats.response.show_internal.enabled` was set to true, we were still trying to use the activity tables to return data to the UI, but that tables doesn't have data from internal stats. This commit does that proper check to not use the Activity tables if that value is enabled. Also, since enabling that setting means the user wants to see the internal stats, this commit also changes the UI so that it returns the internal stats, oterhwise it might be confusing selecting top 100 but the UI only showing 40 for example. Fixes #114460 https://www.loom.com/share/8f3279ae4fd441999c24c68fb0fa8b14 Release note (bug fix): SQL Activity properly showing internal statements when cluster setting `sql.stats.response.show_internal.enabled` is set to true. 114746: storage: correct terminology in log line for disk stalls r=jbowens a=itsbilal Previously, we'd use the term "file write stall" in the log line for a disk stall, and not use the term "disk stall" which is what metrics and documentation refers to this event as. Furthermore, "write stalls" are a different, unrelated kind of stall in Pebble. This change addresses this by changing the wording to refer to this event as a "disk stall" in the log. Epic: none Release note (ops change): Updates the error message in case of stalled disks to use the appropriate term for that event ("disk stall"), which is also what metrics/dashboards refer to that event as. Co-authored-by: maryliag <marylia@cockroachlabs.com> Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Steven Danna <danna@cockroachlabs.com> Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
|
Merged as part of #114719 |
Needed for #111637.
Epic: CRDB-26691.
Prior to this patch, the tenant info watcher would only react to
changes to
system.tenantsupon rangefeed cache flushes, which couldbe (in default config) up to 3 seconds after the change is committed.
This commit accelerates the behavior by processing updates as soon as
the change is observed in the rangefeed.
This new behavior is similar to the way that cluster settings changes
are effected immediately in the settings watcher (pkg/settingswatcher).