-
Notifications
You must be signed in to change notification settings - Fork 4.1k
sql,server: statements and combined statements endpoints errors #71245
Copy link
Copy link
Closed
Labels
A-sql-observabilityRelated to observability of the SQL layerRelated to observability of the SQL layerC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)branch-release-21.2Used to mark GA and release blockers, technical advisories, and bugs for 21.2Used to mark GA and release blockers, technical advisories, and bugs for 21.2release-blockerIndicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Metadata
Metadata
Assignees
Labels
A-sql-observabilityRelated to observability of the SQL layerRelated to observability of the SQL layerC-enhancementSolution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)branch-release-21.2Used to mark GA and release blockers, technical advisories, and bugs for 21.2Used to mark GA and release blockers, technical advisories, and bugs for 21.2release-blockerIndicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
To date we have observed several errors from the /statments and /combined-statements endpoints.
Issue 1: /Statements response too large for gRPC
Currently, /Statements, returns all cluster-wide in-memory statement stats.
Since the virtual statement tables call /Statements, this is also an issue for the combined stats API.
The virtual tables calling /Statement are particularly wasteful as only one of the
statementsortransactionsportion of the response is used, and the other part discarded. We should at least work to remove the unnecessary collection and return of statement / transaction stats when not necessary (see this issue for more context).We have seen reported issues where this might be the cause: https://github.com/cockroachlabs/support/issues/1263.
Potential solutions:
Issue 2: Full scans on system stats tables might be causing timeouts
Another issue we are seeing are timeouts. One cause for this error may be due to the fact that the virtual stats tables (
crdb_internal.statement_statisticsandcrdb_internal.transaction_statistics) performs a full-scan on the system stats tables (system.statement_statisticsandsystem.transaction_statistics). After we have been flushing stats for a while, these tables can grow large enough that a full scan might be causing timeouts to occur.Possible solution to both problems
@Azhng proposed a solution that might address both of the problems above.
In our sqlstats subsystem, we currently flush in-memory stats to the system tables on an hourly interval, or when they grow too large. Instead of buffering cluster-wide in-memory sql stats and merging them with the system tables, we can:
This avoids the full table scans, and the buffering of in-memory stats, removing the necessity for the virtual tables entirely. Some performance testing is necessary here, as we'd be flushing much more often here than the current hourly interval.
Epic CRDB-11997
gz#10996