sql: refactor SQL stats collection to its own package#66011
sql: refactor SQL stats collection to its own package#66011craig[bot] merged 1 commit intocockroachdb:masterfrom
Conversation
a9d68b2 to
808dd0b
Compare
ajwerner
left a comment
There was a problem hiding this comment.
This brings a smile to my heart! Looks like a wonderful change! Please accept my drive-by comments.
| // reporting. As we move to persisted SQL stats, we would no longer need this | ||
| // since we can directly report from our system table. So delete the code | ||
| // related to reported SQL stats once we have finished the migration. | ||
| func NewSQLStatsLocalProvider( |
There was a problem hiding this comment.
nit: I could see just calling this New :)
|
|
||
| const invalidStmtID = 0 | ||
|
|
||
| // appStats holds per-application statistics. |
There was a problem hiding this comment.
One thing that's interesting about this is that it doesn't really have much of anything to do with an app. It just happens to be the case that we bucket by app. Maybe something like ssmemstorage? I think abstracting this a bit might let you make the telemetry stuff feel less like a hack.
Either way, any cleanup in that direction should follow in a separate commit and maybe even a separate PR. This is already great progress.
pkg/sql/sqlstats/ssprovider.go
Outdated
| // RecordStatement records statistics for a statement. | ||
| RecordStatement( | ||
| ctx context.Context, | ||
| statement string, |
There was a problem hiding this comment.
nit: this is begging for a struct
pkg/sql/sqlstats/ssprovider.go
Outdated
| type TransactionWriter interface { | ||
| RecordTransaction( | ||
| ctx context.Context, | ||
| key TransactionFingerprintID, |
There was a problem hiding this comment.
nit: this is also begging for a struct
pkg/sql/sqlstats/ssprovider.go
Outdated
| type Provider interface { | ||
| Storage | ||
|
|
||
| Start( |
There was a problem hiding this comment.
nit: this probably doesn't need to be wrapped
Azhng
left a comment
There was a problem hiding this comment.
Review guide:
sql/sqlstats package is created:
sql/sqlstats/ssprovider.godefines the main interface of the subsystemsql/sqlstats/cluster_settings.godefines all the cluster settings for the subsystems.
sql/app_stats.go: this file is split into sql/sqlstats/sslocal/app_stats.go and sql/sqlstats/sslocal/sql_stats.go. Most of the changes are code movements and removing dependencies on the sql package. A few utility functions in that file that were related to scrubbing sensitive information were moved to sql/exec_util.go since it still relies on the functionalities provided by the sql pacakge.
sql/sqlstats/sslocal is the package of a node-local in-memory implementation of ssprovider.go interface.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @Azhng)
pkg/sql/sqlstats/sslocal/sslocal_provider.go, line 33 at r1 (raw file):
Good point. Created another issue to track the refactoring of the telemetry refactoring: #66047. I'll address this in another PR.
To make sure I understand correctly:
a lot of the iteration business logic
Do you mean the parts where we implement merging two appStats together by iterating through all of its entries? Or do you mean the new sqlstats.Reader interface?
I'm not convinced the txnStats concept even needs to leak into this layer.
Currently the code that implements sqlstats.StatsCollector interface interacts with appStats through two methods, namely: getStatsForStmt() and getStatsForTxnWithKey(). The interface implementation code then directly manipulates the data returned by these two methods. Are you suggesting that these data manipulation logic should be behind another layer of interface? e..g sslocal_internal
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 11 of 30 files at r1, 2 of 7 files at r2.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner and @Azhng)
docs/generated/settings/settings.html, line 84 at r2 (raw file):
<tr><td><code>sql.log.slow_query.internal_queries.enabled</code></td><td>boolean</td><td><code>false</code></td><td>when set to true, internal queries which exceed the slow query log threshold are logged to a separate log. Must have the slow query log enabled for this setting to have any effect.</td></tr> <tr><td><code>sql.log.slow_query.latency_threshold</code></td><td>duration</td><td><code>0s</code></td><td>when set to non-zero, log statements whose service latency exceeds the threshold to a secondary logger on each node</td></tr> <tr><td><code>sql.metrics.max_mem_app_reported_stmt_fingerprints</code></td><td>integer</td><td><code>100000</code></td><td>the maximum number of reported statement fingerprints stored in memory</td></tr>
didn't we remove app on a previous PR? Any reason why adding it back?
docs/generated/settings/settings-for-tenants.txt, line 82 at r2 (raw file):
sql.log.slow_query.internal_queries.enabled boolean false when set to true, internal queries which exceed the slow query log threshold are logged to a separate log. Must have the slow query log enabled for this setting to have any effect. sql.log.slow_query.latency_threshold duration 0s when set to non-zero, log statements whose service latency exceeds the threshold to a secondary logger on each node sql.metrics.max_mem_app_reported_stmt_fingerprints integer 100000 the maximum number of reported statement fingerprints stored in memory
same as the other, any reason to add app back?
pkg/sql/exec_util.go, line 2472 at r2 (raw file):
} // Utility functions related scrubbing sensitive information on SQL Stats.
nit: ...related to scrubbing...
pkg/sql/exec_util.go, line 2474 at r2 (raw file):
// Utility functions related scrubbing sensitive information on SQL Stats. // quantizeCounts ensures that the counts are bucketed into "simple" values.
what "simple" means here?
pkg/sql/sqlstats/cluster_settings.go, line 24 at r2 (raw file):
).WithPublic() // TxnStatsNumStmtIDsToRecord limits the number of statementIDs stored for in
nit: stored in
pkg/sql/sqlstats/cluster_settings.go, line 57 at r2 (raw file):
).WithPublic() // SampleLogicalPlans specifies whether we periodically a logical plan for
nit: missing a word here. we periodically (sample ?) a logical plan...
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @maryliag)
docs/generated/settings/settings.html, line 84 at r2 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
didn't we remove
appon a previous PR? Any reason why adding it back?
Ah I must have missed this during my rebase. Fixed.
docs/generated/settings/settings-for-tenants.txt, line 82 at r2 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
same as the other, any reason to add
appback?
Done.
pkg/sql/exec_util.go, line 2474 at r2 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
what "simple" means here?
Hmm I copied this function from the old app_stats.go. I'm not exactly sure the meaning of "simple" here, but I think this function is trying to round to Count field in the StatementStatistics to the order of magnitude of 10s, (e.g. 425 -> 100, 3453 -> 1000, 67542 -> 10000) and recompute the statistics for the purpose of anonymity. I updated the comments to address this.
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @maryliag)
pkg/sql/sqlstats/sslocal/sslocal_provider.go, line 33 at r1 (raw file):
Do you mean the parts where we implement merging two appStats together by iterating through all of its entries? Or do you mean the new sqlstats.Reader interface?
Both, I think.
Are you suggesting that these data manipulation logic should be behind another layer of interface?
Yes. Let's look at a few different methods here:
IterateStatementStats: it iteratesappStatsand then internally iterates the keys (with optional sorting), what ifappStats` implemented:
IterateStatementStats(_ context.Context, sortedKeys bool, visitor sqlstats.StatementVisitor) error
I think that that whole method would become:
// IterateStatementStats implements sqlstats.Provider interface.
func (s *sqlStats) IterateStatementStats(
ctx context.Context, options *sqlstats.IteratorOptions, visitor sqlstats.StatementVisitor,
) error {
appNames := s.getAppNames(options.SortedAppNames)
for _, appName := range appNames {
appStats := s.getStatsForApplication(appName)
wrapped := func(stats *roachpb.CollectedStatementStatistics) error {
stats.appName = appName
return visitor(stats)
}
if err := appStats.IterateStatementStats(ctx, options.SortedKeys, wrapped); err != nil {
return err
}
}
return nil
}
Most of the reader implementations are like this, they either don't at all, or just barely depend on context that exists outside of the appStats struct.
I guess what I'm pitching is that that the appStats struct is very loosely coupled to the sqlStats struct. It also doesn't know that it has anything to do with the concept of an app, the word "app" doesn't appear in any variables or fields. Can we pull it apart and remove the word "app" from it entirely. It doesn't need to be an interface; start with a concrete struct, maybe StatsContainer? Then for the reportedProvider give that thing an interface like:
type AppStatAggregator interface {
AddAppStats(appName string, stats *StatsContainer)
}
The long and the short of it is, what happens if you try to move *appStats to a different package from *sqlStats and remove the word "app". I think nice things will fall out.
Azhng
left a comment
There was a problem hiding this comment.
@ajwerner I added another commit here to move the appStats to its own package (ssmemstorage) and renamed it to StatsContainer.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @maryliag)
maryliag
left a comment
There was a problem hiding this comment.
it's a little confusing understanding your separate commits, since you add things on the first (e.g. dumpStmtStats, getUnscrubbedTxnStats) and add remove it on second commit. Can you do some clean up on them to have just the final changes?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @maryliag)
ajwerner
left a comment
There was a problem hiding this comment.
Commit history cleanup notwithstanding, this feels much cleaner. How do you feel about it?
Reviewed 1 of 10 files at r5.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @Azhng and @maryliag)
pkg/sql/sqlstats/sslocal/sql_stats.go, line 143 at r5 (raw file):
if target != nil { appStatsCopy = make(map[string]*ssmemstorage.StatsContainer, len(s.mu.apps))
Why are we copying them out instead of just adding them to the target in the loop? It seems like doing it in-line would allow you to get rid of the shallow clone.
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 62 at r5 (raw file):
// StatsContainer holds per-application statistics. type StatsContainer struct {
nit: I don't think you need Stats as a prefix given the package name.
Azhng
left a comment
There was a problem hiding this comment.
Can you do some clean up on them to have just the final changes?
It was added initially because it was a simple code movement. During the refactor of the second commit that is moved again to a different file. I'm not exactly sure how I should clean that one up, so I ended up just squashing two commits into one. Since it was more confusing to look at two commits separately.
How do you feel about it?
Yeah this definitely feels a lot cleaner and we can test things more thoroughly with the new interfaces at the subsystem level once we get to implement the persisted SQL stats. I'm also following the same pattern for the upcoming index usage stats.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @maryliag)
pkg/sql/sqlstats/sslocal/sql_stats.go, line 143 at r5 (raw file):
Previously, ajwerner wrote…
Why are we copying them out instead of just adding them to the target in the loop? It seems like doing it in-line would allow you to get rid of the shallow clone.
I think this was done to avoid deadlock, since resetAndMaybeDumpStats manages the sqlStats lock manually and getStatsForApplication call also lock onto sqlStats.
I refactored getStatsForApplication into a two functions so this can be called from a locked context.
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 62 at r5 (raw file):
Previously, ajwerner wrote…
nit: I don't think you need
Statsas a prefix given the package name.
Done.
ajwerner
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @maryliag)
pkg/sql/sqlstats/sslocal/sql_stats.go, line 143 at r5 (raw file):
Previously, Azhng (Archer Zhang) wrote…
I think this was done to avoid deadlock, since
resetAndMaybeDumpStatsmanages thesqlStatslock manually andgetStatsForApplicationcall also lock ontosqlStats.I refactored
getStatsForApplicationinto a two functions so this can be called from a locked context.
If target == s then I see how target would be locked here but then I'm more confused about what we're doing. If target != s then where are we locking target?
9d682e0 to
e359a8e
Compare
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @maryliag)
pkg/sql/sqlstats/sslocal/sql_stats.go, line 143 at r5 (raw file):
Previously, ajwerner wrote…
If
target == sthen I see howtargetwould be locked here but then I'm more confused about what we're doing. Iftarget != sthen where are we lockingtarget?
Ah, oops. I think I confused myself. Yes, you are correct. target needs to be locked here. Updated.
8dad627 to
5db685a
Compare
|
Hmm seems like I made bazel very upset |
def9d8f to
03cf065
Compare
|
Ah seems like Bazel is happy now. |
5d9acf1 to
6a8101c
Compare
ajwerner
left a comment
There was a problem hiding this comment.
Nice work! Just one more set of small suggestions from me.
Reviewed 1 of 7 files at r2, 1 of 3 files at r4, 8 of 16 files at r6.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @maryliag)
pkg/sql/sqlstats/ssprovider.go, line 31 at r6 (raw file):
type StatementWriter interface { // RecordStatement records statistics for a statement. RecordStatement(ctx context.Context, key *roachpb.StatementStatisticsKey, value *RecordedStmtStats) (roachpb.StmtID, error)
Why pointers for RecordedStmtStats and RecordedTxnStats below? I don't think they need to escape and thus incur allocations.
pkg/sql/sqlstats/sslocal/sslocal_provider.go, line 36 at r6 (raw file):
// TODO(azhng): reportedSQLStats is an unfortunate hack to implement telemetry // reporting. As we move to persisted SQL stats, we would no longer need this // since we can directly report from our system table. So delete the code
nit: extra space (also hopefully this comment can go).
pkg/sql/sqlstats/sslocal/sslocal_provider.go, line 47 at r6 (raw file):
resetInterval *settings.DurationSetting, reportedProvider sqlstats.Provider, ) sqlstats.Provider {
I think it'd be somewhere between fine and preferable for this to return a struct which you export (i.e. sqlStats to SQLStats). That way you can implement more interfaces on it if need be. Also, I think that you can make it so that the last argument here takes a more properly typed argument and avoid this whole type assertion/panic thing.
Now that you've reduced the interactions during dumping data to just this snippet in resetAndMaybeDumpStats:
targetStats := target.getStatsForApplication(appName)
// Container.Add() manages locks for itself, so we don't need to guard it
// with locks.
lastErr := targetStats.Add(ctx, statsContainer)What if you do:
type Sink interface {
AddAppStats(ctx context.Context, appName string, stats *ssmemstorage.Container) error
}which can be implemented on *SQLStats
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @maryliag)
pkg/sql/sqlstats/ssprovider.go, line 31 at r6 (raw file):
Previously, ajwerner wrote…
Why pointers for
RecordedStmtStatsandRecordedTxnStatsbelow? I don't think they need to escape and thus incur allocations.
Both roachpb.StatementStatisticsKey and RecordedStmtStats are pretty big structs. I had pointers here to avoid copying. Does Golang optimize the copying away if I pass a struct literal to this function?
pkg/sql/sqlstats/sslocal/sslocal_provider.go, line 36 at r6 (raw file):
Previously, ajwerner wrote…
nit: extra space (also hopefully this comment can go).
Removed the TODO. We have a tracking issue now for telemetry reporting.
pkg/sql/sqlstats/sslocal/sslocal_provider.go, line 47 at r6 (raw file):
Previously, ajwerner wrote…
I think it'd be somewhere between fine and preferable for this to return a struct which you export (i.e.
sqlStatstoSQLStats). That way you can implement more interfaces on it if need be. Also, I think that you can make it so that the last argument here takes a more properly typed argument and avoid this whole type assertion/panic thing.Now that you've reduced the interactions during dumping data to just this snippet in
resetAndMaybeDumpStats:targetStats := target.getStatsForApplication(appName) // Container.Add() manages locks for itself, so we don't need to guard it // with locks. lastErr := targetStats.Add(ctx, statsContainer)What if you do:
type Sink interface { AddAppStats(ctx context.Context, appName string, stats *ssmemstorage.Container) error }which can be implemented on
*SQLStats
Done. It's definitely a lot cleaner now.
ajwerner
left a comment
There was a problem hiding this comment.
I certainly won't block on the struct vs. pointer discussion. My intuition is that I'd rather spend the CPU cycles copying something onto the stack than allocating a struct I know is both immutable and has a short lifetime (also also does not make its way very far down the stack).
Reviewed 4 of 30 files at r1, 2 of 4 files at r3, 1 of 10 files at r5, 6 of 16 files at r6, 3 of 3 files at r7.
Reviewable status:complete! 1 of 0 LGTMs obtained (waiting on @Azhng and @maryliag)
pkg/sql/sqlstats/ssprovider.go, line 31 at r6 (raw file):
Previously, Azhng (Archer Zhang) wrote…
Both
roachpb.StatementStatisticsKeyandRecordedStmtStatsare pretty big structs. I had pointers here to avoid copying. Does Golang optimize the copying away if I pass a struct literal to this function?
I think I care less about the copying than the allocation but don't care that much. The copying is exactly the same as when we had it as an argument to functions. Also, the fact that it's large makes it a larger allocation. Inside the lower layers of the call there's mid-stack inlining which can reduce the copying.
Azhng
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @maryliag)
pkg/sql/sqlstats/ssprovider.go, line 31 at r6 (raw file):
Previously, ajwerner wrote…
I think I care less about the copying than the allocation but don't care that much. The copying is exactly the same as when we had it as an argument to functions. Also, the fact that it's large makes it a larger allocation. Inside the lower layers of the call there's mid-stack inlining which can reduce the copying.
I see. Sounds good, removed pointer here.
aec5583 to
6d0647f
Compare
maryliag
left a comment
There was a problem hiding this comment.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner and @Azhng)
pkg/sql/crdb_internal.go, line 963 at r8 (raw file):
tree.NewDInt(tree.DInt(stat.Stats.FirstAttemptCount)), // first_attempt_count tree.NewDInt(tree.DInt(stat.Stats.MaxRetries)), // max_retries errString, // last_error
nit: align the comment with the others
pkg/sql/crdb_internal.go, line 1072 at r8 (raw file):
tree.NewDString(stat.App), // application_name tree.NewDString(strconv.FormatUint(uint64(txnID), 10)), // key stmtIDsDatum, // statement_ids
nit: align comment
pkg/sql/crdb_internal.go, line 1134 at r8 (raw file):
nodeID, _ := p.execCfg.NodeID.OptionalNodeID() // zero if not available appTxnStatsVisitor := func(appName string, stat *roachpb.TxnStats) error {
nit: stats
pkg/sql/sqlstats/cluster_settings.go, line 66 at r8 (raw file):
// LogicalPlanCollectionPeriod specifies the interval between collections of // logical plans for each fingerpirnt.
nit: fingerprint
pkg/sql/sqlstats/cluster_settings.go, line 75 at r8 (raw file):
// MaxMemSQLStatsStmtFingerprints specifies the maximum of unique statement // fingerprint we store in memory.
nit: fingerprints
pkg/sql/sqlstats/cluster_settings.go, line 83 at r8 (raw file):
// MaxMemSQLStatsTxnFingerprints specifies the maximum of unique transaction // fingerprint we store in memory.
nit: fingerprints
pkg/sql/sqlstats/cluster_settings.go, line 91 at r8 (raw file):
// MaxMemReportedSQLStatsStmtFingerprints specifies the maximum of unique statement // fingerprint we store in memory.
nit: fingerprints
pkg/sql/sqlstats/cluster_settings.go, line 99 at r8 (raw file):
// MaxMemReportedSQLStatsTxnFingerprints specifies the maximum of unique transaction // fingerprint we store in memory.
nit: fingerprints
pkg/sql/sqlstats/sslocal/sql_stats.go, line 49 at r8 (raw file):
mon *mon.BytesMonitor // lastReset is the time at which the app containers were reset.
nit: is the last time
pkg/sql/sqlstats/sslocal/sql_stats.go, line 122 at r8 (raw file):
// resetAndMaybeDumpStats clears all the stored per-app, per-statement and // per-transaction statistics. If target s not nil, then the stats in s will be
nit: target s is not nil
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 128 at r8 (raw file):
} // IterateStatementStats iterators through the stored statement statistics
nit: iterates through
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 187 at r8 (raw file):
} // IterateTransactionStats iterators through the stored transaction statistics
nit: iterates
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 208 at r8 (raw file):
// pass nil for the statementIDs, as they are only set when a key is // constructed. txnStats, _, _ := s.getStatsForTxnWithKey(txnKey, nil /* stmtIDs */, false /* createIfNonexistent */)
stmtIDs or stmtFingerprintIDs?
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 232 at r8 (raw file):
} // IterateAggregatedTransactionStats iterators through the stored aggregated
nit: iterates
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 290 at r8 (raw file):
collectedStats := &roachpb.CollectedTransactionStatistics{ StatementIDs: txnStats.statementIDs,
you have several references to statementIDs on this PR, make sure you aligned with the changes from previous PR and use statement fingerprint id
pkg/sql/sqlstats/ssmemstorage/ss_mem_storage.go, line 324 at r8 (raw file):
type stmtStats struct { // ID is the statementID constructed using the stmtKey fields. ID roachpb.StmtID
same comment as before, check the use of stmtID
Azhng
left a comment
There was a problem hiding this comment.
Rebased from master to pick up renaming PR.
Reviewable status:
complete! 1 of 0 LGTMs obtained (waiting on @ajwerner, @Azhng, and @maryliag)
pkg/sql/crdb_internal.go, line 963 at r8 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: align the comment with the others
Hmm this is formatted by the auto fomatter. I think we have similar formatting from before.
pkg/sql/crdb_internal.go, line 1072 at r8 (raw file):
Previously, maryliag (Marylia Gutierrez) wrote…
nit: align comment
ditto: auto formatter issue
Previously, SQL stats collection is implemented within the sql package. However, this is not ideal because it introduces a high-level coupling between the SQLServer and rest of subsystems within CRDB. As we move to persisted SQL stats, the SQL stats collection will experience an increase of complexity. This commit moves SQL stats collection into its own subsystem and introduced intra-subsystem interfaces to decouple the various components within the subsystem. Release note: None Closes cockroachdb#65699
maryliag
left a comment
There was a problem hiding this comment.
Reviewed 3 of 7 files at r8, 11 of 11 files at r9.
Reviewable status:complete! 2 of 0 LGTMs obtained (waiting on @ajwerner and @Azhng)
|
TFTR! bors r=ajwerner,maryliag |
|
Build succeeded: |
Previously, SQL stats collection is implemented within the sql
package. However, this is not ideal because it introduces a
high-level coupling between the SQLServer and rest of subsystems
within CRDB. As we move to persisted SQL stats, the SQL stats
collection will experience an increase of complexity.
This commit moves SQL stats collection into its own subsystem and
introduced intra-subsystem interfaces to decouple the various
components within the subsystem.
Release note: None
Closes #65699