feat: Adds statistics measurement for compact-throughput#26754
feat: Adds statistics measurement for compact-throughput#26754devanbenz merged 11 commits intomaster-1.xfrom
Conversation
This PR is a work in progress that will calculate and return the percentage of compact-throughput limiter usage in the form of a percentage
regarding the algorithm to be used
the expected output is the following
❯ curl -s -XPOST 'http://localhost:8086/debug/vars' | grep "compact-throughput-usage"
"stats": {"compact-throughput-usage":42.1},
where 42.1 is a number in percentage use
cmd/influxd/run/server.go
Outdated
| return fmt.Errorf("open points writer: %s", err) | ||
| } | ||
|
|
||
| s.Monitor.WithLimiter(s.TSDBStore.EngineOptions.CompactionThroughputLimiter) |
There was a problem hiding this comment.
I think I'm going to change this to WithCompactThroughputLimiter or something. Thinking about if in the future we want to add other limiters to the metrics collection for /debug/vars. 🤔
davidby-influx
left a comment
There was a problem hiding this comment.
This seems only to cover Burst and not the long term limit.
cmd/influxd/run/server.go
Outdated
| return fmt.Errorf("open points writer: %s", err) | ||
| } | ||
|
|
||
| s.Monitor.WithLimiter(s.TSDBStore.EngineOptions.CompactionThroughputLimiter) |
| m.RegisterDiagnosticsClient("network", &network{}) | ||
| m.RegisterDiagnosticsClient("system", &system{}) | ||
|
|
||
| if m.Limiter != nil { |
There was a problem hiding this comment.
Do we need a lock around the accesses to m.limiter?
There was a problem hiding this comment.
Or are we assuming Open is safely single-threaded?
There was a problem hiding this comment.
I believe Open should be single threaded. It's called during Server.Open
for _, service := range s.Services {
if err := service.Open(); err != nil {
return fmt.Errorf("open service: %s", err)
}
}influxdb/cmd/influxd/run/server.go
Line 425 in d64fc9e
Which is called from within Main.Run.
Lines 81 to 83 in d64fc9e
| m.DeregisterDiagnosticsClient("network") | ||
| m.DeregisterDiagnosticsClient("system") | ||
| m.DeregisterDiagnosticsClient("stats") | ||
| m.DeregisterDiagnosticsClient("config") |
There was a problem hiding this comment.
Was this missing from a previous PR?
| // available = current tokens in the rate limiter bucket (can be negative when in debt) | ||
| // burst = maximum tokens the bucket can hold | ||
| // usage percentage = ((burst - available) / burst) * 100 | ||
| func (s *stats) CompactThroughputUsage() float64 { |
There was a problem hiding this comment.
This only calculates the burst percentage (maximum tokens that can be requested). But we also need the overall percentage used of the limit over time (Limiter.Limit), don't we?
There was a problem hiding this comment.
I can likely generate two fields
compact-throughput-burst-usage
compact-throughput-usage
And return the usage of our burst limit and our standard limit?
There was a problem hiding this comment.
I think the burst limiter is much less interesting; it's going to be highly variable moment to moment, and what I think Support wants is to answer the question: are we, on average, running up against our compaction throughput limit? Check with the FR author on the requirements.
There was a problem hiding this comment.
Added a comment on the FR.
The complex part of this would be tracking a time window that's meaningful for grabbing our bytes per second. Running curl and getting this data at any moment in time would likely be meaningless without a moving average as far as I'm aware.
There was a problem hiding this comment.
Will see what Andrew's input is
There was a problem hiding this comment.
I believe the correct computation on this is
100*(1-rate.Limit(l.Tokens())/l.Limit())
See this Go Playground for an example, and how token debt is handled.
There was a problem hiding this comment.
Are we wanting the percentage to go over 100%, or be limited to 100%? The limiter should keep the actual usage more or less at or under 100%.
There was a problem hiding this comment.
I think it would be okay to go over 100%.
There was a problem hiding this comment.
I'm confused about how going above 100% should be interpreted. Is there an EAR or feature request that drove this feature? Maybe if I see that it will make more sense to me.
gwossum
left a comment
There was a problem hiding this comment.
The truncation to 2 decimal places could be improved. Also have a question about going over 100% on the usage.
monitor/stats.go
Outdated
| i := fmt.Sprintf("%.2f", compactThroughputUsage) | ||
| compactThroughputUsageTrunc, err := strconv.ParseFloat(i, 2) | ||
| if err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Rounding the the percentage to 2 decimal places is an excellent idea! The less significant digits of a floating point number are usually noise. This is not the best way to round it, though. Rounding to 2 decimal places would be:
compactThroughputUsageTrunc := math.Round(compactThroughputUsage * 100.0) / 100.0
This is less work and memory allocation than converting to a string and back. It also allows more control over how we round than using fmt.Sprintf. For instance, math.Round could be changed to math.Ceil or math.Floor if we wanted to change the rounding direction.
| // available = current tokens in the rate limiter bucket (can be negative when in debt) | ||
| // burst = maximum tokens the bucket can hold | ||
| // usage percentage = ((burst - available) / burst) * 100 | ||
| func (s *stats) CompactThroughputUsage() float64 { |
There was a problem hiding this comment.
I'm confused about how going above 100% should be interpreted. Is there an EAR or feature request that drove this feature? Maybe if I see that it will make more sense to me.
gwossum
left a comment
There was a problem hiding this comment.
LGTM. The floating point calculations could be further refined, but it's overkill for what we're doing here.
This PR adds the following as an available `/debug/vars` field.
```
"stats" : { "compact-throughput-usage-percentage" : <percentage used> }
```
This will show the current compaction throughput usage WRT the available limiter tokens as defined by the golang rate limiter.
https://pkg.go.dev/golang.org/x/time/rate#Limiter.Limit
https://pkg.go.dev/golang.org/x/time/rate#Limiter.Tokens
The algorithm for finding our usage is the following
```
percentage = 100 * (1 - (tokens) / limit)
```
(cherry picked from commit 879e34a)
This PR adds the following as an available
/debug/varsfield.This will show the current compaction throughput usage WRT the available limiter tokens as defined by the golang rate limiter.
https://pkg.go.dev/golang.org/x/time/rate#Limiter.Limit
https://pkg.go.dev/golang.org/x/time/rate#Limiter.Tokens
The algorithm for finding our usage is the following