Conversation
shanson7
left a comment
There was a problem hiding this comment.
Would it be possible to add to the tests that this function doesn't modify the inputs?
| if len(totals) == 1 { | ||
| totalsSerie = totals[0] | ||
| } else if len(totals) == len(series) { | ||
| sort.Slice(series, func(i, j int) bool { |
There was a problem hiding this comment.
Maybe a comment about why there are sorted.
| return nil, err | ||
| } | ||
|
|
||
| var outSeries []models.Series |
There was a problem hiding this comment.
Could you do a
defer cache[Req{}] = append(cache[Req{}], outSeries...)
here instead of all the individual cache steps?
There was a problem hiding this comment.
very nice, I like it
There was a problem hiding this comment.
defer args are evaluated at call time, so this won't work as expected.
edit: looks like i'm wrong: https://play.golang.org/p/t8RZ1v8fLXT
There was a problem hiding this comment.
You forgot to call foo() in your snippet.
see https://play.golang.org/p/Qh4rp1p2q73
expr/func_aspercent.go
Outdated
| serie1.Target = fmt.Sprintf("asPercent(%s,%s)", serie1.Target, serie2.Target) | ||
| serie1.Tags = map[string]string{"name": serie1.Target} | ||
| for i := range serie1.Datapoints { | ||
| serie1.Datapoints[i].Val = computeAsPercent(serie1.Datapoints[i].Val, serie2.Datapoints[i].Val) |
There was a problem hiding this comment.
I think this violates https://github.com/grafana/metrictank/blob/master/expr/NOTES#L35
expr/func_aspercent.go
Outdated
| serie1.Target = fmt.Sprintf("asPercent(%s,%s)", serie1.Target, serie2.Target) | ||
| serie1.Tags = map[string]string{"name": serie1.Target} | ||
| for i := range serie1.Datapoints { | ||
| serie1.Datapoints[i].Val = computeAsPercent(serie1.Datapoints[i].Val, serie2.Datapoints[i].Val) |
There was a problem hiding this comment.
I think this violates https://github.com/grafana/metrictank/blob/master/expr/NOTES#L35
| } else { | ||
| totalVal = s.totalFloat | ||
| } | ||
| serie.Datapoints[i].Val = computeAsPercent(serie.Datapoints[i].Val, totalVal) |
There was a problem hiding this comment.
I think this violates https://github.com/grafana/metrictank/blob/master/expr/NOTES#L35
There was a problem hiding this comment.
fixed all and added a test
expr/func_aspercent_test.go
Outdated
| // Test if original series was modified | ||
| for i, orig := range originalSeries { | ||
| inSerie := in[i] | ||
| if orig.Target != inSerie.Target { |
There was a problem hiding this comment.
Could you use something like
Line 186 in 0428430
There was a problem hiding this comment.
No, because the original series does get modified by metrictank (a "name" tag gets added). Makes sense to compare relevant values only.
There was a problem hiding this comment.
Scratch that. the real problem was that reflect.DeepEqual(math.NaN(), math.NaN()) == false, which is not the case when comparing series
expr/func_aspercent_test.go
Outdated
| originalSeries[i].Interval = serie.Interval | ||
| originalSeries[i].QueryPatt = serie.QueryPatt | ||
| originalSeries[i].Target = serie.Target | ||
| originalSeries[i].Datapoints = getCopy(serie.Datapoints) |
There was a problem hiding this comment.
maybe just
originalSeries[i] = serie
originalSeries[i].Datapoints = getCopy(serie.Datapoints)
Gets all the things like tags etc.
expr/func_aspercent.go
Outdated
| totalsSerie = totals[0] | ||
| } else if len(totals) == len(series) { | ||
| // Sorted to match the input series with the total series based on Target. | ||
| // Mimicks Graphite's implementation |
There was a problem hiding this comment.
Nit, but 'Mimic' the common modern spelling
expr/func_aspercent.go
Outdated
|
|
||
| func copyDatapoints(serie *models.Series) { | ||
| out := pointSlicePool.Get().([]schema.Point) | ||
| for _, p := range serie.Datapoints { |
There was a problem hiding this comment.
out = append(out, serie.Datapoints...)
|
Note: this requires Go version 1.10+ because it uses math.Round() @shanson7 any outstanding comments? |
|
Any chance this can get merged to master? |
…an optional argument
|
do you plan to make more changes or dyou consider this complete? |
|
This is good to go. I just rebased it to make the merge cleaner. And I didn't use much, just added an additional argument type (that you originally made). It would be a lot of work to reuse the old commits with little to no payoff imo, since I rewrote most of it. |
| for _, argExp = range argsExp { | ||
| if pos >= len(e.args) { | ||
| break // no more args specified. we're done. | ||
| } |
There was a problem hiding this comment.
the removal of cutoff here is this safe?
There was a problem hiding this comment.
Yes. Before it assumed that series arguments can only be non-optional arguments and in the beginning. i.e function(serie, int, string, opt=string).
With my change the following is possible:
function(serie, int, serie, opt=serie)
Like before, this part only consumes series, and just skips over everything else.
There was a problem hiding this comment.
looks good :) the argument consumption/iterating code is probably my least favorite code of MT. FWIW.
if you can handle this, you can handle anything else 👯♂️
| totalSeriesLists := groupSeriesByKey(totals, s.nodes, &keys) | ||
| totalSeries = getTotalSeries(totalSeriesLists) | ||
| } else { | ||
| return nil, errors.New("total must be None or a seriesList") |
There was a problem hiding this comment.
seems like it's fairly easy to trigger this by specifying a serieslist pattern that doesn't match any series.
maybe in that case we can provide a better error message? (basically above at if len(totals) == 0 { maybe directly return an error ? what does graphite do in this case?
There was a problem hiding this comment.
I'm not sure what you mean here. This error message gets triggered if they pass in a number. In the case where they pass in a nodes argument, only a series or nothing should be passed in for the total argument. First branch is triggered if neither is passed in. Second branch is triggered if a series is passed in. Which leaves the case where a number is passed in (i.e math.IsNaN(s.totalFloat) = false)
Graphite throws that exact error message is that case.
There was a problem hiding this comment.
if you specify a serieslist pattern that doesn't match any series, then I believe this will happen:
if s.totalSeries != nil {
totals, err = s.totalSeries.Exec(cache)
if err != nil {
return nil, err
}
if len(totals) == 0 {
totals = nil <---- this right here
}
}
There was a problem hiding this comment.
FWIW, Graphite crashes if you do that.
Let me see if I can handle it in some clean way
There was a problem hiding this comment.
Ok, so this was just unnecessary:
if len(totals) == 0 {
totals = nil
}
So I removed it. Now if a series returns empty, it does not assume that there were no arguments
| return context | ||
| } | ||
|
|
||
| func (s *FuncAsPercent) Exec(cache map[Req][]models.Series) ([]models.Series, error) { |
There was a problem hiding this comment.
this function is rather complex. can we split it up in smaller functions?
expr/func_aspercent.go
Outdated
| return outSeries, err | ||
| } | ||
|
|
||
| func (s *FuncAsPercent) execWithNodes(series []models.Series, totals []models.Series) ([]models.Series, error) { |
There was a problem hiding this comment.
series, totals []models.Series. also the other function
expr/func_aspercent.go
Outdated
| } | ||
| } | ||
|
|
||
| func deepCopySerieElements(serie *models.Series) { |
There was a problem hiding this comment.
this API is a bit strange. would it not make more sense to return a new series that is a deep copy?
even though that is slightly more work, seems worth it.
in fact, maybe add a Copy() method to the serie type in the models package
There was a problem hiding this comment.
also looking at sumSeries, the copying/newly-allocating seems a bit too eager.
we should only need to allocate point slices:
- for the totals slice: when we need to sum up values and we need a place to store the totals (not when totals results in a single series, then we can just read from it)
- for each output series.(if it is different from the input slice)
There was a problem hiding this comment.
The problem with # 1 is that later I modify the total series in some cases such as this (line 99):
// No series for total series
if _, ok := metaSeries[key]; !ok {
serie2 := totalSeries[key]
serie2.QueryPatt = fmt.Sprintf("asPercent(MISSING,%s)", serie2.QueryPatt)
serie2.Target = fmt.Sprintf("asPercent(MISSING,%s)", serie2.Target)
serie2.Tags = map[string]string{"name": serie2.Target}
for i := range serie2.Datapoints {
serie2.Datapoints[i].Val = math.NaN()
}
outSeries = append(outSeries, serie2)
continue
}
I guess I could make a copy right there, not sure which one would be better.
There was a problem hiding this comment.
I'll be mostly afk until Thursday.
Please read expr/NOTES carefully if you have not already done so. All this complexity is to make sure we never overwrite data in the memory AggMetric, chunk cache etc.
Some of these fields are by value so harmless but the tags map is interesting as well as it might open an avenue to modify the tags in the MemoryIdx. Not sure if we currently take that into account everywhere or maybe we already have a provision for that. On phone so can't check right now
There was a problem hiding this comment.
Ok, well I added a Series.Copy(emptyDatapoints []schema.Point) function (the argument is there so that pointSlicePool can be used if needed). And I only copy the series if I modify values that shouldn't be modified.
Hopefully that's sufficient and what you were looking for 😃
|
some unit tests for ArgIn would be nice. |
| {Val: float64(199) * 100, Ts: 30}, | ||
| {Val: float64(29) / 2 * 100, Ts: 40}, | ||
| {Val: float64(80) / 3.0 * 100, Ts: 50}, | ||
| {Val: float64(250) / 4 * 100, Ts: 60}, |
There was a problem hiding this comment.
shouldn't out2 be the same as in the previous function? because both cases have asPercent(d,a)
There was a problem hiding this comment.
No, because when totals is a seriesList with len(totals) == len(series), the series first get sorted by tag before getting matched. So, in this test case it would be asPercent(d,c) and asPercent(b,a). This behavior is same in Graphite
| totalVal = totalsSerie.Datapoints[i].Val | ||
| } else { | ||
| totalVal = s.totalFloat | ||
| } |
There was a problem hiding this comment.
isn't totalFloat pretty much guaranteed to be NaN here?
There was a problem hiding this comment.
No, because totalFloat is still a valid option here. (the else part of the if statement right before this)
| for i := range serie.Datapoints { | ||
| var totalVal float64 | ||
| if len(totalsSerie.Datapoints) > i { | ||
| totalVal = totalsSerie.Datapoints[i].Val |
There was a problem hiding this comment.
we may want to always just assume this branch is taken and just write this code line directly without the if/else.
that way if we ever have a bug where len(totalSerie.Datapoints) != len(serie.Datapoints) we can troubleshoot it instead of trying to hide such error case and returning incorrect data.
There was a problem hiding this comment.
this is more of a check of whether there is a totalSeries at all. I'll change it to > 0, that way we get an index out of range exception if there's a bug.
expr/func_aspercent.go
Outdated
| sort.Slice(totals, func(i, j int) bool { | ||
| return totals[i].Target < totals[j].Target | ||
| }) | ||
| for i, serie1 := range series { |
There was a problem hiding this comment.
lots of duplication here wrt the similar code block further down.
can't we do all the if/else stuff here to just set up the right totals and series variables,
and then do the same processing at the end irrespective of which scenario it was?
| } else if totals != nil { | ||
| totalSeriesLists := groupSeriesByKey(totals, s.nodes, &keys) | ||
| totalSeries = getTotalSeries(totalSeriesLists) | ||
| } |
There was a problem hiding this comment.
- we compute totals even for keys we won't need (e.g. keys not in the input series)
- for each missing case we repeatedly get a slice and fill it with NaN's. we could reuse the same slice in this case. my suggestion would be declare a
var nones []schema.Point. then whenever you need it, if it's nil, instantiate it. if it's not, just reuse it.
There was a problem hiding this comment.
I don't think reusing a nones slice would play nicely with the pointSlicePool, would it?
There was a problem hiding this comment.
oh right, because at the end the same slice would get added to the pool multiple times, which is bad. so nevermind that then.
There was a problem hiding this comment.
On phone now
best would be to just add the slice to the cache thing once I think (that way we can reuse them)
|
hi @stivenbb my first pass of review comments is now done :p let me know when you've addressed everything or if you have any questions. |
|
@Dieterbe ok, I think I've addressed all the requested changes. Let me know if I missed something |
|
|
math.Round |
|
I think the last things to do here are :
|
* easier to follow code * consistency with execWitNodes * bugfix: pool-obtained sumseries should be recycled later
|
Ok, just cherry picked. Will work on not computing totals when unnecessary. |
|
oh i realized my code |
|
cassandra test seems to be failing after I cherry-picked... Do you know what that could be about? |
|
looks like a flakey test. rerunning tests should fix it. but circleCI is not letting me. can you retrigger on https://circleci.com/workflow-run/ba5e3453-48a0-4731-9c34-22381695eb63 ? if not, your next push will. |
|
Ok, should be good to go now. LMK if my change looks good. |
|
great work @stivenbb, thank you very much for your work on this. |
Native implementation of asPercent() Graphite function. (http://graphite.readthedocs.io/en/latest/functions.html#graphite.render.functions.asPercent)
Added a new argument type
ArgInthat allows multiple other argument types. This was necessary for the total argument. Some of the code borrowed from an abandoned PR: #672In terms of speed improvement:
On average, the native implementation was 11x faster, median was 4x faster, p95 was 14x faster, p99 was 14x faster and max was over 10x faster