use pointslicepool for /getdata cluster-fan out requests by Dieterbe · Pull Request #1921 · grafana/metrictank

Dieterbe · 2020-10-14T17:11:58Z

Note: this builds on top of #1923, review/merge that first.

when handling a user query, fanning out said query, and reading back the /getdata responses from peers, this will use the pointslicepool to decode incoming point slices into and should reduce memory usage. see #962

for review purposes, the generated code is identical except for the calls to pointslicepool.GetMin instead of make

Dieterbe · 2020-10-14T22:26:05Z

NOTE: i'm still working on some tests to confirm improvements

Dieterbe · 2020-10-15T16:15:19Z

i spun up docker-cluster-query with 4 query pods.
all mt read/write processes and q0 (query node 0) run master
q1 runs master + stats (#1922)
q2 runs master + stats + nudge fix (#1923)
q3 runs master + stats + nudge fix + /getdata pointslicepool (this branch)

I ran mt-fakemetrics backfill --kafka-mdm-addr localhost:9092 --offset 48h --period 10s --speedup 100 --mpo 1000 and then

cat ./even-load.sh
#!/bin/bash

q0() {
echo 'GET http://localhost:6060/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}
q1() {
echo 'GET http://localhost:6061/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}
q2() {
echo 'GET http://localhost:6062/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}
q3() {
echo 'GET http://localhost:6063/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}

q0 &
q1 &
q2 &
q3

I ran it for more than an hour.
clearly q3 wins in terms of memory usage. and q2 also does better than q1 and q0.
note the candidate hit stats of the pointslicepool lower left, showing the effect of these PR's in action.

dashboard is here

Dieterbe · 2020-10-15T16:26:18Z

here are the 3 different PSP behaviors
first one suffers from the nudging bug, which the 2nd one fixes. but we only do half the gets on the pool as puts because upon decoding we always allocate fresh.
only for q3 do we pull the same rate in and out of the pool. with the occassional miss which i presume is due to GC

this should help reduce memory usage in query nodes when they decode /getdata requests coming back

robert-milan

Why do we need to manually maintain Series msgp encoding / decoding etc... ?

Dieterbe · 2020-10-21T13:05:33Z

because we use custom code to use the slice pool. perhaps it can be done cleaner with the msgp extension system but i didn't want to spend too much time on it.

robert-milan

LGTM

Dieterbe force-pushed the getdata-use-pointslicepool branch from 1242387 to de35bcf Compare October 15, 2020 08:16

Dieterbe added 3 commits October 15, 2020 23:27

give api/models access to the pointslicepool

88825a9

make Series msgp encoding manually maintained

4129317

make []schema.Point msgp decoding use the pointslicepool

ec83e22

this should help reduce memory usage in query nodes when they decode /getdata requests coming back

Dieterbe force-pushed the getdata-use-pointslicepool branch from de35bcf to ec83e22 Compare October 15, 2020 21:28

Dieterbe requested a review from robert-milan October 15, 2020 21:28

Dieterbe mentioned this pull request Oct 15, 2020

Getdata use pointslicepool and getdata return to pool #1924

Merged

Dieterbe added this to the sprint-18 milestone Oct 16, 2020

robert-milan reviewed Oct 20, 2020

View reviewed changes

robert-milan approved these changes Oct 21, 2020

View reviewed changes

robert-milan merged commit 8f037dd into master Oct 21, 2020

robert-milan deleted the getdata-use-pointslicepool branch October 21, 2020 13:08

Dieterbe mentioned this pull request Oct 26, 2020

pointSlicePool issues #962

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use pointslicepool for /getdata cluster-fan out requests#1921

use pointslicepool for /getdata cluster-fan out requests#1921
robert-milan merged 3 commits intomasterfrom
getdata-use-pointslicepool

Dieterbe commented Oct 14, 2020 •

edited

Loading

Uh oh!

Dieterbe commented Oct 14, 2020

Uh oh!

Dieterbe commented Oct 15, 2020 •

edited

Loading

Uh oh!

Dieterbe commented Oct 15, 2020

Uh oh!

robert-milan left a comment

Uh oh!

Dieterbe commented Oct 21, 2020

Uh oh!

robert-milan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Dieterbe commented Oct 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dieterbe commented Oct 14, 2020

Uh oh!

Dieterbe commented Oct 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dieterbe commented Oct 15, 2020

Uh oh!

robert-milan left a comment

Choose a reason for hiding this comment

Uh oh!

Dieterbe commented Oct 21, 2020

Uh oh!

robert-milan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Dieterbe commented Oct 14, 2020 •

edited

Loading

Dieterbe commented Oct 15, 2020 •

edited

Loading