Skip to content
This repository was archived by the owner on Aug 23, 2023. It is now read-only.

use pointslicepool for /getdata cluster-fan out requests#1921

Merged
robert-milan merged 3 commits intomasterfrom
getdata-use-pointslicepool
Oct 21, 2020
Merged

use pointslicepool for /getdata cluster-fan out requests#1921
robert-milan merged 3 commits intomasterfrom
getdata-use-pointslicepool

Conversation

@Dieterbe
Copy link
Copy Markdown
Contributor

@Dieterbe Dieterbe commented Oct 14, 2020

Note: this builds on top of #1923, review/merge that first.

when handling a user query, fanning out said query, and reading back the /getdata responses from peers, this will use the pointslicepool to decode incoming point slices into and should reduce memory usage. see #962

for review purposes, the generated code is identical except for the calls to pointslicepool.GetMin instead of make

@Dieterbe
Copy link
Copy Markdown
Contributor Author

NOTE: i'm still working on some tests to confirm improvements

@Dieterbe Dieterbe force-pushed the getdata-use-pointslicepool branch from 1242387 to de35bcf Compare October 15, 2020 08:16
@Dieterbe
Copy link
Copy Markdown
Contributor Author

Dieterbe commented Oct 15, 2020

i spun up docker-cluster-query with 4 query pods.
all mt read/write processes and q0 (query node 0) run master
q1 runs master + stats (#1922)
q2 runs master + stats + nudge fix (#1923)
q3 runs master + stats + nudge fix + /getdata pointslicepool (this branch)

I ran mt-fakemetrics backfill --kafka-mdm-addr localhost:9092 --offset 48h --period 10s --speedup 100 --mpo 1000 and then

cat ./even-load.sh
#!/bin/bash

q0() {
echo 'GET http://localhost:6060/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}
q1() {
echo 'GET http://localhost:6061/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}
q2() {
echo 'GET http://localhost:6062/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}
q3() {
echo 'GET http://localhost:6063/render?target=some.id.of.a.metric.123*&from=-48h' | vegeta attack -rate 25 | vegeta report
}

q0 &
q1 &
q2 &
q3

I ran it for more than an hour.
clearly q3 wins in terms of memory usage. and q2 also does better than q1 and q0.
note the candidate hit stats of the pointslicepool lower left, showing the effect of these PR's in action.
comparison

dashboard is here

@Dieterbe
Copy link
Copy Markdown
Contributor Author

here are the 3 different PSP behaviors
first one suffers from the nudging bug, which the 2nd one fixes. but we only do half the gets on the pool as puts because upon decoding we always allocate fresh.
only for q3 do we pull the same rate in and out of the pool. with the occassional miss which i presume is due to GC
psp

Copy link
Copy Markdown
Contributor

@robert-milan robert-milan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to manually maintain Series msgp encoding / decoding etc... ?

@Dieterbe
Copy link
Copy Markdown
Contributor Author

because we use custom code to use the slice pool. perhaps it can be done cleaner with the msgp extension system but i didn't want to spend too much time on it.

Copy link
Copy Markdown
Contributor

@robert-milan robert-milan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@robert-milan robert-milan merged commit 8f037dd into master Oct 21, 2020
@robert-milan robert-milan deleted the getdata-use-pointslicepool branch October 21, 2020 13:08
@Dieterbe Dieterbe mentioned this pull request Oct 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants