Labels: Reduce memory usage for common labels#16588
Open
prymitive wants to merge 4 commits intoprometheus:mainfrom
Open
Labels: Reduce memory usage for common labels#16588prymitive wants to merge 4 commits intoprometheus:mainfrom
prymitive wants to merge 4 commits intoprometheus:mainfrom
Conversation
Since labels_stringlabels.go is now the default implementation we should rename label files to make this more obvious. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
This functions are specific to the default labels implementation (used to be known as stringlabels). Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
stringlabels stores all time series labels as a single string using this format:
`<length><name><length><value>[<length><name><length><value> ...]`
So a label set for my_metric{job=foo, instance="bar", env="prod", blank=""} would be encoded as:
`[8]__name__[9]my_metric[3]job[3]foo[8]instance[3]bar[3]env[4]prod[5]blank[0]`
This is a huge improvement over 'classic' labels implementation that stores all label names & values as seperate strings. There is some room for improvement though since some string are present more often than others. For example `__name__` will be present for all label sets of every time series we store in HEAD, eating 1+8=9 bytes. Since `__name__` is well known string we can try to use a single byte to store it in our encoded string, rather than repeat it in full each time. To be able to store strings that are short cut into a single byte we need to somehow signal that to the reader of the encoded string, for that we use the fact that zero length strings are rare and generaly not stored on time series. If we have an encoded string with zero length then this will now signal that it represents a mapped value - to learn the true value of this string we need to read the next byte which gives us index in a static mapping. That mapping must include empty string, so that we can still encode empty strings using this scheme.
Example of our mapping (minimal version):
```
0: ""
1: "__name__"
2: "instance"
3: "job"
```
With that mapping our example label set would be encoded as:
`[0]1[9]my_metric[0]3[3]foo[0]2[3]bar[3]env[4]prod[5]blank[0]0`
Which would mean 40 bytes instead of 56.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
This will populate the static mapping of strings to store as a single byte on startup. We use the last TSDB block as the source of data, iterate the index for each label and count how many time series given label pair is referencing. We need to call mapCommonLabelSymbols() once TSDB opens all blocks, but before we start to reply the WAL and populate the HEAD. There doesn't seem to be a way to do this right now, so add a hook we can use for it. Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Member
|
Hello from the bug scrub! Our apologies for the long delay, please rebase this PR so it's again reviewable. Please ping @bboreham and @krajorama in this PR when ready. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is a cleaned up version of #15988 implemented behind a new build tag
toplabels.