Introduce TSDB changes for appending metadata to the WAL#10972
Introduce TSDB changes for appending metadata to the WAL#10972codesome merged 43 commits intoprometheus:mainfrom
Conversation
6256e91 to
7667d65
Compare
codesome
left a comment
There was a problem hiding this comment.
This is a first pass re-review of some parts. Mostly nits. I am left with the checkpoint code and the unit tests to review. I think we can remove it out of the draft.
storage/interface.go
Outdated
| // MetadataAppender provides an interface for associating metadata to stored series. | ||
| type MetadataAppender interface { | ||
| // AppendMetadata appends a metadata entry for the given series and labels. | ||
| // A series reference number is returned which can be used to modify the | ||
| // metadata of the given series in the same or later transactions. | ||
| // Returned reference numbers are ephemeral and may be rejected in calls | ||
| // to AppendMetadata() at any point. If the series does not exist, AppendMetadata | ||
| // returns an error. | ||
| // If the reference is 0 it must not be used for caching. | ||
| AppendMetadata(ref SeriesRef, l labels.Labels, m metadata.Metadata) (SeriesRef, error) |
There was a problem hiding this comment.
How about this? Because we are not appending metadata and storing a history of it. We are just updating it in-place.
| // MetadataAppender provides an interface for associating metadata to stored series. | |
| type MetadataAppender interface { | |
| // AppendMetadata appends a metadata entry for the given series and labels. | |
| // A series reference number is returned which can be used to modify the | |
| // metadata of the given series in the same or later transactions. | |
| // Returned reference numbers are ephemeral and may be rejected in calls | |
| // to AppendMetadata() at any point. If the series does not exist, AppendMetadata | |
| // returns an error. | |
| // If the reference is 0 it must not be used for caching. | |
| AppendMetadata(ref SeriesRef, l labels.Labels, m metadata.Metadata) (SeriesRef, error) | |
| // MetadataUpdater provides an interface for associating metadata to stored series. | |
| type MetadataUpdater interface { | |
| // UpdateMetadata updates the metadata entry for the given series and labels. | |
| // A series reference number is returned which can be used to modify the | |
| // metadata of the given series in the same or later transactions. | |
| // Returned reference numbers are ephemeral and may be rejected in calls | |
| // to AppendMetadata() at any point. If the series does not exist, AppendMetadata | |
| // returns an error. | |
| // If the reference is 0 it must not be used for caching. | |
| UpdateMetadata(ref SeriesRef, l labels.Labels, m metadata.Metadata) (SeriesRef, error) |
There was a problem hiding this comment.
Hmm, there's the same argument to be made about Exemplars as well right? I can get behind this change if you feel strongly about it, but (personally) I prefer how the current name works to extend the Appender interface.
There was a problem hiding this comment.
You actually do append exemplars and don't update it in-place. You can have exemplars over time, not just the last one.
There was a problem hiding this comment.
I thought about it but couldn't think of a better name, so let's go with your idea ^^
There was a problem hiding this comment.
FYI FWIW the same name is also used in the pre-existing code on the remote package, for the current implementation of sending metadata via the QueueManager
https://github.com/prometheus/prometheus/blob/main/storage/remote/metadata_watcher.go#L28-L31
codesome
left a comment
There was a problem hiding this comment.
Nice work! While reading the test in db_test.go I realised that the behaviour of AppendMetadata when we use it to update the metadata of the same series more than once in the same transaction can be non-deterministic (like we can tell how it works, for a user it is non-deterministic). Should we document that?
Like if a series s1 had m1 meta already. And then in a transaction we do 3 updates m3 m2 m1 in order in the same transaction (i.e. before commit), then the meta m2 will be attached to s1 and not m1, while the user would expect m1.
| repl := 0 | ||
| for _, m := range metadata { | ||
| if keep(m.Ref) { | ||
| latestMetadataMap[m.Ref] = m |
There was a problem hiding this comment.
Note for future: once we have plugged in the scraping, we should run prombench on it and see if there are memory usage spikes after compactions when this checkpoint runs.
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
When we're setting metadata entries in the scrapeCace, we're using the p.Help(), p.Unit(), p.Type() helpers, which retrieve the series name and use it as the cache key. When checking for cache entries though, we used p.Series() as the key, which included the metric name _with_ its labels. That meant that we were never actually hitting the cache. We're fixing this by utiling the __name__ internal label for correctly getting the cache entries after they've been set by setHelp(), setType() or setUnit(). Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
…dMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
…a replay Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
…taUpdater/UpdateMetadata Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
742c916 to
83fe62c
Compare
codesome
left a comment
There was a problem hiding this comment.
Last set of comments on the tests
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
Signed-off-by: Paschalis Tsilias <paschalist0@gmail.com>
|
I merged this PR as an enabler for the scraping and remote-write part. This in itself adds very minimal overhead to the TSDB (additional field in the series struct) until the scraping is connected to it and TSDB starts storing metadata. This can be reverted easily if required. |
This is a rehash of the recent PR #10312 for appending metadata to the WAL, but focusing on the TSDB-related changes only.
It includes a new storage format in the WAL, the changes to the Appender interface in the Storage package, stubbing all the implementations, as well testing that the encoding/decoding functionality works as intended.