-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Description
Proposal
Metadata WAL records is a key feature required to have type, unit and help tracked per series in both memory and WAL. This allows Remote Write transport to export it and Metadata API to use it for longer than scrape cache. It also allows some persistence across restarts. This is feature is also a first step towards full metadata persistence in blocks
Unfortunately early benchmarks shows enabling this feature ~14% increases in RSS and ~30% in heap size.
Some increase is expected as this literally adds a metadata record with 3 strings into WAL for every time series Prometheus ingests e.g. 10M series, means at least 10M metadata records. The cost occurs when those WAL segments are mmap for decoding and encoding e.g.
- Writing those records during scrape commit
- Reading those records for PRW 2.0
- Reading those records on every WAL checkpoint (compaction).
What's worse is that if the series or target disappears for some period, we have to repeat and add the same metadata records again to the WAL.
Eventually some cost is expected, but 15-30% is a bit too much. This issue is to either:
A) Track and optimize some of this cost
B) Consider alternatives like e.g. dropping this feature, building something best effort for help and implementing type and unit.
This explains alloc spikes during compaction and replay, but I still cannot track where the constant alloc increase is coming from, nothing trivial on the profiles (profile is available here.
Additionally, there are other challenges with those metadata semantics too like atomicity and keeping the metadata records in WAL for each living series. Let's see if we can make it a bit cheaper first.