-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Description
Proposal
Prometheus now accepts remote write data. But it can't ingest data more than an hour old reliably. This means if there is an outage or network partition and the downstream Prometheus has issues with pushing for more than an hour, it is likely that there will be data loss after the issues are resolved. Cortex and Thanos, which consume TSDB have the same limitation.
We should solve this issue in TSDB, and a very interesting trade-off was presented at the storage working group. When ingesting data that is outside the current head block, we don't need to make it immediately available for querying. We could write to a log and compact it in the background before making it available for querying. I think its a fair trade-off that wouldn't use too much extra resources. I'm curious what others think!