-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Closed
Labels
Description
Proposal
Background
OTLP ingestion includes out of order samples. As a result of that exemplars might also come out of order. This leads to errors being returned to clients and discarding data (code)
Proposal
Support out of order exemplars. Implementing this should be as easy as modifying the linked list of the exemplars for the series (code). We still insert the exemplar in the next available slot in the circular buffer as today (code) but then change the order of existing exemplars. Changing the order means traversing from the oldest exemplar of a series and finding the insertion point for the exemplar that's being inserted right now, then mutating the linked list elements.
Side effects/considerations
- During querying traversing the ring buffer will now happen in multiple directions (code). The next hop might be forward in the buffer or backwards if there was an out-of-order exemplar inserted. This less predictable access pattern might make CPU caches less effective.
- Since insertion and querying both hold a global lock, now traversing the linked list from its beginning is going to take more time while holding the lock. There are multiple ways to account for this:
- make the linked list a doubly linked list, so that we can start from the most recent exemplar. It's more likely that an out-of-order exemplar is closer to
now. This should reduce the number of hops we make to find the insertion spot. The tradeoff is increased memory usage for the circular buffer. - insert the exemplar in the circular buffer while holding the write lock, hold the read lock while traversing the linked list and finding the right insertion point, and finally reacquire the write lock to correct the linked list. While holding the read lock queries can still flow, but they won't be able to query for the exemplar. This comes with the tradeoff of added lock contention and code complexity.
- make the linked list a doubly linked list, so that we can start from the most recent exemplar. It's more likely that an out-of-order exemplar is closer to
Reactions are currently unavailable