feat(s2n-quic-dc): Switch to FIFO-based path secret eviction#2477
feat(s2n-quic-dc): Switch to FIFO-based path secret eviction#2477
Conversation
| let mut map = self.0.lock().unwrap_or_else(|e| e.into_inner()); | ||
| map.retain(|e| filter(&*e)) |
There was a problem hiding this comment.
Do we worry about the lock getting held for awhile here? Since it scans the entire table? (same for the IdMap).
There was a problem hiding this comment.
Hm, so doing some measurements I suspect the answer is yes :(
A 500k elements HashMap (u64, Arc) takes ~2.67ms to retain on average (where the inner loop is just loading the value in the Arc) in my local measurements. That's probably too long for us to have a global lockout, even once a minute. Let me think about some options, maybe we can thread a linked list through the Arcs or maintain key vecs that are just for cleaner...
There was a problem hiding this comment.
OK, I think I found a path forward. We no longer retain(...) on the ID or address maps -- in fact, the locks there are fully self-contained (no closures exposed in API). The FIFO queue lock is now used to ensure we've visited all entries in the map (since the FIFO queue should always contain every Entry).
That likely does impact handshake latency / s2n-quic handshake thread event loop for potentially ~5-10 milliseconds as we scan the list, but given that our dc.complete.latency is ~6ms @ p50, I think that's pretty OK -- it will likely increase the tail but not be a major effect. Scanning a Vec is also hopefully faster in the typical case -- hashbrown's iterators are not super cheap.
f0657da to
a190063
Compare
We still preserve the fixed-size nature, albeit at (potentially) greater memory cost, since hashbrown will resize up to 2x the actual max entry count due to having tombstones in the table. After that final growth we will incur somewhat expensive table-wide rehashing periodically, but this is sufficiently fast to be acceptable in practice.
a190063 to
9e82366
Compare
Release Summary:
Resolved issues:
n/a
Description of changes:
We still preserve the fixed-size nature, albeit at (potentially) greater memory cost, since hashbrown will resize up to 2x the actual max entry count due to having tombstones in the table.
After that final growth, we will incur somewhat expensive table-wide rehashing periodically, which does impact both dataplane (usage of keys) and "control" plane (new handshakes). However, the rehashing should be quite fast in practice -- on a toy workload that keeps 500k u64 values, sequentially moving the 500k value window forward, in a map rehashing with the std SipHash impl occupied <0.05% of CPU. It should also occur at most every ~500k inserts, since the map has 2*(500k) slots and so very rarely actually needs the rehash.
I suspect we can long-term reduce any tail latency impact that might have by e.g. sharding the map or incrementally rehashing, etc. but in the meantime this PR significantly improves on the correctness property of when an entry will get evicted -- we require 500k handshakes to do so. That puts a minimum bound of around 500 seconds, already not terrible, given fully occupying a CPU core with handshakes (1000/second); of course in practice we expect a far lower handshake rate to be hit. Future improvements can further reduce likelihood of incorrect evictions by pruning already-gone entries from the FIFO where possible.
No locks acquired during the dataplane (get for already handshaked peer) are kept for more than a single map operation (e.g., lookup, removal, etc.) so should in the common case be very short lived (microseconds at most). As noted above we can and will rarely hit re-hashing but that's largely unavoidable without a lot of work (custom map impl) that we want to avoid blocking improvements on.
Call-outs:
Memory usage is pretty likely not optimal with the 2x larger-than-needed maps, but it's actually plausible this is a memory improvement over the fixed size maps since we no longer store keys separately, reducing memory footprint by ~8MB (16500k) for key map + ~15MB (32500k) for SocketAddr map. We do add ~4MB per map for the size doubling but that's significantly smaller than the gains from cutting keys :) Further improvement here seems plausible -- we don't actually need the 2x coefficient, that's just happenstance from what hashbrown gives us out of the box. But maybe not needed (yet).
Testing:
Mostly covered by existing tests.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.