fix: bitswap lock contention under high load#817
Conversation
Codecov ReportAttention: Patch coverage is
@@ Coverage Diff @@
## main #817 +/- ##
==========================================
- Coverage 60.49% 60.48% -0.01%
==========================================
Files 244 244
Lines 31079 31100 +21
==========================================
+ Hits 18800 18810 +10
- Misses 10603 10615 +12
+ Partials 1676 1675 -1
|
lidel
left a comment
There was a problem hiding this comment.
For posterity, staging box looks really promising, the window of time when this was deployed to box 02 was significantly in better shape than kubo 0.32.1 (01 box):
HTTP success rate is higher too:
EOD for me, but I'll do more test tomorrow morning and see if any questions arise. Some quick ones inline.
b8730a5 to
c200764
Compare
c200764 to
bee11dc
Compare
d9d1313 to
8a27e39
Compare
There was a problem hiding this comment.
Lgtm, this is such improvement we should ship this as a patch release next week.
For posterity: based on our (Shipyard) staging tests the impact on high load providers is significant.
Below is a sample from HTTP gateway processing ~80 requests per second (mirrored organic cache-miss from ipfs.io). "01" is latest Kubo (0.33.0) without this fix, and "02" is with this fix (0.33.1):





Summary
Fix runaway goroutine creation under high load. Under high load conditions, goroutines are created faster than they can complete and the more goroutines creates the slower them complete. This creates a positive feedback cycle that ends in OOM. The fix dynamically adjusts message send scheduling to avoid the runaway condition.
Description of Lock Contention under High Load
The peermanager acquires the peermanager mutex, does peermanager stuff, and then acquires the messagequeue mutex for each peer to put wants/cancels on that peer's message queue. Nothing is blocked indefinitely, but all session goroutines wait on the peermanager mutex.
The messagequeue event loop for each peer is always running in a separate goroutine, waking up every time new data is added to the message queue. The messagequeue acquires the messagequeue mutex to check the amount of pending work and send a message if there is enough work.
The frequent lock/unlock of each messagequeue mutex delays each session goroutine from adding items to messagequeues, as they wait to acquire each peer's messagequeue mutex to enqueue a message. These delays cause the peermanager mutex to be held longer by each goroutine. When there are a sufficient number of peers and want requests, goroutines end up waiting on the peermanager mutex for a longer time, on average, that it takes for an additional request to arrive and start another goroutine. This leads to a positive feedback loop where the number of goroutines increases until their number alone is sufficient to cause OOM.
How this PR Fixes this
This PR avoids waking up the messagequeue event loop on every item added to the message queue, thus avoiding the high-frequency messagequeue mutex lock/unlock. Instead, the event loop wakes up after a delay, sends the accumulated work, then goes back to sleep for another delay. During the delay, wants and cancels are accumulated. This allows the session goroutines to add items to message queues without contending with the messagequeue event loop for the messagequeue mtuex.
The delay dynamically adjusts, between 20ms and 1 second, based on the number peers. The delay per peer is configurable, with a default of 1/8 millisecond (125us).