Use sync.Pool to remove channel contention #2025
Merged
+50
−23
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds
sync.Pooldirect handoff instead to both regular and worker threads, building on the semaphore-based FIFO approach infix/latency. The pool eliminates the remaining channel contention and provides significant tail latency improvements, particularly for workers.By using
sync.Pool, we achieve a lock-free per-P dispatch, reducing contention dramatically. The dispatcher tries the pool first and then falls back to the semaphore implementation.Benchmark Results
Workers (8 threads, 500 connections, 15s)
Regular Threads (8 threads, 500 connections, 15s)
Low Load (10 connections, 1 thread, 15s) - Regular Threads Only to show no difference
Thread Affinity Tradeoff
Workers previously had a low-index affinity and would target the same threads under a low load (t[0], t[1], etc.). This minimised resource initialisation when frameworks lazily created resources (e.g. database connections). The new behaviour in this PR uses
sync.Poolwhich uses per-P (processor) locality, distributing requests across multiple threads even under low loads.I think this is actually a good thing, as the previous behaviour is actively dangerous in production scenarios.
Consider a scenario with 120 worker threads for an i/o heavy workload (this is actually our advice in multiple issues). Under normal load, maybe only t[0-80] are usually active, and thus only 80 database connections are open. On a Black Friday, the load spikes, and we activate t[101] for the first time, but it exceeds a database connection limit of 100, causing cascading failures during peak loads.
This is the worst possible time to discover resource limits.
With
sync.Pool, a normal load eventually cycles through all 120 workers, ensuring no surprises under load. It’s worth noting that with per-P locality there’s a high probability you’ll get the same worker on the same connection, keeping various caches (CPU L1/L2/L3, etc.). This is actually probably better than the low-index affinity for cache efficiency.Further improvements
Further improvements will result in diminishing returns. Based on eBPF profiling of the pool implementation under load, the remaining overhead is well-distributed:
Syscall profile:
Off-CPU profile shows time is now spent primarily on:
The Go-side dispatch optimisation in this PR has eliminated the primary bottleneck (futex contention on channels). The remaining time is spent on productive work (PHP execution) and unavoidable overhead (file I/O, timing). Future optimisation would need to focus on PHP internals, which are outside the scope of FrankenPHP’s Go dispatcher.