Use sync.Pool to remove channel contention #2025

withinboredom · 2025-11-23T18:48:12Z

This PR adds sync.Pool direct handoff instead to both regular and worker threads, building on the semaphore-based FIFO approach in fix/latency. The pool eliminates the remaining channel contention and provides significant tail latency improvements, particularly for workers.

By using sync.Pool, we achieve a lock-free per-P dispatch, reducing contention dramatically. The dispatcher tries the pool first and then falls back to the semaphore implementation.

Benchmark Results

Workers (8 threads, 500 connections, 15s)

Configuration	Req/sec	P50	P75	P90	P99	Threads	Global RQ
latency branch	78,161	6.03ms	8.08ms	11.19ms	18.38ms	45	145
pool (this PR)	76,259	6.06ms	7.45ms	9.03ms	12.99ms	46	217
Improvement	-2.4%	+0.5%	-7.8%	-19.3%	-29.3%	+1	+49.7%

Regular Threads (8 threads, 500 connections, 15s)

Configuration	Req/sec	P50	P75	P90	P99	Threads	Global RQ
latency branch	42,096	11.46ms	12.35ms	13.35ms	17.18ms	62	3
pool (this PR)	42,592	11.40ms	12.26ms	13.27ms	15.95ms	67	11
Improvement	+1.2%	-0.5%	-0.7%	-0.6%	-7.2%	+5	+267%

Low Load (10 connections, 1 thread, 15s) - Regular Threads Only to show no difference

Configuration	Req/sec	P50	P99
baseline (main)	38,354	222µs	650µs
latency branch	37,977	225µs	657µs
pool (this PR)	38,584	222µs	643µs
Improvement	+1.6%	-1.3%	-2.1%

Thread Affinity Tradeoff

Workers previously had a low-index affinity and would target the same threads under a low load (t[0], t[1], etc.). This minimised resource initialisation when frameworks lazily created resources (e.g. database connections). The new behaviour in this PR uses sync.Pool which uses per-P (processor) locality, distributing requests across multiple threads even under low loads.

I think this is actually a good thing, as the previous behaviour is actively dangerous in production scenarios.

Consider a scenario with 120 worker threads for an i/o heavy workload (this is actually our advice in multiple issues). Under normal load, maybe only t[0-80] are usually active, and thus only 80 database connections are open. On a Black Friday, the load spikes, and we activate t[101] for the first time, but it exceeds a database connection limit of 100, causing cascading failures during peak loads.

This is the worst possible time to discover resource limits.

With sync.Pool, a normal load eventually cycles through all 120 workers, ensuring no surprises under load. It’s worth noting that with per-P locality there’s a high probability you’ll get the same worker on the same connection, keeping various caches (CPU L1/L2/L3, etc.). This is actually probably better than the low-index affinity for cache efficiency.

Further improvements

Further improvements will result in diminishing returns. Based on eBPF profiling of the pool implementation under load, the remaining overhead is well-distributed:

Syscall profile:

futex: 902K calls, 211s total: significantly reduced from baseline’s 1.1M+ calls
sched_yield: 58K calls, 5.3s: intentional scheduler balancing (kept from earlier work)
File I/O (openat/newfstatat/close): ~2.8M operations, 11s: PHP script execution
nanosleep: 177K calls, 47s: timing operations

Off-CPU profile shows time is now spent primarily on:

PHP memory allocation (_emalloc_192) and hash operations (zend_hash_index_find)
Go work-stealing and scheduling (normal runtime overhead)

The Go-side dispatch optimisation in this PR has eliminated the primary bottleneck (futex contention on channels). The remaining time is spent on productive work (PHP execution) and unavoidable overhead (file I/O, timing). Future optimisation would need to focus on PHP internals, which are outside the scope of FrankenPHP’s Go dispatcher.

All benchmarks are done with significant tracing enabled and thus may be exaggerated under real workloads.

Signed-off-by: Robert Landers <landers.robert@gmail.com>

withinboredom · 2025-11-23T19:39:02Z

merging to get tests to run 😂

withinboredom added 2 commits November 23, 2025 18:52

add pool for regular threads

3036b6d

Signed-off-by: Robert Landers <landers.robert@gmail.com>

add worker pool

fc405c2

Signed-off-by: Robert Landers <landers.robert@gmail.com>

withinboredom merged commit 9822b38 into fix/latency Nov 23, 2025

withinboredom deleted the pools branch November 23, 2025 19:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use sync.Pool to remove channel contention #2025

Use sync.Pool to remove channel contention #2025

withinboredom commented Nov 23, 2025 •

edited

Loading

Uh oh!

withinboredom commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Use sync.Pool to remove channel contention #2025

Use sync.Pool to remove channel contention #2025

Conversation

withinboredom commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Thread Affinity Tradeoff

Further improvements

Uh oh!

withinboredom commented Nov 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

withinboredom commented Nov 23, 2025 •

edited

Loading