Skip to content

proposal: sync: add GetLocal() method to sync.Pool in order to return only P-local objects #65104

@valyala

Description

@valyala

Proposal Details

Currently sync.Pool.Get() works in the following way:

  1. It returns P-local object if there is one.
  2. Otherwise it steals an object from other P workers if there are no P-local objects.

This logic works great on systems with small number of CPU cores, but it may significantly slow down on systems with many CPU cores because of the following reasons:

  • Stealing an object from other P workers requires reading from the memory, which is missing in local CPU cache. This reading may be multiple orders of magnitude slower than reading from the local CPU cache. Then the CPU must remove the stolen object from foreign P queue. This requires writing to foreign memory, which, in turn, requires potentially very slow inter-CPU cache flush/update for the updated memory location.
  • When the stolen object is returned from sync.Pool.Get(), its contents may be still missing in local CPU cache. So further work with this object may be much slower comparing to the work with P-local object.
  • The probability of a inter-CPU object ping-pong between different P caches at sync.Pool increases with GOMAXPROCS (e.g. the number of CPU cores), since all the P-local caches will have at least a single object only after all the P workers simultaneously execute the same CPU-bound code between Get() and Put() calls. E.g. the problem with sync.Pool inefficiency increases with the number of CPU cores.

It looks like that the way to go is to disallow stealing objects from other P workers. But this may slow down some valid use cases for sync.Pool, when Pool.Get() and Pool.Put() are called for the same pool from different sets of goroutines, which run on different sets of P. Disallowing stealing objects from other P workers will result in excess memory allocations at Pool.Get() call side, since P-local cache will be always empty.

Given these facts, it would be great to add GetLocal() method to sync.Pool, which must return nil if the object isn't found in P-local cache, without an attempt to steal the object from other P caches. Then the GetLocal() method can be used instead of Get() in performance-critical code, which calls Put() for the object returned from the pool on the same goroutine (and P) most of the time.

Update: the current implementation of sync.Pool.Put() already puts the object only to P-local storage, but this isn't documented and can be changed in the future, so it may be worth adding PutLocal() method in order to make the API more consistent.

Summary

  • The proposal doesn't change the behaviour and the performance of the existing code.
  • The proposal allows developers improving sync.Pool scalability and performance on systems with high number of CPU cores, by switching from Pool.Get to Pool.GetLocal at CPU-bound code where the object is returned to the pool at the same goroutine where it has been obtained from the pool.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Incoming

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions