This blog entry focuses specifically on latency imposed by frame buffer queues on GPUs, which are commonly set to a max of 3 or 4 frames. There’s other forms of latency – input latency, network latency – and I won’t be diving into these here.
In the context of a game engine pipeline, the majority of latency will be a function of filled queues. If queues are not filled, or are on average only partially filled, then the latency is the measure of how many frames of data is currently contained in that queue. In the case of a typical game engine, there is always a multi-frame GPU command buffer queue and it’s default setting is usually three (3) frames. If that queue is full, then minimum latency at that moment is 3 frames (50ms at 60hz).
That sounds bad, yes. But it doesn’t mean that’s the effective latency. Most of the time, games can have a command buffer queue that can hold 3 frames and rarely (if ever) use it. Let’s take a look how this becomes a very interesting and complicated problem.
How to Measure Effective GPU Latency
If you mix a fast CPU with a slow GPU, it’s going to cause latency to spike toward the worst case.
- When a GPU is running very fast in any pipeline model, it will drain the buffers as fast as the CPU can create them. No frames in queue means no latency.
- When a GPU is running slower than the CPU then the CPU will keep pushing frames into the GPU’s queue until it maxes out the latency debt.
Latency = maxQueuedFrames * GpuRenderTimePerFrame
An important take-away here is that actual latency is all about the ratio and balance of CPU and GPU workloads. This is why latency will be fine on one PC and then horrible on another PC with slightly different specs. If you mix a fast CPU with a slow GPU, it’s going to cause latency to spike toward the worst case.
How VSYNC Makes a Mess of Things
An interesting thing happens when we consider the effect of turning on vsync. Vsync creates what I call an artificial bottleneck on the GPU. For example, if you have a game running 200fps on average then the worst-case GPU latency equation may look like this:
Latency Equation: maxQueuedFrames(4) * GpuRenderTimePerFrame(1000ms/200fps = 5ms) Simplifies to: 4 * 5ms = 20ms MAX LATENCY
Twenty milliseconds isn’t great but it’s not bad either. Most devs and players will hardly notice. After we turn on traditional VSYNC (60hz):
maxQueuedFrames(4) * GpuRenderTimePerFrame(1000ms/60fps = 16.7ms) 4 * 16.7ms = 67ms MAX LATENCY
70ms. OUCH.
Worse, if you were getting 200fps before vsync then you know for sure that your CPU is pushing frames to the GPU way faster than 60fps, ensuring that the latency queue is always maxed out. Turning on vsync often leads to an immediate and permanent latency spike. This is why in a vast majority of game engines (Unity included) latency feels great when you have vsync turned off, but can suddenly become a quagmire when you flip vsync on.
Of course the first thing we all do at this point is set the maxQueuedFrames to 3 instead of 4. This shaves off 16.7ms and depending on the game might just be enough to get the game shipped to customers. Three cheers for cheap workarounds!
There are many factors to consider when taking latency measurements and making latency assumptions:
- Vsync is never enabled in the Unity Editor, and GPU command buffer queues are reduced to 1 or 2 in the Editor
- conclusion: latency measurements taken from the Player mode in the Editor are useless
- Unity Fixed-function Pipeline classically leaned toward CPU-heavy workloads, and was comparatively light on GPU workloads (aka ‘mobile friendly’)
- conclusion: latency was rarely an issue because the GPU was waiting for the CPU
- Unity running HDRP and URP workloads is more likely to have bottlenecks on the GPU, causing queues to fill and latencies to increase
Ergo, latency has become a more widespread issue.
A Sinister Timing Scenario
The workloads don’t even need to be off by much. Take for example the following parameters and notice that the CPU thread is running just a wee bit faster than the GPU:
| Target Framerate (VSync) | 60.0hz (16.7 ms) |
| Main Thread (CPU) | 60.25hz (16.6 ms) |
| Graphics Thread (GPU) | 60.0hz (16.7 ms) [vsync locked] |
Given the above performance profile, the CPU will slowly out-pace the GPU, generating an additional frame of content every 4 seconds (240 frames). The GPU queue will keep growing until it hits the max (3 frames after 12 seconds) . Suddenly we’re experiencing worst-case latency even though the CPU is running ahead of the GPU and VSYNC by only a fraction of a millisecond, and we have no perceptible drop in framerate to clue us that low framerate is the cause. This is why it’s so important to have benchmarking tools that measure both average framerate over time, and average GPU back buffer queue fillrate over time.
This hyper-sensitivity is also the reason we can’t just hope to solve this problem with cracker-jack timing tricks. We need something better – we need a way to compensate for all the situations when unknowns happen and CPU/GPU timing becomes skewed.
Solving Vsync-imposed GPU Latency
The popular strategy is to try and pace the CPU to the GPU, so that the CPU only feeds the GPU frames when the GPU’s buffer is nearing empty. This is a very complicated method of effectively reducing the GPU’s command buffer queue to 2. So if you want to go this route, save yourself some time and just shorten your GPU command buffer queue whenever vsync is enabled. If your game never has performance wobble on the CPU or GPU, this will work well.
The better way to look at the problem of vsync-imposed GPU latency is to consider that we’re losing time by slide-showing old data to the user.

After just two laggy frames on the GPU, the next frame in the queue is now over 40ms late from the CPU, and the GPU can never catch up because of vsync. But wait – notice that at the point Frame 3 is being flipped onto the user’s display Frame 4 is already rendered by the GPU. Wouldn’t it be cool if we could give Frame 3 a pass, and flip right to Frame 4?
Turns out we can, with a little creativity.
First, use the Per-Frame Vsync Control flag feature widely available in almost every modern GPU API. It allows setting arbitrary frames as vsync enabled vs disabled. Next, make sure the main thread (CPU) is using a Scheduled Fill model, and that it’s schedule is set to match the vsync rate reported by the GPU. The animated illustration earlier shows a Scheduled Fill main thread.
If you aren’t familiar with Scheduled Fill, you can read up on it here. The quick summary is that the Main Thread (CPU) implements its own simulated vsync timer that throttles the rate at which it pushes new frames to the GPU. If the main thread is using an opportunistic fill model, then as soon as the GPU discards frames, the CPU will opportunistically re-fill that queue with new frames. I call this ‘CPU Over-Submission.’ The GPU will be constantly saddled with work, and will end up submitting all frames as vsync disabled. The deltaTime behavior of the CPU will be all over the place. It would be effectively the same as a classic vsync-disabled flip policy.
With Scheduled Fill combined with Per-Frame Vsync Control, we can predict the behavior of the three performance profile situations:
| GPU Bottleneck | CPU blocks due to full GPU queue, and falls into classic variable DeltaTime operation. GPU falls back to screen tearing mode to improve GPU-bound performance until queue is drained. |
| CPU Bottleneck | Nothing special here… low framerate and also low latency (which is expected for any engine pipeline) |
| No Bottlenecks | GPU should operate almost entirely without screen tearing, except in edge-case situations where cadence between CPU and GPU become out-of-sync. In practice, no more than 1 of these per minute should be observable. |
As it happens, screen tearing is a pretty excellent way to cope with the occasional GPU performance bottleneck. Screen tearing only becomes evident to the eye when it occurs repeatedly. A one-off tear is nearly imperceptible (as refresh rates increase to and beyond 120hz, noticeability decreases further). It’s a low-impact alternative to a comparatively high-impact latency problem.
It would be extra cool to be able to could retroactively change the vsync flag for an already submitted, but not yet processed, command buffer. Modifying submitted command buffers directly is too risky, but instead what could be provided is an async-friendly (lock-free) override toggle that the GPU can sample at the point it executes logic for the next flip. If any GPU API/driver authors are reading: I’ve wanted that feature as of 10 yrs ago!
Wrapping it all up: Now you can Always Enable Vsync
The old rule-of-thumb for vsync is that vsync should be used any time the GPU is not a bottleneck, and vsync should be turned off any time the GPU is a bottleneck. Doing so at runtime requires that the GPU be aware of whether or not it’s the bottleneck. Given the assumption of a Scheduled Fill Main Thread, the GPU can finally make its own judgement about when it’s a bottleneck or not based on the state of its queue.
The beauty is that it solves the question of when to enable vsync. Once we can allow our GPU to decide when to use vsync on a per-frame basis, there’s no longer much need to wonder if we should enable or disable vsync globally. Just turn vsync on and if the GPU becomes a bottleneck, it will automatically switch to vsync-disabled behavior to help keep pace.
The only requirement is that the Main Thread run according to a schedule that matches the current hardware vsync. And these days that’s a pretty easy as all modern hardware sport high precisions core-coherent timing mechanisms suitable for matching a vsync timing as reported by your device driver to a very high accuracy. And to be honest, in my experience even vanilla millisecond resolution is fine enough to maintain perfectly smooth rendering at 120hz. (higher accuracy will benefit higher refresh rates)
The kryptonite for Scheduled Fill is when the actual timing of the device’s vsync is unknown. There’s literally no good solution in this scenario. In such case you pretty much need to fall back to a double-buffer on the GPU or force-disable vsync entirely, or accept that latency will probably be an issue. The upside is that such lack of info is extremely rare these days since most streaming media services also depend on having precise knowledge of device refresh rates in order to play videos.


























