Solving GPU Vsync/Swap Latency in Game Engines

This blog entry focuses specifically on latency imposed by frame buffer queues on GPUs, which are commonly set to a max of 3 or 4 frames. There’s other forms of latency – input latency, network latency – and I won’t be diving into these here.

In the context of a game engine pipeline, the majority of latency will be a function of filled queues. If queues are not filled, or are on average only partially filled, then the latency is the measure of how many frames of data is currently contained in that queue. In the case of a typical game engine, there is always a multi-frame GPU command buffer queue and it’s default setting is usually three (3) frames. If that queue is full, then minimum latency at that moment is 3 frames (50ms at 60hz).

That sounds bad, yes. But it doesn’t mean that’s the effective latency. Most of the time, games can have a command buffer queue that can hold 3 frames and rarely (if ever) use it. Let’s take a look how this becomes a very interesting and complicated problem.

How to Measure Effective GPU Latency

If you mix a fast CPU with a slow GPU, it’s going to cause latency to spike toward the worst case.

  • When a GPU is running very fast in any pipeline model, it will drain the buffers as fast as the CPU can create them. No frames in queue means no latency.
  • When a GPU is running slower than the CPU then the CPU will keep pushing frames into the GPU’s queue until it maxes out the latency debt.
    • Latency = maxQueuedFrames * GpuRenderTimePerFrame

An important take-away here is that actual latency is all about the ratio and balance of CPU and GPU workloads. This is why latency will be fine on one PC and then horrible on another PC with slightly different specs. If you mix a fast CPU with a slow GPU, it’s going to cause latency to spike toward the worst case.

How VSYNC Makes a Mess of Things

An interesting thing happens when we consider the effect of turning on vsync. Vsync creates what I call an artificial bottleneck on the GPU. For example, if you have a game running 200fps on average then the worst-case GPU latency equation may look like this:

Latency Equation:
  maxQueuedFrames(4) * GpuRenderTimePerFrame(1000ms/200fps = 5ms)

Simplifies to:
  4 * 5ms = 20ms MAX LATENCY

Twenty milliseconds isn’t great but it’s not bad either. Most devs and players will hardly notice. After we turn on traditional VSYNC (60hz):

  maxQueuedFrames(4) * GpuRenderTimePerFrame(1000ms/60fps = 16.7ms)

  4 * 16.7ms = 67ms MAX LATENCY

70ms. OUCH.

Worse, if you were getting 200fps before vsync then you know for sure that your CPU is pushing frames to the GPU way faster than 60fps, ensuring that the latency queue is always maxed out. Turning on vsync often leads to an immediate and permanent latency spike. This is why in a vast majority of game engines (Unity included) latency feels great when you have vsync turned off, but can suddenly become a quagmire when you flip vsync on.

Of course the first thing we all do at this point is set the maxQueuedFrames to 3 instead of 4. This shaves off 16.7ms and depending on the game might just be enough to get the game shipped to customers. Three cheers for cheap workarounds!

There are many factors to consider when taking latency measurements and making latency assumptions:

  • Vsync is never enabled in the Unity Editor, and GPU command buffer queues are reduced to 1 or 2 in the Editor
    • conclusion: latency measurements taken from the Player mode in the Editor are useless
  • Unity Fixed-function Pipeline classically leaned toward CPU-heavy workloads, and was comparatively light on GPU workloads (aka ‘mobile friendly’)
    • conclusion: latency was rarely an issue because the GPU was waiting for the CPU
  • Unity running HDRP and URP workloads is more likely to have bottlenecks on the GPU, causing queues to fill and latencies to increase

Ergo, latency has become a more widespread issue.

A Sinister Timing Scenario

The workloads don’t even need to be off by much. Take for example the following parameters and notice that the CPU thread is running just a wee bit faster than the GPU:

Target Framerate (VSync)60.0hz (16.7 ms)
Main Thread (CPU)60.25hz (16.6 ms)
Graphics Thread (GPU)60.0hz (16.7 ms) [vsync locked]
The GPU average framerate is running just below the target.

Given the above performance profile, the CPU will slowly out-pace the GPU, generating an additional frame of content every 4 seconds (240 frames). The GPU queue will keep growing until it hits the max (3 frames after 12 seconds) . Suddenly we’re experiencing worst-case latency even though the CPU is running ahead of the GPU and VSYNC by only a fraction of a millisecond, and we have no perceptible drop in framerate to clue us that low framerate is the cause. This is why it’s so important to have benchmarking tools that measure both average framerate over time, and average GPU back buffer queue fillrate over time.

This hyper-sensitivity is also the reason we can’t just hope to solve this problem with cracker-jack timing tricks. We need something better – we need a way to compensate for all the situations when unknowns happen and CPU/GPU timing becomes skewed.

Solving Vsync-imposed GPU Latency

The popular strategy is to try and pace the CPU to the GPU, so that the CPU only feeds the GPU frames when the GPU’s buffer is nearing empty. This is a very complicated method of effectively reducing the GPU’s command buffer queue to 2. So if you want to go this route, save yourself some time and just shorten your GPU command buffer queue whenever vsync is enabled. If your game never has performance wobble on the CPU or GPU, this will work well.

The better way to look at the problem of vsync-imposed GPU latency is to consider that we’re losing time by slide-showing old data to the user.

Hopefully this timeline helps visualize the Vsync Slideshow problem.

After just two laggy frames on the GPU, the next frame in the queue is now over 40ms late from the CPU, and the GPU can never catch up because of vsync. But wait – notice that at the point Frame 3 is being flipped onto the user’s display Frame 4 is already rendered by the GPU. Wouldn’t it be cool if we could give Frame 3 a pass, and flip right to Frame 4?

Turns out we can, with a little creativity.

First, use the Per-Frame Vsync Control flag feature widely available in almost every modern GPU API. It allows setting arbitrary frames as vsync enabled vs disabled. Next, make sure the main thread (CPU) is using a Scheduled Fill model, and that it’s schedule is set to match the vsync rate reported by the GPU. The animated illustration earlier shows a Scheduled Fill main thread.

If you aren’t familiar with Scheduled Fill, you can read up on it here. The quick summary is that the Main Thread (CPU) implements its own simulated vsync timer that throttles the rate at which it pushes new frames to the GPU. If the main thread is using an opportunistic fill model, then as soon as the GPU discards frames, the CPU will opportunistically re-fill that queue with new frames. I call this ‘CPU Over-Submission.’ The GPU will be constantly saddled with work, and will end up submitting all frames as vsync disabled. The deltaTime behavior of the CPU will be all over the place. It would be effectively the same as a classic vsync-disabled flip policy.

With Scheduled Fill combined with Per-Frame Vsync Control, we can predict the behavior of the three performance profile situations:

GPU BottleneckCPU blocks due to full GPU queue, and falls into classic variable DeltaTime operation. GPU falls back to screen tearing mode to improve GPU-bound performance until queue is drained.
CPU BottleneckNothing special here… low framerate and also low latency (which is expected for any engine pipeline)
No Bottlenecks GPU should operate almost entirely without screen tearing, except in edge-case situations where cadence between CPU and GPU become out-of-sync. In practice, no more than 1 of these per minute should be observable.

As it happens, screen tearing is a pretty excellent way to cope with the occasional GPU performance bottleneck. Screen tearing only becomes evident to the eye when it occurs repeatedly. A one-off tear is nearly imperceptible (as refresh rates increase to and beyond 120hz, noticeability decreases further). It’s a low-impact alternative to a comparatively high-impact latency problem.

It would be extra cool to be able to could retroactively change the vsync flag for an already submitted, but not yet processed, command buffer. Modifying submitted command buffers directly is too risky, but instead what could be provided is an async-friendly (lock-free) override toggle that the GPU can sample at the point it executes logic for the next flip. If any GPU API/driver authors are reading: I’ve wanted that feature as of 10 yrs ago!

Wrapping it all up: Now you can Always Enable Vsync

The old rule-of-thumb for vsync is that vsync should be used any time the GPU is not a bottleneck, and vsync should be turned off any time the GPU is a bottleneck. Doing so at runtime requires that the GPU be aware of whether or not it’s the bottleneck. Given the assumption of a Scheduled Fill Main Thread, the GPU can finally make its own judgement about when it’s a bottleneck or not based on the state of its queue.

The beauty is that it solves the question of when to enable vsync. Once we can allow our GPU to decide when to use vsync on a per-frame basis, there’s no longer much need to wonder if we should enable or disable vsync globally. Just turn vsync on and if the GPU becomes a bottleneck, it will automatically switch to vsync-disabled behavior to help keep pace.

The only requirement is that the Main Thread run according to a schedule that matches the current hardware vsync. And these days that’s a pretty easy as all modern hardware sport high precisions core-coherent timing mechanisms suitable for matching a vsync timing as reported by your device driver to a very high accuracy. And to be honest, in my experience even vanilla millisecond resolution is fine enough to maintain perfectly smooth rendering at 120hz. (higher accuracy will benefit higher refresh rates)

The kryptonite for Scheduled Fill is when the actual timing of the device’s vsync is unknown. There’s literally no good solution in this scenario. In such case you pretty much need to fall back to a double-buffer on the GPU or force-disable vsync entirely, or accept that latency will probably be an issue. The upside is that such lack of info is extremely rare these days since most streaming media services also depend on having precise knowledge of device refresh rates in order to play videos.

Turning off Stable Fit Shadowing

Today I realized that I’d totally busted shadow behavior in Unity with one of my Orthographic Camera Tips, relating to some very interesting and unexpected behavior in Unity’s Stable Fit shadow mode (the default shadowing mode in a built-in render pipeline). I updated the entry accordingly. The quick jist is to use Close Fit shadows for Isometric gameplay projects because Stable Fit depends on a concept of camera depth that doesn’t exist in orthographic projections.

If you have Stable Fit enabled then this is the kind of behavior you’ll observe when changing the near plane of your orthographic camera. Pay close attention to the shadows under the sphere and along the wall edges
Edit -> Project Settings -> Close Fit – for the Orthographic Win!

Orthographic Camera Tips for Unity3D

During the process of accomplishing some modest goals in isometric gameplay, I’ve taken notes on mistakes and oversights I made while rigging my first orthographic cameras.

(shameless SEO paragraph) The orthographic camera is a staple of Isometric Gameplay Environment, such as both realtime (RTS) and especially turn based (TBS) strategy games. In this entry, I will explain how I implemented a flexible solution to controlling the camera, within the context of the Unity3D 2019 engine. (/end SEO)

Tip #1: Use X/Z coordinates for landscaping

The classic view of an isometric map has X and Y axis for the map surface, and Z as the Up Vector. This has a long history in isometrics being used primarily to depict 2D landscapes, and 2D coordinate systems naturally use X/Y mnemonics. Contrast this with most 3D engines, and Unity specifically, which treat the X and Z vectors as map surface, and the Y as the Up Vector.

Why does it matter? Because the majority of GameObjects, models, and Unity components (most notably the Physics components) default to the assumption that the Up Vector is the Y vector. More importantly, Unity has provided named constants for the Cardinal Vectors for us — Right (x), Up (Y), Forward (Z) — and if we re-orient the entire world to suit a paradigm where Z=Up, those named constants become confusing/misleading. We can remap the Physics Up Vector (aka gravity), and we can re-orient models at runtime using extra transforms (less efficient but not a show-stopper), but we cannot redefine Unity’s built in constants for Vector3.Up, Vector3.Right, and Vector3.Forward.

For this reason I strongly suggest orienting your world map structure such that XZ are the landscape coordinates and Y is UP. This will save you a ton of headaches later on. It’s fine to have XY data in your map assets, but do convert it to XZ data so that you can operate on it using the named constant vectors, and orient your models in an expected way.

Tip #2: Disable the Near Plane Clipping

Size and Clipping Planes define the Orthographic Frustum. For best results, the Near Plane should be set to a very large negative number.

Go ahead and set that Near Clipping Plane to -1000 (usually default value is 0.1 or something), or -100000. If you don’t do this, your camera’s just going to be a total pain to work with and it’s going to trick you into doing some bad things to try and work around seemingly nonsense object clipping behavior.

The technical rationale: The visible area of the screen (camera plane) is determined according to the size parameter, and that combined with the Near/Far clipping planes determine the frustum. The way the frustum works is pretty similar to perspective cameras, meaning that objects are drawn or clipped according to depth — which is a problem because Orthographic views do not have a valid concept of depth.

The only truly valid clipping for an orthographic view is defined by the rectangle of the screen on which the image is being projected. This clipping area is defined by the Size parameter alone. If an object, orthographically projected, lives within that space, it should be drawn. If it lives outside that space, it should be clipped. This is the intended operation of an orthographic projection.

Therefore, I personally disable both near and far clipping planes (eg, set them very large, eg -100000) though there could be some advantage for setting the far clipping plane to something lower, if you happen to be implementing an oblique orthographic view where the camera is set low on the horizon. Maybe. It’s a hard sell.

Tip #3: Turn off Stable Fit Shadow Projection

(I have only verified this option using the classic Built-In Render Pipeline – behavior for Universal Render Pipeline surely differs)

Edit -> Project Settings -> Close Fit – for the Orthographic Win!
If you have Stable Fit enabled then this is the kind of behavior you’ll observe when changing the near plane of your orthographic camera. Pay close attention to the shadows under the sphere and along the wall edges

Personally, I would recommend to turn off Stable Fit shadows for any style of game that isn’t a first person shooter or VR headset game. It’s especially bad for orthographic cameras because the cascade selection is based on some dodgy heuristic involving near and far clipping planes and our disabling the near plane on our Orthographic Camera totally breaks that heuristic.

(note: by my opinion Stable Fit is an outright cheat tailored to the FOV setting used for FPS and VR, where near plane is almost always 0 and far plane is also typically not so far from the near plane – it should probably be disabled for most third-person gameplay contexts as well as anything using orthographic perspectives. Why is it even the default setting?).

Secondly, disable cascades. Shadow cascades don’t work at all in orthographic cameras, since those depend on camera depth measurements, and orthographic cameras (surprise!) have no true concept of depth. If the cascades did do anything, it would be some nonsense behavior where shadows would cascade near the edges of the screen.

Finally, set Shadow Distance to a large value, like 10000. Again, none of these “camera depth” or “camera distance” concepts make sense in the context of an orthographic projection.

Tip #4: Editor Orthographic Scene View

Yeah, I know, this one’s going to be obvious for most folks. It took me a while to figure it out and, interestingly, Unity Editor does a much better job of switching between perspective and orthographic camera modes than the GameObject camera does. It even has this snazzy zoom transition thing! That’s probably worth a blog article to itself because you’ll notice it’s actually non-trivial to implement a similar behavior using the Camera GameObjects within the game itself.

(hint: I assume it animated the FOV of the perspective camera before hard-switching to the orthographic camera – combined with additional calculation to determine the orthographic size parameter as a function of perspective camera distance from an object, etc. – but I have not verified it)

Tip #5: Move and Orient your Camera in 3D

Re-printed from Wikipedia / SharkD / CC BY-SA
(https://creativecommons.org/licenses/by-sa/3.0)

If you come from a 3D gameplay or Unity background, this may feel obvious. But if you come from 2D/isometric gaming and art backgrounds, then you may have been thinking of orthographic projections in terms of the 2D angular components that form an orthographic illustration. What I mean by that is these 2D angles as illustrated by Wikipedia’s topics on Orthographic Projection (side).

Now is a good time to change that paradigm.

It’s true that you can develop Isometric gameplay within the context of what people have traditionally called 2.5D – a pseudo 3D environment where almost everything is 2D except a few specific bits and pieces needed to sell the illusion. This approach has its merits if you are developing your own 2.5D orthographic engine with a single fixed-function camera projection pipeline. We are not. We are using Unity, and Unity is all about full-3D environments, with full 3D models. These depend on 3D world space coordinates for our camera.

As a specific example, the modern physically-based material pipeline has specular and reflection features that depend on the camera’s full 3D position. If you don’t make an effort to handle your camera as though it exists in depth-honoring 3-dimensional space then you’ll likely have to avoid these effects entirely and use only Unity’s non-reflective legacy materials, such as Legacy Diffuse. While I’m sure it’s possible to develop a 2.5D isometric game in Unity, likely using 2D sprites instead of 3D models, this would be an exceptionally challenging task and is well outside the scope of today’s tips.

It can be helpful to have world axis orientation lines similar to those in the photo above. I made a lightweight component script to help me visualize those lines (extra helpful when paired with tiled floor texture)

void Update() {
    Debug.DrawRay(Vector3.zero, -Vector3.right   * 15, Color.red,   0.5f);
    Debug.DrawRay(Vector3.zero,  Vector3.up      * 15, Color.green, 0.5f);
    Debug.DrawRay(Vector3.zero, -Vector3.forward * 15, Color.blue,  0.5f);
}

Tip #6: Verify Your Camera Position in the Editor View!

Is your orthographic camera really where you think it is? If you’re basing it on what you see in the Game View (Camera Preview), then the answer is very likely “probably not”.

A perspective camera gives you depth perspective by which to help clue you in about the position of the camera within a world. If the camera is too close or too far from an object, well, you clearly can tell. The orthographic camera is not so fortunate: depth does not really exist and so there are no good clues about how far the camera is from the objects in view. In fact, given any specific view on your screen, there are literally an infinite number of possible camera orientations along a vector which will produce the exact same Camera View result. This can fool you into thinking you have a sensible position for your camera when, in fact, your camera could be a million miles away. Or it could be at 0,0,0.

Just for a bit of mind-warping awesome, consider for a moment the fact that you could — if so inclined — implement an entire Isometric Game without ever changing the height of your camera (Y=0 when following the Y=Up convention). To demonstrate this phenomenon, here’s an image of what might be a typical initial camera setup for an isometric view, pay attention to the Position of the camera:

right panel: y-axis (green) view from the side, with frustum is visualized by the white lines
Observe the camera position in the Scene view on the right panel –
it’s nonsense, but the Game View and Camera Preview kinda look OK.

As you can see, the game view looks nearly identical even though the cameras are vastly different positions. It’s through this illusion that you can think you’ve rigged an orthographic camera correctly but then fail spectacularly when trying to execute a pan or spin around a variable pivot point on your isometric landscape.

So I strongly suggest paying close attention to your camera’s position and behavior inside Unity’s Editor View, at least until the general framework of your camera controls are rigged. Don’t just rely on what you see in that Camera Preview. It can save you time.

Tip #7: Orthographic isn’t a simplification, and some closing words

These tips are helpful for avoiding some pitfalls but are still a long way from rigging more advanced features of a good isometric camera. Isometric views have lots of advantages in terms of gameplay mechanics but are, generally speaking, an added layer of complexity from a game development perspective — unless you decide to limit your game to a handful of fixed-angle views a limited camera movement.

Unity3D compounds that problem, since it is built from the ground up in a manner that limits us from taking advantage of certain “orthographic hacks” that might otherwise be available to a custom 2.5D style game engine. If you are thinking of utilizing an orthographic camera as a means to simplify your game — for example making it “less 3D” and thus less mathematically intimidating — then probably don’t do it. Orthographics in Unity3D become the worst of both worlds: you have to rig everything up as though it’s full 3D perspective, and you have to then filter it through an unorthodox reprojection matrix that hides visual information from you about the state of your world.

It’s my experience so far that robust isometric view game development requires an even stronger grasp of 3D math and methodologies than games built on 3D perspective views. Sure, Orthographics offer gameplay and artistic advantages! But be sure to weigh those vs. the added complexity of rigging cameras that can play by all the weird rules of depth-aliasing.

Rotation Matrices and Looking at a Thing the Easy Way

(part two of two)
(Part 1 – Dots, Crosses, and Looking at a Thing the Hard Way)

In Part One of this series, I explained the process by which I learned that Dot Product and Cross Product are a lousy way to implement a LookAt() or LookRotate() function. The Dot Product works well enough in 2D and for vectors which have some orthogonality to them, but is a poor tool for the purpose of calculating the angle between two free vectors in three-space. The cross product works well for setting up a perpendicular axis, which can be used in AngleAxis(), but the axis is mathematically generated, and has no care about how it might warp of skew unrelated orientations while rotating an object toward the target (famously, UP becomes corrupted, causing an object to twist as it rotates to face another object).

Not to be deterred, I dug deeper into the heart of things, and decided to try my hand at setting up a Rotation Matrix. As it turns out, starting here would have been easier. Much easier. Alas, I’ve been trying to take more top-down approaches to problem solving lately, and (in theory) matrices are lower level than quat rotations. I feel like this is a debatable perspective. Moving on….

What is a Rotation Matrix?

There are plenty of mathematical definitions, and the value of these is likely neigh outside of discovering paths for optimization of complex series of transformations (or re-inventing some very crude mathematical wheels, for which I refuse to do unless being paid to do so). So let’s do the simple awesome game developer definition…

A Unity-friendly column-major identity matrix.

First, a rotation matrix is composed of the three independent XYZ axes, arranged by COLUMN in Unity and OpenGL, and by ROW nearly everywhere else. This is a huge gotcha for many folks, but not something I’m going to spend much time on here.

To the right you can see values for Vector3.Right [1,0,0], Vector3.Up [0,1,0], and Vector3.Forward [0,0,1] plugged into the X, Y, and Z columns. This forms the Identity Matrix. These axes are the same as the vector normals of the rotated object. In the case of the Identity Matrix no rotation occurs. The normals of the object match the world axis.

To further illustrate the relationship of the Rotation Matrix to the normals/axes of the object in question, I threw together this quick script and attached it to an object in my scene:

void Update() {
    var mat = Matrix4x4.Rotate(gameObject.transform.localRotation);
    var r = mat.GetColumn(0);
    var u = mat.GetColumn(1);
    var f = mat.GetColumn(2);

    // these lines will match precisely the axis arrows
    // drawn by Unity Editor.

    Debug.DrawRay(Vector3.zero, r, Color.red);
    Debug.DrawRay(Vector3.zero, u, Color.green);
    Debug.DrawRay(Vector3.zero, f, Color.blue);
}
These three arrows are the vectors that make up a Rotation Matrix.

This gives you a cool little manual/physical exercise, where you can take an object and point it at another object of interest using the Unity Editor, and then inspect the resulting values. And what we will notice – and at risk of stating the obvious – is that the Forward Vector always points directly toward the object. The question is, how do we determine the other vectors?

Making a LookRotation Matrix

As it turns out, a Rotation Matrix is an easy and ideal way to create an object’s rotation oriented according to any vector of our choosing. We have already the Forward Vector (Z), since we know exactly what we want our object to be facing: toward the target! The trick is choosing the other two vectors.

The short answer is that as long as the other two vectors form right angles with our Forward Vector, then the Forward Vector will aim true.

A rotation matrix with an Oblique angle along Green (UP) and Blue (Forward) vectors.

If we build a matrix of vectors that do not form right angles, then the actual Forward Vector of our rotated object will be skewed away from our intended target.

If this is confusing, then take a quick look to the image on the right. I set up a rotation matrix for this camera that is not well-formed – the green (up) and blue (forward) vectors are oblique, they do not form a right angle. Because of this, the true orientation of the camera, as shown by Unity Editor’s gizmos (arrows), is skewed away from the intended Forward vector. Indeed, Unity Editor’s gizmos show up as a set of right-angle vectors that also describe the exact same rotation. This is what we call “aliasing” — for any given orientation in three-space there are infinite Rotation Matrix possibilities that can produce that orientation.

You could even say that the Quaternion itself utilizes this redundant feature of rotation, thus compressing nine values into four values. A quaternion encodes a subset of fixed-angle possibilities into its four values, and disregards the billions of other oblique permutations. There’s a lot of trickery of imaginary math involved to accomplish that, but that is more-or-less a practical explanation of how a Quat does what it does.

I’d like a Triple Right-Angle With Cheese

As we can see, the golden rule for building a well-formed Rotation Matrix is to make sure all three vectors form right-angles with each other. More specifically, if we ensure all vectors are at right-angles, then it also ensures that each cardinal of our object is going to point exactly in the direction we specify.

And the rule for building right-angles? Perpendiculars, of course. Hello again, old friend…

… Sir Cross Product.
(a recent knighthood was bestowed somewhere upon a fictional land which happened to be described predominantly by right triangles)

ALERT! the following table may depend on your game engine’s coordinate system. I’ve shown the cross products according to Unity’s Right-Hand Rule Coordinate system:

Expressed as up, forward, right…Expressed as XYZ…
RIGHT= CROSS(up, forward)X = cross(Y,Z)
UP= CROSS(forward, right)Y = cross(Z,X)
FORWARD= CROSS(right, up)Z = cross(X,Y)
notice that each parallel is formed from the two vectors heading clockwise, if you were to draw x/y/z on the face of a clock.

We have a Forward Vector (Z), cool. So let’s start typing out what our Right Vector (X) will be…

var forward = (target - gameObject.transform.position).normalized;
var right = Vector3.Cross(??, forward);   // crap, what's Y here?

What is the Up Vector (Y) here? So here’s the thing, it depends on what rotation axis we want to lock. Typically, the goal is to lock the Up Vector so that it matches the original object’s UP, where up is a vector that points through the top of a box, or through a model’s “head”, or which orients a camera so that the camera image matches the world’s concept of “up”. 99% of the time, this should be the same as the constant Vector3.Up. And that is our answer:

// note that object_cardinal_up is not the same as gameObject.transform.up!
// the cardinal up vector for the object is usually a constant
// (unless the object's model is not well-formed) and should
// usually be Vector3.Up.

var object_cardinal_up = Vector3.up;
var right = Vector3.Cross(object_cardinal_up, forward);

Next up, calculate the Up Vector (Y) for the matrix, which is in no way related to the Up vector of the object. Indeed, it is simply the cross product (right-angled) of the two vectors we already have:

var up = Vector3.Cross(forward, side);

Finally, assign them to a matrix and apply the rotation:

var mat = Matrix4x4.identity;
mat.SetColumn(0, right);
mat.SetColumn(1, up);
mat.SetColumn(2, forward);

// Unity gotcha #677: remember to always assign to
// localRotation, unless you have a really well-
// understood (aka "good") reason not to.

gameObject.transform.localRotation = mat.rotation;

Boom, Done. We just implemented our own LookAt() / LookRotation() ! When I put it all together, it looks like this:

void LookAt(Vector3 target) {
    var object_up = Vector3.up;
    var forward = (target - gameObject.transform.position).normalized;

    var right    = Vector3.Cross(object_up, forward).normalized;
    var up       = Vector3.Cross(forward, right).normalized;

    var mat = Matrix4x4.identity;
    mat.SetColumn(0, right);
    mat.SetColumn(1, up);
    mat.SetColumn(2, forward);

    gameObject.transform.localRotation = mat.rotation;
}

Verification Time

It’s not enough to just test this code in a static scene with some pre-set angles. There’s too many ways I could have messed up my math in a way in which it just happens to work when some axis is aligned to some other axis. You see this a lot when people post answers on Unity Support Forums, where an answer will classically only work given some specific orientation of objects in some specific scene. Maybe the camera has to be facing forward, or maybe the object has to be at 0,0,0, etc. So to verify my snippet for the scope of this blog, I made a sample that includes:

  • a moving camera
  • a moving target
And this is the final result of my custom hand-crafted LookAt() function,
following a sphere being kicked around by some cheap physics…

Finally – for a more robust test, I could have moved the camera long multiple axes to verify behavior in all four quadrants of Cartesian Space. In this case I opted not to, because the visual result of that test looks silly since I didn’t model out a bottom area to my game board.

Some Useful Link(s)

Lots of math sites on the internet. A lot of them basically suck. I’ve linked here the ones that helped me better understand this problem and how to solve it intelligently and with gameplay constraints in mind.

https://www.continuummechanics.org/rotationmatrix.html <– all around great

http://www.euclideanspace.com/maths/algebra/vectors/angleBetween/index.htm <– nice clear diagrams, don’t trust the code so much

Dots, Crosses, and Looking at a Thing the Hard Way

(part one of two)
(Part 2 – Rotation Matrices and Looking at a Thing the Easy Way)

In my game world I have a camera, and as it turns out, I often need it to look at something important. If you want to orient something so that it’s pointing at another object, how would you do that?

In Unity3D, the problem is solved for us:

// make the gameObject point at 0,0,0
gameObject.transform.LookAt(Vector3.zero);

But I’m going to put my “for the sake of academics” hat on (it looks almost brand new… maybe I should wear it out and about a little more often). How would one solve this problem without calling LookAt() or Quaternion.LookRotation()?

Dot Product to Find the Angle

As plenty of online materials will tell you, we use the DOT PRODUCT to find the angle between two vectors. Specifically we find the arc-cosine of the dot product of the vector normals. Or in software engineering context:

// C-style pseudo-syntax
float angle_rads = arccos(
    dot(
        normalized(camera_position), 
        normalized(target_position)
    )
);

// written in Unity 3D it looks like:
var angle = Mathf.acos(
    Vector3.Dot(
        camera.transform.position.normalized,
        target.transform.position.normalized
    )
) * Mathf.Rad2Deg;

Ok. great, except… dot product is entirely the wrong way to solve this problem.

It turns out that the dot product is notoriously fickle and probably the wrong tool for the vast majority angular problems. It is useful for determining the quadrant in which an angle lives – eg, if two vectors are acute, obtuse, positive, or negative. It is also useful for decomposing vectors into their axis-aligned components and solving 2D angular problems (such as projecting a lightsource or reflection onto a plane – the plane is 2D). But for the purpose of calculating a precise angle in three-space? A dot product is a whole lotta “meh”.

You can feasibly make the dot product work for this problem by decomposing the vectors into their axis-aligned components, taking several individual Euler readings, and applying rotations in a deterministic order. Tons of ill-advised forum posts and StackExchange answers are barking up this tree. I’m interested in something more rooted. More clever. Less “meh.”

AngleAxis to the Rescue?

The first thing I learned while tackling 3D gamedev: When dealing in three-space, XYZ angle values are not the tool you should be using. Of course not, you might say! Use Quaternions!

Yes, that’s only half the answer tho. What also matters is how you create those Quaternions.

The thing with Quats is that they’re really just a useful internal representation when concatenating angles. Euler angles fail mainly because they run into axis-aliasing issues at the 0, 180, and 360 degree positions – it’s this aliasing that causes gimbal lock and also makes it very cumbersome to interpolate between two orientations. Quats work around that restriction nicely from a SIMD-enabled Computer Science perspective (4 floats == SIMD word). But Quats still have plenty of issues when rigging gameplay, if all you’re doing is invoking Quaternion.Euler(x,y,z). As a result, I changed my paradigm and try to use Unity’s Quaternion.AngleAxis based orientations to solve three-space orientation problems.

The typical use case for AngleAxis is to imagine a line through the object you need to rotate, perpendicular to the target orientation, and then that line becomes your rotation axis. The common case is that an object has been rotated, and you need to spin it according to its new rotation. To spin that like a top, imagine an axis cut through it like so:

The typical way an Axis is calculated is by taking the original UP or FORWARD axis for the object and transforming that in the same way the object was transformed. The following snippet can be attached to the Update() script for the above cylinder:

// tilt our game object - almost always use localRotation!
gameObject.transform.localRotation = Quaternion.Euler(0,15,40);

// re-orient the UP axis to match the object's orientation
// (the resulting axis looks like
var axis = gameObject.transform.rotation * Vector3.up;

// remember, quats need to be multiplied in REVERSE ORDER.
gameObject.transform.localRotation = Quaternion.AngleAxis(spinAngle, axis) * gameObject.transform.localRotation;

// animate spinAngle, cuz it's fun.
spinAngle += Time.deltaTime * 100.0f;
The end result.
I could watch this all day.

So now the question on my mind – can we use this to rotate a CAMERA to look at a TARGET?

Time to Doodle some Triangles

At risk of over-simplifying: In order to use AngleAxis, we need an ANGLE and an AXIS. In theory, we already have the tool we need to get our angle: the arccos of the dot product of the position normals (explained above). So how do we find a suitable axis?

The first step in most (maybe all?) of these kinds of math problems is to try and build either triangles or parallelograms out of the scene. From there, a whole array of proofs (aka, “math shortcuts”) may become available to help solve any given three-space problem.

In the case of our world camera, we don’t need to care about what the camera is looking at currently. All that matters is where it’s looking when rotation=0,0,0 and what it will look at relative to its current position. Smooth interpolation between current and target orientations can be handled as a separate problem later. When reset to rot=0,0,0, a camera is facing along the forward vector. So let’s draw our camera and then augment some triangles along that forward vector just to get a better idea what’s happening:

What I’ve doodled here is a right-angle triangle between the forward vector of our camera, and the thing we want our camera to look at. What I want to be able to do is create a virtual axis through the camera on which I can spin the camera, such that it will eventually cast its gaze upon the point of interest. For this purpose I drew a new doodle from the perspective and forward views:

If I spin this perpendicular axis, the camera will eventually point toward the Point of Interest. And the tool for getting perpendiculars is the Cross Product.

So it looks like what I want is the perpendicular to the vector that traces the path from the camera to the point of interest. The tool for calculating perpendiculars is the Cross Product.

Cross Product to Find the Perpendicular

The Inputs into this tool will be the Forward Vector and the position of the camera relative to the point of interest, defined as (camera_position - target_position):

var relative_position = camera.transform.position - target_pos;
var cross = Vector3.Cross(-Vector3.forward, relative_position.normalized);

// and now earlier dot-product calculation along same vectors:
var angle = Mathf.acos(
    Vector3.Dot(-Vector3.forward, relative_position.normalized)
) * Mathf.Rad2Deg;

// Finally, using AngleAxis to glue these together
camera.transform.localRotation = Quaternion.AngleAxis(angle, cross.normalized);

As an added benefit, I discover a handful of form/exchange answers that mention this very process. Could it be that I’ve done a good thing?

Let’s apply the code to a simple test scene that rotates the camera around a quad, and see what we get:

well that looks like shit.
The green lines in the right hand scene view is the cross product visualized.

So what went wrong? Short answer is, the cross product as a tool for calculating the axis of AngleAxis is quite limited, the dot product is a shitty tool for calculating the angle between vectors, and combining them creates a synergy of unrelenting headaches.

The long answer is that cross product may encounter a failure condition any time you have a coordinate system with three degrees of freedom, because the third degree — in our case the spin of the camera — becomes part of the equation in solving for the target. The math just doesn’t care that the UP vector is being tossed on its head (literally, in this case). All it cares is that the target is the focus of the rotation. Added to that, the dot product fails at various perpendicular axis situations, causing more things to flip on their head, or out of view, depending.

A popular workaround online is to recalculate the UP vector using the cross between the newly oriented RIGHT (x) and FORWARD (z) vectors… basically spinning the camera back toward UP after the initial AngleAxis corrupted it. But that doesn’t fix the problems with the dot product. Working around those within the context of three degrees of freedom is very cumbersome… ugh.

At this point I decided this is going nowhere, fast, and that I need to take a step back and look at this problem from a different ang… perspective. (sorry, pun, today is not your day.)

Onto Part 2 – Rotation Matrices and Looking at a Thing the Easy Way

Swapping Unity Editor Shortcuts Profile On Player Focus

An excellent fix for Unity Editor’s “going rogue” problem when developing desktop PC games with keyboard bindings.

Source Download: SwapEditorShortcutsOnPlayerFocus (GIST)

Unity still has its roots in mobile game development, and it reminds you of this fact in peculiar ways. One of the first things I did in my Unity project was bind CTRL to affect mouse behavior (for click and drag movement) and bind 'Z' for changing view zoom level. As I played my test game I discovered, to my horror, that Unity Editor was executing “Undo” actions in my scene even though my game window was selected. Fortunately changes to the scene are transient while running the Player, and everything goes back to normal when play stops. Though apparently, you may not be so lucky if you type CTRL+N which may (or may not – I didn’t confirm this) wipe your scene from existence.

This sort of keyboard binding is pretty standard for PC games, but Unity is totally incapable of dealing with it out-of-the-box. And little wonder: it’s an entirely alien way to implement user input for mobile games. You simply never even think about conflicts of keyboard controls within a game when you’re a mobile games developer. I’m a desktop PC developer (at this moment!) so I need a workaround or fix.

I evaluated a handful of suggested workarounds and one GIST’d script meant to solve the issue. None were to my satisfaction, so I wrote my own.

Usage

In order to use this gist, you will need to create a Player profile in your Unity Shortcuts, via menu item Edit->Shortcuts:

(I strongly suggest leaving CTRL+TAB bound, to allow for quick nav between player and editor)

… and then running the game from Player, you should see this console output as your player gains and loses focus:

Enjoy!