Because 10-bit HDR is a hack and a technical nightmare, and no longer worth the performance optimization over 16-bit float.
The only reason sRGB and HDR exist is to improve the color fidelity of fixed-precision data formats. sRGB shifts the precision of 8-bit integers into the brighter portion of the color space, because humans see detail better at higher luminance. 10-bit HDR shifts things even further into the high-luminance end of the spectrum.
Float-16 precision format provides roughly 14 bits of precision for numbers ranged -2 to 2. The 15th bit provides depreciating precision for the remainder of representable values. The 16th bit provides sign.
The 15th and 16th bits could be considered wasted information in the context of a GPU performing texture lookups. This would be the obvious assumption when viewed from the standpoint of current gen GPU development. My personal feeling is that non-unit float values could prove to be useful and exciting in clever ways, once we make them available to asset creators and engineers
(note that unit values are in the range of 0.0 to 1.0, and are commonly – and incorrectly – referred to as ‘normalized’ values)
Thinking Constructively
These are the things needed for a full 16-bit floating point pipeline:
an image format that supports 16-bit float
wide adoption of 16-bit float by popular image editors not named Photoshop
a willingness to accept an increasingly small GPU memory performance hit for the greater good
Once these things are in place, we can finally say Goodbye to:
sRGB — a thing that to this day hardly anyone actually understands anyway
(we all know secretly most of us just keep trying various sRGB toggles until something looks right)
10-bit HDR surfaces
maddeningly increasing numbers of sRGB <-> HDR resolve steps
Imagine a World Where Color Space Only Matters for Output to Display
Given the traits above, color space considerations would disappear almost entirely. Color space is something that needs to be determined only at the point a finalized image is being sent to “the real world” — aka the display which physically faces a human consumer.
In some rare situations the color space may matter for post processing effects, but even in this case it should have little or no impact on the rest of the render pipeline, and certainly no impact on asset formats, asset creation, and asset management.
This is a future I hope comes sooner, rather than later.
This blog entry focuses specifically on latency imposed by frame buffer queues on GPUs, which are commonly set to a max of 3 or 4 frames. There’s other forms of latency – input latency, network latency – and I won’t be diving into these here.
In the context of a game engine pipeline, the majority of latency will be a function of filled queues. If queues are not filled, or are on average only partially filled, then the latency is the measure of how many frames of data is currently contained in that queue. In the case of a typical game engine, there is always a multi-frame GPU command buffer queue and it’s default setting is usually three (3) frames. If that queue is full, then minimum latency at that moment is 3 frames (50ms at 60hz).
That sounds bad, yes. But it doesn’t mean that’s the effectivelatency. Most of the time, games can have a command buffer queue that can hold 3 frames and rarely (if ever) use it. Let’s take a look how this becomes a very interesting and complicated problem.
How to Measure Effective GPU Latency
If you mix a fast CPU with a slow GPU, it’s going to cause latency to spike toward the worst case.
When a GPU is running very fast in any pipeline model, it will drain the buffers as fast as the CPU can create them. No frames in queue means no latency.
When a GPU is running slower than the CPU then the CPU will keep pushing frames into the GPU’s queue until it maxes out the latency debt.
Latency = maxQueuedFrames * GpuRenderTimePerFrame
An important take-away here is that actual latency is all about the ratio and balance of CPU and GPU workloads. This is why latency will be fine on one PC and then horrible on another PC with slightly different specs. If you mix a fast CPU with a slow GPU, it’s going to cause latency to spike toward the worst case.
How VSYNC Makes a Mess of Things
An interesting thing happens when we consider the effect of turning on vsync. Vsync creates what I call an artificial bottleneck on the GPU. For example, if you have a game running 200fps on average then the worst-case GPU latency equation may look like this:
Latency Equation:
maxQueuedFrames(4) * GpuRenderTimePerFrame(1000ms/200fps = 5ms)
Simplifies to:
4 * 5ms = 20ms MAX LATENCY
Twenty milliseconds isn’t great but it’s not bad either. Most devs and players will hardly notice. After we turn on traditional VSYNC (60hz):
Worse, if you were getting 200fps before vsync then you know for sure that your CPU is pushing frames to the GPU way faster than 60fps, ensuring that the latency queue is always maxed out. Turning on vsync often leads to an immediate and permanent latency spike. This is why in a vast majority of game engines (Unity included) latency feels great when you have vsync turned off, but can suddenly become a quagmire when you flip vsync on.
Of course the first thing we all do at this point is set the maxQueuedFrames to 3 instead of 4. This shaves off 16.7ms and depending on the game might just be enough to get the game shipped to customers. Three cheers for cheap workarounds!
There are many factors to consider when taking latency measurements and making latency assumptions:
Vsync is never enabled in the Unity Editor, and GPU command buffer queues are reduced to 1 or 2 in the Editor
conclusion: latency measurements taken from the Player mode in the Editor are useless
Unity Fixed-function Pipeline classically leaned toward CPU-heavy workloads, and was comparatively light on GPU workloads (aka ‘mobile friendly’)
conclusion: latency was rarely an issue because the GPU was waiting for the CPU
Unity running HDRP and URP workloads is more likely to have bottlenecks on the GPU, causing queues to fill and latencies to increase
Ergo, latency has become a more widespread issue.
A Sinister Timing Scenario
The workloads don’t even need to be off by much. Take for example the following parameters and notice that the CPU thread is running just a wee bit faster than the GPU:
Target Framerate (VSync)
60.0hz (16.7 ms)
Main Thread (CPU)
60.25hz (16.6 ms)
Graphics Thread (GPU)
60.0hz (16.7 ms) [vsync locked]
The GPU average framerate is running just below the target.
Given the above performance profile, the CPU will slowly out-pace the GPU, generating an additional frame of content every 4 seconds (240 frames). The GPU queue will keep growing until it hits the max (3 frames after 12 seconds) . Suddenly we’re experiencing worst-case latency even though the CPU is running ahead of the GPU and VSYNC by only a fraction of a millisecond, and we have no perceptible drop in framerate to clue us that low framerate is the cause. This is why it’s so important to have benchmarking tools that measure both average framerate over time, and average GPU back buffer queue fillrate over time.
This hyper-sensitivity is also the reason we can’t just hope to solve this problem with cracker-jack timing tricks. We need something better – we need a way to compensate for all the situations when unknowns happen and CPU/GPU timing becomes skewed.
Solving Vsync-imposed GPU Latency
The popular strategy is to try and pace the CPU to the GPU, so that the CPU only feeds the GPU frames when the GPU’s buffer is nearing empty. This is a very complicated method of effectively reducing the GPU’s command buffer queue to 2. So if you want to go this route, save yourself some time and just shorten your GPU command buffer queue whenever vsync is enabled. If your game never has performance wobble on the CPU or GPU, this will work well.
The better way to look at the problem of vsync-imposed GPU latency is to consider that we’re losing time by slide-showing old data to the user.
Hopefully this timeline helps visualize the Vsync Slideshow problem.
After just two laggy frames on the GPU, the next frame in the queue is now over 40ms late from the CPU, and the GPU can never catch up because of vsync. But wait – notice that at the point Frame 3 is being flipped onto the user’s display Frame 4 is already rendered by the GPU. Wouldn’t it be cool if we could give Frame 3 a pass, and flip right to Frame 4?
Turns out we can, with a little creativity.
First, use the Per-Frame Vsync Control flag feature widely available in almost every modern GPU API. It allows setting arbitrary frames as vsync enabled vs disabled. Next, make sure the main thread (CPU) is using a Scheduled Fill model, and that it’s schedule is set to match the vsync rate reported by the GPU. The animated illustration earlier shows a Scheduled Fill main thread.
If you aren’t familiar with Scheduled Fill, you can read up on it here. The quick summary is that the Main Thread (CPU) implements its own simulated vsync timer that throttles the rate at which it pushes new frames to the GPU. If the main thread is using an opportunistic fill model, then as soon as the GPU discards frames, the CPU will opportunistically re-fill that queue with new frames. I call this ‘CPU Over-Submission.’ The GPU will be constantly saddled with work, and will end up submitting all frames as vsync disabled. The deltaTime behavior of the CPU will be all over the place. It would be effectively the same as a classic vsync-disabled flip policy.
With Scheduled Fill combined with Per-Frame Vsync Control, we can predict the behavior of the three performance profile situations:
GPU Bottleneck
CPU blocks due to full GPU queue, and falls into classic variable DeltaTime operation. GPU falls back to screen tearing mode to improve GPU-bound performance until queue is drained.
CPU Bottleneck
Nothing special here… low framerate and also low latency (which is expected for any engine pipeline)
No Bottlenecks
GPU should operate almost entirely without screen tearing, except in edge-case situations where cadence between CPU and GPU become out-of-sync. In practice, no more than 1 of these per minute should be observable.
As it happens, screen tearing is a pretty excellent way to cope with the occasional GPU performance bottleneck. Screen tearing only becomes evident to the eye when it occurs repeatedly. A one-off tear is nearly imperceptible (as refresh rates increase to and beyond 120hz, noticeability decreases further). It’s a low-impact alternative to a comparatively high-impact latency problem.
It would be extra cool to be able to could retroactively change the vsync flag for an already submitted, but not yet processed, command buffer. Modifying submitted command buffers directly is too risky, but instead what could be provided is an async-friendly (lock-free) override toggle that the GPU can sample at the point it executes logic for the next flip. If any GPU API/driver authors are reading: I’ve wanted that feature as of 10 yrs ago!
Wrapping it all up: Now you can Always Enable Vsync
The old rule-of-thumb for vsync is that vsync should be used any time the GPU is not a bottleneck, and vsync should be turned off any time the GPU is a bottleneck. Doing so at runtime requires that the GPU be aware of whether or not it’s the bottleneck. Given the assumption of a Scheduled Fill Main Thread, the GPU can finally make its own judgement about when it’s a bottleneck or not based on the state of its queue.
The beauty is that it solves the question of when to enable vsync. Once we can allow our GPU to decide when to use vsync on a per-frame basis, there’s no longer much need to wonder if we should enable or disable vsync globally. Just turn vsync on and if the GPU becomes a bottleneck, it will automatically switch to vsync-disabled behavior to help keep pace.
The only requirement is that the Main Thread run according to a schedule that matches the current hardware vsync. And these days that’s a pretty easy as all modern hardware sport high precisions core-coherent timing mechanisms suitable for matching a vsync timing as reported by your device driver to a very high accuracy. And to be honest, in my experience even vanilla millisecond resolution is fine enough to maintain perfectly smooth rendering at 120hz. (higher accuracy will benefit higher refresh rates)
The kryptonite for Scheduled Fill is when the actual timing of the device’s vsync is unknown. There’s literally no good solution in this scenario. In such case you pretty much need to fall back to a double-buffer on the GPU or force-disable vsync entirely, or accept that latency will probably be an issue. The upside is that such lack of info is extremely rare these days since most streaming media services also depend on having precise knowledge of device refresh rates in order to play videos.
This post was inspired by the Unity blog posted here, and represents actual work and research I’ve done for a variety of game and multimedia projects.
There are two issues endemic to the majority of flexible-framerate game engines (this being game engines that surface deltaTime rather than forcing the developer to ensure a certain FPS is always met, or engage in slowdown of gameplay when performance drops). These are:
wobbly DeltaTime values even as framerates are consistent
significant amount of added latency
Fixes for the first problem, such as the one proposed in the Unity Blog, tend to either hurt performance or increase latency or both (in various worst cases). Can we fix this wobbly deltaTime without sacrificing performance?
Yes, we can.
First – Why does DeltaTime wobble?
Modern game engines almost always experience wobbling of true DeltaTime even when VSYNC is enabled. In the simplest model, the wobbles are due to the fact that engines run expensive operations– Physics, AI, and Garbage Collection– on infrequent intervals. But in practice there tend to be lots of other reasons for high variance in time needed to update a given frame vs the previous one (setting up dynamic mesh data, texture updates, etc). As a result, a simple timeline ends up looking like this:
A hypothetical timeline of a wait-first CPU thread running intermittent Physics and Garbage Collection logics. Notice that Frame 3 took too long and caused a frame drop, but that the AVERAGE of Frames 2+3 combined is still less than the 8ms requirement.
One option is to throw more threads at the problem, but threads are technically non-trivial for many tasks, and as a general rule any thread added to a game engine pipeline should be entirely lock-free (architected using semaphores and atomics only). Threads can also add extra latency. Some things like Garbage Collection are mostly synchronous activities and block all managed CPU threads, making GC increasingly costly as more managed threads are added to the system.
All things considered, it’s just not realistic to keep every frame exactly within the confines of 8.3ms or even 16.7ms– and likely won’t be realistic for the foreseeable future. Therefore, engine developers have looked for lightweight ways for game engine pipelines to gracefully handle such hitching around those VSYNC intervals, without dropping the following frame.
How can we make it better? Let’s take a look at a series of hypothetical main thread timelines, starting with what I expect to be Unity 2019’s model, followed by a simplistic “wait” model as proposed for Unity 2020, and finally a debt-based model… and weigh the pros and cons of each.
Opportunistic Queue Fill, Context-Unaware
In this design, the main thread waits only when the GPU thread queue fills. Three frames of data are burst immediately to the GPU’s empty queue, and from there the main thread will opportunistically fill the queue whenever there’s room. Because the main thread is neither aware of local time or GPU sync, and because there are additional latencies in the pipeline between main thread and GPU, the pattern of waits becomes non-deterministic. If you’re lucky and your game has very little pipeline noise, the timing will be consistent. If you’re not so lucky, it’ll be all over the map, with DeltaTime variance from 2 to 15ms on any given frame.
The main advantage of this model is that it’s perfect games that don’t use VSYNC and have framerates flying all over the place. You can get a huge framerate boost from that 3-frame queue in an engine with a lot of bunky managed-language overhead and housekeeping like Unity. This is why I call this model the “benchmark model” – it maximizes total work performed, and makes benchmarks look great, but totally sucks for the VSYNC-enabled gamer experience.
Opportunistic Queue Fill, Vsync-Aware
Once again in this model, the main thread waits only when the GPU queue is filled. Three frames of data are burst immediately into the queue, and the pipeline becomes lock-step with a lagging or vsync-paced GPU, maintaining a consistent latency of approx. 3 frames. The main advantage this system has over the unaware variety is that the DeltaTime will be very consistent since it’s effectively tied to vsync. Additionally, it can reduce latency slightly by synchronizing with (maxQueuedFrames-1) rather than (maxQueuedFrames).
This model requires listening to the GPU’s actual vsync signal in order to operate robustly. This is a high-risk requirement as some devices and drivers do not always provide efficient access to such signals, or may even lie about them (DirectX/OpenGL famously). I’ve seen workarounds which attempt to poll command buffer statuses or implement other heuristics.
This model has no ability to compensate for the occasional long frames illustrated at the top of the blog. If the Main Thread takes an unusually long time, it immediately falls back on wobbly DeltaTime for that frame. In fact, if you have any bottleneck of the Main Thread/CPU then this model will behave identically to the Opportunistic Fill Context Unaware model!
This model is very prone to having syncopation issues when a game’s framerate is significantly below the nominal vsync target. For example, if a game is running 35 to 45fps on a 60hz display, this model will fall into a very noticeable hitch-hitch-hitch-hitch pattern. The context unaware version tends to have less issue since it’s so inconsistent, there’s no pattern in the hitching, and so it is less obvious to the human brain (our brains love to notice patterns). Engines using this model tend to lock entire games to lowest common denominator framerates when they can’t meet performance requirements for the higher denominator (30fps when targeting 60hz displays, for example).
Implementing an adaptive fixed framerate in this model is very challenging because of what I call the self-fulfilling prophecyproblem: Attempts to artificially throttle the framerate introduce a problem where-in you can’t easily know when it’s safe to revert back to an unthrottled framerate. In some cases, logic will be confused by its own attempts to throttle itself, and will get stuck running at the lowest allowed throttle setting indefinitely (select AAA games have actually shipped with this exact issue over the course of history). The rabbit hole of caveats and workarounds needed to solve that problem deserve their own blog, or maybe better, just short tweet saying “yea, don’t even bother trying.”
All that said — based on my experience, I would say this model is extremely popular and is probably the most widely used model for AA and AAA games on consoles. Note that most of the time such engines have to limit the GPU queue size to 2 frames when enabling VSYNC, for latency reasons. I’ve even seen some that jumped through impressive technical hoops to chunk things into smaller units than whole frames, in an attempt to retain triple buffer advantages without incurring the full brunt of latency. That micromanagement of individual portions of a scene gets very tricky very fast, and must always be carefully tailored to the architecture of a specific engine and the type of scene/level content being pushed through it. It’s a good strategy for porting finished games to consoles, and should be avoided for general-purpose middleware engines like Unity.
This model provides an additional level of abstraction away from GPU timing / vsync.
In this model, the main thread tracks the expected completion time for each frame, called the Virtual Timer Edge. This allows it to compensate for slow frames by utilizing an abstracted concept of DeltaTime which is remarkability similar to Unity’s own pre-existing Time.fixedTime implementation. If the framerate hitches, latency will increase only for the 2 or 3 subsequent frames, until the debt is repaid and the queue is returned to an empty state. During that time, DeltaTime will continue to return a consistent monotick value. Hitching above a certain threshold causes the system to fall back on the classic model of using variable DeltaTime to respond to dropped frames.
The caveat with this system is that you need to know your GPU’s vsync rate ahead of time. VSYNC timing tends to be pretty well documented and accessible across most devices. The actual physical vsync signal and GPU buffer status may be hard to get at, but knowing whether the cadence is 60hz or 59.94hz or 120hz is usually readily available information.
If your main thread (CPU) virtual timing out-paces the GPU then your main thread is going to end up doing exactly what we saw in the Opportunistic Fill Vsync-Aware model. It will fill the GPU with work and then wait, causing spikes in latency. For that reason, I like to take that value and the round down a bit, just enough that the CPU virtual timing interval is maybe 0.001ms shorter than the expected GPU interval. It’s better to hitch a frame every few minutes than filling the GPU’s queue and incurring constant high latency.
Implementing adaptive fixed-framerate behavior in this model is easier than for the Opportunistic Fill model. It’s at least plausible in this model, but I still consider is non-trivial. It can be done most effectively when targeting very specific hardware profiles — when porting and shipping a finished title for a specific console. For PCs, your best bet is to fall back on classic wobbly DeltaTime. And at some point, the burden is on the player to upgrade their PC or ensure a clean running environment.
Debt-Payback Economic Analogy
Yo, economics! The concept of this debt-payback is that you take a loan against the time-window allotted to next frame with the intent of paying that time back. In this way, a one-off slow frame can be handled without hitching the framerate or the game’s DeltaTime.
Opportunistic Fillers can also be viewed through the same lens of banking and economics: they are essentially pre-paid debt. The main thread incurs the debt immediately by filling GPU frame queue. A frame queue is like a prepaid credit card: you pay in a few frames as you start the game engine, and then cash them out only when needed– when a frame hitches or runs slow. This is a bad design because whatever time you’ve accrued on your Time-Debt Credit Card is surfaced to the user as LATENCY.
So what we’re doing is changing our paradigm: instead of treating the queue as a pre-paid credit card, we instead look at it as a proper extension of real credit. We take a loan out on time that we must pay back later. There is a caveat: if the debt grows too big then the debt needs to be cancelled (framerate bankruptcy?) and the system reverts back to variable DeltaTime logic.
Next Steps – Latency
The Scheduled FillModel solves our wobbly DeltaTime problem, and does so without introducing complex inter-thread or inter-hardware communication dependencies between CPU and the GPU’s VSYNC signal, without introducing hitching problems, and while also making it feasible to implement framerate throttling. It even gives us some tools to help tweak-past common GPU frame queue latency problems.
As it turns out, we can do a lot better on on latency, especially now that we have a Main Thread that’s operating on a Scheduled Fill Model.
Today I realized that I’d totally busted shadow behavior in Unity with one of my Orthographic Camera Tips, relating to some very interesting and unexpected behavior in Unity’s Stable Fit shadow mode (the default shadowing mode in a built-in render pipeline). I updated the entry accordingly. The quick jist is to use Close Fit shadows for Isometric gameplay projects because Stable Fit depends on a concept of camera depth that doesn’t exist in orthographic projections.
If you have Stable Fit enabled then this is the kind of behavior you’ll observe when changing the near plane of your orthographic camera. Pay close attention to the shadows under the sphere and along the wall edges
Edit -> Project Settings -> Close Fit – for the Orthographic Win!
During the process of accomplishing some modest goals in isometric gameplay, I’ve taken notes on mistakes and oversights I made while rigging my first orthographic cameras.
(shameless SEO paragraph) The orthographic camera is a staple of Isometric Gameplay Environment, such as both realtime (RTS) and especially turn based (TBS) strategy games. In this entry, I will explain how I implemented a flexible solution to controlling the camera, within the context of the Unity3D 2019 engine. (/end SEO)
Tip #1: Use X/Z coordinates for landscaping
The classic view of an isometric map has X and Y axis for the map surface, and Z as the Up Vector. This has a long history in isometrics being used primarily to depict 2D landscapes, and 2D coordinate systems naturally use X/Y mnemonics. Contrast this with most 3D engines, and Unity specifically, which treat the X and Z vectors as map surface, and the Y as the Up Vector.
Why does it matter? Because the majority of GameObjects, models, and Unity components (most notably the Physics components) default to the assumption that the Up Vector is the Y vector. More importantly, Unity has provided named constants for the Cardinal Vectors for us — Right (x), Up (Y), Forward (Z) — and if we re-orient the entire world to suit a paradigm where Z=Up, those named constants become confusing/misleading. We can remap the Physics Up Vector (aka gravity), and we can re-orient models at runtime using extra transforms (less efficient but not a show-stopper), but we cannot redefine Unity’s built in constants for Vector3.Up, Vector3.Right, and Vector3.Forward.
For this reason I strongly suggest orienting your world map structure such that XZ are the landscape coordinates and Y is UP. This will save you a ton of headaches later on. It’s fine to have XY data in your map assets, but do convert it to XZ data so that you can operate on it using the named constant vectors, and orient your models in an expected way.
Tip #2: Disable the Near Plane Clipping
Size and Clipping Planes define the Orthographic Frustum. For best results, the Near Plane should be set to a very large negative number.
Go ahead and set that Near Clipping Plane to -1000 (usually default value is 0.1 or something), or -100000. If you don’t do this, your camera’s just going to be a total pain to work with and it’s going to trick you into doing some bad things to try and work around seemingly nonsense object clipping behavior.
The technical rationale: The visible area of the screen (camera plane) is determined according to the size parameter, and that combined with the Near/Far clipping planes determine the frustum. The way the frustum works is pretty similar to perspective cameras, meaning that objects are drawn or clipped according to depth — which is a problem because Orthographic views do not have a valid concept of depth.
The only truly valid clipping for an orthographic view is defined by the rectangle of the screen on which the image is being projected. This clipping area is defined by the Sizeparameter alone. If an object, orthographically projected, lives within that space, it should be drawn. If it lives outside that space, it should be clipped. This is the intended operation of an orthographic projection.
Therefore, I personally disable both near and far clipping planes (eg, set them very large, eg -100000) though there could be some advantage for setting the far clipping plane to something lower, if you happen to be implementing an oblique orthographic view where the camera is set low on the horizon. Maybe. It’s a hard sell.
Tip #3: Turn off Stable Fit Shadow Projection
(I have only verified this option using the classic Built-In Render Pipeline – behavior for Universal Render Pipeline surely differs)
Edit -> Project Settings -> Close Fit – for the Orthographic Win!
If you have Stable Fit enabled then this is the kind of behavior you’ll observe when changing the near plane of your orthographic camera. Pay close attention to the shadows under the sphere and along the wall edges
Personally, I would recommend to turn off Stable Fit shadows for any style of game that isn’t a first person shooter or VR headset game. It’s especially bad for orthographic cameras because the cascade selection is based on some dodgy heuristic involving near and far clipping planes and our disabling the near plane on our Orthographic Camera totally breaks that heuristic.
(note: by my opinion Stable Fit is an outright cheat tailored to the FOV setting used for FPS and VR, where near plane is almost always 0 and far plane is also typically not so far from the near plane – it should probably be disabled for most third-person gameplay contexts as well as anything using orthographic perspectives. Why is it even the default setting?).
Secondly, disable cascades. Shadow cascades don’t work at all in orthographic cameras, since those depend on camera depth measurements, and orthographic cameras (surprise!) have no true concept of depth. If the cascades did do anything, it would be some nonsense behavior where shadows would cascade near the edges of the screen.
Finally, set Shadow Distance to a large value, like 10000. Again, none of these “camera depth” or “camera distance” concepts make sense in the context of an orthographic projection.
Tip #4: Editor Orthographic Scene View
Yeah, I know, this one’s going to be obvious for most folks. It took me a while to figure it out and, interestingly, Unity Editor does a much better job of switching between perspective and orthographic camera modes than the GameObject camera does. It even has this snazzy zoom transition thing! That’s probably worth a blog article to itself because you’ll notice it’s actually non-trivial to implement a similar behavior using the Camera GameObjects within the game itself.
(hint: I assume it animated the FOV of the perspective camera before hard-switching to the orthographic camera – combined with additional calculation to determine the orthographic size parameter as a function of perspective camera distance from an object, etc. – but I have not verified it)
If you come from a 3D gameplay or Unity background, this may feel obvious. But if you come from 2D/isometric gaming and art backgrounds, then you may have been thinking of orthographic projections in terms of the 2D angular components that form an orthographic illustration. What I mean by that is these 2D angles as illustrated by Wikipedia’s topics on Orthographic Projection (side).
Now is a good time to change that paradigm.
It’s true that you can develop Isometric gameplay within the context of what people have traditionally called 2.5D – a pseudo 3D environment where almost everything is 2D except a few specific bits and pieces needed to sell the illusion. This approach has its merits if you are developing your own 2.5D orthographic engine with a single fixed-function camera projection pipeline. We are not. We are using Unity, and Unity is all about full-3D environments, with full 3D models. These depend on 3D world space coordinates for our camera.
As a specific example, the modern physically-based material pipeline has specular and reflection features that depend on the camera’s full 3D position. If you don’t make an effort to handle your camera as though it exists in depth-honoring 3-dimensional space then you’ll likely have to avoid these effects entirely and use only Unity’s non-reflective legacy materials, such as Legacy Diffuse. While I’m sure it’s possible to develop a 2.5D isometric game in Unity, likely using 2D sprites instead of 3D models, this would be an exceptionally challenging task and is well outside the scope of today’s tips.
It can be helpful to have world axis orientation lines similar to those in the photo above. I made a lightweight component script to help me visualize those lines (extra helpful when paired with tiled floor texture)
Tip #6: Verify Your Camera Position in the Editor View!
Is your orthographic camera really where you think it is? If you’re basing it on what you see in the Game View (Camera Preview), then the answer is very likely “probably not”.
A perspective camera gives you depth perspective by which to help clue you in about the position of the camera within a world. If the camera is too close or too far from an object, well, you clearly can tell. The orthographic camera is not so fortunate: depth does not really exist and so there are no good clues about how far the camera is from the objects in view. In fact, given any specific view on your screen, there are literally an infinite number of possible camera orientations along a vector which will produce the exact same Camera View result. This can fool you into thinking you have a sensible position for your camera when, in fact, your camera could be a million miles away. Or it could be at 0,0,0.
Just for a bit of mind-warping awesome, consider for a moment the fact that you could — if so inclined — implement an entire Isometric Game without ever changing the height of your camera (Y=0 when following the Y=Up convention). To demonstrate this phenomenon, here’s an image of what might be a typical initial camera setup for an isometric view, pay attention to the Position of the camera:
right panel: y-axis (green) view from the side, with frustum is visualized by the white lines
Observe the camera position in the Scene view on the right panel – it’s nonsense, but the Game View and Camera Preview kinda look OK.
As you can see, the game view looks nearly identical even though the cameras are vastly different positions. It’s through this illusion that you can think you’ve rigged an orthographic camera correctly but then fail spectacularly when trying to execute a pan or spin around a variable pivot point on your isometric landscape.
So I strongly suggest paying close attention to your camera’s position and behavior inside Unity’s Editor View, at least until the general framework of your camera controls are rigged. Don’t just rely on what you see in that Camera Preview. It can save you time.
Tip #7: Orthographic isn’t a simplification, and some closing words
These tips are helpful for avoiding some pitfalls but are still a long way from rigging more advanced features of a good isometric camera. Isometric views have lots of advantages in terms of gameplay mechanics but are, generally speaking, an added layer of complexity from a game development perspective — unless you decide to limit your game to a handful of fixed-angle views a limited camera movement.
Unity3D compounds that problem, since it is built from the ground up in a manner that limits us from taking advantage of certain “orthographic hacks” that might otherwise be available to a custom 2.5D style game engine. If you are thinking of utilizing an orthographic camera as a means to simplify your game — for example making it “less 3D” and thus less mathematically intimidating — then probably don’t do it. Orthographics in Unity3D become the worst of both worlds: you have to rig everything up as though it’s full 3D perspective, and you have to then filter it through an unorthodox reprojection matrix that hides visual information from you about the state of your world.
It’s my experience so far that robust isometric view game development requires an even stronger grasp of 3D math and methodologies than games built on 3D perspective views. Sure, Orthographics offer gameplay and artistic advantages! But be sure to weigh those vs. the added complexity of rigging cameras that can play by all the weird rules of depth-aliasing.
(one of the many tools in the toolbox when I need to throw together some programmer art)
I started with one spiral fill
Then I created a new empty layer and added a more different spiral fill
Made some programmer art for my blog, using Paint dot Net. This is mainly worth documenting because you can implement this sort of effect procedurally using shaders, for psychedelic and potentially seizure-inducing particle effects. Procedural wave interference is good for various material textures and can also be a cool trick patterning for swarms of particles themselves.
In the case of Paint dot Net, an ideal way to implement wave interference is using the following tools:
Spiral Gradient Fill – Repeat Reflected mode.
Multiple Layers … and the main magic …
Layer Property “XOR”
Open Layer Properties and changed the blend mode to XOR. Oh so fancy.
Finally, I merged the layers and applied Gaussian Filter to reduce the contrast.
The Finished Product. Now if only it had a suitable purpose…
Polar Coordinate Space
In case you’re wondering how to make a spiral procedurally, the answer is to use the Polar Coordinate Space.
In Part One of this series, I explained the process by which I learned that Dot Product and Cross Product are a lousy way to implement a LookAt() or LookRotate() function. The Dot Product works well enough in 2D and for vectors which have some orthogonality to them, but is a poor tool for the purpose of calculating the angle between two free vectors in three-space. The cross product works well for setting up a perpendicular axis, which can be used in AngleAxis(), but the axis is mathematically generated, and has no care about how it might warp of skew unrelated orientations while rotating an object toward the target (famously, UP becomes corrupted, causing an object to twist as it rotates to face another object).
Not to be deterred, I dug deeper into the heart of things, and decided to try my hand at setting up a Rotation Matrix. As it turns out, starting here would have been easier. Much easier. Alas, I’ve been trying to take more top-down approaches to problem solving lately, and (in theory) matrices are lower level than quat rotations. I feel like this is a debatable perspective. Moving on….
What is a Rotation Matrix?
There are plenty of mathematical definitions, and the value of these is likely neigh outside of discovering paths for optimization of complex series of transformations (or re-inventing some very crude mathematical wheels, for which I refuse to do unless being paid to do so). So let’s do the simple awesome game developer definition…
A Unity-friendly column-major identity matrix.
First, a rotation matrix is composed of the three independent XYZ axes, arranged by COLUMN in Unity and OpenGL, and by ROW nearly everywhere else. This is a huge gotcha for many folks, but not something I’m going to spend much time on here.
To the right you can see values for Vector3.Right [1,0,0], Vector3.Up [0,1,0], and Vector3.Forward [0,0,1] plugged into the X, Y, and Z columns. This forms the Identity Matrix. These axes are the same as the vector normals of the rotated object. In the case of the Identity Matrix no rotation occurs. The normals of the object match the world axis.
To further illustrate the relationship of the Rotation Matrix to the normals/axes of the object in question, I threw together this quick script and attached it to an object in my scene:
void Update() {
var mat = Matrix4x4.Rotate(gameObject.transform.localRotation);
var r = mat.GetColumn(0);
var u = mat.GetColumn(1);
var f = mat.GetColumn(2);
// these lines will match precisely the axis arrows
// drawn by Unity Editor.
Debug.DrawRay(Vector3.zero, r, Color.red);
Debug.DrawRay(Vector3.zero, u, Color.green);
Debug.DrawRay(Vector3.zero, f, Color.blue);
}
These three arrows are the vectorsthat make up a Rotation Matrix.
This gives you a cool little manual/physical exercise, where you can take an object and point it at another object of interest using the Unity Editor, and then inspect the resulting values. And what we will notice – and at risk of stating the obvious – is that the Forward Vector always points directly toward the object. The question is, how do we determine the other vectors?
Making a LookRotation Matrix
As it turns out, a Rotation Matrix is an easy and ideal way to create an object’s rotation oriented according to any vector of our choosing. We have already the Forward Vector (Z), since we know exactly what we want our object to be facing: toward the target! The trick is choosing the other two vectors.
The short answer is that as long as the other two vectors form right angles with our Forward Vector, then the Forward Vector will aim true.
A rotation matrix with an Oblique angle along Green (UP) and Blue (Forward) vectors.
If we build a matrix of vectors that do not form right angles, then the actual Forward Vector of our rotated object will be skewed away from our intended target.
If this is confusing, then take a quick look to the image on the right. I set up a rotation matrix for this camera that is not well-formed – the green (up) and blue (forward) vectors are oblique, they do not form a right angle. Because of this, the true orientation of the camera, as shown by Unity Editor’s gizmos (arrows), is skewed away from the intended Forward vector. Indeed, Unity Editor’s gizmos show up as a set of right-angle vectors that also describe the exact same rotation. This is what we call “aliasing” — for any given orientation in three-space there are infinite Rotation Matrix possibilities that can produce that orientation.
You could even say that the Quaternion itself utilizes this redundant feature of rotation, thus compressing nine values into four values. A quaternion encodes a subset of fixed-angle possibilities into its four values, and disregards the billions of other oblique permutations. There’s a lot of trickery of imaginary math involved to accomplish that, but that is more-or-less a practical explanation of how a Quat does what it does.
I’d like a Triple Right-Angle With Cheese
As we can see, the golden rule for building a well-formed Rotation Matrix is to make sure all three vectors form right-angles with each other. More specifically, if we ensure all vectors are at right-angles, then it also ensures that each cardinal of our object is going to point exactly in the direction we specify.
And the rule for building right-angles? Perpendiculars, of course. Hello again, old friend…
… Sir Cross Product. (a recent knighthood was bestowed somewhere upon a fictional land which happened to be described predominantly by right triangles)
ALERT! the following table may depend on your game engine’s coordinate system. I’ve shown the cross products according to Unity’s Right-Hand Rule Coordinate system:
Expressed as up, forward, right…
Expressed as XYZ…
RIGHT
= CROSS(up, forward)
X = cross(Y,Z)
UP
= CROSS(forward, right)
Y = cross(Z,X)
FORWARD
= CROSS(right, up)
Z = cross(X,Y)
notice that each parallel is formed from the two vectors heading clockwise, if you were to draw x/y/z on the face of a clock.
We have a Forward Vector (Z), cool. So let’s start typing out what our Right Vector (X) will be…
var forward = (target - gameObject.transform.position).normalized;
var right = Vector3.Cross(??, forward); // crap, what's Y here?
What is the Up Vector (Y) here? So here’s the thing, it depends on what rotation axis we want to lock. Typically, the goal is to lock the Up Vector so that it matches the original object’s UP, where up is a vector that points through the top of a box, or through a model’s “head”, or which orients a camera so that the camera image matches the world’s concept of “up”. 99% of the time, this should be the same as the constant Vector3.Up. And that is our answer:
// note that object_cardinal_up is not the same as gameObject.transform.up!
// the cardinal up vector for the object is usually a constant
// (unless the object's model is not well-formed) and should
// usually be Vector3.Up.
var object_cardinal_up = Vector3.up;
var right = Vector3.Cross(object_cardinal_up, forward);
Next up, calculate the Up Vector (Y) for the matrix, which is in no way related to the Up vector of the object. Indeed, it is simply the cross product (right-angled) of the two vectors we already have:
var up = Vector3.Cross(forward, side);
Finally, assign them to a matrix and apply the rotation:
var mat = Matrix4x4.identity;
mat.SetColumn(0, right);
mat.SetColumn(1, up);
mat.SetColumn(2, forward);
// Unity gotcha #677: remember to always assign to
// localRotation, unless you have a really well-
// understood (aka "good") reason not to.
gameObject.transform.localRotation = mat.rotation;
Boom, Done. We just implemented our own LookAt() / LookRotation() ! When I put it all together, it looks like this:
void LookAt(Vector3 target) {
var object_up = Vector3.up;
var forward = (target - gameObject.transform.position).normalized;
var right = Vector3.Cross(object_up, forward).normalized;
var up = Vector3.Cross(forward, right).normalized;
var mat = Matrix4x4.identity;
mat.SetColumn(0, right);
mat.SetColumn(1, up);
mat.SetColumn(2, forward);
gameObject.transform.localRotation = mat.rotation;
}
Verification Time
It’s not enough to just test this code in a static scene with some pre-set angles. There’s too many ways I could have messed up my math in a way in which it just happens to work when some axis is aligned to some other axis. You see this a lot when people post answers on Unity Support Forums, where an answer will classically only work given some specific orientation of objects in some specific scene. Maybe the camera has to be facing forward, or maybe the object has to be at 0,0,0, etc. So to verify my snippet for the scope of this blog, I made a sample that includes:
a moving camera
a moving target
And this is the final result of my custom hand-crafted LookAt() function, following a sphere being kicked around by some cheap physics…
Finally – for a more robust test, I could have moved the camera long multiple axes to verify behavior in all four quadrants of Cartesian Space. In this case I opted not to, because the visual result of that test looks silly since I didn’t model out a bottom area to my game board.
Some Useful Link(s)
Lots of math sites on the internet. A lot of them basically suck. I’ve linked here the ones that helped me better understand this problem and how to solve it intelligently and with gameplay constraints in mind.
In my game world I have a camera, and as it turns out, I often need it to look at something important. If you want to orient something so that it’s pointing at another object, how would you do that?
In Unity3D, the problem is solved for us:
// make the gameObject point at 0,0,0
gameObject.transform.LookAt(Vector3.zero);
But I’m going to put my “for the sake of academics” hat on (it looks almost brand new… maybe I should wear it out and about a little more often). How would one solve this problem without calling LookAt() or Quaternion.LookRotation()?
Dot Product to Find the Angle
As plenty of online materials will tell you, we use the DOT PRODUCT to find the angle between two vectors. Specifically we find the arc-cosine of the dot product of the vector normals. Or in software engineering context:
// C-style pseudo-syntax
float angle_rads = arccos(
dot(
normalized(camera_position),
normalized(target_position)
)
);
// written in Unity 3D it looks like:
var angle = Mathf.acos(
Vector3.Dot(
camera.transform.position.normalized,
target.transform.position.normalized
)
) * Mathf.Rad2Deg;
Ok. great, except… dot product is entirely the wrong way to solve this problem.
It turns out that the dot product is notoriously fickle and probably the wrong tool for the vast majority angular problems. It is useful for determining the quadrant in which an angle lives – eg, if two vectors are acute, obtuse, positive, or negative. It is also useful for decomposing vectors into their axis-aligned components and solving 2D angular problems (such as projecting a lightsource or reflection onto a plane – the plane is 2D). But for the purpose of calculating a precise angle in three-space? A dot product is a whole lotta “meh”.
You can feasibly make the dot product work for this problem by decomposing the vectors into their axis-aligned components, taking several individual Euler readings, and applying rotations in a deterministic order. Tons of ill-advised forum posts and StackExchange answers are barking up this tree. I’m interested in something more rooted. More clever. Less “meh.”
AngleAxis to the Rescue?
The first thing I learned while tackling 3D gamedev: When dealing in three-space, XYZ angle values are not the tool you should be using. Of course not, you might say! Use Quaternions!
Yes, that’s only half the answer tho. What also matters is how you create those Quaternions.
The thing with Quats is that they’re really just a useful internal representation when concatenating angles. Euler angles fail mainly because they run into axis-aliasing issues at the 0, 180, and 360 degree positions – it’s this aliasing that causes gimbal lock and also makes it very cumbersome to interpolate between two orientations. Quats work around that restriction nicely from a SIMD-enabled Computer Science perspective (4 floats == SIMD word). But Quats still have plenty of issues when rigging gameplay, if all you’re doing is invoking Quaternion.Euler(x,y,z). As a result, I changed my paradigm and try to use Unity’s Quaternion.AngleAxis based orientations to solve three-space orientation problems.
The typical use case for AngleAxis is to imagine a line through the object you need to rotate, perpendicular to the target orientation, and then that line becomes your rotation axis. The common case is that an object has been rotated, and you need to spin it according to its new rotation. To spin that like a top, imagine an axis cut through it like so:
Default Orientation
Rotated somewhat on the Z (forward) Axis.
Our imaginary rotated UP axis
The typical way an Axis is calculated is by taking the original UP or FORWARD axis for the object and transforming that in the same way the object was transformed. The following snippet can be attached to the Update() script for the above cylinder:
// tilt our game object - almost always use localRotation!
gameObject.transform.localRotation = Quaternion.Euler(0,15,40);
// re-orient the UP axis to match the object's orientation
// (the resulting axis looks like
var axis = gameObject.transform.rotation * Vector3.up;
// remember, quats need to be multiplied in REVERSE ORDER.
gameObject.transform.localRotation = Quaternion.AngleAxis(spinAngle, axis) * gameObject.transform.localRotation;
// animate spinAngle, cuz it's fun.
spinAngle += Time.deltaTime * 100.0f;
The end result. I could watch this all day.
So now the question on my mind – can we use this to rotate a CAMERA to look at a TARGET?
Time to Doodle some Triangles
At risk of over-simplifying: In order to use AngleAxis, we need an ANGLE and an AXIS. In theory, we already have the tool we need to get our angle: the arccos of the dot product of the position normals (explained above). So how do we find a suitable axis?
The first step in most (maybe all?) of these kinds of math problems is to try and build either triangles or parallelograms out of the scene. From there, a whole array of proofs (aka, “math shortcuts”) may become available to help solve any given three-space problem.
In the case of our world camera, we don’t need to care about what the camera is looking at currently. All that matters is where it’s looking when rotation=0,0,0 and what it will look at relative to its current position. Smooth interpolation between current and target orientations can be handled as a separate problem later. When reset to rot=0,0,0, a camera is facing along the forward vector. So let’s draw our camera and then augment some triangles along that forward vector just to get a better idea what’s happening:
Fig 1
Fig 2
Fig 3
1. The camera, facing forward, over a playing field (floor quad) 2. A triangle stretching from the point of interest to the camera, oriented along the forward vector 3. And this is the angle we’re looking for…
What I’ve doodled here is a right-angle triangle between the forward vector of our camera, and the thing we want our camera to look at. What I want to be able to do is create a virtual axis through the camera on which I can spin the camera, such that it will eventually cast its gaze upon the point of interest. For this purpose I drew a new doodle from the perspective and forward views:
If I spin this perpendicular axis, the camera will eventually point toward the Point of Interest. And the tool for getting perpendiculars is the Cross Product.
So it looks like what I want is the perpendicular to the vector that traces the path from the camera to the point of interest. The tool for calculating perpendiculars is the Cross Product.
Cross Product to Find the Perpendicular
The Inputs into this tool will be the Forward Vector and the position of the camera relative to the point of interest, defined as (camera_position - target_position):
var relative_position = camera.transform.position - target_pos;
var cross = Vector3.Cross(-Vector3.forward, relative_position.normalized);
// and now earlier dot-product calculation along same vectors:
var angle = Mathf.acos(
Vector3.Dot(-Vector3.forward, relative_position.normalized)
) * Mathf.Rad2Deg;
// Finally, using AngleAxis to glue these together
camera.transform.localRotation = Quaternion.AngleAxis(angle, cross.normalized);
As an added benefit, I discover a handful of form/exchange answers that mention this very process. Could it be that I’ve done a good thing?
Let’s apply the code to a simple test scene that rotates the camera around a quad, and see what we get:
well that looks like shit. The green lines in the right hand scene view is the cross product visualized.
So what went wrong? Short answer is, the cross product as a tool for calculating the axis of AngleAxis is quite limited, the dot product is a shitty tool for calculating the angle between vectors, and combining them creates a synergy of unrelenting headaches.
The long answer is that cross product may encounter a failure condition any time you have a coordinate system with three degrees of freedom, because the third degree — in our case the spin of the camera — becomes part of the equation in solving for the target. The math just doesn’t care that the UP vector is being tossed on its head (literally, in this case). All it cares is that the target is the focus of the rotation. Added to that, the dot product fails at various perpendicular axis situations, causing more things to flip on their head, or out of view, depending.
A popular workaround online is to recalculate the UP vector using the cross between the newly oriented RIGHT (x) and FORWARD (z) vectors… basically spinning the camera back toward UP after the initial AngleAxis corrupted it. But that doesn’t fix the problems with the dot product. Working around those within the context of three degrees of freedom is very cumbersome… ugh.
At this point I decided this is going nowhere, fast, and that I need to take a step back and look at this problem from a different ang… perspective.(sorry, pun, today is not your day.)
If we interviewed a carpenter in the same way we interview software engineers these days, I imagine it would be something like this…
Interviewee: Normally I’d use a sawhorse and blocks, some safety equipment– gloves, goggles– start with a broad-cut with a circular saw and follow up with a Jig, add some pre-drilled holes, fasteners, glues, followed by…
Interviewer: Sounds good, but for this interview, we have this pre-cut wood, and this pocket knife. And a stapler. Prove to me you can build a house. You have 40 mins.
Interviewee: …
(thought bubble)
Interviewer: Just, you know, do the best you can / show your process.