<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Eric's Blog]]></title><description><![CDATA[- Game - Engine - Tool  - Math -]]></description><link>https://lxjk.github.io</link><generator>RSS for Node</generator><lastBuildDate>Tue, 10 Mar 2020 01:33:58 GMT</lastBuildDate><atom:link href="https://lxjk.github.io/rss/" rel="self" type="application/rss+xml"/><ttl>60</ttl><item><title><![CDATA[Translate GLSL to SPIR-V for Vulkan at Runtime]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div id="toc" class="toc">
<div id="toctitle" class="title">Table of Contents</div>
<ul class="sectlevel1">
<li><a href="#_get_glslang_libs">Get Glslang Libs</a></li>
<li><a href="#_use_glslang">Use Glslang</a></li>
<li><a href="#_build_glslang_libs">Build Glslang Libs</a></li>
</ul>
</div>
<div class="paragraph">
<p>As I&#8217;m porting my game engine from OpenGL to Vulkan, I encountered the need of translating exisiting glsl shaders (with changes for Vulkan) to spir-v. The recommanded way is to use offline toolchain glslangValidator, provided in Vulkan SDK (<a href="https://vulkan.lunarg.com/doc/sdk/1.1.92.1/windows/spirv_toolchain.html">more info here</a>), then simply load the converted spir-v shader in your program. However for development it would be really handy if you can translate shaders at runtime, so you don&#8217;t need to run the offline tool everytime you change your shader. Moreover you can support shader hot reload so that you can change shaders in glsl while the game is running.</p>
</div>
<div class="paragraph">
<p>Of course you can build a process to monitor shader changes and run tools automatically, but that would be a topic down the road.</p>
</div>
<div class="paragraph">
<p>To be able to translate glsl at runtime, you would need to setup and link Glslang libraries to your program.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_get_glslang_libs">Get Glslang Libs</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Glslang comes with Vulkan SDK under <strong>$(VK_SDK_PATH)\glslang</strong>, but there is no binary provided. The easiest way to get binaries is actually to download from their github:</p>
</div>
<div class="paragraph">
<p><a href="https://github.com/KhronosGroup/glslang/releases/tag/master-tot" class="bare">https://github.com/KhronosGroup/glslang/releases/tag/master-tot</a></p>
</div>
<div class="paragraph">
<p>Use the above link to download the latest, there are both debug and release libraries. However if your required compiler version is higher than the libraries they compiled with, then unfortunately you would have to build the binaries yourself. We will talk about that in the later section.</p>
</div>
<div class="paragraph">
<p>Once downloaded, setup include path and library path in your project, then add all libraries under glslang/lib to Linker dependencies.</p>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_use_glslang">Use Glslang</h2>
<div class="sectionbody">
<div class="paragraph">
<p>There is an example in Vulkan SDK: <strong>$(VK_SDK_PATH)\Samples\API-Samples\utils\util.cpp</strong>. Search for "glslang" and you would see how it is used. The following code is pretty much the same as the sample:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "glslang/SPIRV/GlslangToSpv.h"

struct SpirvHelper
{
	static void Init() {
		glslang::InitializeProcess();
	}

	static void Finalize() {
		glslang::FinalizeProcess();
	}

	static void InitResources(TBuiltInResource &amp;Resources) {
		Resources.maxLights = 32;
		Resources.maxClipPlanes = 6;
		Resources.maxTextureUnits = 32;
		Resources.maxTextureCoords = 32;
		Resources.maxVertexAttribs = 64;
		Resources.maxVertexUniformComponents = 4096;
		Resources.maxVaryingFloats = 64;
		Resources.maxVertexTextureImageUnits = 32;
		Resources.maxCombinedTextureImageUnits = 80;
		Resources.maxTextureImageUnits = 32;
		Resources.maxFragmentUniformComponents = 4096;
		Resources.maxDrawBuffers = 32;
		Resources.maxVertexUniformVectors = 128;
		Resources.maxVaryingVectors = 8;
		Resources.maxFragmentUniformVectors = 16;
		Resources.maxVertexOutputVectors = 16;
		Resources.maxFragmentInputVectors = 15;
		Resources.minProgramTexelOffset = -8;
		Resources.maxProgramTexelOffset = 7;
		Resources.maxClipDistances = 8;
		Resources.maxComputeWorkGroupCountX = 65535;
		Resources.maxComputeWorkGroupCountY = 65535;
		Resources.maxComputeWorkGroupCountZ = 65535;
		Resources.maxComputeWorkGroupSizeX = 1024;
		Resources.maxComputeWorkGroupSizeY = 1024;
		Resources.maxComputeWorkGroupSizeZ = 64;
		Resources.maxComputeUniformComponents = 1024;
		Resources.maxComputeTextureImageUnits = 16;
		Resources.maxComputeImageUniforms = 8;
		Resources.maxComputeAtomicCounters = 8;
		Resources.maxComputeAtomicCounterBuffers = 1;
		Resources.maxVaryingComponents = 60;
		Resources.maxVertexOutputComponents = 64;
		Resources.maxGeometryInputComponents = 64;
		Resources.maxGeometryOutputComponents = 128;
		Resources.maxFragmentInputComponents = 128;
		Resources.maxImageUnits = 8;
		Resources.maxCombinedImageUnitsAndFragmentOutputs = 8;
		Resources.maxCombinedShaderOutputResources = 8;
		Resources.maxImageSamples = 0;
		Resources.maxVertexImageUniforms = 0;
		Resources.maxTessControlImageUniforms = 0;
		Resources.maxTessEvaluationImageUniforms = 0;
		Resources.maxGeometryImageUniforms = 0;
		Resources.maxFragmentImageUniforms = 8;
		Resources.maxCombinedImageUniforms = 8;
		Resources.maxGeometryTextureImageUnits = 16;
		Resources.maxGeometryOutputVertices = 256;
		Resources.maxGeometryTotalOutputComponents = 1024;
		Resources.maxGeometryUniformComponents = 1024;
		Resources.maxGeometryVaryingComponents = 64;
		Resources.maxTessControlInputComponents = 128;
		Resources.maxTessControlOutputComponents = 128;
		Resources.maxTessControlTextureImageUnits = 16;
		Resources.maxTessControlUniformComponents = 1024;
		Resources.maxTessControlTotalOutputComponents = 4096;
		Resources.maxTessEvaluationInputComponents = 128;
		Resources.maxTessEvaluationOutputComponents = 128;
		Resources.maxTessEvaluationTextureImageUnits = 16;
		Resources.maxTessEvaluationUniformComponents = 1024;
		Resources.maxTessPatchComponents = 120;
		Resources.maxPatchVertices = 32;
		Resources.maxTessGenLevel = 64;
		Resources.maxViewports = 16;
		Resources.maxVertexAtomicCounters = 0;
		Resources.maxTessControlAtomicCounters = 0;
		Resources.maxTessEvaluationAtomicCounters = 0;
		Resources.maxGeometryAtomicCounters = 0;
		Resources.maxFragmentAtomicCounters = 8;
		Resources.maxCombinedAtomicCounters = 8;
		Resources.maxAtomicCounterBindings = 1;
		Resources.maxVertexAtomicCounterBuffers = 0;
		Resources.maxTessControlAtomicCounterBuffers = 0;
		Resources.maxTessEvaluationAtomicCounterBuffers = 0;
		Resources.maxGeometryAtomicCounterBuffers = 0;
		Resources.maxFragmentAtomicCounterBuffers = 1;
		Resources.maxCombinedAtomicCounterBuffers = 1;
		Resources.maxAtomicCounterBufferSize = 16384;
		Resources.maxTransformFeedbackBuffers = 4;
		Resources.maxTransformFeedbackInterleavedComponents = 64;
		Resources.maxCullDistances = 8;
		Resources.maxCombinedClipAndCullDistances = 8;
		Resources.maxSamples = 4;
		Resources.maxMeshOutputVerticesNV = 256;
		Resources.maxMeshOutputPrimitivesNV = 512;
		Resources.maxMeshWorkGroupSizeX_NV = 32;
		Resources.maxMeshWorkGroupSizeY_NV = 1;
		Resources.maxMeshWorkGroupSizeZ_NV = 1;
		Resources.maxTaskWorkGroupSizeX_NV = 32;
		Resources.maxTaskWorkGroupSizeY_NV = 1;
		Resources.maxTaskWorkGroupSizeZ_NV = 1;
		Resources.maxMeshViewCountNV = 4;
		Resources.limits.nonInductiveForLoops = 1;
		Resources.limits.whileLoops = 1;
		Resources.limits.doWhileLoops = 1;
		Resources.limits.generalUniformIndexing = 1;
		Resources.limits.generalAttributeMatrixVectorIndexing = 1;
		Resources.limits.generalVaryingIndexing = 1;
		Resources.limits.generalSamplerIndexing = 1;
		Resources.limits.generalVariableIndexing = 1;
		Resources.limits.generalConstantMatrixVectorIndexing = 1;
	}

	static EShLanguage FindLanguage(const vk::ShaderStageFlagBits shader_type) {
		switch (shader_type) {
		case vk::ShaderStageFlagBits::eVertex:
			return EShLangVertex;
		case vk::ShaderStageFlagBits::eTessellationControl:
			return EShLangTessControl;
		case vk::ShaderStageFlagBits::eTessellationEvaluation:
			return EShLangTessEvaluation;
		case vk::ShaderStageFlagBits::eGeometry:
			return EShLangGeometry;
		case vk::ShaderStageFlagBits::eFragment:
			return EShLangFragment;
		case vk::ShaderStageFlagBits::eCompute:
			return EShLangCompute;
		default:
			return EShLangVertex;
		}
	}

	static bool GLSLtoSPV(const vk::ShaderStageFlagBits shader_type, const char *pshader, std::vector&lt;unsigned int&gt; &amp;spirv) {
		EShLanguage stage = FindLanguage(shader_type);
		glslang::TShader shader(stage);
		glslang::TProgram program;
		const char *shaderStrings[1];
		TBuiltInResource Resources = {};
		InitResources(Resources);

		// Enable SPIR-V and Vulkan rules when parsing GLSL
		EShMessages messages = (EShMessages)(EShMsgSpvRules | EShMsgVulkanRules);

		shaderStrings[0] = pshader;
		shader.setStrings(shaderStrings, 1);

		if (!shader.parse(&amp;Resources, 100, false, messages)) {
			puts(shader.getInfoLog());
			puts(shader.getInfoDebugLog());
			return false;  // something didn't work
		}

		program.addShader(&amp;shader);

		//
		// Program-level processing...
		//

		if (!program.link(messages)) {
			puts(shader.getInfoLog());
			puts(shader.getInfoDebugLog());
			fflush(stdout);
			return false;
		}

		glslang::GlslangToSpv(*program.getIntermediate(stage), spirv);
		return true;
	}
};</code></pre>
</div>
</div>
<div class="paragraph">
<p>Then when you actually use it:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">void InitVulkan() {
	// ...

	// init glslang
	SpirvHelper::Init();
}

void ShutdownVulkan() {
	// ...

	// shut down glslang
	SpirvHelper::Finalize();
}

bool LoadShader(vk::ShaderStageFlagBits stage, const char* shaderCode) {

	std::vector&lt;unsigned int&gt; shaderCodeSpirV;
	bool success = SpirvHelper::GLSLtoSPV(stage, shaderCode, shaderCodeSpirV);

	// ...
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now if it is compiled and succeeded, congratulations you are done!</p>
</div>
<div class="paragraph">
<p>If you get the a similar error as the following, then you need to build the glslang libraries yourself, and let&#8217;s keep going.</p>
</div>
<div class="listingblock">
<div class="content">
<pre>Error	LNK2038	mismatch detected for '_MSC_VER': value '1800' doesn't match value '1900' in xxx.obj</pre>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_build_glslang_libs">Build Glslang Libs</h2>
<div class="sectionbody">
<div class="paragraph">
<p>First we need to get CMake and Python 3.x, see details on <a href="https://github.com/KhronosGroup/glslang/blob/master/README.md" class="bare">https://github.com/KhronosGroup/glslang/blob/master/README.md</a></p>
</div>
<div class="paragraph">
<p>Then use CMake to generate Glslang projects. Here the source code path is <strong>$(VK_SDK_PATH)\glslang</strong> and we will generate the project to <strong>$(VK_SDK_PATH)\glslang\build</strong>.
Make sure you select the correct target platform, especially if you are building for x64. If you are using cmake-gui, click "Configure" and select as following.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/glsl2spirv/001.png" alt="001.png" width="504">
</div>
</div>
<div class="paragraph">
<p>Now you can generate the project. If you get a similar error as the following:</p>
</div>
<div class="listingblock">
<div class="content">
<pre>  Could NOT find PythonInterp: Found unsuitable version "1.4", but required is at least "3"</pre>
</div>
</div>
<div class="paragraph">
<p>It means you have another version of python installed, and you need to point CMake to the correct python.
If you are using cmake-gui, change python path as the following:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/glsl2spirv/002.png" alt="002.png" width="660">
</div>
</div>
<div class="paragraph">
<p>Generate again, and you should see the correct project got generated. Now open the generated project/solution, and build "ALL BUILD".</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/glsl2spirv/003.png" alt="003.png" width="351">
</div>
</div>
<div class="paragraph">
<p>Then copy over all the libraries under following paths (and Release version of course)</p>
</div>
<div class="listingblock">
<div class="content">
<pre>$(VK_SDK_PATH)\glslang\build\External\spirv-tools\source\Debug
$(VK_SDK_PATH)\glslang\build\External\spirv-tools\source\opt\Debug
$(VK_SDK_PATH)\glslang\build\glslang\Debug
$(VK_SDK_PATH)\glslang\build\glslang\OSDependent\Windows\Debug
$(VK_SDK_PATH)\glslang\build\hlsl\Debug
$(VK_SDK_PATH)\glslang\build\OGLCompilersDLL\Debug
$(VK_SDK_PATH)\glslang\build\SPIRV\Debug</pre>
</div>
</div>
<div class="paragraph">
<p>With all these efforts, you got the glslang libs you need. Compile your program again and it should be up and running!</p>
</div>
</div>
</div>]]></description><link>https://lxjk.github.io/2020/03/10/Translate-GLSL-to-SPIRV-for-Vulkan-at-Runtime.html</link><guid isPermaLink="true">https://lxjk.github.io/2020/03/10/Translate-GLSL-to-SPIRV-for-Vulkan-at-Runtime.html</guid><category><![CDATA[Vulkan]]></category><dc:creator><![CDATA[Eric Zhang]]></dc:creator><pubDate>Tue, 10 Mar 2020 00:00:00 GMT</pubDate></item><item><title><![CDATA[SSE SIMDを用いて４ｘ４の逆行列の高速アルゴリズム]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div id="toc" class="toc">
<div id="toctitle" class="title">Table of Contents</div>
<ul class="sectlevel2">
<li><a href="#__">トランスフォーム行列</a></li>
<li><a href="#___2">一般の逆行列</a></li>
<li><a href="#___3">付録その一</a></li>
<li><a href="#___4">付録その二</a></li>
</ul>
</div>
<div class="paragraph">
<p><a href="https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html">English Verison</a></p>
</div>
<div class="paragraph">
<p>始まる前に、実際に必要となる逆行列は「一般の行列」かどうかを考えてください。</p>
</div>
<div class="paragraph">
<p>私は自作ゲームエンジンの数学系ライブラリを書く時、逆行列の問題を考えました。ゲームや３Dアプリケーションでは、オブジェクトのトランスフォーム情報は４ｘ４行列で記録されています。このような、位置、回転、スケールの三要素から作成している行列は、この文章で「トランスフォーム行列」と呼びます。トランスフォーム行列は一般の行列より2倍早い逆行列の求め方があります。この文章の前半は先ずトランスフォーム行列について話しましょう。後半はSIMD命令を用いた一般の４ｘ４行列の逆行列の求め方を説明します。最後にこのアルゴリズムのパフォーマンスとよく使われる数学系ライブラリUE4、Eigen、DirectX Mathなどを比較させていただきます。</p>
</div>
<div class="paragraph">
<p>この文章の行列は全て行優先になります。データレイアウトの説明は行優先の方が簡単であり、他の数学系ライブラリとの参照も出来ます。逆行列の求め方について、行優先と列優先は同じです（\(A^{-1}=((A^{T})^{-1})^{T}\) のため）。私と同じく列優先が好みだとしたら、付録に列優先バージョンのソースコードも用意しています。</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="__">トランスフォーム行列</h3>
<div class="paragraph">
<p>トランスフォーム行列はこのように定義されています：</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left( \begin{matrix} a\vec{X} &amp; 0 \\ b\vec{Y} &amp; 0 \\ c\vec{Z} &amp; 0 \\ \vec{T} &amp; 1 \\ \end{matrix} \right) = \left( \begin{matrix} aX_0 &amp; aX_1 &amp; aX_2 &amp; 0 \\ bY_0 &amp; bY_1 &amp; bY_2 &amp; 0 \\ cZ_0 &amp; cZ_1 &amp; cZ_2 &amp; 0 \\ T_0 &amp; T_1 &amp; T_2 &amp; 1 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>第4行の最初の3つの成分は位置\(\vec{T}\)です。左上の３ｘ３小行列はスケール回転行列で、その3行ともスケール変換した回転軸であります。つまり\(\vec{X}\cdot\vec{Y}=\vec{X}\cdot\vec{Z}=\vec{Y}\cdot\vec{Z}=0\)、\(\left|\vec{X}\right|=\left|\vec{Y}\right|=\left|\vec{Z}\right|=1\)、そしてスケールは\((a,b,c)\)です。</p>
</div>
<div class="paragraph">
<p>ゲームに使う行列はほとんどこの形になります。例えば、\(M\)はローカルからワールドへの座標変換、\(\vec{X}\), \(\vec{Y}\), \(\vec{Z}\)はローカル座標の軸と考えます。ローカル空間内の位置\(\vec{P}(P_0,P_1,P_2)\)をローカル座標からワールド座標へ変換する時、次の式を使います：</p>
</div>
<div class="stemblock">
<div class="content">
\[\vec{P'}=P_0a\vec{X}+P_1b\vec{Y}+P_2c\vec{Z}+\vec{T}\]
</div>
</div>
<div class="paragraph">
<p>これはベクトル\(\vec{P}\)を拡張して、4元ベクトル\(\vec{P}(P_0,P_1,P_2,1)\)と行列\(M\)の積と同じです。なら逆行列\(M^{-1}\)は何を意味するでしょうか？この例だと、ワールドからローカルへの座標変換です。つまり、拡張したワールド座標\(\vec{P'}\)と\(M^{-1}\)の積を求めれば、結果はローカル座標の\(\vec{P}\)になるはずです。では行列なしでどうすればワールド空間の位置\(\vec{P'}\)をワールド座標からローカル座標へ変換するでしょう？先ずローカル座標の原点（つまり\(\vec{T}\)）を引き、そしてローカルの軸との内積を求め、最後に逆スケール変換します：</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{P}&amp;=(\frac{1}{a}(\vec{P'}-\vec{T})\cdot\vec{X},\frac{1}{b}(\vec{P'}-\vec{T})\cdot\vec{Y},\frac{1}{c}(\vec{P'}-\vec{T})\cdot\vec{Z})\\
&amp;=(\frac{1}{a}\vec{P'}\cdot\vec{X},\frac{1}{b}\vec{P'}\cdot\vec{Y},\frac{1}{c}\vec{P'}\cdot\vec{Z})-(\frac{1}{a}\vec{T}\cdot\vec{X},\frac{1}{b}\vec{T}\cdot\vec{Y},\frac{1}{c}\vec{T}\cdot\vec{Z})
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>この式があれば、下記のように逆行列を直接書く事ができます：</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{-1}=\left( \begin{matrix} \frac{1}{a}\vec{X} &amp; \frac{1}{b}\vec{Y} &amp; \frac{1}{c}\vec{Z} &amp; \vec{0} \\ -\vec{T}\cdot\frac{1}{a}\vec{X} &amp; -\vec{T}\cdot\frac{1}{b}\vec{Y} &amp; -\vec{T}\cdot\frac{1}{c}\vec{Z} &amp; 1 \\ \end{matrix} \right) = \left( \begin{matrix} \frac{1}{a}X_0 &amp; \frac{1}{b}Y_0 &amp; \frac{1}{c}Z_0 &amp; 0 \\ \frac{1}{a}X_1 &amp; \frac{1}{b}Y_1 &amp; \frac{1}{c}Z_1 &amp; 0 \\ \frac{1}{a}X_2 &amp; \frac{1}{b}Y_2 &amp; \frac{1}{c}Z_2 &amp; 0 \\ -\vec{T}\cdot\frac{1}{a}\vec{X} &amp; -\vec{T}\cdot\frac{1}{b}\vec{Y} &amp; -\vec{T}\cdot\frac{1}{c}\vec{Z} &amp; 1 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>トランスフォーム行列\(M\)は最初の3行が互いに垂直という嬉しい性質がありますので、\(MM^{-1}=I\)は簡単に確認できます。元の３ｘ３回転小行列を転置し、逆スケール変換し、最後に位置のベクトルと逆スケール変換した軸の内積を計算すると逆行列が出来上がります。</p>
</div>
<div class="paragraph">
<p>もしスケールを軸のベクトルに収めれば（つまり\(\left|\vec{X}\right|=a\),\(\left|\vec{Y}\right|=b\),\(\left|\vec{Z}\right|=c\)）以下のように一般型になります。</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left( \begin{matrix} \vec{X} &amp; 0 \\ \vec{Y} &amp; 0 \\ \vec{Z} &amp; 0 \\ \vec{T} &amp; 1 \\ \end{matrix} \right), M^{-1}=\left( \begin{matrix} \frac{1}{{\left|\vec{X}\right|}^{2}}\vec{X} &amp; \frac{1}{{\left|\vec{Y}\right|}^{2}}\vec{Y} &amp; \frac{1}{{\left|\vec{Z}\right|}^{2}}\vec{Z} &amp; \vec{0} \\ -\vec{T}\cdot\frac{1}{{\left|\vec{X}\right|}^{2}}\vec{X} &amp; -\vec{T}\cdot\frac{1}{{\left|\vec{Y}\right|}^{2}}\vec{Y} &amp; -\vec{T}\cdot\frac{1}{{\left|\vec{Z}\right|}^{2}}\vec{Z} &amp; 1 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>逆スケール変換の部分を注目してください。分母の方はスケールではなく、スケールの2乗になります。平方根を求めなくてもいいので、これは朗報です。もしスケールが全て１であれば、この式はよりシンプルな形になります。</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{-1}=\left( \begin{matrix} \vec{X} &amp; \vec{Y} &amp; \vec{Z} &amp; \vec{0} \\ -\vec{T}\cdot\vec{X} &amp; -\vec{T}\cdot\vec{Y} &amp; -\vec{T}\cdot\vec{Z} &amp; 1 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>理論はここまでにしましょう。次はソースコードに行きます。まずは行列の定義です。</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">__declspec(align(16)) struct Matrix4
{
public:
	union
	{
		float m[4][4];
		__m128 mVec[4];
	};
};</code></pre>
</div>
</div>
<div class="paragraph">
<p>intrinsics関数を使う前に、shuffleとswizzleに関する幾つのマクロを定義します。このあとのソースコードを読みやすくするためであり、特殊なshuffle はより早い命令を使いうためでもあります。</p>
</div>
<div class="paragraph">
<p>（_mm_shuffle_epi32命令を提案する<strong>Stefan Kaps</strong>さんに感謝です！）</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#define MakeShuffleMask(x,y,z,w)           (x | (y&lt;&lt;2) | (z&lt;&lt;4) | (w&lt;&lt;6))

// vec(0, 1, 2, 3) -&gt; (vec[x], vec[y], vec[z], vec[w])
#define VecSwizzleMask(vec, mask)          _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(vec), mask))
#define VecSwizzle(vec, x, y, z, w)        VecSwizzleMask(vec, MakeShuffleMask(x,y,z,w))
#define VecSwizzle1(vec, x)                VecSwizzleMask(vec, MakeShuffleMask(x,x,x,x))
// special swizzle
#define VecSwizzle_0022(vec)               _mm_moveldup_ps(vec)
#define VecSwizzle_1133(vec)               _mm_movehdup_ps(vec)

// return (vec1[x], vec1[y], vec2[z], vec2[w])
#define VecShuffle(vec1, vec2, x,y,z,w)    _mm_shuffle_ps(vec1, vec2, MakeShuffleMask(x,y,z,w))
// special shuffle
#define VecShuffle_0101(vec1, vec2)        _mm_movelh_ps(vec1, vec2)
#define VecShuffle_2323(vec1, vec2)        _mm_movehl_ps(vec2, vec1)</code></pre>
</div>
</div>
<div class="paragraph">
<p>下記はスケール１のトランスフォーム行列の逆行列を求める関数です。</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">// Requires this matrix to be transform matrix, NoScale version requires this matrix be of scale 1
inline Matrix4 GetTransformInverseNoScale(const Matrix4&amp; inM)
{
	Matrix4 r;

	// transpose 3x3, we know m03 = m13 = m23 = 0
	__m128 t0 = VecShuffle_0101(inM.mVec[0], inM.mVec[1]); // 00, 01, 10, 11
	__m128 t1 = VecShuffle_2323(inM.mVec[0], inM.mVec[1]); // 02, 03, 12, 13
	r.mVec[0] = VecShuffle(t0, inM.mVec[2], 0,2,0,3); // 00, 10, 20, 23(=0)
	r.mVec[1] = VecShuffle(t0, inM.mVec[2], 1,3,1,3); // 01, 11, 21, 23(=0)
	r.mVec[2] = VecShuffle(t1, inM.mVec[2], 0,2,2,3); // 02, 12, 22, 23(=0)

	// last line
	r.mVec[3] =                       _mm_mul_ps(r.mVec[0], VecSwizzle1(inM.mVec[3], 0));
	r.mVec[3] = _mm_add_ps(r.mVec[3], _mm_mul_ps(r.mVec[1], VecSwizzle1(inM.mVec[3], 1)));
	r.mVec[3] = _mm_add_ps(r.mVec[3], _mm_mul_ps(r.mVec[2], VecSwizzle1(inM.mVec[3], 2)));
	r.mVec[3] = _mm_sub_ps(_mm_setr_ps(0.f, 0.f, 0.f, 1.f), r.mVec[3]);

	return r;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>これは一番早い関数です。必要な計算は転置と幾つの内積しかありません。もしスケールを加われば、割り算に処理時間が増やしますが、それでもまた早い方です。スケールの２乗の計算について、ちょっとしたトリックがあります。いずれ３ｘ３回転行列を転置するので、スケールの２乗の計算を後回しして、転置行列の結果を利用し、一気に３つの軸のスケールの２乗を計算することが出来ます。</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#define SMALL_NUMBER		(1.e-8f)

// Requires this matrix to be transform matrix
inline Matrix4 GetTransformInverse(const Matrix4&amp; inM)
{
	Matrix4 r;

	// transpose 3x3, we know m03 = m13 = m23 = 0
	__m128 t0 = VecShuffle_0101(inM.mVec[0], inM.mVec[1]); // 00, 01, 10, 11
	__m128 t1 = VecShuffle_2323(inM.mVec[0], inM.mVec[1]); // 02, 03, 12, 13
	r.mVec[0] = VecShuffle(t0, inM.mVec[2], 0,2,0,3); // 00, 10, 20, 23(=0)
	r.mVec[1] = VecShuffle(t0, inM.mVec[2], 1,3,1,3); // 01, 11, 21, 23(=0)
	r.mVec[2] = VecShuffle(t1, inM.mVec[2], 0,2,2,3); // 02, 12, 22, 23(=0)

	// (SizeSqr(mVec[0]), SizeSqr(mVec[1]), SizeSqr(mVec[2]), 0)
	__m128 sizeSqr;
	sizeSqr =                     _mm_mul_ps(r.mVec[0], r.mVec[0]);
	sizeSqr = _mm_add_ps(sizeSqr, _mm_mul_ps(r.mVec[1], r.mVec[1]));
	sizeSqr = _mm_add_ps(sizeSqr, _mm_mul_ps(r.mVec[2], r.mVec[2]));

	// optional test to avoid divide by 0
	__m128 one = _mm_set1_ps(1.f);
	// for each component, if(sizeSqr &lt; SMALL_NUMBER) sizeSqr = 1;
	__m128 rSizeSqr = _mm_blendv_ps(
		_mm_div_ps(one, sizeSqr),
		one,
		_mm_cmplt_ps(sizeSqr, _mm_set1_ps(SMALL_NUMBER))
		);

	r.mVec[0] = _mm_mul_ps(r.mVec[0], rSizeSqr);
	r.mVec[1] = _mm_mul_ps(r.mVec[1], rSizeSqr);
	r.mVec[2] = _mm_mul_ps(r.mVec[2], rSizeSqr);

	// last line
	r.mVec[3] =                       _mm_mul_ps(r.mVec[0], VecSwizzle1(inM.mVec[3], 0));
	r.mVec[3] = _mm_add_ps(r.mVec[3], _mm_mul_ps(r.mVec[1], VecSwizzle1(inM.mVec[3], 1)));
	r.mVec[3] = _mm_add_ps(r.mVec[3], _mm_mul_ps(r.mVec[2], VecSwizzle1(inM.mVec[3], 2)));
	r.mVec[3] = _mm_sub_ps(_mm_setr_ps(0.f, 0.f, 0.f, 1.f), r.mVec[3]);

	return r;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>この関数の最初と最後の部分はNoScaleバージョンと全く同じです。その間に、スケールの２乗を計算します。絶対必要ではないですが、０に近い数字との除算を回避するテストもあります。</p>
</div>
</div>
<div class="sect2">
<h3 id="___2">一般の逆行列</h3>
<div class="paragraph">
<p>一般の逆行列の計算はかなり難しくなります。このあと使う理論の詳細は英語版のWikiページを参照してください。
<a href="https://en.wikipedia.org/wiki/Invertible_matrix">逆行列（Invertible Matrix）</a>、 <a href="https://en.wikipedia.org/wiki/Adjugate_matrix">随伴行列（Adjugate Matrix）</a>、 <a href="https://en.wikipedia.org/wiki/Determinant#Relation_to_eigenvalues_and_trace">行列式（Determinant）</a>、 <a href="https://en.wikipedia.org/wiki/Trace_(linear_algebra)">トレース（Trace）</a>。</p>
</div>
<div class="paragraph">
<p>その中の幾つは後で紹介します。以下の説明で使うブロック行列方法はIntelさんの <a href="https://software.intel.com/en-us/articles/optimized-matrix-library-for-use-with-the-intel-pentiumr-4-processors-sse2-instructions/">Optimized Matrix Library</a>と同じです。</p>
</div>
<div class="paragraph">
<p>４ｘ４行列は4つの２ｘ２小行列で分割表示することが出来ます。２ｘ２行列は2つの利点があります。一つ目は逆行列と行列式の計算は簡単です。二つ目はそのデータを全て128ビット幅のベクトルレジスタに収められることで、高速計算が可能です。</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)=\left( \begin{matrix} A_0 &amp; A_1 &amp; B_0 &amp; B_1 \\ A_2 &amp; A_3 &amp; B_2 &amp; B_3 \\ C_0 &amp; C_1 &amp; D_0 &amp; D_1 \\ C_2 &amp; C_3 &amp; D_2 &amp; D_3 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>下記の式を導出するために、幾つを仮定します：小行列\(A\)と\(D\)が正則、\(C\)と\(D\)は可換であります（\(CD=DC\)）。（<strong>wychmaster</strong>さんの指摘に感謝です。）かなり強い仮定ですが、あとの導出をしやすくするためです 。付録では仮定なしだとしても導出の結果は成立することを証明します。</p>
</div>
<div class="paragraph">
<p>ブロック行列の逆行列の公式は以下のようになります：</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}^{-1}&amp;=\left( \begin{matrix} A^{-1}+A^{-1}B(D-CA^{-1}B)^{-1}CA^{-1} &amp; -A^{-1}B(D-CA^{-1}B)^{-1} \\ -(D-CA^{-1}B)^{-1}CA^{-1} &amp; (D-CA^{-1}B)^{-1} \\ \end{matrix} \right)\\
&amp;=\left( \begin{matrix} (A-BD^{-1}C)^{-1} &amp; -(A-BD^{-1}C)^{-1}BD^{-1} \\ -D^{-1}C(A-BD^{-1}C)^{-1} &amp; D^{-1}+D^{-1}C(A-BD^{-1}C)^{-1}BD^{-1} \\ \end{matrix} \right)
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>実際に使うのは、一つ目の第２行と二つ目の第１行を融合した行列です。</p>
</div>
<div class="stemblock">
<div class="content">
\[{\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}^{-1}=\left( \begin{matrix} (A-BD^{-1}C)^{-1} &amp; -(A-BD^{-1}C)^{-1}BD^{-1} \\ -(D-CA^{-1}B)^{-1}CA^{-1} &amp; (D-CA^{-1}B)^{-1} \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>初見ではこのやり方は意味不明と思うかもしれませんね。例えば、一つ目の式について、２つの２ｘ２逆行列（\(A^{-1}\)と\((D-CA^{-1} B)^{-1}\)）を計算すればいいのに、どうしてわざわざ二つ目の式を混ぜるですか？それは適切な導出より、よりシンプルな形になれるからです。この２つの式の行列は実際全く同じものですので、どっちを使っても構いません。</p>
</div>
<div class="paragraph">
<p>ここから、幾つの定義を紹介します。行列\(A\)の随伴行列はこのように定義しています：\(A\operatorname{adj}(A)=\left|A\right|I\)、 \(\left|A\right|\)は\(A\)の行列式です。この文章では、随伴行列を略し\(A^{\#}=\operatorname{adj}(A)\)と記載します。\(A^{-1}=\frac{1}{\left|A\right|}A^{\#}\)によって、逆行列の計算を随伴行列の計算に変換することが出来ます。２ｘ２行列の随伴行列は以下のようになります：</p>
</div>
<div class="stemblock">
<div class="content">
\[A^{\#}={\left( \begin{matrix} A_0 &amp; A_1 \\ A_2 &amp; A_3 \\ \end{matrix} \right)}^{\#}=\left( \begin{matrix} A_3 &amp; -A_1 \\ -A_2 &amp; A_0 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>２ｘ２随伴行列の性質：\((AB)^{\#}=B^{\#}A^{\#}\)、\((A^{\#})^{\#}=A\)、\((cA)^{\#}=cA^{\#}\)。</p>
</div>
<div class="paragraph">
<p>２ｘ２行列式について、下記の性質を使います：\(\left|A\right|={A_0}{A_3}-{A_1}{A_2}\), \(\left|-A\right|=\left|A\right|\)、\(\left|AB\right|=\left|A\right|\left|B\right|\)、\(\left|A+B\right|=\left|A\right| + \left|B\right| + \operatorname{tr}(A^{\#}{B})\)。</p>
</div>
<div class="paragraph">
<p>トレースの性質：\(\operatorname{tr}(AB)=\operatorname{tr}(BA)\)、\(\operatorname{tr}(-A)=-\operatorname{tr}(A)\)。</p>
</div>
<div class="paragraph">
<p>最後にブロック行列\(M={\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}\)の行列式の性質：</p>
</div>
<div class="stemblock">
<div class="content">
\[\left|M\right|=\left|A\right|\left|D-CA^{-1}B\right|=\left|D\right|\left|A-BD^{-1}C\right|=\left|AD-BC\right|\]
</div>
</div>
<div class="paragraph">
<p>導出に使う性質しか書いていませんが、詳しくは前のWikiページを参照してください。</p>
</div>
<div class="paragraph">
<p>\(M^{-1}={\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}^{-1}={\left( \begin{matrix} X &amp; Y \\ Z &amp; W \\ \end{matrix} \right)}\)と表示して、ブロック行列の左上側から始めましょう。</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
X&amp;=(A-BD^{-1}C)^{-1}\\
&amp;=\frac{1}{\left|A-BD^{-1}C\right|}(A-\frac{1}{\left|D\right|}BD^{\#}C)^{\#}\\
&amp;=\frac{1}{\left|D\right|\left|A-BD^{-1}C\right|}(\left|D\right|A-BD^{\#}C)^{\#}\\
&amp;=\frac{1}{\left|M\right|}(\left|D\right|A-B(D^{\#}C))^{\#}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>同じ方法で、右下側は下記の式になります：</p>
</div>
<div class="stemblock">
<div class="content">
\[W=(D-CA^{-1}B)^{-1}=\frac{1}{\left|M\right|}(\left|A\right|D-C(A^{\#}B))^{\#}\]
</div>
</div>
<div class="paragraph">
<p>\(D^{\#}C\)と\(A^{\#}B\)は括弧で囲まれている理由は後に明かします。</p>
</div>
<div class="paragraph">
<p>次は左上側\(X\)の導出結果を利用して、右上側を導出します。</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
Y&amp;=-(A-BD^{-1}C)^{-1}BD^{-1}\\
&amp;=-\frac{1}{\left|M\right|\left|D\right|}(\left|D\right|A-B(D^{\#}C))^{\#}(BD^{\#})\\
&amp;=-\frac{1}{\left|M\right|\left|D\right|}(\left|D\right|A-B(D^{\#}C))^{\#}(DB^{\#})^{\#}\\
&amp;=-\frac{1}{\left|M\right|\left|D\right|}(\left|D\right|DB^{\#}A-DB^{\#}B(D^{\#}C))^{\#}\\
&amp;=-\frac{1}{\left|M\right|\left|D\right|}(\left|D\right|D(A^{\#}B)^{\#}-\left|D\right|\left|B\right|C))^{\#}\\
&amp;=\frac{1}{\left|M\right|}(\left|B\right|C-D(A^{\#}B)^{\#})^{\#}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>同じ方法で、左下側は下記の式になります：</p>
</div>
<div class="stemblock">
<div class="content">
\[Z=-(D-CA^{-1}B)^{-1}CA^{-1}=\frac{1}{\left|M\right|}(\left|C\right|B-A(D^{\#}C)^{\#})^{\#}\]
</div>
</div>
<div class="paragraph">
<p>右上側の式は\(A^{\#}B\)の計算結果を再利用するため、\(B^{\#}A\)の部分を\((A^{\#}B)^{\#}\)に変えます。以上４つの式を合わせて：</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{-1}={\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}^{-1}=\frac{1}{\left|M\right|}{\left( \begin{matrix} (\left|D\right|A-B(D^{\#}C))^{\#} &amp; (\left|B\right|C-D(A^{\#}B)^{\#})^{\#} \\ (\left|C\right|B-A(D^{\#}C)^{\#})^{\#} &amp; (\left|A\right|D-C(A^{\#}B))^{\#} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="paragraph">
<p>ここまで読んたら明白だと思いますが、必要な計算関数は２ｘ２行列の乗算、そして随伴行列との乗算：\(AB\)、\(A^{\#}B\)と\(AB^{\#}\)。２ｘ２随伴行列の計算は前にも記述しましたが、この場合は乗算とまとめて計算する方が使う命令数が少ないです。計算結果を展開して、順序を調整するだけです、例えば：</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
A^{\#}B&amp;={\left( \begin{matrix} A_3 &amp; -A_1 \\ -A_2 &amp; A_0 \\ \end{matrix} \right)}{\left( \begin{array}{} B_0 &amp; B_1 \\ B_2 &amp; B_3 \\ \end{array} \right)}\\
&amp;={\left( \begin{array}{} {A_3}{B_0}-{A_1}{B_2} &amp;{A_3}{B_1}-{A_1}{B_3} \\ {A_0}{B_2}-{A_2}{B_0} &amp; {A_0}{B_3}-{A_2}{B_1} \\ \end{array} \right)}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>以下はその３つの関数のソースコードです：</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">// for row major matrix
// we use __m128 to represent 2x2 matrix as A = | A0  A1 |
//                                              | A2  A3 |
// 2x2 row major Matrix multiply A*B
__forceinline __m128 Mat2Mul(__m128 vec1, __m128 vec2)
{
	return
		_mm_add_ps(_mm_mul_ps(                     vec1, VecSwizzle(vec2, 0,3,0,3)),
		           _mm_mul_ps(VecSwizzle(vec1, 1,0,3,2), VecSwizzle(vec2, 2,1,2,1)));
}
// 2x2 row major Matrix adjugate multiply (A#)*B
__forceinline __m128 Mat2AdjMul(__m128 vec1, __m128 vec2)
{
	return
		_mm_sub_ps(_mm_mul_ps(VecSwizzle(vec1, 3,3,0,0), vec2),
		           _mm_mul_ps(VecSwizzle(vec1, 1,1,2,2), VecSwizzle(vec2, 2,3,0,1)));

}
// 2x2 row major Matrix multiply adjugate A*(B#)
__forceinline __m128 Mat2MulAdj(__m128 vec1, __m128 vec2)
{
	return
		_mm_sub_ps(_mm_mul_ps(                     vec1, VecSwizzle(vec2, 3,0,3,0)),
		           _mm_mul_ps(VecSwizzle(vec1, 1,0,3,2), VecSwizzle(vec2, 2,1,2,1)));
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>ここにもう一つのトリックがあります。例えば\(\left|D\right|A-B(D^{\#}C)\)のような２ｘ２小行列を計算したあと、通常その随伴行列\(X=(\left|D\right|A-B(D^{\#}C))^{\#}\)を求めますが、ここではその随伴行列の計算を後回しして、最終結果のデータを４ｘ４行列にを入れる時にまとめて計算するの方が効率良くなります。逆行列を求める関数の最後の部分を見れば分かるでしょう。</p>
</div>
<div class="paragraph">
<p>最後に残ったのは行列式です。２ｘ２行列式は簡単ですが、４ｘ４行列式の方が問題です。前述した行列式性質を思い出してください：</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\left|M\right|&amp;=\left|AD-BC\right|\\
&amp;=\left|AD\right|+\left|-BC\right|+\operatorname{tr}((AD)^{\#}(-BC))\\
&amp;=\left|A\right|\left|D\right|+\left|B\right|\left|C\right|-\operatorname{tr}(D^{\#}A^{\#}BC)\\
&amp;=\left|A\right|\left|D\right|+\left|B\right|\left|C\right|-\operatorname{tr}((A^{\#}B)(D^{\#}C))
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>この式にある行列\(A^{\#}B\)と\(D^{\#}C\)は計算済みです。そして２ｘ２行列の乗算のトレースを展開すれば：</p>
</div>
<div class="stemblock">
<div class="content">
\[\operatorname{tr}(AB)={A_0}{B_0}+{A_1}{B_2}+{A_2}{B_1}+{A_3}{B_3}\]
</div>
</div>
<div class="paragraph">
<p>shuffleと内積で、簡単な命令文でできます。</p>
</div>
<div class="paragraph">
<p>全てのパズルを解いたので、４ｘ４逆行列を求める関数は下記のようになります：</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">// Inverse function is the same no matter column major or row major
// this version treats it as row major
inline Matrix4 GetInverse(const Matrix4&amp; inM)
{
	// use block matrix method
	// A is a matrix, then i(A) or iA means inverse of A, A# (or A_ in code) means adjugate of A, |A| (or detA in code) is determinant, tr(A) is trace

	// sub matrices
	__m128 A = VecShuffle_0101(inM.mVec[0], inM.mVec[1]);
	__m128 B = VecShuffle_2323(inM.mVec[0], inM.mVec[1]);
	__m128 C = VecShuffle_0101(inM.mVec[2], inM.mVec[3]);
	__m128 D = VecShuffle_2323(inM.mVec[2], inM.mVec[3]);

#if 0
	__m128 detA = _mm_set1_ps(inM.m[0][0] * inM.m[1][1] - inM.m[0][1] * inM.m[1][0]);
	__m128 detB = _mm_set1_ps(inM.m[0][2] * inM.m[1][3] - inM.m[0][3] * inM.m[1][2]);
	__m128 detC = _mm_set1_ps(inM.m[2][0] * inM.m[3][1] - inM.m[2][1] * inM.m[3][0]);
	__m128 detD = _mm_set1_ps(inM.m[2][2] * inM.m[3][3] - inM.m[2][3] * inM.m[3][2]);
#else
	// determinant as (|A| |B| |C| |D|)
	__m128 detSub = _mm_sub_ps(
		_mm_mul_ps(VecShuffle(inM.mVec[0], inM.mVec[2], 0,2,0,2), VecShuffle(inM.mVec[1], inM.mVec[3], 1,3,1,3)),
		_mm_mul_ps(VecShuffle(inM.mVec[0], inM.mVec[2], 1,3,1,3), VecShuffle(inM.mVec[1], inM.mVec[3], 0,2,0,2))
	);
	__m128 detA = VecSwizzle1(detSub, 0);
	__m128 detB = VecSwizzle1(detSub, 1);
	__m128 detC = VecSwizzle1(detSub, 2);
	__m128 detD = VecSwizzle1(detSub, 3);
#endif

	// let iM = 1/|M| * | X  Y |
	//                  | Z  W |

	// D#C
	__m128 D_C = Mat2AdjMul(D, C);
	// A#B
	__m128 A_B = Mat2AdjMul(A, B);
	// X# = |D|A - B(D#C)
	__m128 X_ = _mm_sub_ps(_mm_mul_ps(detD, A), Mat2Mul(B, D_C));
	// W# = |A|D - C(A#B)
	__m128 W_ = _mm_sub_ps(_mm_mul_ps(detA, D), Mat2Mul(C, A_B));

	// |M| = |A|*|D| + ... (continue later)
	__m128 detM = _mm_mul_ps(detA, detD);

	// Y# = |B|C - D(A#B)#
	__m128 Y_ = _mm_sub_ps(_mm_mul_ps(detB, C), Mat2MulAdj(D, A_B));
	// Z# = |C|B - A(D#C)#
	__m128 Z_ = _mm_sub_ps(_mm_mul_ps(detC, B), Mat2MulAdj(A, D_C));

	// |M| = |A|*|D| + |B|*|C| ... (continue later)
	detM = _mm_add_ps(detM, _mm_mul_ps(detB, detC));

	// tr((A#B)(D#C))
	__m128 tr = _mm_mul_ps(A_B, VecSwizzle(D_C, 0,2,1,3));
	tr = _mm_hadd_ps(tr, tr);
	tr = _mm_hadd_ps(tr, tr);
	// |M| = |A|*|D| + |B|*|C| - tr((A#B)(D#C)
	detM = _mm_sub_ps(detM, tr);

	const __m128 adjSignMask = _mm_setr_ps(1.f, -1.f, -1.f, 1.f);
	// (1/|M|, -1/|M|, -1/|M|, 1/|M|)
	__m128 rDetM = _mm_div_ps(adjSignMask, detM);

	X_ = _mm_mul_ps(X_, rDetM);
	Y_ = _mm_mul_ps(Y_, rDetM);
	Z_ = _mm_mul_ps(Z_, rDetM);
	W_ = _mm_mul_ps(W_, rDetM);

	Matrix4 r;

	// apply adjugate and store, here we combine adjugate shuffle and store shuffle
	r.mVec[0] = VecShuffle(X_, Y_, 3,1,3,1);
	r.mVec[1] = VecShuffle(X_, Y_, 2,0,2,0);
	r.mVec[2] = VecShuffle(Z_, W_, 3,1,3,1);
	r.mVec[3] = VecShuffle(Z_, W_, 2,0,2,0);

	return r;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>おまけとして、４ｘ４行列式と随伴行列を求め方もこの関数にあります。</p>
</div>
<div class="paragraph">
<p>小行列の行列式を計算する時、４つの行列式をまとめて一気に計算する方法を書いたけど、私のCPUでは、別々で計算したあと_mm_set1_ps命令を使ってベクトルレジスタにロードする方が早いです。どうしてと言うと、まとめて計算してもあとで４つのshuffleを使った別々のレジスタに分離しないといけませんので、まとめて計算は近道ではないと思います。実際に使う時両方ともパフォーマンスを確認した上で選んでください。</p>
</div>
<div class="paragraph">
<p>（<strong>編集</strong>：新しいCPU（Coffee Lake）でテストした結果、まとめて計算するのは別々で計算するより２０％早いです。）</p>
</div>
<div class="paragraph">
<p>もう一つは、トレースを計算する時、２つの_mm_hadd_ps命令を使ってベクトルレジスタの４つの成分の加算し、その結果を４つの成分に保存することにします。他の方法もありますが、テストの結果、パフォーマンスはほぼ同じですので、一番命令数少ない方法を使いました。こちらも同じくパフォーマンスを確認した上で方法を選んでください。</p>
</div>
<div class="paragraph">
<p>では肝心なパフォーマンスはどうなっていますか？以下の数字は２０１７年８月でテストした結果です。Intel Haswellで計算を１０００万回を回して、__rdtsc命令を使ってサイクルをカウントします。全ての方法を５回テストして、平均値を求めます。</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/matrixinverse/fig1.jpg" alt="fig1.jpg" width="600">
</div>
<div class="title">Figure 1</div>
</div>
<div class="paragraph">
<p>最初の３列はここで紹介した３つの関数。一般の逆行列を求める関数のSIMDバージョンの時間はfloatバージョンの半分以下（44%）です。そして、もし行列はトランスフォーム行列だとしたら、四分の一以下（21%）になります。計算対象の情報を知るほど、機械の計算量が減ります。</p>
</div>
<div class="paragraph">
<p>最後にこの質問を考えてみましょう：行列の逆行列を求める必要がありますか？もし計算の目的は空間の位置また方向の逆座標変換（トランスフォーム行列の逆行列を保存して他の計算に使う必要がない）だとしたら、逆座標変換の関数を書いてください。逆行列を求める関数より早いです。この文章を通じてどの関数を使うまた書くのか、そしてどうすればパフォーマンスが上がるのかを紹介出来たら幸いです。</p>
</div>
</div>
<div class="sect2">
<h3 id="___3">付録その一</h3>
<div class="paragraph">
<p>残った仕事はまた一つあります。この方法は仮定なしでも成立するのを証明することです。先ずは何を仮定したのかを振り返ってみましょう：</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)=\left( \begin{matrix} A_0 &amp; A_1 &amp; B_0 &amp; B_1 \\ A_2 &amp; A_3 &amp; B_2 &amp; B_3 \\ C_0 &amp; C_1 &amp; D_0 &amp; D_1 \\ C_2 &amp; C_3 &amp; D_2 &amp; D_3 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>小行列\(A\)と\(D\)が正則、\(C\)と\(D\)は可換（\(CD=DC\)）を仮定します。</p>
</div>
<div class="paragraph">
<p>次の例を考えてください：</p>
</div>
<div class="stemblock">
<div class="content">
\[M'=\left( \begin{matrix} 1 &amp; 0 &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; 1 &amp; 0 \\ 0 &amp; 1 &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; 0 &amp; 1 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>先仮定した条件は一つも成立しませんが、\(M'\)は正則です。（逆行列は行列自身\((M')^{-1}=M'\)）もし前述の方法を使って、\(M'\)の逆行列を計算したら、意外と正解が出ます。これは偶然ではありません。ここから、この計算は４ｘ４正則行列なら成立することを証明します。</p>
</div>
<div class="paragraph">
<p>以下は計算に使った式です：</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{-1}={\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}^{-1}=\frac{1}{\left|M\right|}{\left( \begin{matrix} (\left|D\right|A-B(D^{\#}C))^{\#} &amp; (\left|B\right|C-D(A^{\#}B)^{\#})^{\#} \\ (\left|C\right|B-A(D^{\#}C)^{\#})^{\#} &amp; (\left|A\right|D-C(A^{\#}B))^{\#} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="stemblock">
<div class="content">
\[\left|M\right|=\left|A\right|\left|D\right|+\left|B\right|\left|C\right|-\operatorname{tr}((A^{\#}B)(D^{\#}C))\]
</div>
</div>
<div class="paragraph">
<p>随伴行列の定義\(M^{-1}=\frac{1}{\left|M\right|}M^{\#}\)により、先ずはこの式を証明します。</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{\#}={\left( \begin{matrix} X &amp; Y \\ Z &amp; W \\ \end{matrix} \right)}={\left( \begin{matrix} (\left|D\right|A-B(D^{\#}C))^{\#} &amp; (\left|B\right|C-D(A^{\#}B)^{\#})^{\#} \\ (\left|C\right|B-A(D^{\#}C)^{\#})^{\#} &amp; (\left|A\right|D-C(A^{\#}B))^{\#} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="paragraph">
<p>左上の小行列\(X=(\left|D\right|A-B(D^{\#}C))^{\#}\)から始めましょう。</p>
</div>
<div class="paragraph">
<p>\(M\)の随伴行列は余因子行列\(C\)の転置行列であり（\(M^{\#}=C^{T}\)）、その余因子行列は\(C=((-1)^{i+j} M_{ij})\)と定義されています。\(M_{ij}\)は\(M\)からi行j列を取り除いて得られる小行列（(i,j)-minor）の行列式。つまり\(M^{\#}= ((-1)^{j+i}M_{ji})\)になります。「<strong>転置</strong>」のとこを覚えてください。</p>
</div>
<div class="paragraph">
<p>詳細は随伴行列（Adjugate Matrix）の英語版wikiページに参照してください。</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
X&amp;={\left( \begin{matrix} \left| \begin{matrix} A_3 &amp; B_2 &amp; B_3 \\ C_1 &amp; D_0 &amp; D_1 \\ C_3 &amp; D_2 &amp; D_3 \end{matrix} \right| &amp; -\left| \begin{matrix} A_1 &amp; B_0 &amp; B_1 \\ C_1 &amp; D_0 &amp; D_1 \\ C_3 &amp; D_2 &amp; D_3 \end{matrix} \right| \\ -\left| \begin{matrix} A_2 &amp; B_2 &amp; B_3 \\ C_0 &amp; D_0 &amp; D_1 \\ C_2 &amp; D_2 &amp; D_3 \end{matrix} \right| &amp; \left| \begin{matrix} A_0 &amp; B_0 &amp; B_1 \\ C_0 &amp; D_0 &amp; D_1 \\ C_2 &amp; D_2 &amp; D_3 \end{matrix} \right| \\ \end{matrix} \right)}\\
&amp;={\left( \begin{matrix} A_3\left|D\right|-B_2(D_3C_1-D_1C_3) + B_3(D_2C_1-D_0C_3) &amp; -(A_1\left|D\right|-B_0(D_3C_1-D_1C_3) + B_1(D_2C_1-D_0C_3)) \\ -(A_2\left|D\right|-B_2(D_3C_0-D_1C_2) + B_3(D_2C_0-D_0C_2)) &amp; A_0\left|D\right|-B_0(D_3C_0-D_1C_2) + B_1(D_2C_0-D_0C_2) \\ \end{matrix} \right)}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>こちらの計算結果</p>
</div>
<div class="stemblock">
<div class="content">
\[D^{\#}C={\left( \begin{matrix}{} {D_3}{C_0}-{D_1}{C_2} &amp;{D_3}{C_1}-{D_1}{C_3} \\ {D_0}{C_2}-{D_2}{C_0} &amp; {D_0}{C_3}-{D_2}{C_1} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="paragraph">
<p>を利用すると</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
X&amp;={\left( \begin{matrix} A_3\left|D\right|-B_2{(D^{\#}C)}_1 - B_3{(D^{\#}C)}_3 &amp; -(A_1\left|D\right|-B_0{(D^{\#}C)}_1 - B_1{(D^{\#}C)}_3) \\ -(A_2\left|D\right|-B_2{(D^{\#}C)}_0 - B_3{(D^{\#}C)}_2) &amp; A_0\left|D\right|-B_0{(D^{\#}C)}_0 - B_1{(D^{\#}C)}_2 \\ \end{matrix} \right)} \\
&amp;={\left( \begin{matrix} A_0\left|D\right|-B_0{(D^{\#}C)}_0 - B_1{(D^{\#}C)}_2  &amp; A_1\left|D\right|-B_0{(D^{\#}C)}_1 - B_1{(D^{\#}C)}_3 \\ A_2\left|D\right|-B_2{(D^{\#}C)}_0 - B_3{(D^{\#}C)}_2 &amp; A_3\left|D\right|-B_2{(D^{\#}C)}_1 - B_3{(D^{\#}C)}_3 \\ \end{matrix} \right)}^{\#} \\
&amp;=(\left|D\right|A-B(D^{\#}C))^{\#}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>同じく他の小行列\(Y\)、\(Z\)、\(W\)の証明が出来ます。</p>
</div>
<div class="paragraph">
<p>次は行列式の計算式を証明します。</p>
</div>
<div class="stemblock">
<div class="content">
\[\left|M\right|=\left|A\right|\left|D\right|+\left|B\right|\left|C\right|-\operatorname{tr}((A^{\#}B)(D^{\#}C))\]
</div>
</div>
<div class="paragraph">
<p>もう一回左側から</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\left|M\right|&amp;=A_0 \left| \begin{matrix} A_3 &amp; B_2 &amp; B_3 \\ C_1 &amp; D_0 &amp; D_1 \\ C_3 &amp; D_2 &amp; D_3 \end{matrix} \right| - A_1 \left| \begin{matrix} A_2 &amp; B_2 &amp; B_3 \\ C_0 &amp; D_0 &amp; D_1 \\ C_2 &amp; D_2 &amp; D_3 \end{matrix} \right| + B_0 \left| \begin{matrix} A_2 &amp; A_3 &amp; B_3 \\ C_0 &amp; C_1 &amp; D_1 \\ C_2 &amp; C_3 &amp; D_3 \end{matrix} \right| - B_1 \left| \begin{matrix} A_2 &amp; A_3 &amp; B_2 \\ C_0 &amp; C_1 &amp; D_0 \\ C_2 &amp; C_3 &amp; D_2 \end{matrix} \right| \\
&amp;= A_0(A_3\left|D\right|-B_2(D_3C_1-D_1C_3) + B_3(D_2C_1-D_0C_3)) - A_1(A_2\left|D\right|-B_2(D_3C_0-D_1C_2) + B_3(D_2C_0-D_0C_2)) \\
&amp;+B_0(B_3\left|C\right|+A_2(D_3C_1-D_1C_3) - A_3(D_3C_0-D_1C_2)) - B_1(B_2\left|C\right|+A_2(D_2C_1-D_0C_3) - A_3(D_2C_0-D_0C_2)) \\
&amp;= \left|A\right|\left|D\right| + \left|B\right|\left|C\right|  \\
&amp;- ({A_3}{B_0}-{A_1}{B_2})({D_3}{C_0}-{D_1}{C_2}) - ({A_3}{B_1}-{A_1}{B_3})({D_0}{C_2}-{D_2}{C_0}) \\
&amp;- ({A_0}{B_2}-{A_2}{B_0})({D_3}{C_1}-{D_1}{C_3}) - ({A_0}{B_3}-{A_2}{B_1})({D_0}{C_3}-{D_2}{C_1})
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>こちらの計算結果</p>
</div>
<div class="stemblock">
<div class="content">
\[A^{\#}B={\left( \begin{matrix}{} {A_3}{B_0}-{A_1}{B_2} &amp;{A_3}{B_1}-{A_1}{B_3} \\ {A_0}{B_2}-{A_2}{B_0} &amp; {A_0}{B_3}-{A_2}{B_1} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="stemblock">
<div class="content">
\[D^{\#}C={\left( \begin{matrix}{} {D_3}{C_0}-{D_1}{C_2} &amp;{D_3}{C_1}-{D_1}{C_3} \\ {D_0}{C_2}-{D_2}{C_0} &amp; {D_0}{C_3}-{D_2}{C_1} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="paragraph">
<p>を利用すると</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\left|M\right|&amp;= \left|A\right|\left|D\right| + \left|B\right|\left|C\right|- ({(A^{\#}B)}_0{(D^{\#}C)}_0 + {(A^{\#}B)}_1{(D^{\#}C)}_2 + {(A^{\#}B)}_2{(D^{\#}C)}_1 + {(A^{\#}B)}_3{(D^{\#}C)}_3) \\
&amp;=\left|A\right|\left|D\right|+\left|B\right|\left|C\right|-\operatorname{tr}((A^{\#}B)(D^{\#}C))
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>以上、この計算は４ｘ４正則行列なら成立することを証明しました。どうしてと言うと、２ｘ２行列の特別な性質が原因だと思います。そして、もっとシンプルな証明方法があると思いますので、もし解っていたら是非教えていただきたいです。</p>
</div>
</div>
<div class="sect2">
<h3 id="___4">付録その二</h3>
<div class="paragraph">
<p>ここからは列優先バージョンです。最初の２つのトランスフォーム行列の関数は全く同じですので、一般の行列の関数だけここに乗ります。</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">// for column major matrix
// we use __m128 to represent 2x2 matrix as A = | A0  A2 |
//                                              | A1  A3 |
// 2x2 column major Matrix multiply A*B
__forceinline __m128 Mat2Mul(__m128 vec1, __m128 vec2)
{
	return
		_mm_add_ps(_mm_mul_ps(                     vec1, VecSwizzle(vec2, 0,0,3,3)),
		           _mm_mul_ps(VecSwizzle(vec1, 2,3,0,1), VecSwizzle(vec2, 1,1,2,2)));
}
// 2x2 column major Matrix adjugate multiply (A#)*B
__forceinline __m128 Mat2AdjMul(__m128 vec1, __m128 vec2)
{
	return
		_mm_sub_ps(_mm_mul_ps(VecSwizzle(vec1, 3,0,3,0), vec2),
		           _mm_mul_ps(VecSwizzle(vec1, 2,1,2,1), VecSwizzle(vec2, 1,0,3,2)));

}
// 2x2 column major Matrix multiply adjugate A*(B#)
__forceinline __m128 Mat2MulAdj(__m128 vec1, __m128 vec2)
{
	return
		_mm_sub_ps(_mm_mul_ps(                     vec1, VecSwizzle(vec2, 3,3,0,0)),
		           _mm_mul_ps(VecSwizzle(vec1, 2,3,0,1), VecSwizzle(vec2, 1,1,2,2)));
}

// Inverse function is the same no matter column major or row major
// this version treats it as column major
inline Matrix4 GetInverse(const Matrix4&amp; inM)
{
	// use block matrix method
	// A is a matrix, then i(A) or iA means inverse of A, A# (or A_ in code) means adjugate of A, |A| (or detA in code) is determinant, tr(A) is trace

	// sub matrices
	__m128 A = VecShuffle_0101(inM.mVec[0], inM.mVec[1]);
	__m128 C = VecShuffle_2323(inM.mVec[0], inM.mVec[1]);
	__m128 B = VecShuffle_0101(inM.mVec[2], inM.mVec[3]);
	__m128 D = VecShuffle_2323(inM.mVec[2], inM.mVec[3]);

#if 0
	__m128 detA = _mm_set1_ps(inM.m[0][0] * inM.m[1][1] - inM.m[0][1] * inM.m[1][0]);
	__m128 detC = _mm_set1_ps(inM.m[0][2] * inM.m[1][3] - inM.m[0][3] * inM.m[1][2]);
	__m128 detB = _mm_set1_ps(inM.m[2][0] * inM.m[3][1] - inM.m[2][1] * inM.m[3][0]);
	__m128 detD = _mm_set1_ps(inM.m[2][2] * inM.m[3][3] - inM.m[2][3] * inM.m[3][2]);
#else
	// determinant as (|A| |C| |B| |D|)
	__m128 detSub = _mm_sub_ps(
		_mm_mul_ps(VecShuffle(inM.mVec[0], inM.mVec[2], 0,2,0,2), VecShuffle(inM.mVec[1], inM.mVec[3], 1,3,1,3)),
		_mm_mul_ps(VecShuffle(inM.mVec[0], inM.mVec[2], 1,3,1,3), VecShuffle(inM.mVec[1], inM.mVec[3], 0,2,0,2))
		);
	__m128 detA = VecSwizzle1(detSub, 0);
	__m128 detC = VecSwizzle1(detSub, 1);
	__m128 detB = VecSwizzle1(detSub, 2);
	__m128 detD = VecSwizzle1(detSub, 3);
#endif

	// let iM = 1/|M| * | X  Y |
	//                  | Z  W |

	// D#C
	__m128 D_C = Mat2AdjMul(D, C);
	// A#B
	__m128 A_B = Mat2AdjMul(A, B);
	// X# = |D|A - B(D#C)
	__m128 X_ = _mm_sub_ps(_mm_mul_ps(detD, A), Mat2Mul(B, D_C));
	// W# = |A|D - C(A#B)
	__m128 W_ = _mm_sub_ps(_mm_mul_ps(detA, D), Mat2Mul(C, A_B));

	// |M| = |A|*|D| + ... (continue later)
	__m128 detM = _mm_mul_ps(detA, detD);

	// Y# = |B|C - D(A#B)#
	__m128 Y_ = _mm_sub_ps(_mm_mul_ps(detB, C), Mat2MulAdj(D, A_B));
	// Z# = |C|B - A(D#C)#
	__m128 Z_ = _mm_sub_ps(_mm_mul_ps(detC, B), Mat2MulAdj(A, D_C));

	// |M| = |A|*|D| + |B|*|C| ... (continue later)
	detM = _mm_add_ps(detM, _mm_mul_ps(detB, detC));

	// tr((A#B)(D#C))
	__m128 tr = _mm_mul_ps(A_B, VecSwizzle(D_C, 0,2,1,3));
	tr = _mm_hadd_ps(tr, tr);
	tr = _mm_hadd_ps(tr, tr);
	// |M| = |A|*|D| + |B|*|C| - tr((A#B)(D#C))
	detM = _mm_sub_ps(detM, tr);

	const __m128 adjSignMask = _mm_setr_ps(1.f, -1.f, -1.f, 1.f));
	// (1/|M|, -1/|M|, -1/|M|, 1/|M|)
	__m128 rDetM = _mm_div_ps(adjSignMask, detM);

	X_ = _mm_mul_ps(X_, rDetM);
	Y_ = _mm_mul_ps(Y_, rDetM);
	Z_ = _mm_mul_ps(Z_, rDetM);
	W_ = _mm_mul_ps(W_, rDetM);

	Matrix4 r;

	// apply adjugate and store, here we combine adjugate shuffle and store shuffle
	r.mVec[0] = VecShuffle(X_, Z_, 3,1,3,1);
	r.mVec[1] = VecShuffle(X_, Z_, 2,0,2,0);
	r.mVec[2] = VecShuffle(Y_, W_, 3,1,3,1);
	r.mVec[3] = VecShuffle(Y_, W_, 2,0,2,0);

	return r;
}</code></pre>
</div>
</div>
</div>]]></description><link>https://lxjk.github.io/2020/02/07/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained-JP.html</link><guid isPermaLink="true">https://lxjk.github.io/2020/02/07/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained-JP.html</guid><category><![CDATA[Math]]></category><category><![CDATA[SSE]]></category><category><![CDATA[Japanese]]></category><dc:creator><![CDATA[Eric Zhang]]></dc:creator><pubDate>Fri, 07 Feb 2020 00:00:00 GMT</pubDate></item><item><title><![CDATA[How to Make Tools in UE4]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div class="paragraph">
<p><strong><em>This article is based on Unreal 4.17 code base, tested in Unreal 4.23.</em></strong></p>
</div>
<div class="paragraph">
<p>This is a step by step tutorial to write tools for your Unreal project. I would assume you are familiar with Unreal already. This is NOT a tutorial for SLATE code, that deserves a tutorial for its own, and there are lots of SLATE example in Unreal already. With that said there will be some basic SLATE code in this tutorial to build UI widget, and I will try to show some different use cases for each example.</p>
</div>
<div class="paragraph">
<p>The example project is available in <a href="https://github.com/lxjk/ToolExample" class="bare">https://github.com/lxjk/ToolExample</a> . Right click on the "<strong>ToolExample.uproject</strong>" and choose Switch Unreal Engine version to link to your engine.</p>
</div>
<div id="toc" class="toc">
<div id="toctitle" class="title">Table of Contents</div>
<ul class="sectlevel1">
<li><a href="#_setup_editor_module">Setup Editor Module</a>
<ul class="sectlevel2">
<li><a href="#_iexamplemoduleinterface_h">IExampleModuleInterface.h</a></li>
<li><a href="#_toolexmampleeditor_build_cs">ToolExmampleEditor.Build.cs</a></li>
<li><a href="#_toolexampleeditor_h_toolexampleeditor_cpp">ToolExampleEditor.h &amp; ToolExampleEditor.cpp</a></li>
<li><a href="#_toolexampleeditor_target_cs">ToolExampleEditor.Target.cs</a></li>
<li><a href="#_toolexample_uproject">ToolExample.uproject</a></li>
</ul>
</li>
<li><a href="#_add_custom_menu">Add Custom Menu</a></li>
<li><a href="#_advanced_menu">Advanced Menu</a></li>
<li><a href="#_create_a_tab_window">Create a Tab (Window)</a></li>
<li><a href="#_customize_details_panel">Customize Details Panel</a></li>
<li><a href="#_custom_data_type">Custom Data Type</a>
<ul class="sectlevel2">
<li><a href="#_new_custom_data">New Custom Data</a></li>
<li><a href="#_import_custom_data">Import Custom Data</a></li>
<li><a href="#_reimport">Reimport</a></li>
</ul>
</li>
<li><a href="#_custom_editor_mode">Custom Editor Mode</a>
<ul class="sectlevel2">
<li><a href="#_setup_editor_mode">Setup Editor Mode</a></li>
<li><a href="#_render_and_click">Render and Click</a></li>
<li><a href="#_use_transform_widget">Use Transform Widget</a></li>
<li><a href="#_key_input_support_right_click_menu_and_others">Key input support, right click menu, and others</a></li>
</ul>
</li>
<li><a href="#_custom_project_settings">Custom Project Settings</a></li>
<li><a href="#_tricks">Tricks</a>
<ul class="sectlevel2">
<li><a href="#_use_widget_reflector">Use Widget Reflector</a></li>
<li><a href="#_is_my_tool_running_in_the_editor_or_game">Is my tool running in the editor or game?</a></li>
<li><a href="#_useful_uproperty_meta_marker">Useful UPROPERTY() meta marker</a></li>
<li><a href="#_make_custom_animation_blueprint_node">Make custom Animation Blueprint Node</a></li>
<li><a href="#_debug_draw_tricks">Debug Draw Tricks</a></li>
<li><a href="#_other_tricks_for_editor_mode">Other Tricks for Editor Mode</a></li>
</ul>
</li>
</ul>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_setup_editor_module">Setup Editor Module</h2>
<div class="sectionbody">
<div class="paragraph">
<p>To make proper tools in Unreal it is almost a must to setup a custom editor module first. This will provide you an entry point for you custom tools, and also make sure your tool will not be included other than running in editor.</p>
</div>
<div class="paragraph">
<p>Here we create a new ToolExample project.</p>
</div>
<div class="paragraph">
<p>First we want to create a "ToolExampleEditor" folder and add the following files. This will be our new editor module.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/001.png" alt="001.png" width="331">
</div>
</div>
<div class="sect2">
<h3 id="_iexamplemoduleinterface_h">IExampleModuleInterface.h</h3>
<div class="paragraph">
<p>In this header, we first define <strong>IExampleModuleListenerInterface</strong>, a convenient interface to provide event when our module starts up or shuts down. Almost all our later tools will need to implement this interface.</p>
</div>
<div class="paragraph">
<p>Then we define <strong>IExampleModuleInterface</strong>, this is not necessary if you only have one editor module, but if you have more than that, this will handle event broadcasting for you.
It is required that a module inherit from <strong>IModuleInterface</strong>, so our interface will inherit from the same class.</p>
</div>
<div class="listingblock">
<div class="title">IExampleModuleInterface.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ModuleManager.h"

class IExampleModuleListenerInterface
{
public:
    virtual void OnStartupModule() {};
    virtual void OnShutdownModule() {};
};

class IExampleModuleInterface : public IModuleInterface
{
public:
    void StartupModule() override
    {
        if (!IsRunningCommandlet())
        {
            AddModuleListeners();
            for (int32 i = 0; i &lt; ModuleListeners.Num(); ++i)
            {
                ModuleListeners[i]-&gt;OnStartupModule();
            }
        }
    }

    void ShutdownModule() override
    {
        for (int32 i = 0; i &lt; ModuleListeners.Num(); ++i)
        {
            ModuleListeners[i]-&gt;OnShutdownModule();
        }
    }

    virtual void AddModuleListeners() {};

protected:
    TArray&lt;TSharedRef&lt;IExampleModuleListenerInterface&gt;&gt; ModuleListeners;
};</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_toolexmampleeditor_build_cs">ToolExmampleEditor.Build.cs</h3>
<div class="paragraph">
<p>This file you can copy from ToolExample.Build.cs. We added commonly used module names to dependency. Note we add "ToolExample" module here as well.</p>
</div>
<div class="listingblock">
<div class="title">ToolExmampleEditor.Build.cs</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">PublicDependencyModuleNames.AddRange(
            new string[] {
                "Core",
                "Engine",
                "CoreUObject",
                "InputCore",
                "LevelEditor",
                "Slate",
                "EditorStyle",
                "AssetTools",
                "EditorWidgets",
                "UnrealEd",
                "BlueprintGraph",
                "AnimGraph",
                "ComponentVisualizers",
                "ToolExample"
        }
        );


PrivateDependencyModuleNames.AddRange(
            new string[]
            {
                "Core",
                "CoreUObject",
                "Engine",
                "AppFramework",
                "SlateCore",
                "AnimGraph",
                "UnrealEd",
                "KismetWidgets",
                "MainFrame",
                "PropertyEditor",
                "ComponentVisualizers",
                "ToolExample"
            }
            );</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_toolexampleeditor_h_toolexampleeditor_cpp">ToolExampleEditor.h &amp; ToolExampleEditor.cpp</h3>
<div class="paragraph">
<p>Here we define the actual module class, implementing <strong>IExampleModuleInterface</strong> we defined above. We include headers we need for following sections as well. Make sure the module name you use to get module is the same as the one you pass in <strong>IMPLEMENT_GAME_MODULE</strong> macro.</p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "UnrealEd.h"
#include "SlateBasics.h"
#include "SlateExtras.h"
#include "Editor/LevelEditor/Public/LevelEditor.h"
#include "Editor/PropertyEditor/Public/PropertyEditing.h"
#include "IAssetTypeActions.h"
#include "IExampleModuleInterface.h"

class FToolExampleEditor : public IExampleModuleInterface
{
public:
    /** IModuleInterface implementation */
    virtual void StartupModule() override;
    virtual void ShutdownModule() override;

    virtual void AddModuleListeners() override;

    static inline FToolExampleEditor&amp; Get()
    {
        return FModuleManager::LoadModuleChecked&lt; FToolExampleEditor &gt;("ToolExampleEditor");
    }

    static inline bool IsAvailable()
    {
        return FModuleManager::Get().IsModuleLoaded("ToolExampleEditor");
    }
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor.h"
#include "IExampleModuleInterface.h"

IMPLEMENT_GAME_MODULE(FToolExampleEditor, ToolExampleEditor)

void FToolExampleEditor::AddModuleListeners()
{
    // add tools later
}

void FToolExampleEditor::StartupModule()
{
    IExampleModuleInterface::StartupModule();
}

void FToolExampleEditor::ShutdownModule()
{
    IExampleModuleInterface::ShutdownModule();
}</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_toolexampleeditor_target_cs">ToolExampleEditor.Target.cs</h3>
<div class="paragraph">
<p>We need to modify this file to load our module in Editor mode (Don&#8217;t change ToolExample.Target.cs), add the following:</p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.Target.cs</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">ExtraModuleNames.AddRange( new string[] { "ToolExampleEditor" });</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_toolexample_uproject">ToolExample.uproject</h3>
<div class="paragraph">
<p>Similarly, we need to include our modules here, add the following:</p>
</div>
<div class="listingblock">
<div class="title">ToolExample.uproject</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">{
    "Name": "ToolExampleEditor",
    "Type": "Editor",
    "LoadingPhase": "PostEngineInit",
    "AdditionalDependencies": [
        "Engine"
    ]
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now the editor module should be setup properly.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_add_custom_menu">Add Custom Menu</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Next we are going to add a custom menu, so we can add widget in the menu to run a command or open up a window.</p>
</div>
<div class="paragraph">
<p>First we need to add menu extensions related functions in our editor module <strong>ToolExampleEditor</strong>:</p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">public:
    void AddMenuExtension(const FMenuExtensionDelegate &amp;extensionDelegate, FName extensionHook, const TSharedPtr&lt;FUICommandList&gt; &amp;CommandList = NULL, EExtensionHook::Position position = EExtensionHook::Before);
    TSharedRef&lt;FWorkspaceItem&gt; GetMenuRoot() { return MenuRoot; };

protected:
    TSharedPtr&lt;FExtensibilityManager&gt; LevelEditorMenuExtensibilityManager;
    TSharedPtr&lt;FExtender&gt; MenuExtender;

    static TSharedRef&lt;FWorkspaceItem&gt; MenuRoot;

    void MakePulldownMenu(FMenuBarBuilder &amp;menuBuilder);
    void FillPulldownMenu(FMenuBuilder &amp;menuBuilder);</code></pre>
</div>
</div>
<div class="paragraph">
<p>In the cpp file, define <strong>MenuRoot</strong> and add the implement all the functions. Here we will add a menu called "Example" and create 2 sections: "Section 1" and "Section 2", with extension hook name "Section_1" and "Section_2".</p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">TSharedRef&lt;FWorkspaceItem&gt; FToolExampleEditor::MenuRoot = FWorkspaceItem::NewGroup(FText::FromString("Menu Root"));


void FToolExampleEditor::AddMenuExtension(const FMenuExtensionDelegate &amp;extensionDelegate, FName extensionHook, const TSharedPtr&lt;FUICommandList&gt; &amp;CommandList, EExtensionHook::Position position)
{
    MenuExtender-&gt;AddMenuExtension(extensionHook, position, CommandList, extensionDelegate);
}

void FToolExampleEditor::MakePulldownMenu(FMenuBarBuilder &amp;menuBuilder)
{
    menuBuilder.AddPullDownMenu(
        FText::FromString("Example"),
        FText::FromString("Open the Example menu"),
        FNewMenuDelegate::CreateRaw(this, &amp;FToolExampleEditor::FillPulldownMenu),
        "Example",
        FName(TEXT("ExampleMenu"))
    );
}

void FToolExampleEditor::FillPulldownMenu(FMenuBuilder &amp;menuBuilder)
{
    // just a frame for tools to fill in
    menuBuilder.BeginSection("ExampleSection", FText::FromString("Section 1"));
    menuBuilder.AddMenuSeparator(FName("Section_1"));
    menuBuilder.EndSection();

    menuBuilder.BeginSection("ExampleSection", FText::FromString("Section 2"));
    menuBuilder.AddMenuSeparator(FName("Section_2"));
    menuBuilder.EndSection();
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Finally in <strong>StartupModule</strong> we add the following before we call the parent function. We add our menu after "Window" menu.</p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">void FToolExampleEditor::StartupModule()
{
    if (!IsRunningCommandlet())
    {
        FLevelEditorModule&amp; LevelEditorModule = FModuleManager::LoadModuleChecked&lt;FLevelEditorModule&gt;("LevelEditor");
        LevelEditorMenuExtensibilityManager = LevelEditorModule.GetMenuExtensibilityManager();
        MenuExtender = MakeShareable(new FExtender);
        MenuExtender-&gt;AddMenuBarExtension("Window", EExtensionHook::After, NULL, FMenuBarExtensionDelegate::CreateRaw(this, &amp;FToolExampleEditor::MakePulldownMenu));
        LevelEditorMenuExtensibilityManager-&gt;AddExtender(MenuExtender);
    }
    IExampleModuleInterface::StartupModule();
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now if you run it you should see the custom menu get added with two sections.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/002.png" alt="002.png" width="329">
</div>
</div>
<div class="paragraph">
<p>Next we can add our first tool to register to our menu. First add two new files:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/003.png" alt="003.png" width="190">
</div>
</div>
<div class="paragraph">
<p>This class will inherit from <strong>IExampleModuleListenerInterface</strong>, and we add function to create menu entry. We also add <strong>FUICommandList</strong>, which will define and map a menu item to a function. Finally we add our only menu function <strong>MenuCommand1</strong>, this function will be called when user click on the menu item.</p>
</div>
<div class="listingblock">
<div class="title">MenuTool.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/IExampleModuleInterface.h"

class MenuTool : public IExampleModuleListenerInterface, public TSharedFromThis&lt;MenuTool&gt;
{
public:
    virtual ~MenuTool() {}

    virtual void OnStartupModule() override;
    virtual void OnShutdownModule() override;

    void MakeMenuEntry(FMenuBuilder &amp;menuBuilder);

protected:
    TSharedPtr&lt;FUICommandList&gt; CommandList;

    void MapCommands();

    // UI Command functions
    void MenuCommand1();
};</code></pre>
</div>
</div>
<div class="paragraph">
<p>On the cpp side, we got a lot more to do. First we need to define <strong>LOCTEXT_NAMESPACE</strong> at the beginning, and un-define it at the end. This is required to use <strong>UI_COMMAND</strong> macro.
Then we start filling in each command, first create a <strong>FUICommandInfo</strong> member for each command in command list class, fill in <strong>RegisterCommands</strong> function by using <strong>UI_COMMAND</strong> marcro. Then in <strong>MapCommands</strong> function map each command info to a function. And of course define the command function <strong>MenuTool::MenuCommand1</strong>.</p>
</div>
<div class="paragraph">
<p>In <strong>OnStartupModule</strong>, we create command list, register it, map it, then register to menu extension. In this case we want our item in "Section 1", and <strong>MakeMenuEntry</strong> will be called when Unreal build the menu, in which we simply add <strong>MenuCommand1</strong> to the menu.</p>
</div>
<div class="paragraph">
<p>In <strong>OnShutdownModule</strong>, we need to unregister command list.</p>
</div>
<div class="listingblock">
<div class="title">MenuTool.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "MenuTool.h"

#define LOCTEXT_NAMESPACE "MenuTool"

class MenuToolCommands : public TCommands&lt;MenuToolCommands&gt;
{
public:

    MenuToolCommands()
        : TCommands&lt;MenuToolCommands&gt;(
        TEXT("MenuTool"), // Context name for fast lookup
        FText::FromString("Example Menu tool"), // Context name for displaying
        NAME_None,   // No parent context
        FEditorStyle::GetStyleSetName() // Icon Style Set
        )
    {
    }

    virtual void RegisterCommands() override
    {
        UI_COMMAND(MenuCommand1, "Menu Command 1", "Test Menu Command 1.", EUserInterfaceActionType::Button, FInputGesture());

    }

public:
    TSharedPtr&lt;FUICommandInfo&gt; MenuCommand1;
};

void MenuTool::MapCommands()
{
    const auto&amp; Commands = MenuToolCommands::Get();

    CommandList-&gt;MapAction(
        Commands.MenuCommand1,
        FExecuteAction::CreateSP(this, &amp;MenuTool::MenuCommand1),
        FCanExecuteAction());
}

void MenuTool::OnStartupModule()
{
    CommandList = MakeShareable(new FUICommandList);
    MenuToolCommands::Register();
    MapAction();
    FToolExampleEditor::Get().AddMenuExtension(
        FMenuExtensionDelegate::CreateRaw(this, &amp;MenuTool::MakeMenuEntry),
        FName("Section_1"),
        CommandList);
}

void MenuTool::OnShutdownModule()
{
    MenuToolCommands::Unregister();
}

void MenuTool::MakeMenuEntry(FMenuBuilder &amp;menuBuilder)
{
    menuBuilder.AddMenuEntry(MenuToolCommands::Get().MenuCommand1);
}

void MenuTool::MenuCommand1()
{
    UE_LOG(LogClass, Log, TEXT("clicked MenuCommand1"));
}

#undef LOCTEXT_NAMESPACE</code></pre>
</div>
</div>
<div class="paragraph">
<p>When this is all done, remember to add this tool as a listener to editor module in <strong>FToolExampleEditor::AddModuleListeners</strong>:</p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">ModuleListeners.Add(MakeShareable(new MenuTool));</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now if you build the project, you should see your menu item in the menu. And if you click on it, it will print "clicked MenuCommand1".</p>
</div>
<div class="paragraph">
<p>By now you have a basic framework for tools, You can run anything you want based on a menu click.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/004.png" alt="004.png" width="236">
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_advanced_menu">Advanced Menu</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Before we jump to window, let&#8217;s extend menu functionality for a bit, since there are a lot more you can do.</p>
</div>
<div class="paragraph">
<p>First if you have a lot of items, it will be good to put them in a sub menu. Let&#8217;s make two more commands <strong>MenuCommand2</strong> and <strong>MenuCommand3</strong>. You can search for <strong>MenuCommand1</strong> and create two more in each places, other than <strong>MakeMenuEntry</strong>, where we will add sub menu.</p>
</div>
<div class="paragraph">
<p>In <strong>MenuTool</strong>, we add function for sub menu:</p>
</div>
<div class="listingblock">
<div class="title">MenuTool.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">void MakeSubMenu(FMenuBuilder &amp;menuBuilder);</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">MenuTool.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">void MenuTool::MakeSubMenu(FMenuBuilder &amp;menuBuilder)
{
    menuBuilder.AddMenuEntry(MenuToolCommands::Get().MenuCommand2);
    menuBuilder.AddMenuEntry(MenuToolCommands::Get().MenuCommand3);
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Then we call <strong>AddSubMenu</strong> in <strong>MenuTool::MakeMenuEntry</strong>, after MenuCommand1 is registered so the submenu comes after that.</p>
</div>
<div class="listingblock">
<div class="title">MenuTool.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">void MenuTool::MakeMenuEntry(FMenuBuilder &amp;menuBuilder)
{
    ...
    menuBuilder.AddSubMenu(
        FText::FromString("Sub Menu"),
        FText::FromString("This is example sub menu"),
        FNewMenuDelegate::CreateSP(this, &amp;MenuTool::MakeSubMenu)
    );
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now you should see sub menu like the following:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/005.png" alt="005.png" width="269">
</div>
</div>
<div class="paragraph">
<p>Not only you can add simple menu item, you can actually add any widget into the menu. We will try to make a small tool that you can type in a textbox and click a button to set that as tags for selected actors.</p>
</div>
<div class="paragraph">
<p>I&#8217;m not going to go into details for each functions I used here, search them in Unreal engine and you should find plenty of use cases.</p>
</div>
<div class="paragraph">
<p>First we add needed member and functions, note this time we are going to use custom widget, so we don&#8217;t need to change command list. For <strong>AddTag</strong> fucntion, because it is going to be used for a button, return type have to be <strong>FReply</strong>.</p>
</div>
<div class="listingblock">
<div class="title">MenuTool.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">FName TagToAdd;

FReply AddTag();
FText GetTagToAddText() const;
void OnTagToAddTextCommited(const FText&amp; InText, ETextCommit::Type CommitInfo);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Then we fill in those functions. If you type in a text, we save it to <strong>TagToAdd</strong>. If you click on the button, we search all selected actors and make the tag change. We wrap it around a transaction so it will support undo. To use transaction we need to include "ScopedTransaction.h".</p>
</div>
<div class="listingblock">
<div class="title">MenuTool.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">FReply MenuTool::AddTag()
{
    if (!TagToAdd.IsNone())
    {
        const FScopedTransaction Transaction(FText::FromString("Add Tag"));
        for (FSelectionIterator It(GEditor-&gt;GetSelectedActorIterator()); It; ++It)
        {
            AActor* Actor = static_cast&lt;AActor*&gt;(*It);
            if (!Actor-&gt;Tags.Contains(TagToAdd))
            {
                Actor-&gt;Modify();
                Actor-&gt;Tags.Add(TagToAdd);
            }
        }
    }
    return FReply::Handled();
}

FText MenuTool::GetTagToAddText() const
{
    return FText::FromName(TagToAdd);
}

void MenuTool::OnTagToAddTextCommited(const FText&amp; InText, ETextCommit::Type CommitInfo)
{
    FString str = InText.ToString();
    TagToAdd = FName(*str.Trim());
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Then in <strong>MenuTool::MakeMenuEntry</strong>, we create the widget and add it to the menu. Again I will not go into Slate code details.</p>
</div>
<div class="listingblock">
<div class="title">MenuTool.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">void MenuTool::MakeMenuEntry(FMenuBuilder &amp;menuBuilder)
{
    ...
    TSharedRef&lt;SWidget&gt; AddTagWidget =
        SNew(SHorizontalBox)
        + SHorizontalBox::Slot()
        .AutoWidth()
        .VAlign(VAlign_Center)
        [
            SNew(SEditableTextBox)
            .MinDesiredWidth(50)
            .Text(this, &amp;MenuTool::GetTagToAddText)
            .OnTextCommitted(this, &amp;MenuTool::OnTagToAddTextCommited)
        ]
        + SHorizontalBox::Slot()
        .AutoWidth()
        .Padding(5, 0, 0, 0)
        .VAlign(VAlign_Center)
        [
            SNew(SButton)
            .Text(FText::FromString("Add Tag"))
            .OnClicked(this, &amp;MenuTool::AddTag)
        ];

    menuBuilder.AddWidget(AddTagWidget, FText::FromString(""));
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now you have a more complex tool sit in the menu, and you can set actor tags with it:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/006.png" alt="006.png" width="174">
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_create_a_tab_window">Create a Tab (Window)</h2>
<div class="sectionbody">
<div class="paragraph">
<p>While we can do a lot in the menu, it is still more convenient and flexible if you have a window. In Unreal it is called "tab". Because creating a tab from menu is very common for tools, we will make a base class for it first.</p>
</div>
<div class="paragraph">
<p>Add a new file:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/007.png" alt="007.png" width="217">
</div>
</div>
<div class="paragraph">
<p>The base class is also inherit from <strong>IExampleModuleListenerInterface</strong>. In <strong>OnStartupModule</strong> we register a tab, and unregister it in <strong>OnShutdownModule</strong>. Then in <strong>MakeMenuEntry</strong>, we let <strong>FGlobalTabmanager</strong> to populate tab for this menu item.
We leave <strong>SpawnTab</strong> function to be overriden by child class to set proper widget.</p>
</div>
<div class="listingblock">
<div class="title">ExampleTabToolBase.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "ToolExampleEditor/IExampleModuleInterface.h"
#include "TabManager.h"
#include "SDockTab.h"

class FExampleTabToolBase : public IExampleModuleListenerInterface, public TSharedFromThis&lt; FExampleTabToolBase &gt;
{
public:
    // IPixelopusToolBase
    virtual void OnStartupModule() override
    {
        Initialize();
        FGlobalTabmanager::Get()-&gt;RegisterNomadTabSpawner(TabName, FOnSpawnTab::CreateRaw(this, &amp;FExampleTabToolBase::SpawnTab))
            .SetGroup(FToolExampleEditor::Get().GetMenuRoot())
            .SetDisplayName(TabDisplayName)
            .SetTooltipText(ToolTipText);
    };

    virtual void OnShutdownModule() override
    {
        FGlobalTabmanager::Get()-&gt;UnregisterNomadTabSpawner(TabName);
    };

    // In this function set TabName/TabDisplayName/ToolTipText
    virtual void Initialize() {};
    virtual TSharedRef&lt;SDockTab&gt; SpawnTab(const FSpawnTabArgs&amp; TabSpawnArgs) { return SNew(SDockTab); };

    virtual void MakeMenuEntry(FMenuBuilder &amp;menuBuilder)
    {
        FGlobalTabmanager::Get()-&gt;PopulateTabSpawnerMenu(menuBuilder, TabName);
    };

protected:
    FName TabName;
    FText TabDisplayName;
    FText ToolTipText;
};</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now we add files for tab tool. Other than the normal tool class, we also need a custom panel widget class for the tab itself.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/008.png" alt="008.png" width="218">
</div>
</div>
<div class="paragraph">
<p>Let&#8217;s look at <strong>TabTool</strong> class first, it is inherited from <strong>ExampleTabToolBase</strong> defined above.</p>
</div>
<div class="paragraph">
<p>We set tab name, display name and tool tips in <strong>Initialize</strong> function, and prepare the panel in <strong>SpawnTab</strong> function. Note here we send the tool object itself as a parameter when creating the panel. This is not necessary, but as an example how you can pass in an object to the widget.</p>
</div>
<div class="paragraph">
<p>This tab tool is added in "Section 2" in the custom menu.</p>
</div>
<div class="listingblock">
<div class="title">TabTool.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ExampleTabToolBase.h"

class TabTool : public FExampleTabToolBase
{
public:
    virtual ~TabTool () {}
    virtual void OnStartupModule() override;
    virtual void OnShutdownModule() override;
    virtual void Initialize() override;
    virtual TSharedRef&lt;SDockTab&gt; SpawnTab(const FSpawnTabArgs&amp; TabSpawnArgs) override;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">TabTool.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "TabToolPanel.h"
#include "TabTool.h"

void TabTool::OnStartupModule()
{
    FExampleTabToolBase::OnStartupModule();
    FToolExampleEditor::Get().AddMenuExtension(FMenuExtensionDelegate::CreateRaw(this, &amp;TabTool::MakeMenuEntry), FName("Section_2"));
}

void TabTool::OnShutdownModule()
{
    FExampleTabToolBase::OnShutdownModule();
}

void TabTool::Initialize()
{
    TabName = "TabTool";
    TabDisplayName = FText::FromString("Tab Tool");
    ToolTipText = FText::FromString("Tab Tool Window");
}

TSharedRef&lt;SDockTab&gt; TabTool::SpawnTab(const FSpawnTabArgs&amp; TabSpawnArgs)
{
    TSharedRef&lt;SDockTab&gt; SpawnedTab = SNew(SDockTab)
        .TabRole(ETabRole::NomadTab)
        [
            SNew(TabToolPanel)
            .Tool(SharedThis(this))
        ];

    return SpawnedTab;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now for the pannel:</p>
</div>
<div class="paragraph">
<p>In the construct function we build the slate widget in <strong>ChildSlot</strong>. Here I&#8217;m add a scroll box, with a grey border inside, with a text box inside.</p>
</div>
<div class="listingblock">
<div class="title">TabToolPanel.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "SDockTab.h"
#include "SDockableTab.h"
#include "SDockTabStack.h"
#include "SlateApplication.h"
#include "TabTool.h"

class TabToolPanel : public SCompoundWidget
{
    SLATE_BEGIN_ARGS(TabToolPanel)
    {}
    SLATE_ARGUMENT(TWeakPtr&lt;class TabTool&gt;, Tool)
    SLATE_END_ARGS()

    void Construct(const FArguments&amp; InArgs);

protected:
    TWeakPtr&lt;TabTool&gt; tool;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">TabToolPanel.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "TabToolPanel.h"

void TabToolPanel::Construct(const FArguments&amp; InArgs)
{
    tool = InArgs._Tool;
    if (tool.IsValid())
    {
        // do anything you need from tool object
    }

    ChildSlot
    [
        SNew(SScrollBox)
        + SScrollBox::Slot()
        .VAlign(VAlign_Top)
        .Padding(5)
        [
            SNew(SBorder)
            .BorderBackgroundColor(FColor(192, 192, 192, 255))
            .Padding(15.0f)
            [
                SNew(STextBlock)
                .Text(FText::FromString(TEXT("This is a tab example.")))
            ]
        ]
    ];
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Finally remember to add this tool to editor module in <strong>FToolExampleEditor::AddModuleListeners</strong>:</p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">ModuleListeners.Add(MakeShareable(new TabTool));</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now you can see tab tool in our custom menu:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/009.png" alt="009.png" width="231">
</div>
</div>
<div class="paragraph">
<p>When you click on it, it will populate a window you can dock anywhere as regular Unreal tab.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/010.png" alt="010.png" width="436">
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_customize_details_panel">Customize Details Panel</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Another commonly used feature is to customize the details panel for any UObject.</p>
</div>
<div class="paragraph">
<p>To show how it works, we will create an Actor class first in our game module "ToolExample". Add the follow file:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/011.png" alt="011.png" width="235">
</div>
</div>
<div class="paragraph">
<p>In this class, we add 2 booleans in "Options" category, and an integer in "Test" category. Remember to add "<strong>TOOLEXAMPLE_API</strong>" in front of class name to export it from game module, otherwise we cannot use it in editor module.</p>
</div>
<div class="listingblock">
<div class="title">ExampleActor.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "ExampleActor.generated.h"

UCLASS()
class TOOLEXAMPLE_API AExampleActor : public AActor
{
    GENERATED_BODY()
public:
    UPROPERTY(EditAnywhere, Category = "Options")
    bool bOption1 = false;

    UPROPERTY(EditAnywhere, Category = "Options")
    bool bOption2 = false;

    UPROPERTY(EditAnywhere, Category = "Test")
    int testInt = 0;
};</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now if we load up Unreal and drag a "ExampleActor", you should see the following in the details panel:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/012.png" alt="012.png" width="269">
</div>
</div>
<div class="paragraph">
<p>If we want option 1 and option 2 to be mutually exclusive. You can have both unchecked or one of them checked, but you cannot have both checked. We want to customize this details panel, so if user check one of them, it will automatically uncheck the other.</p>
</div>
<div class="paragraph">
<p>Add the following files to editor module "ToolExampleEditor":</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/013.png" alt="013.png" width="244">
</div>
</div>
<div class="paragraph">
<p>The details customization implements <strong>IDetailCustomization</strong> interface. In the main entry point <strong>CustomizeDetails</strong> function, we first hide original properties option 1 and option 2 (you can comment out those two lines and see how it works). Then we add our custom widget, here the "RadioButton" is purely a visual style, it has nothing to do with mutually exclusive logic. You can implement the same logic with other visuals like regular check box, buttons, etc.</p>
</div>
<div class="paragraph">
<p>In the widget functions for check box, <strong>IsModeRadioChecked</strong> and <strong>OnModeRadioChanged</strong> we add extra parameters "actor" and "optionIndex", so we can pass in the editing object and specify option when we construct the widget.</p>
</div>
<div class="listingblock">
<div class="title">ExampleActorDetails.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "IDetailCustomization.h"

class AExampleActor;

class FExampleActorDetails : public IDetailCustomization
{
public:
    /** Makes a new instance of this detail layout class for a specific detail view requesting it */
    static TSharedRef&lt;IDetailCustomization&gt; MakeInstance();

    /** IDetailCustomization interface */
    virtual void CustomizeDetails(IDetailLayoutBuilder&amp; DetailLayout) override;

protected:
    // widget functions
    ECheckBoxState IsModeRadioChecked(AExampleActor* actor, int optionIndex) const;
    void OnModeRadioChanged(ECheckBoxState CheckType, AExampleActor* actor, int optionIndex);
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ExampleActorDetails.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "ExampleActorDetails.h"
#include "DetailsCustomization/ExampleActor.h"

TSharedRef&lt;IDetailCustomization&gt; FExampleActorDetails::MakeInstance()
{
    return MakeShareable(new FExampleActorDetails);
}

void FExampleActorDetails::CustomizeDetails(IDetailLayoutBuilder&amp; DetailLayout)
{
    TArray&lt;TWeakObjectPtr&lt;UObject&gt;&gt; Objects;
    DetailLayout.GetObjectsBeingCustomized(Objects);
    if (Objects.Num() != 1)
    {
        // skip customization if select more than one objects
        return;
    }
    AExampleActor* actor = (AExampleActor*)Objects[0].Get();

    // hide original property
    DetailLayout.HideProperty(DetailLayout.GetProperty(GET_MEMBER_NAME_CHECKED(AExampleActor, bOption1)));
    DetailLayout.HideProperty(DetailLayout.GetProperty(GET_MEMBER_NAME_CHECKED(AExampleActor, bOption2)));

    // add custom widget to "Options" category
    IDetailCategoryBuilder&amp; OptionsCategory = DetailLayout.EditCategory("Options", FText::FromString(""), ECategoryPriority::Important);
    OptionsCategory.AddCustomRow(FText::FromString("Options"))
                .WholeRowContent()
                [
                    SNew(SHorizontalBox)
                    + SHorizontalBox::Slot()
                    .AutoWidth()
                    .VAlign(VAlign_Center)
                    [
                        SNew(SCheckBox)
                        .Style(FEditorStyle::Get(), "RadioButton")
                        .IsChecked(this, &amp;FExampleActorDetails::IsModeRadioChecked, actor, 1)
                        .OnCheckStateChanged(this, &amp;FExampleActorDetails::OnModeRadioChanged, actor, 1)
                        [
                            SNew(STextBlock).Text(FText::FromString("Option 1"))
                        ]
                    ]
                    + SHorizontalBox::Slot()
                    .AutoWidth()
                    .Padding(10.f, 0.f, 0.f, 0.f)
                    .VAlign(VAlign_Center)
                    [
                        SNew(SCheckBox)
                        .Style(FEditorStyle::Get(), "RadioButton")
                        .IsChecked(this, &amp;FExampleActorDetails::IsModeRadioChecked, actor, 2)
                        .OnCheckStateChanged(this, &amp;FExampleActorDetails::OnModeRadioChanged, actor, 2)
                        [
                            SNew(STextBlock).Text(FText::FromString("Option 2"))
                        ]
                    ]
                ];
}

ECheckBoxState FExampleActorDetails::IsModeRadioChecked(AExampleActor* actor, int optionIndex) const
{
    bool bFlag = false;
    if (actor)
    {
        if (optionIndex == 1)
            bFlag = actor-&gt;bOption1;
        else if (optionIndex == 2)
            bFlag = actor-&gt;bOption2;
    }
    return bFlag ? ECheckBoxState::Checked : ECheckBoxState::Unchecked;
}

void FExampleActorDetails::OnModeRadioChanged(ECheckBoxState CheckType, AExampleActor* actor, int optionIndex)
{
    bool bFlag = (CheckType == ECheckBoxState::Checked);
    if (actor)
    {
        actor-&gt;Modify();
        if (bFlag)
        {
            // clear all options first
            actor-&gt;bOption1 = false;
            actor-&gt;bOption2 = false;
        }
        if (optionIndex == 1)
            actor-&gt;bOption1 = bFlag;
        else if (optionIndex == 2)
            actor-&gt;bOption2 = bFlag;
    }
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Then we need to register the layout in <strong>FToolExampleEditor::StartupModule</strong> and unregister it in <strong>FToolExampleEditor::ShutdownModule</strong></p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "DetailsCustomization/ExampleActor.h"
#include "DetailsCustomization/ExampleActorDetails.h"

void FToolExampleEditor::StartupModule()
{
    ...

    // register custom layouts
    {
        static FName PropertyEditor("PropertyEditor");
        FPropertyEditorModule&amp; PropertyModule = FModuleManager::GetModuleChecked&lt;FPropertyEditorModule&gt;(PropertyEditor);
        PropertyModule.RegisterCustomClassLayout(AExampleActor::StaticClass()-&gt;GetFName(), FOnGetDetailCustomizationInstance::CreateStatic(&amp;FExampleActorDetails::MakeInstance));
    }

    IExampleModuleInterface::StartupModule();
}

void FToolExampleEditor::ShutdownModule()
{
    // unregister custom layouts
    if (FModuleManager::Get().IsModuleLoaded("PropertyEditor"))
    {
        FPropertyEditorModule&amp; PropertyModule = FModuleManager::GetModuleChecked&lt;FPropertyEditorModule&gt;("PropertyEditor");
        PropertyModule.UnregisterCustomClassLayout(AExampleActor::StaticClass()-&gt;GetFName());
    }

    IExampleModuleInterface::ShutdownModule();
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now you should see the customized details panel:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/014.png" alt="014.png" width="271">
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_custom_data_type">Custom Data Type</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_new_custom_data">New Custom Data</h3>
<div class="paragraph">
<p>For simple data, you can just inherit from <strong>UDataAsset</strong> class, then you can create your data object in Unreal content browser: Add New → miscellaneous → Data Asset</p>
</div>
<div class="paragraph">
<p>If you want to add you data to a custom category, you need to do a bit more work.</p>
</div>
<div class="paragraph">
<p>First we need to create a custom data type in game module (ExampleTool). We will make one with only one property.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/015.png" alt="015.png" width="236">
</div>
</div>
<div class="paragraph">
<p>We add "SourceFilePath" for future sections.</p>
</div>
<div class="listingblock">
<div class="title">ExampleData.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "ExampleData.generated.h"

UCLASS(Blueprintable)
class UExampleData : public UObject
{
    GENERATED_BODY()

public:
    UPROPERTY(EditAnywhere, Category = "Properties")
    FString ExampleString;

#if WITH_EDITORONLY_DATA
    UPROPERTY(Category = SourceAsset, VisibleAnywhere)
    FString SourceFilePath;
#endif
};</code></pre>
</div>
</div>
<div class="paragraph">
<p>Then in editor module, add the following files:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/016.png" alt="016.png" width="380">
</div>
</div>
<div class="paragraph">
<p>We first make the factory:</p>
</div>
<div class="listingblock">
<div class="title">ExampleDataFactory.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "UnrealEd.h"
#include "ExampleDataFactory.generated.h"

UCLASS()
class UExampleDataFactory : public UFactory
{
    GENERATED_UCLASS_BODY()
public:
    virtual UObject* FactoryCreateNew(UClass* Class, UObject* InParent, FName Name, EObjectFlags Flags, UObject* Context, FFeedbackContext* Warn) override;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ExampleDataFactory.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "ExampleDataFactory.h"
#include "CustomDataType/ExampleData.h"

UExampleDataFactory::UExampleDataFactory(const FObjectInitializer&amp; ObjectInitializer) : Super(ObjectInitializer)
{
    SupportedClass = UExampleData::StaticClass();
    bCreateNew = true;
    bEditAfterNew = true;
}

UObject* UExampleDataFactory::FactoryCreateNew(UClass* Class, UObject* InParent, FName Name, EObjectFlags Flags, UObject* Context, FFeedbackContext* Warn)
{
    UExampleData* NewObjectAsset = NewObject&lt;UExampleData&gt;(InParent, Class, Name, Flags | RF_Transactional);
    return NewObjectAsset;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Then we make type actions, here we will pass in the asset category.</p>
</div>
<div class="listingblock">
<div class="title">ExampleDataTypeActions.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "AssetTypeActions_Base.h"

class FExampleDataTypeActions : public FAssetTypeActions_Base
{
public:
    FExampleDataTypeActions(EAssetTypeCategories::Type InAssetCategory);

    // IAssetTypeActions interface
    virtual FText GetName() const override;
    virtual FColor GetTypeColor() const override;
    virtual UClass* GetSupportedClass() const override;
    virtual uint32 GetCategories() override;
    // End of IAssetTypeActions interface

private:
    EAssetTypeCategories::Type MyAssetCategory;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ExampleDataTypeActions.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "ExampleDataTypeActions.h"
#include "CustomDataType/ExampleData.h"

FExampleDataTypeActions::FExampleDataTypeActions(EAssetTypeCategories::Type InAssetCategory)
    : MyAssetCategory(InAssetCategory)
{
}

FText FExampleDataTypeActions::GetName() const
{
    return FText::FromString("Example Data");
}

FColor FExampleDataTypeActions::GetTypeColor() const
{
    return FColor(230, 205, 165);
}

UClass* FExampleDataTypeActions::GetSupportedClass() const
{
    return UExampleData::StaticClass();
}

uint32 FExampleDataTypeActions::GetCategories()
{
    return MyAssetCategory;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Finally we need to register type actions in editor module. We add an array <strong>CreatedAssetTypeActions</strong> to save all type actions we registered, so we can unregister them properly when module is unloaded:</p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">class FToolExampleEditor : public IExampleModuleInterface
{
    ...
    TArray&lt;TSharedPtr&lt;IAssetTypeActions&gt;&gt; CreatedAssetTypeActions;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>In <strong>StartupModule</strong> function, we create a new "<strong>Example</strong>" category, and use that to register our type action.</p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "CustomDataType/ExampleDataTypeActions.h"

void FToolExampleEditor::StartupModule()
{
    ...

    // register custom types:
    {
        IAssetTools&amp; AssetTools = FModuleManager::LoadModuleChecked&lt;FAssetToolsModule&gt;("AssetTools").Get();
        // add custom category
        EAssetTypeCategories::Type ExampleCategory = AssetTools.RegisterAdvancedAssetCategory(FName(TEXT("Example")), FText::FromString("Example"));
        // register our custom asset with example category
        TSharedPtr&lt;IAssetTypeActions&gt; Action = MakeShareable(new FExampleDataTypeActions(ExampleCategory));
        AssetTools.RegisterAssetTypeActions(Action.ToSharedRef());
        // saved it here for unregister later
        CreatedAssetTypeActions.Add(Action);
    }

    IExampleModuleInterface::StartupModule();
}

void FToolExampleEditor::ShutdownModule()
{
    ...

    // Unregister all the asset types that we registered
    if (FModuleManager::Get().IsModuleLoaded("AssetTools"))
    {
        IAssetTools&amp; AssetTools = FModuleManager::GetModuleChecked&lt;FAssetToolsModule&gt;("AssetTools").Get();
        for (int32 i = 0; i &lt; CreatedAssetTypeActions.Num(); ++i)
        {
            AssetTools.UnregisterAssetTypeActions(CreatedAssetTypeActions[i].ToSharedRef());
        }
    }
    CreatedAssetTypeActions.Empty();

    IExampleModuleInterface::ShutdownModule();
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now you will see your data in proper category.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/017.png" alt="017.png" width="380">
</div>
</div>
</div>
<div class="sect2">
<h3 id="_import_custom_data">Import Custom Data</h3>
<div class="paragraph">
<p>For all the hard work we did above, we can now our data from a file, like the way you can drag and drop an PNG file to create a texture. In this case we will have a text file, with extension ".xmp", to be imported into unreal, and we just set the text from the file to "ExampleString" property.</p>
</div>
<div class="paragraph">
<p>To make it work with import, we actually have to disable the ability to be able to create a new data from scratch. Modify factory class as following:</p>
</div>
<div class="listingblock">
<div class="title">ExampleDataFactory.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">class UExampleDataFactory : public UFactory
{
    ...

    virtual UObject* FactoryCreateText(UClass* InClass, UObject* InParent, FName InName, EObjectFlags Flags, UObject* Context, const TCHAR* Type, const TCHAR*&amp; Buffer, const TCHAR* BufferEnd, FFeedbackContext* Warn) override;
    virtual bool FactoryCanImport(const FString&amp; Filename) override;

    // helper function
    static void MakeExampleDataFromText(class UExampleData* Data, const TCHAR*&amp; Buffer, const TCHAR* BufferEnd);
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ExampleDataFactory.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">UExampleDataFactory::UExampleDataFactory(const FObjectInitializer&amp; ObjectInitializer) : Super(ObjectInitializer)
{
    Formats.Add(TEXT("xmp;Example Data"));
    SupportedClass = UExampleData::StaticClass();
    bCreateNew = false; // turned off for import
    bEditAfterNew = false; // turned off for import
    bEditorImport = true;
    bText = true;
}


UObject* UExampleDataFactory::FactoryCreateText(UClass* InClass, UObject* InParent, FName InName, EObjectFlags Flags, UObject* Context, const TCHAR* Type, const TCHAR*&amp; Buffer, const TCHAR* BufferEnd, FFeedbackContext* Warn)
{
    FEditorDelegates::OnAssetPreImport.Broadcast(this, InClass, InParent, InName, Type);

    // if class type or extension doesn't match, return
    if (InClass != UExampleData::StaticClass() ||
        FCString::Stricmp(Type, TEXT("xmp")) != 0)
        return nullptr;

    UExampleData* Data = CastChecked&lt;UExampleData&gt;(NewObject&lt;UExampleData&gt;(InParent, InName, Flags));
    MakeExampleDataFromText(Data, Buffer, BufferEnd);

    // save the source file path
    Data-&gt;SourceFilePath = UAssetImportData::SanitizeImportFilename(CurrentFilename, Data-&gt;GetOutermost());

    FEditorDelegates::OnAssetPostImport.Broadcast(this, Data);

    return Data;
}

bool UExampleDataFactory::FactoryCanImport(const FString&amp; Filename)
{
    return FPaths::GetExtension(Filename).Equals(TEXT("xmp"));
}

void UExampleDataFactory::MakeExampleDataFromText(class UExampleData* Data, const TCHAR*&amp; Buffer, const TCHAR* BufferEnd)
{
    Data-&gt;ExampleString = Buffer;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Note we changed <strong>bCreateNew</strong> and <strong>bEditAfterNew</strong> to false. We set "<strong>SourceFilePath</strong>" so we can do reimport later. If you want to import binary file, set <strong>bText = false</strong>, and override <strong>FactoryCreateBinary</strong> function instead.</p>
</div>
<div class="paragraph">
<p>Now you can drag &amp; drop a xmp file and have the content imported automatically.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/018.png" alt="018.png" width="789">
</div>
</div>
<div class="paragraph">
<p>If you want to have custom editor for the data, you can follow "Customize Details Panel" section to create custom widget. Or you can override <strong>OpenAssetEditor</strong> function in <strong>ExampleDataTypeActions</strong>, to create a complete different editor. We are not going to dive in here, search "<strong>OpenAssetEditor</strong>" in Unreal engine for examples.</p>
</div>
</div>
<div class="sect2">
<h3 id="_reimport">Reimport</h3>
<div class="paragraph">
<p>To reimport a file, we need to implement a different factory class. The implementation should be straight forward.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/019.png" alt="019.png" width="299">
</div>
</div>
<div class="listingblock">
<div class="title">ReimportExampleDataFactory.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "ExampleDataFactory.h"
#include "ReimportExampleDataFactory.generated.h"

UCLASS()
class UReimportExampleDataFactory : public UExampleDataFactory, public FReimportHandler
{
    GENERATED_BODY()

    // Begin FReimportHandler interface
    virtual bool CanReimport(UObject* Obj, TArray&lt;FString&gt;&amp; OutFilenames) override;
    virtual void SetReimportPaths(UObject* Obj, const TArray&lt;FString&gt;&amp; NewReimportPaths) override;
    virtual EReimportResult::Type Reimport(UObject* Obj) override;
    // End FReimportHandler interface
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ReimportExampleDataFactory.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "ReimportExampleDataFactory.h"
#include "ExampleDataFactory.h"
#include "CustomDataType/ExampleData.h"

bool UReimportExampleDataFactory::CanReimport(UObject* Obj, TArray&lt;FString&gt;&amp; OutFilenames)
{
    UExampleData* ExampleData = Cast&lt;UExampleData&gt;(Obj);
    if (ExampleData)
    {
        OutFilenames.Add(UAssetImportData::ResolveImportFilename(ExampleData-&gt;SourceFilePath, ExampleData-&gt;GetOutermost()));
        return true;
    }
    return false;
}

void UReimportExampleDataFactory::SetReimportPaths(UObject* Obj, const TArray&lt;FString&gt;&amp; NewReimportPaths)
{
    UExampleData* ExampleData = Cast&lt;UExampleData&gt;(Obj);
    if (ExampleData &amp;&amp; ensure(NewReimportPaths.Num() == 1))
    {
        ExampleData-&gt;SourceFilePath = UAssetImportData::SanitizeImportFilename(NewReimportPaths[0], ExampleData-&gt;GetOutermost());
    }
}

EReimportResult::Type UReimportExampleDataFactory::Reimport(UObject* Obj)
{
    UExampleData* ExampleData = Cast&lt;UExampleData&gt;(Obj);
    if (!ExampleData)
    {
        return EReimportResult::Failed;
    }

    const FString Filename = UAssetImportData::ResolveImportFilename(ExampleData-&gt;SourceFilePath, ExampleData-&gt;GetOutermost());
    if (!FPaths::GetExtension(Filename).Equals(TEXT("xmp")))
    {
        return EReimportResult::Failed;
    }

    CurrentFilename = Filename;
    FString Data;
    if (FFileHelper::LoadFileToString(Data, *CurrentFilename))
    {
        const TCHAR* Ptr = *Data;
        ExampleData-&gt;Modify();
        ExampleData-&gt;MarkPackageDirty();

        UExampleDataFactory::MakeExampleDataFromText(ExampleData, Ptr, Ptr + Data.Len());

        // save the source file path and timestamp
        ExampleData-&gt;SourceFilePath = UAssetImportData::SanitizeImportFilename(CurrentFilename, ExampleData-&gt;GetOutermost());
    }

    return EReimportResult::Succeeded;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>And just for fun, let&#8217;s add "<strong>Reimport</strong>" to right click menu on this asset. This is also an example for how to add more actions on specific asset type. Modify <strong>ExampleDataTypeActions</strong> class:</p>
</div>
<div class="listingblock">
<div class="title">ExampleDataTypeActions.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">class FExampleDataTypeActions : public FAssetTypeActions_Base
{
public:
    ...
    virtual bool HasActions(const TArray&lt;UObject*&gt;&amp; InObjects) const override { return true; }
    virtual void GetActions(const TArray&lt;UObject*&gt;&amp; InObjects, FMenuBuilder&amp; MenuBuilder) override;

    void ExecuteReimport(TArray&lt;TWeakObjectPtr&lt;UExampleData&gt;&gt; Objects);
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ExampleDataTypeActions.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">void FExampleDataTypeActions::GetActions(const TArray&lt;UObject*&gt;&amp; InObjects, FMenuBuilder&amp; MenuBuilder)
{
    auto ExampleDataImports = GetTypedWeakObjectPtrs&lt;UExampleData&gt;(InObjects);

    MenuBuilder.AddMenuEntry(
        FText::FromString("Reimport"),
        FText::FromString("Reimports example data."),
        FSlateIcon(),
        FUIAction(
            FExecuteAction::CreateSP(this, &amp;FExampleDataTypeActions::ExecuteReimport, ExampleDataImports),
            FCanExecuteAction()
        )
    );
}

void FExampleDataTypeActions::ExecuteReimport(TArray&lt;TWeakObjectPtr&lt;UExampleData&gt;&gt; Objects)
{
    for (auto ObjIt = Objects.CreateConstIterator(); ObjIt; ++ObjIt)
    {
        auto Object = (*ObjIt).Get();
        if (Object)
        {
            FReimportManager::Instance()-&gt;Reimport(Object, /*bAskForNewFileIfMissing=*/true);
        }
    }
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now you can reimport your custom files.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/020.png" alt="020.png" width="405">
</div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_custom_editor_mode">Custom Editor Mode</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Editor Mode is probably the most powerful tool framework in Unreal. You will get and react to all user input; you can render to viewport; you can monitor any change in the scene and get Undo/Redo events. Remember you can enter a mode and paint foliage over objects? You can do the same degree of stuff in custom editor mode. Editor Mode has dedicated section in UI layout, and you can customize the widget here as well.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/021.png" alt="021.png" width="480">
</div>
</div>
<div class="paragraph">
<p>Here as an example, we will create an editor mode to do a simple task. We have an actor "ExampleTargetPoint" inherit from "TargetPoint", with a list of locations. In this editor mode we want to visualize those points. You can create new points or delete points. You can also move points around as moving normal objects. Note this is not the best way for this functionality (you can use MakeEditWidget in UPROPERTY to do this easily), but rather as a way to demonstrate how to set it up and what you can potentially do.</p>
</div>
<div class="sect2">
<h3 id="_setup_editor_mode">Setup Editor Mode</h3>
<div class="paragraph">
<p>First we need to create an icon for our editor mode. We make an 40x40 PNG file as \Content\EditorResources\IconExampleEditorMode.png</p>
</div>
<div class="paragraph">
<p>Then add the following files in editor module:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/022.png" alt="022.png" width="251">
</div>
</div>
<div class="paragraph">
<p><strong>SExampleEdModeWidget</strong> is the widget we use in "Modes" panel. Here we will just create a simple one for now. We also include a commonly used util function to get EdMode object.</p>
</div>
<div class="listingblock">
<div class="title">SExampleEdModeWidget.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "SlateApplication.h"

class SExampleEdModeWidget : public SCompoundWidget
{
public:
    SLATE_BEGIN_ARGS(SExampleEdModeWidget) {}
    SLATE_END_ARGS();

    void Construct(const FArguments&amp; InArgs);

    // Util Functions
    class FExampleEdMode* GetEdMode() const;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">SExampleEdModeWidget.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "ExampleEdMode.h"
#include "SExampleEdModeWidget.h"

void SExampleEdModeWidget::Construct(const FArguments&amp; InArgs)
{
    ChildSlot
    [
        SNew(SScrollBox)
        + SScrollBox::Slot()
        .VAlign(VAlign_Top)
        .Padding(5.f)
        [
            SNew(STextBlock)
            .Text(FText::FromString(TEXT("This is a editor mode example.")))
        ]
    ];
}

FExampleEdMode* SExampleEdModeWidget::GetEdMode() const
{
    return (FExampleEdMode*)GLevelEditorModeTools().GetActiveMode(FExampleEdMode::EM_Example);
}</code></pre>
</div>
</div>
<div class="paragraph">
<p><strong>ExampleEdModeToolkit</strong> is a middle layer between EdMode and its widget:</p>
</div>
<div class="listingblock">
<div class="title">ExampleEdModeToolkit.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "BaseToolkit.h"
#include "ExampleEdMode.h"
#include "SExampleEdModeWidget.h"

class FExampleEdModeToolkit: public FModeToolkit
{
public:
    FExampleEdModeToolkit()
    {
        SAssignNew(ExampleEdModeWidget, SExampleEdModeWidget);
    }

    /** IToolkit interface */
    virtual FName GetToolkitFName() const override { return FName("ExampleEdMode"); }
    virtual FText GetBaseToolkitName() const override { return NSLOCTEXT("BuilderModeToolkit", "DisplayName", "Builder"); }
    virtual class FEdMode* GetEditorMode() const override { return GLevelEditorModeTools().GetActiveMode(FExampleEdMode::EM_Example); }
    virtual TSharedPtr&lt;class SWidget&gt; GetInlineContent() const override { return ExampleEdModeWidget; }

private:
    TSharedPtr&lt;SExampleEdModeWidget&gt; ExampleEdModeWidget;
};</code></pre>
</div>
</div>
<div class="paragraph">
<p>Then for the main class <strong>ExampleEdMode</strong>. Since we are only try to set it up, we will leave it mostly empty, only setting up its ID and create toolkit object. We will fill it in heavily in the next section.</p>
</div>
<div class="listingblock">
<div class="title">ExampleEdMode.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "EditorModes.h"

class FExampleEdMode : public FEdMode
{
public:
    const static FEditorModeID EM_Example;

    // FEdMode interface
    virtual void Enter() override;
    virtual void Exit() override;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ExampleEdMode.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "Editor/UnrealEd/Public/Toolkits/ToolkitManager.h"
#include "ScopedTransaction.h"
#include "ExampleEdModeToolkit.h"
#include "ExampleEdMode.h"

const FEditorModeID FExampleEdMode::EM_Example(TEXT("EM_Example"));

void FExampleEdMode::Enter()
{
    FEdMode::Enter();

    if (!Toolkit.IsValid())
    {
        Toolkit = MakeShareable(new FExampleEdModeToolkit);
        Toolkit-&gt;Init(Owner-&gt;GetToolkitHost());
    }
}

void FExampleEdMode::Exit()
{
    FToolkitManager::Get().CloseToolkit(Toolkit.ToSharedRef());
    Toolkit.Reset();

    FEdMode::Exit();
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>As other tools, we need a tool class to handle registration. Here we need to register both editor mode and its icon.</p>
</div>
<div class="listingblock">
<div class="title">ExampleEdModeTool.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "ToolExampleEditor/ExampleTabToolBase.h"

class ExampleEdModeTool : public FExampleTabToolBase
{
public:
    virtual void OnStartupModule() override;
    virtual void OnShutdownModule() override;

    virtual ~ExampleEdModeTool() {}
private:
    static TSharedPtr&lt; class FSlateStyleSet &gt; StyleSet;

    void RegisterStyleSet();
    void UnregisterStyleSet();

    void RegisterEditorMode();
    void UnregisterEditorMode();
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ExampleEdModeTool.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#include "ToolExampleEditor/ToolExampleEditor.h"
#include "ExampleEdModeTool.h"
#include "ExampleEdMode.h"

#define IMAGE_BRUSH(RelativePath, ...) FSlateImageBrush(StyleSet-&gt;RootToContentDir(RelativePath, TEXT(".png")), __VA_ARGS__)

TSharedPtr&lt; FSlateStyleSet &gt; ExampleEdModeTool::StyleSet = nullptr;

void ExampleEdModeTool::OnStartupModule()
{
    RegisterStyleSet();
    RegisterEditorMode();
}

void ExampleEdModeTool::OnShutdownModule()
{
    UnregisterStyleSet();
    UnregisterEditorMode();
}

void ExampleEdModeTool::RegisterStyleSet()
{
    // Const icon sizes
    const FVector2D Icon20x20(20.0f, 20.0f);
    const FVector2D Icon40x40(40.0f, 40.0f);

    // Only register once
    if (StyleSet.IsValid())
    {
        return;
    }

    StyleSet = MakeShareable(new FSlateStyleSet("ExampleEdModeToolStyle"));
    StyleSet-&gt;SetContentRoot(FPaths::GameDir() / TEXT("Content/EditorResources"));
    StyleSet-&gt;SetCoreContentRoot(FPaths::GameDir() / TEXT("Content/EditorResources"));

    // Spline editor
    {
        StyleSet-&gt;Set("ExampleEdMode", new IMAGE_BRUSH(TEXT("IconExampleEditorMode"), Icon40x40));
        StyleSet-&gt;Set("ExampleEdMode.Small", new IMAGE_BRUSH(TEXT("IconExampleEditorMode"), Icon20x20));
    }

    FSlateStyleRegistry::RegisterSlateStyle(*StyleSet.Get());
}

void ExampleEdModeTool::UnregisterStyleSet()
{
    if (StyleSet.IsValid())
    {
        FSlateStyleRegistry::UnRegisterSlateStyle(*StyleSet.Get());
        ensure(StyleSet.IsUnique());
        StyleSet.Reset();
    }
}

void ExampleEdModeTool::RegisterEditorMode()
{
    FEditorModeRegistry::Get().RegisterMode&lt;FExampleEdMode&gt;(
        FExampleEdMode::EM_Example,
        FText::FromString("Example Editor Mode"),
        FSlateIcon(StyleSet-&gt;GetStyleSetName(), "ExampleEdMode", "ExampleEdMode.Small"),
        true, 500
        );
}

void ExampleEdModeTool::UnregisterEditorMode()
{
    FEditorModeRegistry::Get().UnregisterMode(FExampleEdMode::EM_Example);
}

#undef IMAGE_BRUSH</code></pre>
</div>
</div>
<div class="paragraph">
<p>Finally as usual, we add the tool to editor module <strong>FToolExampleEditor::AddModuleListeners</strong>:</p>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">ModuleListeners.Add(MakeShareable(new ExampleEdModeTool));</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now you should see our custom editor mode show up in "Modes" panel.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/023.png" alt="023.png" width="457">
</div>
</div>
</div>
<div class="sect2">
<h3 id="_render_and_click">Render and Click</h3>
<div class="paragraph">
<p>With the basic framework ready, we can actually start implementing tool logic. First we make <strong>ExampleTargetPoint</strong> class in game module. This actor holds points data, and is what our tool will be operating on. Again remember to export the class with <strong>TOOLEXAMPLE_API</strong>.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/024.png" alt="024.png" width="222">
</div>
</div>
<div class="listingblock">
<div class="title">ExampleTargetPoint.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "Engine/Targetpoint.h"
#include "ExampleTargetPoint.generated.h"

UCLASS()
class TOOLEXAMPLE_API AExampleTargetPoint : public ATargetPoint
{
    GENERATED_BODY()

public:
    UPROPERTY(EditAnywhere, Category = "Points")
    TArray&lt;FVector&gt; Points;
};</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now we modify <strong>ExampleEdMode</strong> to add functions to add point, remove point, and select point. We also save our current selection in variable, here we use weak object pointer to handle the case if the actor is removed.</p>
</div>
<div class="paragraph">
<p>For adding point, we only allow that when you have exactly on <strong>ExampleTargetPoint</strong> actor selected in editor. For removing point, we simply remove the current selected point if there is any. If you select any point, we will deselect all actors and select the actor associated with that point.</p>
</div>
<div class="paragraph">
<p>Note that we put <strong>FScopedTransaction</strong>, and called <strong>Modify()</strong> function whenever we modify data we need to save. This will make sure undo/redo is properly handled.</p>
</div>
<div class="listingblock">
<div class="title">ExampleEdMode.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">...
class AExampleTargetPoint;

class FExampleEdMode : public FEdMode
{
public:
    ...
    void AddPoint();
    bool CanAddPoint() const;
    void RemovePoint();
    bool CanRemovePoint() const;
    bool HasValidSelection() const;
    void SelectPoint(AExampleTargetPoint* actor, int32 index);

    TWeakObjectPtr&lt;AExampleTargetPoint&gt; currentSelectedTarget;
    int32 currentSelectedIndex = -1;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ExampleEdMode.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">void FExampleEdMode::Enter()
{
    ...

    // reset
    currentSelectedTarget = nullptr;
    currentSelectedIndex = -1;
}

AExampleTargetPoint* GetSelectedTargetPointActor()
{
    TArray&lt;UObject*&gt; selectedObjects;
    GEditor-&gt;GetSelectedActors()-&gt;GetSelectedObjects(selectedObjects);
    if (selectedObjects.Num() == 1)
    {
        return Cast&lt;AExampleTargetPoint&gt;(selectedObjects[0]);
    }
    return nullptr;
}

void FExampleEdMode::AddPoint()
{
    AExampleTargetPoint* actor = GetSelectedTargetPointActor();
    if (actor)
    {
        const FScopedTransaction Transaction(FText::FromString("Add Point"));

        // add new point, slightly in front of camera
        FEditorViewportClient* client = (FEditorViewportClient*)GEditor-&gt;GetActiveViewport()-&gt;GetClient();
        FVector newPoint = client-&gt;GetViewLocation() + client-&gt;GetViewRotation().Vector() * 50.f;
        actor-&gt;Modify();
        actor-&gt;Points.Add(newPoint);
        // auto select this new point
        SelectPoint(actor, actor-&gt;Points.Num() - 1);
    }
}

bool FExampleEdMode::CanAddPoint() const
{
    return GetSelectedTargetPointActor() != nullptr;
}

void FExampleEdMode::RemovePoint()
{
    if (HasValidSelection())
    {
        const FScopedTransaction Transaction(FText::FromString("Remove Point"));

        currentSelectedTarget-&gt;Modify();
        currentSelectedTarget-&gt;Points.RemoveAt(currentSelectedIndex);
        // deselect the point
        SelectPoint(nullptr, -1);
    }
}

bool FExampleEdMode::CanRemovePoint() const
{
    return HasValidSelection();
}

bool FExampleEdMode::HasValidSelection() const
{
    return currentSelectedTarget.IsValid() &amp;&amp; currentSelectedIndex &gt;= 0 &amp;&amp; currentSelectedIndex &lt; currentSelectedTarget-&gt;Points.Num();
}

void FExampleEdMode::SelectPoint(AExampleTargetPoint* actor, int32 index)
{
    currentSelectedTarget = actor;
    currentSelectedIndex = index;

    // select this actor only
    if (currentSelectedTarget.IsValid())
    {
        GEditor-&gt;SelectNone(true, true);
        GEditor-&gt;SelectActor(currentSelectedTarget.Get(), true, true);
    }
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now we have functionality ready, we still need to hook it up with UI. Modify to <strong>SExampleEdModeWidget</strong>  add "Add" and "Remove" button, and we will check "CanAddPoint" and "CanRemovePoint" to determine if the button should be enabled.</p>
</div>
<div class="listingblock">
<div class="title">SExampleEdModeWidget.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">class SExampleEdModeWidget : public SCompoundWidget
{
public:
    ...
    FReply OnAddPoint();
    bool CanAddPoint() const;
    FReply OnRemovePoint();
    bool CanRemovePoint() const;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">SExampleEdModeWidget.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">void SExampleEdModeWidget::Construct(const FArguments&amp; InArgs)
{
    ChildSlot
    [
        SNew(SScrollBox)
        + SScrollBox::Slot()
        .VAlign(VAlign_Top)
        .Padding(5.f)
        [
            SNew(SVerticalBox)
            + SVerticalBox::Slot()
            .AutoHeight()
            .Padding(0.f, 5.f, 0.f, 0.f)
            [
                SNew(STextBlock)
                .Text(FText::FromString(TEXT("This is a editor mode example.")))
            ]
            + SVerticalBox::Slot()
            .AutoHeight()
            .Padding(0.f, 5.f, 0.f, 0.f)
            [
                SNew(SHorizontalBox)
                + SHorizontalBox::Slot()
                .AutoWidth()
                .Padding(2, 0, 0, 0)
                .VAlign(VAlign_Center)
                [
                    SNew(SButton)
                    .Text(FText::FromString("Add"))
                    .OnClicked(this, &amp;SExampleEdModeWidget::OnAddPoint)
                    .IsEnabled(this, &amp;SExampleEdModeWidget::CanAddPoint)
                ]
                + SHorizontalBox::Slot()
                .AutoWidth()
                .VAlign(VAlign_Center)
                .Padding(0, 0, 2, 0)
                [
                    SNew(SButton)
                    .Text(FText::FromString("Remove"))
                    .OnClicked(this, &amp;SExampleEdModeWidget::OnRemovePoint)
                    .IsEnabled(this, &amp;SExampleEdModeWidget::CanRemovePoint)
                ]
            ]
        ]
    ];
}

FReply SExampleEdModeWidget::OnAddPoint()
{
    GetEdMode()-&gt;AddPoint();
    return FReply::Handled();
}

bool SExampleEdModeWidget::CanAddPoint() const
{
    return GetEdMode()-&gt;CanAddPoint();
}

FReply SExampleEdModeWidget::OnRemovePoint()
{
    GetEdMode()-&gt;RemovePoint();
    return FReply::Handled();
}

bool SExampleEdModeWidget::CanRemovePoint() const
{
    return GetEdMode()-&gt;CanRemovePoint();
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now if you launch the editor, you should be able to drag in an "Example Target Point", switch to our editor mode, select that target point and add new points from the editor mode UI. However it is not visualized in the viewport yet, and you cannot click and select point. We will work on that next.</p>
</div>
<div class="paragraph">
<p>To be able to click in editor and select something, we need to define a HitProxy struct. When we render the points, we render with this hit proxy along with some data attached to it. Then when we get the click event, we can retrieve those data back from the proxy and know what we clicked on.</p>
</div>
<div class="paragraph">
<p>Back to <strong>ExampleEdMode</strong>, we define <strong>HExamplePointProxy</strong> with a reference object (the ExampleTargetPoint actor) and the point index, and we add <strong>Render</strong> and <strong>HandleClick</strong> override function.</p>
</div>
<div class="listingblock">
<div class="title">ExampleEdMode.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">struct HExamplePointProxy : public HHitProxy
{
    DECLARE_HIT_PROXY();

    HExamplePointProxy(UObject* InRefObject, int32 InIndex)
        : HHitProxy(HPP_UI), RefObject(InRefObject), Index(InIndex)
    {}

    UObject* RefObject;
    int32 Index;
};

class FExampleEdMode : public FEdMode
{
public:
    ...
    virtual void Render(const FSceneView* View, FViewport* Viewport, FPrimitiveDrawInterface* PDI) override;
    virtual bool HandleClick(FEditorViewportClient* InViewportClient, HHitProxy *HitProxy, const FViewportClick &amp;Click) override;
};</code></pre>
</div>
</div>
<div class="paragraph">
<p>Then in cpp file, we use macro <strong>IMPLEMENT_HIT_PROXY</strong> to implement the proxy. In <strong>Render</strong> we simply loops through all <strong>ExampleTargetPoint</strong> actor and draw all the points (and a line to the actor itself), we choose a different color if this is the current selected point. We set hit proxy for each point before drawing and clears it immediately afterwards (this is important so the proxy doesn&#8217;t leak through to other draws). In <strong>HandleClick</strong>, we test hit proxy and select point if we have a valid hit. We don&#8217;t check mouse button here, so you can select with left/right/middle click.</p>
</div>
<div class="listingblock">
<div class="title">ExampleEdMode.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">IMPLEMENT_HIT_PROXY(HExamplePointProxy, HHitProxy);
...

void FExampleEdMode::Render(const FSceneView* View, FViewport* Viewport, FPrimitiveDrawInterface* PDI)
{
    const FColor normalColor(200, 200, 200);
    const FColor selectedColor(255, 128, 0);

    UWorld* World = GetWorld();
    for (TActorIterator&lt;AExampleTargetPoint&gt; It(World); It; ++It)
    {
        AExampleTargetPoint* actor = (*It);
        if (actor)
        {
            FVector actorLoc = actor-&gt;GetActorLocation();
            for (int i = 0; i &lt; actor-&gt;Points.Num(); ++i)
            {
                bool bSelected = (actor == currentSelectedTarget &amp;&amp; i == currentSelectedIndex);
                const FColor&amp; color = bSelected ? selectedColor : normalColor;
                // set hit proxy and draw
                PDI-&gt;SetHitProxy(new HExamplePointProxy(actor, i));
                PDI-&gt;DrawPoint(actor-&gt;Points[i], color, 15.f, SDPG_Foreground);
                PDI-&gt;DrawLine(actor-&gt;Points[i], actorLoc, color, SDPG_Foreground);
                PDI-&gt;SetHitProxy(NULL);
            }
        }
    }

    FEdMode::Render(View, Viewport, PDI);
}

bool FExampleEdMode::HandleClick(FEditorViewportClient* InViewportClient, HHitProxy *HitProxy, const FViewportClick &amp;Click)
{
    bool isHandled = false;

    if (HitProxy)
    {
        if (HitProxy-&gt;IsA(HExamplePointProxy::StaticGetType()))
        {
            isHandled = true;
            HExamplePointProxy* examplePointProxy = (HExamplePointProxy*)HitProxy;
            AExampleTargetPoint* actor = Cast&lt;AExampleTargetPoint&gt;(examplePointProxy-&gt;RefObject);
            int32 index = examplePointProxy-&gt;Index;
            if (actor &amp;&amp; index &gt;= 0 &amp;&amp; index &lt; actor-&gt;Points.Num())
            {
                SelectPoint(actor, index);
            }
        }
    }

    return isHandled;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>With all of these you can start adding/removing points in the editor:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/025.png" alt="025.png" width="1325">
</div>
</div>
</div>
<div class="sect2">
<h3 id="_use_transform_widget">Use Transform Widget</h3>
<div class="paragraph">
<p>The next mission is to be able to move point around in editor like moving any other actor. Go back to <strong>ExampleEdMode</strong>, this time we need to add support for custom transform widget, and handle <strong>InputDelta</strong> event. In <strong>InputDelta</strong> function, we don&#8217;t use <strong>FScopedTransaction</strong> because undo/redo is already handled for this function. We still need to call <strong>Modify()</strong> though.</p>
</div>
<div class="listingblock">
<div class="title">ExampleEdMode.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">...
class FExampleEdMode : public FEdMode
{
public:
    ...
    virtual bool InputDelta(FEditorViewportClient* InViewportClient, FViewport* InViewport, FVector&amp; InDrag, FRotator&amp; InRot, FVector&amp; InScale) override;
    virtual bool ShowModeWidgets() const override;
    virtual bool ShouldDrawWidget() const override;
    virtual bool UsesTransformWidget() const override;
    virtual FVector GetWidgetLocation() const override;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ExampleEdMode.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">bool FExampleEdMode::InputDelta(FEditorViewportClient* InViewportClient, FViewport* InViewport, FVector&amp; InDrag, FRotator&amp; InRot, FVector&amp; InScale)
{
    if (InViewportClient-&gt;GetCurrentWidgetAxis() == EAxisList::None)
    {
        return false;
    }

    if (HasValidSelection())
    {
        if (!InDrag.IsZero())
        {
            currentSelectedTarget-&gt;Modify();
            currentSelectedTarget-&gt;Points[currentSelectedIndex] += InDrag;
        }
        return true;
    }

    return false;
}

bool FExampleEdMode::ShowModeWidgets() const
{
    return true;
}

bool FExampleEdMode::ShouldDrawWidget() const
{
    return true;
}

bool FExampleEdMode::UsesTransformWidget() const
{
    return true;
}

FVector FExampleEdMode::GetWidgetLocation() const
{
    if (HasValidSelection())
    {
        return currentSelectedTarget-&gt;Points[currentSelectedIndex];
    }
    return FEdMode::GetWidgetLocation();
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now you should have a transform widget to move your points around:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/026.png" alt="026.png" width="475">
</div>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">virtual bool GetCustomDrawingCoordinateSystem(FMatrix&amp; InMatrix, void* InData) override;
virtual bool GetCustomInputCoordinateSystem(FMatrix&amp; InMatrix, void* InData) override;</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_key_input_support_right_click_menu_and_others">Key input support, right click menu, and others</h3>
<div class="paragraph">
<p>Next we will add some other common features: when we have a point selected, we want to hit delete button and remove it. Also we want to have a menu generated when you right click on a point, showing the point index, and an option to delete it.</p>
</div>
<div class="paragraph">
<p>Remember in the "Menu Tool" tutorial, in order to make a menu, we would need a UI command list, here we will do the same thing. We also override <strong>InputKey</strong> function to handle input. Though we can simply call functions based on which key is pressed, since we have the same functionality in the menu, we will route the input through the UI command list instead. (when we define UI Commands, we pass in a key in <strong>FInputGesture</strong>)</p>
</div>
<div class="paragraph">
<p>Finally we will modify <strong>HandleClick</strong> function to generate context menu when we right click on a point.</p>
</div>
<div class="listingblock">
<div class="title">ExampleEdMode.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">...
class FExampleEdMode : public FEdMode
{
public:
    ...
    FExampleEdMode();
    ~FExampleEdMode();

    virtual bool HandleClick(FEditorViewportClient* InViewportClient, HHitProxy *HitProxy, const FViewportClick &amp;Click) override;

    TSharedPtr&lt;FUICommandList&gt; ExampleEdModeActions;
    void MapCommands();
    TSharedPtr&lt;SWidget&gt; GenerateContextMenu(FEditorViewportClient* InViewportClient) const;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ExampleEdMode.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">class ExampleEditorCommands : public TCommands&lt;ExampleEditorCommands&gt;
{
public:
    ExampleEditorCommands() : TCommands &lt;ExampleEditorCommands&gt;
        (
            "ExampleEditor",    // Context name for fast lookup
            FText::FromString(TEXT("Example Editor")),  // context name for displaying
            NAME_None,  // Parent
            FEditorStyle::GetStyleSetName()
            )
    {
    }

#define LOCTEXT_NAMESPACE ""
    virtual void RegisterCommands() override
    {
        UI_COMMAND(DeletePoint, "Delete Point", "Delete the currently selected point.", EUserInterfaceActionType::Button, FInputGesture(EKeys::Delete));
    }
#undef LOCTEXT_NAMESPACE

public:
    TSharedPtr&lt;FUICommandInfo&gt; DeletePoint;
};


FExampleEdMode::FExampleEdMode()
{
    ExampleEditorCommands::Register();
    ExampleEdModeActions = MakeShareable(new FUICommandList);
}

FExampleEdMode::~FExampleEdMode()
{
    ExampleEditorCommands::Unregister();
}

void FExampleEdMode::MapCommands()
{
    const auto&amp; Commands = ExampleEditorCommands::Get();

    ExampleEdModeActions-&gt;MapAction(
        Commands.DeletePoint,
        FExecuteAction::CreateSP(this, &amp;FExampleEdMode::RemovePoint),
        FCanExecuteAction::CreateSP(this, &amp;FExampleEdMode::CanRemovePoint));
}

bool FExampleEdMode::InputKey(FEditorViewportClient* ViewportClient, FViewport* Viewport, FKey Key, EInputEvent Event)
{
    bool isHandled = false;

    if (!isHandled &amp;&amp; Event == IE_Pressed)
    {
        isHandled = ExampleEdModeActions-&gt;ProcessCommandBindings(Key, FSlateApplication::Get().GetModifierKeys(), false);
    }

    return isHandled;
}

TSharedPtr&lt;SWidget&gt; FExampleEdMode::GenerateContextMenu(FEditorViewportClient* InViewportClient) const
{
    FMenuBuilder MenuBuilder(true, NULL);

    MenuBuilder.PushCommandList(ExampleEdModeActions.ToSharedRef());
    MenuBuilder.BeginSection("Example Section");
    if (HasValidSelection())
    {
        // add label for point index
        TSharedRef&lt;SWidget&gt; LabelWidget =
            SNew(STextBlock)
            .Text(FText::FromString(FString::FromInt(currentSelectedIndex)))
            .ColorAndOpacity(FLinearColor::Green);
        MenuBuilder.AddWidget(LabelWidget, FText::FromString(TEXT("Point Index: ")));
        MenuBuilder.AddMenuSeparator();
        // add delete point entry
        MenuBuilder.AddMenuEntry(ExampleEditorCommands::Get().DeletePoint);
    }
    MenuBuilder.EndSection();
    MenuBuilder.PopCommandList();

    TSharedPtr&lt;SWidget&gt; MenuWidget = MenuBuilder.MakeWidget();
    return MenuWidget;
}


bool FExampleEdMode::HandleClick(FEditorViewportClient* InViewportClient, HHitProxy *HitProxy, const FViewportClick &amp;Click)
{
    ...

    if (HitProxy &amp;&amp; isHandled &amp;&amp; Click.GetKey() == EKeys::RightMouseButton)
    {
        TSharedPtr&lt;SWidget&gt; MenuWidget = GenerateContextMenu(InViewportClient);
        if (MenuWidget.IsValid())
        {
            FSlateApplication::Get().PushMenu(
                Owner-&gt;GetToolkitHost()-&gt;GetParentWidget(),
                FWidgetPath(),
                MenuWidget.ToSharedRef(),
                FSlateApplication::Get().GetCursorPos(),
                FPopupTransitionEffect(FPopupTransitionEffect::ContextMenu));
        }
    }

    return isHandled;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>The following is the result:</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/027.png" alt="027.png" width="463">
</div>
</div>
<div class="paragraph">
<p>There are other virtual functions from FEdMode that can be very helpful. I&#8217;ll list some of them here:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">    virtual void Tick(FEditorViewportClient* ViewportClient, float DeltaTime) override;
    virtual bool CapturedMouseMove(FEditorViewportClient* InViewportClient, FViewport* InViewport, int32 InMouseX, int32 InMouseY) override;
    virtual bool StartTracking(FEditorViewportClient* InViewportClient, FViewport* InViewport) override;
    virtual bool EndTracking(FEditorViewportClient* InViewportClient, FViewport* InViewport) override;
    virtual bool HandleClick(FEditorViewportClient* InViewportClient, HHitProxy *HitProxy, const FViewportClick &amp;Click) override;
    virtual void PostUndo() override;
    virtual void ActorsDuplicatedNotify(TArray&lt;AActor*&gt;&amp; PreDuplicateSelection, TArray&lt;AActor*&gt;&amp; PostDuplicateSelection, bool bOffsetLocations) override;
    virtual void ActorMoveNotify() override;
    virtual void ActorSelectionChangeNotify() override;
    virtual void MapChangeNotify() override;
    virtual void SelectionChanged() override;</code></pre>
</div>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_custom_project_settings">Custom Project Settings</h2>
<div class="sectionbody">
<div class="paragraph">
<p>Remember you can you go to Edit → Project Settings in Unreal editor to change various game/editor settings? You can add your custom settings to this window as well.</p>
</div>
<div class="paragraph">
<p>First we create a settings object. In this example we will create it in editor module, you can create in game module as well, just remember to export it with proper macro.
In the UCLASS macro, we need specify which .ini file to write to. You can use existing .ini file like "Game" or "Editor". In this case we want this setting to be per user and not shared on source control, so we create a new ini file.
For each UPROPERTY that you want to include in the settings, mark it with "<strong>config</strong>".</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/028.png" alt="028.png" width="213">
</div>
</div>
<div class="listingblock">
<div class="title">ExampleSettings.h</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#pragma once
#include "ExampleSettings.generated.h"

UCLASS(config = EditorUserSettings, defaultconfig)
class UExampleSettings : public UObject
{
    GENERATED_BODY()

    UPROPERTY(EditAnywhere, config, Category = Test)
    bool bTest = false;
};</code></pre>
</div>
</div>
<div class="listingblock">
<div class="title">ToolExampleEditor.cpp</div>
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">...
#include "ISettingsModule.h"
#include "Developer/Settings/Public/ISettingsContainer.h"
#include "CustomProjectSettings/ExampleSettings.h"

void FToolExampleEditor::StartupModule()
{
    ...
    // register settings:
    {
        ISettingsModule* SettingsModule = FModuleManager::GetModulePtr&lt;ISettingsModule&gt;("Settings");
        if (SettingsModule)
        {
            TSharedPtr&lt;ISettingsContainer&gt; ProjectSettingsContainer = SettingsModule-&gt;GetContainer("Project");
            ProjectSettingsContainer-&gt;DescribeCategory("ExampleCategory", FText::FromString("Example Category"), FText::FromString("Example settings description text here"));

            SettingsModule-&gt;RegisterSettings("Project", "ExampleCategory", "ExampleSettings",
                FText::FromString("Example Settings"),
                FText::FromString("Configure Example Settings"),
                GetMutableDefault&lt;UExampleSettings&gt;()
            );
        }
    }

    IExampleModuleInterface::StartupModule();
}

void FToolExampleEditor::ShutdownModule()
{
    ...
    // unregister settings
    ISettingsModule* SettingsModule = FModuleManager::GetModulePtr&lt;ISettingsModule&gt;("Settings");
    if (SettingsModule)
    {
        SettingsModule-&gt;UnregisterSettings("Project", "ExampleCategory", "ExampleSettings");
    }

    IExampleModuleInterface::ShutdownModule();
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Now you should see your custom settings in "Project Settings" window. And when you change it, you should see DefaultEditorUserSettings.ini created in \ToolExample\Config</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/029.png" alt="029.png" width="1060">
</div>
</div>
<div class="paragraph">
<p>To get access to this settings, do the following:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">const UExampleSettings* ExampleSettings = GetDefault&lt;UExampleSettings&gt;();
if(ExampleSettings &amp;&amp; ExampleSettings-&gt;bTest)
    // do something</code></pre>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_tricks">Tricks</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_use_widget_reflector">Use Widget Reflector</h3>
<div class="paragraph">
<p>The best way to learn SLATE and Unreal tools, is to use Widget Reflector. In Window → Developer Tool → Widget Reflector to launch the reflector. Click on "Pick Live Widget" and mouse over the widget you want to see, then hit "ESC" to freeze.</p>
</div>
<div class="paragraph">
<p>For example we can mouse over our editor mode widget, and you can see the structure showing in the reflector window. You can click on the file and it will take you to the exact place that widget is constructed. This is powerful tool to debug your widget or to learn how Unreal build their widget.</p>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/030.png" alt="030.png" width="1466">
</div>
</div>
</div>
<div class="sect2">
<h3 id="_is_my_tool_running_in_the_editor_or_game">Is my tool running in the editor or game?</h3>
<div class="paragraph">
<p>There 3 conditions that your tool is running:</p>
</div>
<div class="olist arabic">
<ol class="arabic">
<li>
<p>Editor: game not started, you can do all normal editing.</p>
</li>
<li>
<p>Game: game started, cannot do any editing.</p>
</li>
<li>
<p>Simulate: either hit “Simulate” or hit “Play” then “Eject”, game started and you can do limited editing.
Here is how you can determine which state you are in:</p>
</li>
</ol>
</div>
<table class="tableblock frame-all grid-all spread">
<colgroup>
<col style="width: 25%;">
<col style="width: 25%;">
<col style="width: 25%;">
<col style="width: 25%;">
</colgroup>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Editor</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Game</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Simulate</strong></p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>FApp::IsGame()</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">false</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">true</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">true</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Cast&lt;UEditorEngine&gt;(GEngine)&#8594;bIsSimulatingInEditor</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">false</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">false</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">true</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>Note: this do NOT work in SLATE call (any UI tick for example), because that is in SLATE world.</p>
</div>
</div>
<div class="sect2">
<h3 id="_useful_uproperty_meta_marker">Useful UPROPERTY() meta marker</h3>
<div class="ulist">
<ul>
<li>
<p><strong>MakeEditWidget</strong>: If you just need to visualize a point in the level and be able to drag it around, this is the quick way to do it. It works for FVector or FTransform, and it works with TArray of those as well.<br>
example: UPROPERTY(meta = (MakeEditWidget = true))</p>
</li>
<li>
<p><strong>DisplayName, ToolTip</strong>: Useful if you want to have a different display name than the variable name; or if you want add a mouse over tooltip. There are plenty of examples in Unreal code base.</p>
</li>
<li>
<p><strong>ClampMin, ClampMax, UIMin, UIMax</strong>: You can specify a range for the value that can be input for this field.<br>
example: UPROPERTY(meta = (ClampMin = "0", ClampMax = "180"))</p>
</li>
<li>
<p><strong>EditCondition</strong>: You can specify a bool to determine whether this field is editable.<br>
example: UPROPERTY(meta = (EditCondition = "bIsThisFieldEnabled")))</p>
</li>
</ul>
</div>
<div class="paragraph">
<p>For a complete list, search for <strong>ObjectMacros.h</strong> in Unreal code base.</p>
</div>
</div>
<div class="sect2">
<h3 id="_make_custom_animation_blueprint_node">Make custom Animation Blueprint Node</h3>
<div class="paragraph">
<p>To make a custom Animation Blueprint Node, you need to first inherit from <strong>FAnimNode_Base</strong> class in game module, this class will process animation pose at runtime.</p>
</div>
<div class="paragraph">
<p>Then in the editor module, inherit from <strong>UAnimGraphNode_Base</strong> class, and define how you want this node to be in editor.</p>
</div>
</div>
<div class="sect2">
<h3 id="_debug_draw_tricks">Debug Draw Tricks</h3>
<div class="ulist">
<ul>
<li>
<p>Easy way to draw circle/box/sphere<br>
FPrimitiveDrawInterface only provides basic draw methods (DrawSprite, DrawPoint, DrawLine, DrawMesh). However Unreal already has a collection of “advanced” draw methods for their own use. Defined in “PrimitiveDrawingUtils.cpp” and declared in “SceneManagement.h”. Check out “PrimitiveDrawingUtils.cpp” for details. Necessary files should already be included, so just call “DrawCircle” or “DrawBox”.</p>
</li>
<li>
<p>Draw point with world space size<br>
The default <strong>FPrimitiveDrawInterface::DrawPoint</strong> function will only draw point with screen space size, but sometimes you want to give it a world space size, here’s how you can do it:</p>
</li>
</ul>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">void DrawPointWS (
    FPrimitiveDrawInterface* PDI,
    const FVector&amp; Position,
    const FLinearColor&amp; Color,
    float PointSize,
    uint8 DepthPriorityGroup,
    bool bScreenSpaceSize
)
{
    float ScaledPointSize = PointSize;
    if (!bScreenSpaceSize)
    {
        FVector PositionVS = PDI-&gt;View-&gt;ViewMatrices.GetViewMatrix().TransformPosition(Position);
        float factor = FMath::Max(FMath::Abs(PositionVS.Z), 0.001f);
        ScaledPointSize /= factor;
        ScaledPointSize *= PDI-&gt;View-&gt;ViewRect.Width();
    }
    PDI-&gt;DrawPoint(Position, Color, ScaledPointSize, DepthPriorityGroup);
}</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_other_tricks_for_editor_mode">Other Tricks for Editor Mode</h3>
<div class="ulist">
<ul>
<li>
<p>It is quite common you need a viewport client to do something, and not all functions has viewport client passed in. Here is the call you can get that from anywhere:</p>
</li>
</ul>
</div>
<div class="listingblock">
<div class="content">
<pre>FEditorViewportClient* client = (FEditorViewportClient*)GEditor-&gt;GetActiveViewport()-&gt;GetClient();</pre>
</div>
</div>
<div class="ulist">
<ul>
<li>
<p>It is also quite common you want to refresh rendering for the whole viewport after the user did some edit in your tool. Use the following call:</p>
</li>
</ul>
</div>
<div class="listingblock">
<div class="content">
<pre>GEditor-&gt;RedrawAllViewports(true);</pre>
</div>
</div>
<div class="ulist">
<ul>
<li>
<p>If the Editor Mode is not responding, or lagging behind, make sure you have "Realtime" checked in viewport.</p>
</li>
</ul>
</div>
<div class="imageblock" style="text-align: left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/ue4tools/031.png" alt="031.png" width="487">
</div>
</div>
</div>
</div>
</div>]]></description><link>https://lxjk.github.io/2019/10/01/How-to-Make-Tools-in-U-E.html</link><guid isPermaLink="true">https://lxjk.github.io/2019/10/01/How-to-Make-Tools-in-U-E.html</guid><category><![CDATA[UE4]]></category><category><![CDATA[Unreal]]></category><category><![CDATA[Tools]]></category><dc:creator><![CDATA[Eric Zhang]]></dc:creator><pubDate>Tue, 01 Oct 2019 00:00:00 GMT</pubDate></item><item><title><![CDATA[Improve Tile-based Light Culling with Spherical-sliced Cone]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div id="toc" class="toc">
<div id="toctitle" class="title">Table of Contents</div>
<ul class="sectlevel2">
<li><a href="#_the_problem_of_sphere_frustum_test">The Problem of Sphere-Frustum Test</a></li>
<li><a href="#_cone_test">Cone Test</a></li>
<li><a href="#_spherical_sliced_cone_test">Spherical-sliced Cone Test</a></li>
<li><a href="#_extend_to_clustered_light_culling">Extend to Clustered Light Culling</a></li>
</ul>
</div>
<div class="paragraph">
<p>Tile-based method is used in both deferred and forward rendering. Since light calculation is expensive, one of the main goal for improving tile-based method is to provide more accurate and efficient light culling. This article will present a new method for light culling using spherical-sliced cone, which largely reduces false positives introduced by traditional sphere-frustum test. Furthermore this method can be naturally extended to clustered light culling.</p>
</div>
<div class="paragraph">
<p>I would assume you have a basic idea of how tile-based deferred or forward rendering works, and in this article I will only be talking about light culling phase. Example shader code in this article is in GLSL.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_the_problem_of_sphere_frustum_test">The Problem of Sphere-Frustum Test</h3>
<div class="paragraph">
<p>Let’s do a quick overview of how tile-based light culling normally works. We first divide the pixels into tiles (usually16x16), and run a compute shader to calculate the min and max depth of each tile by sampling depth buffer. After this step, each tile can be viewed as a little frustum. Figure 1 shows a top view of tiles and frustums under this setup. On the CPU side we build a list of bounding spheres of visible lights, and send it over to GPU. Then we run another compute shader to do sphere-frustum test for every light against each tile. If a light passed the test, its index gets added to a list for this tile for shading in a later stage.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/sphericalslicedcone/fig1.png" alt="fig1.png" width="600">
</div>
<div class="title">Figure 1 top view of tile based shading.</div>
</div>
<div class="paragraph">
<p>A sphere-frustum test is basically calculating signed distance from sphere center to all 6 planes of the frustum. If any of the signed distance is larger than sphere radius, which means the sphere is completely outside one of the planes, it fails the test; otherwise it passes. In practice, since we know the near/far plane of the tile frustum is parallel to view near/far plane, we can simply compare light depth with tile min/max depth, and only calculate signed distance for 4 side planes. The shader code looks like this:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-glsl" data-lang="glsl">if (lightDepth - lightRadius &lt;= tileMaxDepth &amp;&amp;
    lightDepth + lightRadius &gt;= tileMinDepth)
{
    for (int i = 0; i &lt; 4; ++i)
    {
        // test 4 side planes
    }
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Sphere-frustum test will introduce false positives, where the sphere will pass the test even though it does not intersect with the frustum. As in Figure 2, the green zone is the actual intersection area, while the red zone is the area that will also pass the test, which are the false positives. As you can see from the graph, the false positive area grows if the sphere radius gets larger or the frustum gets smaller. That’s why while doing sphere-frustum test for view frustum culling is acceptable, doing it for tile-based light culling is not; the light radius are usually huge compared to our tile frustum. The cost of false positive here is also quite high (light calculation in later stage), so it is a problem we need to deal with.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/sphericalslicedcone/fig2.png" alt="fig2.png" width="800">
</div>
<div class="title">Figure 2 (a) and (b) top view of sphere-frustum test.</div>
</div>
</div>
<div class="sect2">
<h3 id="_cone_test">Cone Test</h3>
<div class="paragraph">
<p>To reduce the false positives, we will tackle this problem in two steps. Step 1 we will focus on improving tests on 4 side planes of a frustum; and we will improve the test for near/far plane as step 2 in the next section.</p>
</div>
<div class="paragraph">
<p>Sphere-frustum test performs better when frustum is big and sphere is small, cone test is completely the opposite. It will perform better when frustum is small and sphere is big, which fits perfectly for this situation. To do cone culling, you make a cone from the camera origin that contains the whole tile frustum, and for each light we make a cone that contains the bounding sphere of the light; then we simply test if the cone overlaps. Again we will use the same near/far plane test for now, and we will improve that later. We are not going to send more data to shader, cones are easy to calculate on the fly.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/sphericalslicedcone/fig3.png" alt="fig3.png" width="600">
</div>
<div class="title">Figure 3 front view of sphere-frustum test and cone test.</div>
</div>
<div class="paragraph">
<p>Figure 3 shows the front view of sphere-frustum test and cone test. The green zone is the actual intersection area; the red zone is the false positive area for sphere-frustum test; the blue zone is the false positive area for cone test. You can get a sense of how false positives for cone test will decrease when we increase the light radius.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/sphericalslicedcone/fig4.png" alt="fig4.png" width="1000">
</div>
<div class="title">Figure 4 top view of cone test.</div>
</div>
<div class="paragraph">
<p>Let’s look at an example in Figure 4. Firstly we need to make a cone for the tile (marked in green). The tile cone center vector can simply be the average of 4 side vectors that makes the tile frustum, and the half angle would be the maximum angle between center vector and 4 side vectors. We don’t really want to calculate angle, we calculate sine and cosine instead:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-glsl" data-lang="glsl">vec3 tileCenterVec = normalize(sides[0] + sides[1] + sides[2] + sides[3]);
float tileCos = min(min(min(dot(tileCenterVec, sides[0]), dot(tileCenterVec, sides[1])), dot(tileCenterVec, sides[2])), dot(tileCenterVec, sides[3]));
float tileSin = sqrt(1 - tileCos * tileCos);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Note the half angle of a cone cannot go beyond 90 degree, so both sine and cosine are always positive.</p>
</div>
<div class="paragraph">
<p>For each light, we need to make a cone for the bounding sphere. If we transform light’s bounding sphere into view space, the center vector of the cone is the vector to light position. We can get sine of the half angle by dividing light radius by light distance to camera (origin).</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-glsl" data-lang="glsl">// get lightPos and lightRadius in view space
float lightDistSqr = dot(lightPos, lightPos);
float lightDist = sqrt(lightDistSqr);
vec3 lightCenterVec = lightPos / lightDist;
float lightSin = clamp(lightRadius / lightDist, 0.0, 1.0);
float lightCos = sqrt(1 - lightSin * lightSin);</code></pre>
</div>
</div>
<div class="paragraph">
<p>Here we put clamp on sine to take care of the case when camera is inside a light. In this case the light will intersect all tiles for cone test (but can still fail near/far plane test), which we will handle specifically in the next step.
Now we have both cones, we just need to compare the angle between two cone center vector and the sum of both cone half angles. Here we will use trigonometric formula: \(\cos{(A+B)} = \cos{A}\cos{B} - \sin{A}\sin{B}\).</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-glsl" data-lang="glsl">float lightTileCos = dot(lightCenterVec, tileCenterVec);
float lightTileSin = sqrt(1 - lightTileCos * lightTileCos);
// sum angle = light cone half angle + tile cone half angle
float sumCos = (lightRadius &gt; lightDist) ? -1.0 : (tileCos * lightCos - tileSin * lightSin);

if (lightTileCos &gt;= sumCos &amp;&amp;
    lightDepth - lightRadius &lt;= tileMaxDepth &amp;&amp;
    lightDepth + lightRadius &gt;= tileMinDepth)
{
    // light intersect this tile
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>If the camera is inside a light, we set cosine of sum angle to be -1, so it will always pass the cone test. For near/far plane we do the same depth check as sphere-frustum test.</p>
</div>
<div class="paragraph">
<p>How are we doing with cone test? First let’s test in a single light situation. The results shows in Figure 5, in (b) and (c) the tiles are tinted red if it passes light culling. The sphere-frustum test will get a big square like result, which matches the false positive area we discussed above. And the cone test gives something closer to our goal.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/sphericalslicedcone/fig5.png" alt="fig5.png" width="900">
</div>
<div class="title">Figure 5 (a) normal rendering; (b) tiles passed sphere-frustum test; (c) tiles passed cone test.</div>
</div>
<div class="paragraph">
<p>Next we test performance. We put in 1024 random lights in Crytek Sponza scene, rendered in 1280x720 with NVidia GeForce GTX 760M. And here is the result we got:</p>
</div>
<table class="tableblock frame-all grid-all" style="width: 80%;">
<colgroup>
<col style="width: 43.75%;">
<col style="width: 18.75%;">
<col style="width: 18.75%;">
<col style="width: 18.75%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"></th>
<th class="tableblock halign-left valign-top">Lighting Time</th>
<th class="tableblock halign-left valign-top">Step Improvement</th>
<th class="tableblock halign-left valign-top">Accumulated Improvement</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Sphere-Frustum Test</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">5.55 ms</p></td>
<td class="tableblock halign-left valign-top"></td>
<td class="tableblock halign-left valign-top"></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Cone Test</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">5.30 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">4.50%</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">4.50%</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>We got better result, but not super exciting. Remember we have not yet changed the near/far plane test, and we are going to tackle it next.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/sphericalslicedcone/fig6.png" alt="fig6.png" width="800">
</div>
<div class="title">Figure 6. 1024 random lights in Crytek Sponza scene.</div>
</div>
</div>
<div class="sect2">
<h3 id="_spherical_sliced_cone_test">Spherical-sliced Cone Test</h3>
<div class="paragraph">
<p>To illustrate the problem of near/far plane test, Figure 7 (a) shows a good example. The light on the left will pass the cone test and near/far plane test, but apparently it does not intersect the tile (marked in green).</p>
</div>
<div class="paragraph">
<p>The good news is with cone setup, we can refine light range per tile. However we do need to change the value we are comparing to, instead of using tile min/max depth, we will need tile min/max distance to camera. This also means in the previous compute shader, we need to calculate min/max distance to camera per pixel instead. The reason for this change is that to calculate min/max distance to camera for a light within a tile is much easier than calculating min/max depth. This change also gives the name of “Spherical-sliced Cone”, since visually we are slicing each cone with two spheres, which has min/max distance to camera as their radii.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/sphericalslicedcone/fig7.png" alt="fig7.png" width="1200">
</div>
<div class="title">Figure 7 (a) false positive example of near/far plane test; (b) Spherical-sliced Cone test.</div>
</div>
<div class="paragraph">
<p>Figure 7 (b) shows how to calculate min/max light tile distance. Basically we are looking for the vector closet to light sphere center in the tile cone. In the example above, we get this vector by rotating tile cone center vector around origin towards sphere cone center vector, with tile cone half angle. The “Sum Angle” is the angle between tile cone center vector and light cone center vector, which we used to do cone test in previous section. The “Diff Angle” is “Sum Angle” minus tile cone half angle, which we will be using to calculate min/max light tile distance.</p>
</div>
<div class="paragraph">
<p>One special condition is if the light sphere center is inside a tile, we will get a negative “Diff Angle”. In this case we simply clamp it to 0, since light cone center vector is inside the cone, it IS the closest vector we are looking for. Some more trigonometric formulas: \(\sin{(A-B)} = \sin{A}\cos{B} - \cos{A}\sin{B}\); \(\cos{(A-B)} = \cos{A}\cos{B} + \sin{A}\sin{B}\).</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-glsl" data-lang="glsl">// diff angle = sum angle - tile cone half angle
// clamp to handle the case when light center is within tile cone
float diffSin = clamp(lightTileSin * tileCos - lightTileCos * tileSin, 0.0, 1.0);
float diffCos = (diffSin == 0.0) ? 1.0 : lightTileCos * tileCos + lightTileSin * tileSin;
float lightTileDistOffset = sqrt(lightRadius * lightRadius - lightDistSqr * diffSin * diffSin);
float lightTileDistBase = lightDist * diffCos;

if (lightTileCos &gt;= sumCos &amp;&amp;
    lightTileDistBase - lightTileDistOffset &lt;= maxTileDist &amp;&amp;
    lightTileDistBase + lightTileDistOffset &gt;= minTileDist)
{
    // light intersect this tile
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Here we keep cone test comparison, but changed near/far plane test to light tile distance comparison. How are we doing spherical-sliced cone test then? As shown in Figure 8 (d), for single light visualization, it removes false positives introduced by depth comparison. For performance, we get 11.70% improvement over cone test, and 15.68% improvement over sphere-frustum test.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/sphericalslicedcone/fig8.png" alt="fig8.png" width="1200">
</div>
<div class="title">Figure 8 (a) normal rendering; (b) tiles passed sphere-frustum test; (c) tiles passed cone test; (d) tiles passed spherical-sliced cone test.</div>
</div>
<table class="tableblock frame-all grid-all" style="width: 80%;">
<colgroup>
<col style="width: 43.75%;">
<col style="width: 18.75%;">
<col style="width: 18.75%;">
<col style="width: 18.75%;">
</colgroup>
<thead>
<tr>
<th class="tableblock halign-left valign-top"></th>
<th class="tableblock halign-left valign-top">Lighting Time</th>
<th class="tableblock halign-left valign-top">Step Improvement</th>
<th class="tableblock halign-left valign-top">Accumulated Improvement</th>
</tr>
</thead>
<tbody>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Sphere-Frustum Test</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">5.55 ms</p></td>
<td class="tableblock halign-left valign-top"></td>
<td class="tableblock halign-left valign-top"></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Cone Test</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">5.30 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">4.50%</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">4.50%</p></td>
</tr>
<tr>
<td class="tableblock halign-left valign-top"><p class="tableblock"><strong>Spherical-sliced Cone Test</strong></p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">4.68 ms</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">11.70%</p></td>
<td class="tableblock halign-left valign-top"><p class="tableblock">15.68%</p></td>
</tr>
</tbody>
</table>
<div class="paragraph">
<p>Here is the shader code we used so far:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="highlight"><code class="language-glsl" data-lang="glsl">// calculate tile cone
vec3 tileCenterVec = normalize(sides[0] + sides[1] + sides[2] + sides[3]);
float tileCos = min(min(min(dot(tileCenterVec, sides[0]), dot(tileCenterVec, sides[1])), dot(tileCenterVec, sides[2])), dot(tileCenterVec, sides[3]));
float tileSin = sqrt(1 - tileCos * tileCos);

// loop through light list
for (uint lightIdx = 0; lightIdx &lt; lightCount; ++lightIdx)
{
    // get lightPos and lightRadius in view space
    float lightDistSqr = dot(lightPos, lightPos);
    float lightDist = sqrt(lightDistSqr);
    vec3 lightCenterVec = lightPos / lightDist;
    float lightSin = clamp(lightRadius / lightDist, 0.0, 1.0);
    float lightCos = sqrt(1 - lightSin * lightSin);

    float lightTileCos = dot(lightCenterVec, tileCenterVec);
    float lightTileSin = sqrt(1 - lightTileCos * lightTileCos);
    // sum angle = light cone half angle + tile cone half angle
    float sumCos = (lightRadius &gt; lightDist) ? -1.0 : (tileCos * lightCos - tileSin * lightSin);

    // diff angle = sum angle - tile cone half angle
    // clamp to handle the case when light center is within tile cone
    float diffSin = clamp(lightTileSin * tileCos - lightTileCos * tileSin, 0.0, 1.0);
    float diffCos = (diffSin == 0.0) ? 1.0 : lightTileCos * tileCos + lightTileSin * tileSin;
    float lightTileDistOffset = sqrt(lightRadius * lightRadius - lightDistSqr * diffSin * diffSin);
    float lightTileDistBase = lightDist * diffCos;

    if (lightTileCos &gt;= sumCos &amp;&amp;
        lightTileDistBase - lightTileDistOffset &lt;= maxTileDepth &amp;&amp;
        lightTileDistBase + lightTileDistOffset &gt;= minTileDepth)
    {
        // light intersect this tile
    }
}</code></pre>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_extend_to_clustered_light_culling">Extend to Clustered Light Culling</h3>
<div class="paragraph">
<p>Since we are calculating light range per tile, it is natural to extend this method to clustered light culling, which is useful for rendering translucent object. Similar to common cluster setup, when we build the light list we record the farthest light and use that as the far bound for clusters. Instead of using overall max light depth, we use overall max light distance to camera. Figure 9 shows the difference between two setups.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/sphericalslicedcone/fig9.png" alt="fig9.png" width="1200">
</div>
<div class="title">Figure 9 cluster setup with 4 clusters per tile; (a) common cluster setup; (b) cluster setup with Spherical-sliced Cone.</div>
</div>
<div class="paragraph">
<p>Also instead of using a global max distance, we calculate max distance per tile, which is the smaller value of overall max light distance and max tile distance. We are not going to run light culling per cluster, we still run it once per tile. With spherical-sliced cone culling, simply compare min/max light tile distance with cluster min/max distance we can get light-cluster intersection result for all clusters in this tile.</p>
</div>
<div class="paragraph">
<p>To store the information, we use one bit to mark whether light intersect with a cluster in a tile. If the maximum allowed visible lights are no more than 65535, and we have no more than 16 clusters per tile, we can use one uint32 for a light intersect a tile (16 bits for light index, 16 bits for cluster mask). Or if we have no more than 32 clusters per tile, we can use two uint32, one for light index, one for cluster mask. This way we still have a list of lights per tile rather than a list of lights per cluster.</p>
</div>
<div class="paragraph">
<p>There are many ways to setup clusters within a tile, here we use even distribution just for simplicity. Finally, another trick is for the last and farthest cluster. If tile geometry distance range (max tile distance minus min tile distance) is smaller than the range of the farthest cluster, the second left-most tile in Figure 9 (b) for example, we can use the tile distance range to define the last cluster, and setup other clusters in this tile normally starting from min tile distance to camera. This way we have better culling result for rendering opaque geometry, which are the majority of the scene. The opposite example is the second right-most tile in Figure 9 (b), where tile geometry distance range is larger than farthest cluster range, we want to leave the cluster setup as it is, since this setup will cull the light for the front-most geometry, while the other setup or tile-based culling will not.</p>
</div>
</div>]]></description><link>https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html</link><guid isPermaLink="true">https://lxjk.github.io/2018/03/25/Improve-Tile-based-Light-Culling-with-Spherical-sliced-Cone.html</guid><category><![CDATA[Graphics]]></category><dc:creator><![CDATA[Eric Zhang]]></dc:creator><pubDate>Sun, 25 Mar 2018 00:00:00 GMT</pubDate></item><item><title><![CDATA[Stop Using Normal Matrix]]></title><description><![CDATA[<div class="paragraph">
<p>For rendering, I used to calculate normal matrix to transform vertex normal from model space to world space or view space. The normal matrix is defined as the inverse transpose of upper-left 3x3 matrix of the model matrix, from <a href="http://www.lighthouse3d.com/tutorials/glsl-12-tutorial/the-normal-matrix/">this article</a>. Of course matrix inverse is not a cheap operation (I discussed more about matrix inverse <a href="https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html">here</a>), and I just realized I actually don’t need to calculate inverse transpose at all, if the model matrix is made of translation, rotation and scale, which in most of the cases your matrices will be.</p>
</div>
<div class="imageblock" style="text-align: center;float: right">
<div class="content">
<img src="http://www.lighthouse3d.com/wp-content/uploads/2011/03/normalmat2.gif" alt="normalmat2.gif" width="200">
</div>
<div class="title">Figure 2</div>
</div>
<div class="imageblock" style="text-align: center;float: right">
<div class="content">
<img src="http://www.lighthouse3d.com/wp-content/uploads/2011/03/normalmat1.gif" alt="normalmat1.gif" width="200">
</div>
<div class="title">Figure 1</div>
</div>
<div class="paragraph">
<p>Let’s revisit the problem, why we cannot just use model matrix to transform the normal? If the matrix has uniform scale, there won’t be any problem. However if the matrix has non-uniform scale, after multiplied by the matrix, our normal is no longer perpendicular to the tangent anymore.</p>
</div>
<div class="paragraph">
<p>Now we describe the problem in math terms. Because we only transform direction, we can ignore the translation for convenience and use 3x3 matrix. The matrices used here are row major. Let our model matrix</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left( \begin{matrix} a\vec{X} \\ b\vec{Y} \\ c\vec{Z} \\ \end{matrix} \right) = \left( \begin{matrix} aX_0 &amp; aX_1 &amp; aX_2 \\ bY_0 &amp; bY_1 &amp; bY_2 \\ cZ_0 &amp; cZ_1 &amp; cZ_2 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>be made of rotation (axis \(\vec{X}\), \(\vec{Y}\) and \(\vec{Z}\), where \(\vec{X}\cdot\vec{Y}=\vec{X}\cdot\vec{Z}=\vec{Y}\cdot\vec{Z}=0\), and \(\left|\vec{X}\right|=\left|\vec{Y}\right|=\left|\vec{Z}\right|=1\)), and scale \((a,b,c)\). We also have tangent \(\vec{T}=(T_0,T_1,T_2)\) and normal \(\vec{N}=(N_0,N_1,N_2)\), that \(\vec{T}\cdot\vec{N}=0\). Now after transform, tangent becomes \(\vec{T'}=\vec{T}M=a{T_0}\vec{X}+b{T_1}\vec{Y}+c{T_2}\vec{Z}\), we need to find a normal \(\vec{N'}\) such that \(\vec{T'}\cdot\vec{N'}=0\).</p>
</div>
<div class="paragraph">
<p>Remember \(\vec{X}\), \(\vec{Y}\) and \(\vec{Z}\) are unit axes perpendicular to each other, if we denote \(\vec{N'}={N'_0}\vec{X} + {N'_1}\vec{Y} + {N'_2}\vec{Z}\), we can expand this dot product.</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{T'}\cdot\vec{N'}&amp;=(a{T_0}\vec{X}+b{T_1}\vec{Y}+c{T_2}\vec{Z})\vec{N'}\\
&amp;=a{T_0}\vec{X}\cdot\vec{N'}+b{T_1}\vec{Y}\cdot\vec{N'}+c{T_2}\vec{Z}\cdot\vec{N'}\\
&amp;=a{T_0}{N'_0}+b{T_1}{N'_1}+c{T_2}{N'_2}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Doesn’t this look familiar? We already know the original tangent and normal \(\vec{T}\cdot\vec{N}={T_0}{N_0} + {T_1}{N_1} + {T_2}{N_2}=0\). We can easily get our “transformed” normal</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{N'}&amp;=\frac{N_0}{a}\vec{X} + \frac{N_1}{b}\vec{Y} + \frac{N_2}{c}\vec{Z}\\
&amp;=(\frac{N_0}{a}, \frac{N_1}{b}, \frac{N_2}{c})M
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>It means instead of calculating inverse transpose of model matrix, and send a 3x3 matrix to the shader, you can simply calculate reciprocal of squared scale and send over a vector3, use that to rescale your normal and multiply by the model matrix you already have, and that’s it. Because the calculation is cheap, you can even avoid sending over extra data, and just calculate the whole thing in vertex shader (3 dot product to get squared scale from model matrix, 3 reciprocal and multiply).</p>
</div>
<div class="paragraph">
<p>Of course the normal you get need to be re-normalized, but you need to do this no matter which method you use. Moreover since you have to re-normalize the normal in pixel shader anyway (because after interpolation the normal may not be of unit length), you don’t need to do anything extra in vertex shader.</p>
</div>
<div class="paragraph">
<p>Now how does the normal matrix handle the same problem? The inverse of our 3x3 model matrix is</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{-1}=\left( \begin{matrix} \frac{1}{a}\vec{X} &amp; \frac{1}{b}\vec{Y} &amp; \frac{1}{c}\vec{Z} \\ \end{matrix} \right) = \left( \begin{matrix} \frac{1}{a}X_0 &amp; \frac{1}{b}Y_0 &amp; \frac{1}{c}Z_0 \\ \frac{1}{a}X_1 &amp; \frac{1}{b}Y_1 &amp; \frac{1}{c}Z_1 \\ \frac{1}{a}X_2 &amp; \frac{1}{b}Y_2 &amp; \frac{1}{c}Z_2 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>It should be easy to confirm \(MM^{-1}=I\). More about matrix inverse can be found <a href="https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html">here</a>.</p>
</div>
<div class="paragraph">
<p>The normal matrix is \(M'={(M^{-1})}^{T}=\left( \begin{matrix} \frac{1}{a}\vec{X} \\ \frac{1}{b}\vec{Y} \\ \frac{1}{c}\vec{Z} \\ \end{matrix} \right)\).</p>
</div>
<div class="paragraph">
<p>The transformed normal is \(\vec{N'}=\vec{N}M'=\frac{N_0}{a}\vec{X} + \frac{N_1}{b}\vec{Y} + \frac{N_2}{c}\vec{Z}\).</p>
</div>
<div class="paragraph">
<p>Well, we got the same result.</p>
</div>
<div class="paragraph">
<p>One thing to be careful about is, this method only works if the matrix axes are perpendicular to each other, that is \(\vec{X}\cdot\vec{Y}=\vec{X}\cdot\vec{Z}=\vec{Y}\cdot\vec{Z}=0\). If your matrix is made of translation, rotation and scale, this is always true. However if you have interesting coordinate system that this does not hold, you need to fall back to using normal matrix. The proof and explanation for normal matrix in a general case can be found in <a href="http://www.lighthouse3d.com/tutorials/glsl-12-tutorial/the-normal-matrix/">this original article</a>.</p>
</div>]]></description><link>https://lxjk.github.io/2017/10/01/Stop-Using-Normal-Matrix.html</link><guid isPermaLink="true">https://lxjk.github.io/2017/10/01/Stop-Using-Normal-Matrix.html</guid><category><![CDATA[Math]]></category><category><![CDATA[Graphics]]></category><dc:creator><![CDATA[Eric Zhang]]></dc:creator><pubDate>Sun, 01 Oct 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Fast 4x4 Matrix Inverse with SSE SIMD, Explained]]></title><description><![CDATA[<div id="preamble">
<div class="sectionbody">
<div id="toc" class="toc">
<div id="toctitle" class="title">Table of Contents</div>
<ul class="sectlevel2">
<li><a href="#_transform_matrix_inverse">Transform Matrix Inverse</a></li>
<li><a href="#_general_matrix_inverse">General Matrix Inverse</a></li>
<li><a href="#_appendix_1">Appendix 1</a></li>
<li><a href="#_appendix_2">Appendix 2</a></li>
</ul>
</div>
<div class="paragraph">
<p>Before we start, think about this question: do we really need the inverse of a general matrix?</p>
</div>
<div class="paragraph">
<p>I came to this problem when writing a math library for my game engine. If you are making a game or 3D application, we use 4x4 matrix for object transform, which is a combination of 3D translation, rotation and scale. If most of your matrices are used as transform matrices, because of their special property, we have a fast route for calculating their inverse. In fact transform matrix inverse is only 50% of the cost compared to the optimized general matrix inverse. In the first half of this post we will talk about transform matrix.  In the second half we will dive in and explain the SIMD version of general 4x4 matrix inverse, and we compare the performance of our method with commonly used math libraries from UE4, Eigen and DirectX Math.</p>
</div>
<div class="paragraph">
<p>The matrices used in this post are row major. This is mainly for (1) easier to demonstrate and visualize with matrix data layout; (2) easier to compare with other math library. The same matrix inverse function works for both row major and column major, because \(A^{-1}=((A^{T})^{-1})^{T}\) (inverse is the same as transpose, inverse then transpose again). However if you are a column major guy like myself, I have a full-on column major version for you in <a href="#Appendix">[Appendix]</a>.</p>
</div>
</div>
</div>
<div class="sect2">
<h3 id="_transform_matrix_inverse">Transform Matrix Inverse</h3>
<div class="paragraph">
<p>The transform matrix we are talking about here is defined as following:</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left( \begin{matrix} a\vec{X} &amp; 0 \\ b\vec{Y} &amp; 0 \\ c\vec{Z} &amp; 0 \\ \vec{T} &amp; 1 \\ \end{matrix} \right) = \left( \begin{matrix} aX_0 &amp; aX_1 &amp; aX_2 &amp; 0 \\ bY_0 &amp; bY_1 &amp; bY_2 &amp; 0 \\ cZ_0 &amp; cZ_1 &amp; cZ_2 &amp; 0 \\ T_0 &amp; T_1 &amp; T_2 &amp; 1 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>The first 3 component of the last row is the translation \(\vec{T}\). The top left 3x3 sub-matrix is the scaled rotation matrix, with each row as a scaled axis. We have \(\vec{X}\cdot\vec{Y}=\vec{X}\cdot\vec{Z}=\vec{Y}\cdot\vec{Z}=0\), and \(\left|\vec{X}\right|=\left|\vec{Y}\right|=\left|\vec{Z}\right|=1\). And the scale is \((a,b,c)\).</p>
</div>
<div class="paragraph">
<p>Most matrices in the game are of this form. For example, \(M\) represents a local to world transform, \(\vec{X}\), \(\vec{Y}\), \(\vec{Z}\) are your local space axes. If you have a point \(\vec{P}(P_0,P_1,P_2)\), and you want to transform it from local space to world space, you do this:</p>
</div>
<div class="stemblock">
<div class="content">
\[\vec{P'}=P_0a\vec{X}+P_1b\vec{Y}+P_2c\vec{Z}+\vec{T}\]
</div>
</div>
<div class="paragraph">
<p>This is the same as extend \(\vec{P}\) to a 4 component vector \(\vec{P}(P_0,P_1,P_2,1)\) and multiply by matrix \(M\). Now what does inverse matrix \(M^{-1}\) mean? In this case it represents a world to local transform, so if we multiply \(\vec{P'}\) by \(M^{-1}\), we should get \(\vec{P}\) back. How do we transform the point \(\vec{P'}\) from world space back in local space? We subtract the local space origin (aka the translation \(\vec{T}\)), then dot each axes to get its local space coordinate and rescale it:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{P}&amp;=(\frac{1}{a}(\vec{P'}-\vec{T})\cdot\vec{X},\frac{1}{b}(\vec{P'}-\vec{T})\cdot\vec{Y},\frac{1}{c}(\vec{P'}-\vec{T})\cdot\vec{Z})\\
&amp;=(\frac{1}{a}\vec{P'}\cdot\vec{X},\frac{1}{b}\vec{P'}\cdot\vec{Y},\frac{1}{c}\vec{P'}\cdot\vec{Z})-(\frac{1}{a}\vec{T}\cdot\vec{X},\frac{1}{b}\vec{T}\cdot\vec{Y},\frac{1}{c}\vec{T}\cdot\vec{Z})
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>With this, we can actually directly write the form of the inverse of our matrix.</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{-1}=\left( \begin{matrix} \frac{1}{a}\vec{X} &amp; \frac{1}{b}\vec{Y} &amp; \frac{1}{c}\vec{Z} &amp; \vec{0} \\ -\vec{T}\cdot\frac{1}{a}\vec{X} &amp; -\vec{T}\cdot\frac{1}{b}\vec{Y} &amp; -\vec{T}\cdot\frac{1}{c}\vec{Z} &amp; 1 \\ \end{matrix} \right) = \left( \begin{matrix} \frac{1}{a}X_0 &amp; \frac{1}{b}Y_0 &amp; \frac{1}{c}Z_0 &amp; 0 \\ \frac{1}{a}X_1 &amp; \frac{1}{b}Y_1 &amp; \frac{1}{c}Z_1 &amp; 0 \\ \frac{1}{a}X_2 &amp; \frac{1}{b}Y_2 &amp; \frac{1}{c}Z_2 &amp; 0 \\ -\vec{T}\cdot\frac{1}{a}\vec{X} &amp; -\vec{T}\cdot\frac{1}{b}\vec{Y} &amp; -\vec{T}\cdot\frac{1}{c}\vec{Z} &amp; 1 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>The nice thing about transform matrix \(M\) is its first 3 rows are perpendicular to each other. Its inverse form is basically transpose the 3x3 rotation matrix, and rescale it, and change translation part by doing dot product with 3 rescaled axes. It should be easy to confirm \(MM^{-1}=I\).</p>
</div>
<div class="paragraph">
<p>Now let’s bake in the scale (so \(\left|\vec{X}\right|=a\),\(\left|\vec{Y}\right|=b\),\(\left|\vec{Z}\right|=c\)) and get a more generic form.</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left( \begin{matrix} \vec{X} &amp; 0 \\ \vec{Y} &amp; 0 \\ \vec{Z} &amp; 0 \\ \vec{T} &amp; 1 \\ \end{matrix} \right), M^{-1}=\left( \begin{matrix} \frac{1}{{\left|\vec{X}\right|}^{2}}\vec{X} &amp; \frac{1}{{\left|\vec{Y}\right|}^{2}}\vec{Y} &amp; \frac{1}{{\left|\vec{Z}\right|}^{2}}\vec{Z} &amp; \vec{0} \\ -\vec{T}\cdot\frac{1}{{\left|\vec{X}\right|}^{2}}\vec{X} &amp; -\vec{T}\cdot\frac{1}{{\left|\vec{Y}\right|}^{2}}\vec{Y} &amp; -\vec{T}\cdot\frac{1}{{\left|\vec{Z}\right|}^{2}}\vec{Z} &amp; 1 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>Notice that for rescaling, we divide by squared size of scaled axis, instead of size, which is good news for implementation. And if our transform is of unit scale, which is also common in games, our target becomes even simpler.</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{-1}=\left( \begin{matrix} \vec{X} &amp; \vec{Y} &amp; \vec{Z} &amp; \vec{0} \\ -\vec{T}\cdot\vec{X} &amp; -\vec{T}\cdot\vec{Y} &amp; -\vec{T}\cdot\vec{Z} &amp; 1 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>Alright, enough theory, let’s see some code. This is our matrix definition.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">__declspec(align(16)) struct Matrix4
{
public:
	union
	{
		float m[4][4];
		__m128 mVec[4];
	};
};</code></pre>
</div>
</div>
<div class="paragraph">
<p>Before we jump in intrinsics, I would like to define a bunch of shuffle/swizzle macros. Hopefully they will make it easier to read. We also make use of faster instructions for special shuffles.</p>
</div>
<div class="paragraph">
<p>(Thank you <strong>Stefan Kaps</strong> for pointing out single register shuffle instruction _mm_shuffle_epi32!)</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#define MakeShuffleMask(x,y,z,w)           (x | (y&lt;&lt;2) | (z&lt;&lt;4) | (w&lt;&lt;6))

// vec(0, 1, 2, 3) -&gt; (vec[x], vec[y], vec[z], vec[w])
#define VecSwizzleMask(vec, mask)          _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(vec), mask))
#define VecSwizzle(vec, x, y, z, w)        VecSwizzleMask(vec, MakeShuffleMask(x,y,z,w))
#define VecSwizzle1(vec, x)                VecSwizzleMask(vec, MakeShuffleMask(x,x,x,x))
// special swizzle
#define VecSwizzle_0022(vec)               _mm_moveldup_ps(vec)
#define VecSwizzle_1133(vec)               _mm_movehdup_ps(vec)

// return (vec1[x], vec1[y], vec2[z], vec2[w])
#define VecShuffle(vec1, vec2, x,y,z,w)    _mm_shuffle_ps(vec1, vec2, MakeShuffleMask(x,y,z,w))
// special shuffle
#define VecShuffle_0101(vec1, vec2)        _mm_movelh_ps(vec1, vec2)
#define VecShuffle_2323(vec1, vec2)        _mm_movehl_ps(vec2, vec1)</code></pre>
</div>
</div>
<div class="paragraph">
<p>Here is our first function to inverse transform matrix without scaling (always unit scale).</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">// Requires this matrix to be transform matrix, NoScale version requires this matrix be of scale 1
inline Matrix4 GetTransformInverseNoScale(const Matrix4&amp; inM)
{
	Matrix4 r;

	// transpose 3x3, we know m03 = m13 = m23 = 0
	__m128 t0 = VecShuffle_0101(inM.mVec[0], inM.mVec[1]); // 00, 01, 10, 11
	__m128 t1 = VecShuffle_2323(inM.mVec[0], inM.mVec[1]); // 02, 03, 12, 13
	r.mVec[0] = VecShuffle(t0, inM.mVec[2], 0,2,0,3); // 00, 10, 20, 23(=0)
	r.mVec[1] = VecShuffle(t0, inM.mVec[2], 1,3,1,3); // 01, 11, 21, 23(=0)
	r.mVec[2] = VecShuffle(t1, inM.mVec[2], 0,2,2,3); // 02, 12, 22, 23(=0)

	// last line
	r.mVec[3] =                       _mm_mul_ps(r.mVec[0], VecSwizzle1(inM.mVec[3], 0));
	r.mVec[3] = _mm_add_ps(r.mVec[3], _mm_mul_ps(r.mVec[1], VecSwizzle1(inM.mVec[3], 1)));
	r.mVec[3] = _mm_add_ps(r.mVec[3], _mm_mul_ps(r.mVec[2], VecSwizzle1(inM.mVec[3], 2)));
	r.mVec[3] = _mm_sub_ps(_mm_setr_ps(0.f, 0.f, 0.f, 1.f), r.mVec[3]);

	return r;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Very straight forward. This is the fastest function you can have, it only does a transpose and some dot products. If we add in scales, it takes a little more time to do rescaling, but still pretty fast. There is a little trick for calculating squared size, we can make use of the fact that we need to transpose 3x3 rotation part anyway, do squared size after and calculate 3 axes in one go.</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">#define SMALL_NUMBER		(1.e-8f)

// Requires this matrix to be transform matrix
inline Matrix4 GetTransformInverse(const Matrix4&amp; inM)
{
	Matrix4 r;

	// transpose 3x3, we know m03 = m13 = m23 = 0
	__m128 t0 = VecShuffle_0101(inM.mVec[0], inM.mVec[1]); // 00, 01, 10, 11
	__m128 t1 = VecShuffle_2323(inM.mVec[0], inM.mVec[1]); // 02, 03, 12, 13
	r.mVec[0] = VecShuffle(t0, inM.mVec[2], 0,2,0,3); // 00, 10, 20, 23(=0)
	r.mVec[1] = VecShuffle(t0, inM.mVec[2], 1,3,1,3); // 01, 11, 21, 23(=0)
	r.mVec[2] = VecShuffle(t1, inM.mVec[2], 0,2,2,3); // 02, 12, 22, 23(=0)

	// (SizeSqr(mVec[0]), SizeSqr(mVec[1]), SizeSqr(mVec[2]), 0)
	__m128 sizeSqr;
	sizeSqr =                     _mm_mul_ps(r.mVec[0], r.mVec[0]);
	sizeSqr = _mm_add_ps(sizeSqr, _mm_mul_ps(r.mVec[1], r.mVec[1]));
	sizeSqr = _mm_add_ps(sizeSqr, _mm_mul_ps(r.mVec[2], r.mVec[2]));

	// optional test to avoid divide by 0
	__m128 one = _mm_set1_ps(1.f);
	// for each component, if(sizeSqr &lt; SMALL_NUMBER) sizeSqr = 1;
	__m128 rSizeSqr = _mm_blendv_ps(
		_mm_div_ps(one, sizeSqr),
		one,
		_mm_cmplt_ps(sizeSqr, _mm_set1_ps(SMALL_NUMBER))
		);

	r.mVec[0] = _mm_mul_ps(r.mVec[0], rSizeSqr);
	r.mVec[1] = _mm_mul_ps(r.mVec[1], rSizeSqr);
	r.mVec[2] = _mm_mul_ps(r.mVec[2], rSizeSqr);

	// last line
	r.mVec[3] =                       _mm_mul_ps(r.mVec[0], VecSwizzle1(inM.mVec[3], 0));
	r.mVec[3] = _mm_add_ps(r.mVec[3], _mm_mul_ps(r.mVec[1], VecSwizzle1(inM.mVec[3], 1)));
	r.mVec[3] = _mm_add_ps(r.mVec[3], _mm_mul_ps(r.mVec[2], VecSwizzle1(inM.mVec[3], 2)));
	r.mVec[3] = _mm_sub_ps(_mm_setr_ps(0.f, 0.f, 0.f, 1.f), r.mVec[3]);

	return r;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Notice the top and bottom of the function is exactly the same as the NoScale version. In the middle we calculate squared size, with an optional divide-by-small-number test.</p>
</div>
</div>
<div class="sect2">
<h3 id="_general_matrix_inverse">General Matrix Inverse</h3>
<div class="paragraph">
<p>For general matrix, things are getting complicated. You can find most of the theory part in the following wiki pages:
<a href="https://en.wikipedia.org/wiki/Invertible_matrix">Invertible Matrix</a>, <a href="https://en.wikipedia.org/wiki/Adjugate_matrix">Adjugate Matrix</a>, <a href="https://en.wikipedia.org/wiki/Determinant#Relation_to_eigenvalues_and_trace">Determinant</a>, <a href="https://en.wikipedia.org/wiki/Trace_(linear_algebra)">Trace</a>.</p>
</div>
<div class="paragraph">
<p>We will introduce some of them as we go. The method is based on the same block matrices method Intel used for its <a href="https://software.intel.com/en-us/articles/optimized-matrix-library-for-use-with-the-intel-pentiumr-4-processors-sse2-instructions/">Optimized Matrix Library</a>.</p>
</div>
<div class="paragraph">
<p>A 4x4 matrix can be described as 4 2x2 sub matrices. The good things about 2x2 matrix are not only it is easy to calculate their inverse or determinant, but also because they can fit in one vector register, their calculation can be done very fast.</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)=\left( \begin{matrix} A_0 &amp; A_1 &amp; B_0 &amp; B_1 \\ A_2 &amp; A_3 &amp; B_2 &amp; B_3 \\ C_0 &amp; C_1 &amp; D_0 &amp; D_1 \\ C_2 &amp; C_3 &amp; D_2 &amp; D_3 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>For the following derivation, we are going to assume these properties: submatrix \(A\) and \(D\) are invertible, \(C\) and \(D\) commute (\(CD=DC\)). (credits to <strong>wychmaster</strong> for pointing out the assumptions). These are rather strong assumptions, which would help us derive the final form we use for calculation. Later on in appendix we will prove that the result of derivation still holds for 4x4 matrix even if none of these assumptions is true.</p>
</div>
<div class="paragraph">
<p>Matrix block-wise inverse is given by the following:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}^{-1}&amp;=\left( \begin{matrix} A^{-1}+A^{-1}B(D-CA^{-1}B)^{-1}CA^{-1} &amp; -A^{-1}B(D-CA^{-1}B)^{-1} \\ -(D-CA^{-1}B)^{-1}CA^{-1} &amp; (D-CA^{-1}B)^{-1} \\ \end{matrix} \right)\\
&amp;=\left( \begin{matrix} (A-BD^{-1}C)^{-1} &amp; -(A-BD^{-1}C)^{-1}BD^{-1} \\ -D^{-1}C(A-BD^{-1}C)^{-1} &amp; D^{-1}+D^{-1}C(A-BD^{-1}C)^{-1}BD^{-1} \\ \end{matrix} \right)
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>We actually use a mix of these two forms, 2nd row from the first form, and 1st row from the second form.</p>
</div>
<div class="stemblock">
<div class="content">
\[{\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}^{-1}=\left( \begin{matrix} (A-BD^{-1}C)^{-1} &amp; -(A-BD^{-1}C)^{-1}BD^{-1} \\ -(D-CA^{-1}B)^{-1}CA^{-1} &amp; (D-CA^{-1}B)^{-1} \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>This choice might not seem obvious. Take the first form for example, it seems we only need to calculate two 2x2 matrix inverse: \(A^{-1}\) and \((D-CA^{-1} B)^{-1}\), however it can be further simplified by proper derivation. Since each corresponding sub-matrix equals to each other, it doesn’t matter which form you choose to work your math on. We just select the easier row from both forms.</p>
</div>
<div class="paragraph">
<p>Before we start derivation, we need to introduce some concepts. The adjugate of matrix \(A\) is defined as \(A\operatorname{adj}(A)=\left|A\right|I\), where \(\left|A\right|\) is determinant of \(A\). For convenience, in this post we denote adjugate matrix as \(A^{\#}=\operatorname{adj}(A)\). So we can change inverse calculation to adjugate calculation by \(A^{-1}=\frac{1}{\left|A\right|}A^{\#}\). Adjugate of 2x2 matrix is:</p>
</div>
<div class="stemblock">
<div class="content">
\[A^{\#}={\left( \begin{matrix} A_0 &amp; A_1 \\ A_2 &amp; A_3 \\ \end{matrix} \right)}^{\#}=\left( \begin{matrix} A_3 &amp; -A_1 \\ -A_2 &amp; A_0 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>Adjugate of 2x2 matrix has the following property: \((AB)^{\#}=B^{\#}A^{\#}\),\((A^{\#})^{\#}=A\), \((cA)^{\#}=cA^{\#}\).</p>
</div>
<div class="paragraph">
<p>For determinant of 2x2 matrix, we will use the following properties: \(\left|A\right|={A_0}{A_3}-{A_1}{A_2}\), \(\left|-A\right|=\left|A\right|\), \(\left|AB\right|=\left|A\right|\left|B\right|\), \(\left|A+B\right|=\left|A\right| + \left|B\right| + \operatorname{tr}(A^{\#}{B})\).</p>
</div>
<div class="paragraph">
<p>For trace of matrix we have \(\operatorname{tr}(AB)=\operatorname{tr}(BA)\), \(\operatorname{tr}(-A)=-\operatorname{tr}(A)\).</p>
</div>
<div class="paragraph">
<p>Finally for our block matrices \(M={\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}\), the determinant is</p>
</div>
<div class="stemblock">
<div class="content">
\[\left|M\right|=\left|A\right|\left|D-CA^{-1}B\right|=\left|D\right|\left|A-BD^{-1}C\right|=\left|AD-BC\right|\]
</div>
</div>
<div class="paragraph">
<p>I only listed properties needed for derivation. If you are not familiar with these concepts, or want to know more about them, take a look at the wiki pages above.</p>
</div>
<div class="paragraph">
<p>Let \(M^{-1}={\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}^{-1}={\left( \begin{matrix} X &amp; Y \\ Z &amp; W \\ \end{matrix} \right)}\).Let’s start with the top left corner.</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
X&amp;=(A-BD^{-1}C)^{-1}\\
&amp;=\frac{1}{\left|A-BD^{-1}C\right|}(A-\frac{1}{\left|D\right|}BD^{\#}C)^{\#}\\
&amp;=\frac{1}{\left|D\right|\left|A-BD^{-1}C\right|}(\left|D\right|A-BD^{\#}C)^{\#}\\
&amp;=\frac{1}{\left|M\right|}(\left|D\right|A-B(D^{\#}C))^{\#}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Similarly we can derive the bottom right corner:</p>
</div>
<div class="stemblock">
<div class="content">
\[W=(D-CA^{-1}B)^{-1}=\frac{1}{\left|M\right|}(\left|A\right|D-C(A^{\#}B))^{\#}\]
</div>
</div>
<div class="paragraph">
<p>Notice that we put parentheses around \(D^{\#}C\) and \(A^{\#}B\), and you will see the reason soon.</p>
</div>
<div class="paragraph">
<p>Now let’s do the top right corner, and make use of the result of top left corner \(X\):</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
Y&amp;=-(A-BD^{-1}C)^{-1}BD^{-1}\\
&amp;=-\frac{1}{\left|M\right|\left|D\right|}(\left|D\right|A-B(D^{\#}C))^{\#}(BD^{\#})\\
&amp;=-\frac{1}{\left|M\right|\left|D\right|}(\left|D\right|A-B(D^{\#}C))^{\#}(DB^{\#})^{\#}\\
&amp;=-\frac{1}{\left|M\right|\left|D\right|}(\left|D\right|DB^{\#}A-DB^{\#}B(D^{\#}C))^{\#}\\
&amp;=-\frac{1}{\left|M\right|\left|D\right|}(\left|D\right|D(A^{\#}B)^{\#}-\left|D\right|\left|B\right|C))^{\#}\\
&amp;=\frac{1}{\left|M\right|}(\left|B\right|C-D(A^{\#}B)^{\#})^{\#}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Similarly we can derive the bottom left corner:</p>
</div>
<div class="stemblock">
<div class="content">
\[Z=-(D-CA^{-1}B)^{-1}CA^{-1}=\frac{1}{\left|M\right|}(\left|C\right|B-A(D^{\#}C)^{\#})^{\#}\]
</div>
</div>
<div class="paragraph">
<p>Here we also changed from \(B^{\#}A\) to \((A^{\#}B)^{\#}\), so we can reuse the result of \(A^{\#}B\). Putting them together:</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{-1}={\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}^{-1}=\frac{1}{\left|M\right|}{\left( \begin{matrix} (\left|D\right|A-B(D^{\#}C))^{\#} &amp; (\left|B\right|C-D(A^{\#}B)^{\#})^{\#} \\ (\left|C\right|B-A(D^{\#}C)^{\#})^{\#} &amp; (\left|A\right|D-C(A^{\#}B))^{\#} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="paragraph">
<p>Now it is clear what kind of calculation we need. We need 2x2 matrix multiply and multiply by adjugate: \(AB\), \(A^{\#}B\) and \(AB^{\#}\). We already know how to do adjugate, but in this case, adjugate can be combined with multiplication so we don’t waste instructions. Just expand the result and rearrange the order, for example:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
A^{\#}B&amp;={\left( \begin{matrix} A_3 &amp; -A_1 \\ -A_2 &amp; A_0 \\ \end{matrix} \right)}{\left( \begin{array}{} B_0 &amp; B_1 \\ B_2 &amp; B_3 \\ \end{array} \right)}\\
&amp;={\left( \begin{array}{} {A_3}{B_0}-{A_1}{B_2} &amp;{A_3}{B_1}-{A_1}{B_3} \\ {A_0}{B_2}-{A_2}{B_0} &amp; {A_0}{B_3}-{A_2}{B_1} \\ \end{array} \right)}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Here’s the code for these three functions:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">// for row major matrix
// we use __m128 to represent 2x2 matrix as A = | A0  A1 |
//                                              | A2  A3 |
// 2x2 row major Matrix multiply A*B
__forceinline __m128 Mat2Mul(__m128 vec1, __m128 vec2)
{
	return
		_mm_add_ps(_mm_mul_ps(                     vec1, VecSwizzle(vec2, 0,3,0,3)),
		           _mm_mul_ps(VecSwizzle(vec1, 1,0,3,2), VecSwizzle(vec2, 2,1,2,1)));
}
// 2x2 row major Matrix adjugate multiply (A#)*B
__forceinline __m128 Mat2AdjMul(__m128 vec1, __m128 vec2)
{
	return
		_mm_sub_ps(_mm_mul_ps(VecSwizzle(vec1, 3,3,0,0), vec2),
		           _mm_mul_ps(VecSwizzle(vec1, 1,1,2,2), VecSwizzle(vec2, 2,3,0,1)));

}
// 2x2 row major Matrix multiply adjugate A*(B#)
__forceinline __m128 Mat2MulAdj(__m128 vec1, __m128 vec2)
{
	return
		_mm_sub_ps(_mm_mul_ps(                     vec1, VecSwizzle(vec2, 3,0,3,0)),
		           _mm_mul_ps(VecSwizzle(vec1, 1,0,3,2), VecSwizzle(vec2, 2,1,2,1)));
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>Another trick is after we calculate the 2x2 sub matrix, for example \(\left|D\right|A-B(D^{\#}C)\), the final adjugate to get \(X=(\left|D\right|A-B(D^{\#}C))^{\#}\) can be combined with storing 2x2 sub matrices to the final result 4x4 matrix. You can see this at the end of the function.</p>
</div>
<div class="paragraph">
<p>The only thing left if determinant. 2x2 determinant is easy, the problem really is the whole 4x4 matrix determinant. Remember the determinant property we give above:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\left|M\right|&amp;=\left|AD-BC\right|\\
&amp;=\left|AD\right|+\left|-BC\right|+\operatorname{tr}((AD)^{\#}(-BC))\\
&amp;=\left|A\right|\left|D\right|+\left|B\right|\left|C\right|-\operatorname{tr}(D^{\#}A^{\#}BC)\\
&amp;=\left|A\right|\left|D\right|+\left|B\right|\left|C\right|-\operatorname{tr}((A^{\#}B)(D^{\#}C))
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>This is good. We need to calculate all sub matrices determinants and matrix \(A^{\#}B\) and \(D^{\#}C\) anyway. And if you derive the trace of 2x2 matrix multiplication:</p>
</div>
<div class="stemblock">
<div class="content">
\[\operatorname{tr}(AB)={A_0}{B_0}+{A_1}{B_2}+{A_2}{B_1}+{A_3}{B_3}\]
</div>
</div>
<div class="paragraph">
<p>This is just a shuffle and a dot product, should be easy enough to translate into instructions.</p>
</div>
<div class="paragraph">
<p>Now we have all pieces ready, here is our function for general 4x4 matrix inverse:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">// Inverse function is the same no matter column major or row major
// this version treats it as row major
inline Matrix4 GetInverse(const Matrix4&amp; inM)
{
	// use block matrix method
	// A is a matrix, then i(A) or iA means inverse of A, A# (or A_ in code) means adjugate of A, |A| (or detA in code) is determinant, tr(A) is trace

	// sub matrices
	__m128 A = VecShuffle_0101(inM.mVec[0], inM.mVec[1]);
	__m128 B = VecShuffle_2323(inM.mVec[0], inM.mVec[1]);
	__m128 C = VecShuffle_0101(inM.mVec[2], inM.mVec[3]);
	__m128 D = VecShuffle_2323(inM.mVec[2], inM.mVec[3]);

#if 0
	__m128 detA = _mm_set1_ps(inM.m[0][0] * inM.m[1][1] - inM.m[0][1] * inM.m[1][0]);
	__m128 detB = _mm_set1_ps(inM.m[0][2] * inM.m[1][3] - inM.m[0][3] * inM.m[1][2]);
	__m128 detC = _mm_set1_ps(inM.m[2][0] * inM.m[3][1] - inM.m[2][1] * inM.m[3][0]);
	__m128 detD = _mm_set1_ps(inM.m[2][2] * inM.m[3][3] - inM.m[2][3] * inM.m[3][2]);
#else
	// determinant as (|A| |B| |C| |D|)
	__m128 detSub = _mm_sub_ps(
		_mm_mul_ps(VecShuffle(inM.mVec[0], inM.mVec[2], 0,2,0,2), VecShuffle(inM.mVec[1], inM.mVec[3], 1,3,1,3)),
		_mm_mul_ps(VecShuffle(inM.mVec[0], inM.mVec[2], 1,3,1,3), VecShuffle(inM.mVec[1], inM.mVec[3], 0,2,0,2))
	);
	__m128 detA = VecSwizzle1(detSub, 0);
	__m128 detB = VecSwizzle1(detSub, 1);
	__m128 detC = VecSwizzle1(detSub, 2);
	__m128 detD = VecSwizzle1(detSub, 3);
#endif

	// let iM = 1/|M| * | X  Y |
	//                  | Z  W |

	// D#C
	__m128 D_C = Mat2AdjMul(D, C);
	// A#B
	__m128 A_B = Mat2AdjMul(A, B);
	// X# = |D|A - B(D#C)
	__m128 X_ = _mm_sub_ps(_mm_mul_ps(detD, A), Mat2Mul(B, D_C));
	// W# = |A|D - C(A#B)
	__m128 W_ = _mm_sub_ps(_mm_mul_ps(detA, D), Mat2Mul(C, A_B));

	// |M| = |A|*|D| + ... (continue later)
	__m128 detM = _mm_mul_ps(detA, detD);

	// Y# = |B|C - D(A#B)#
	__m128 Y_ = _mm_sub_ps(_mm_mul_ps(detB, C), Mat2MulAdj(D, A_B));
	// Z# = |C|B - A(D#C)#
	__m128 Z_ = _mm_sub_ps(_mm_mul_ps(detC, B), Mat2MulAdj(A, D_C));

	// |M| = |A|*|D| + |B|*|C| ... (continue later)
	detM = _mm_add_ps(detM, _mm_mul_ps(detB, detC));

	// tr((A#B)(D#C))
	__m128 tr = _mm_mul_ps(A_B, VecSwizzle(D_C, 0,2,1,3));
	tr = _mm_hadd_ps(tr, tr);
	tr = _mm_hadd_ps(tr, tr);
	// |M| = |A|*|D| + |B|*|C| - tr((A#B)(D#C)
	detM = _mm_sub_ps(detM, tr);

	const __m128 adjSignMask = _mm_setr_ps(1.f, -1.f, -1.f, 1.f);
	// (1/|M|, -1/|M|, -1/|M|, 1/|M|)
	__m128 rDetM = _mm_div_ps(adjSignMask, detM);

	X_ = _mm_mul_ps(X_, rDetM);
	Y_ = _mm_mul_ps(Y_, rDetM);
	Z_ = _mm_mul_ps(Z_, rDetM);
	W_ = _mm_mul_ps(W_, rDetM);

	Matrix4 r;

	// apply adjugate and store, here we combine adjugate shuffle and store shuffle
	r.mVec[0] = VecShuffle(X_, Y_, 3,1,3,1);
	r.mVec[1] = VecShuffle(X_, Y_, 2,0,2,0);
	r.mVec[2] = VecShuffle(Z_, W_, 3,1,3,1);
	r.mVec[3] = VecShuffle(Z_, W_, 2,0,2,0);

	return r;
}</code></pre>
</div>
</div>
<div class="paragraph">
<p>As side products of this function, it also gives you optimized version of calculating determinant and adjugate of 4x4 matrix. There are two things I want to talk a little bit more.</p>
</div>
<div class="paragraph">
<p>When we calculate the determinants of sub matrices, I do have a version to calculate 4 determinants in one go. However calculate them separately and use _mm_set1_ps to load into vector unit is proven to be faster on my CPU. My guess is since we need them to be separated anyway, even if I can calculate them together I need to use 4 shuffles to separate them out, which is not worth the effort, but I’m not sure. You should test performance in both versions.</p>
</div>
<div class="paragraph">
<p>(<strong>Edit</strong>: in my new CPU (Coffee Lake) the second method (4 determinants in one go) is 20% faster than the first method)</p>
</div>
<div class="paragraph">
<p>Also when calculating trace, I’m using two _mm_hadd_ps to sum up 4 components and have the result in all 4 components. There are a lot of ways to do the same thing. From what I tested, they yield similar performance, so I choose the one with less instructions. Again it could be different on different target platforms, and you should test them.</p>
</div>
<div class="paragraph">
<p>So how our functions perform? The following measurement and comparison is done in August 2017. We use __rdtsc to count cycles. For each test we loop 10 million times and measure the average cycle counts. We do 5 groups of tests and here is the result on Intel Haswell:</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/matrixinverse/fig1.jpg" alt="fig1.jpg" width="600">
</div>
<div class="title">Figure 1</div>
</div>
<div class="paragraph">
<p>The first three columns are our 3 versions of functions. The SIMD version of general 4x4 matrix inverse only cost less than half (44%) of the float version. And if you know the matrix is a transform matrix, it would cost less than a quarter (21%) of the float version. The more information you have as a programmer, the less work the machine need to do.</p>
</div>
<div class="paragraph">
<p>Think about that question again, do we really need to inverse a matrix. If we are using transform matrix and all we do is inverse transform a point or vector temporarily (so no need to save inverse matrix for other calculations), write an inverse transform function, which is faster than get inverse matrix and then transform. Hopefully this will help you choose which function to write or use, and how to make it fast.</p>
</div>
</div>
<div class="sect2">
<h3 id="_appendix_1">Appendix 1</h3>
<div class="paragraph">
<p>We have one more thing to do, prove that this method is valid regardless of our assumptions before derivation. Let’s look back what we assumed:</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)=\left( \begin{matrix} A_0 &amp; A_1 &amp; B_0 &amp; B_1 \\ A_2 &amp; A_3 &amp; B_2 &amp; B_3 \\ C_0 &amp; C_1 &amp; D_0 &amp; D_1 \\ C_2 &amp; C_3 &amp; D_2 &amp; D_3 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>Assume these properties: submatrix \(A\) and \(D\) are invertible, \(C\) and \(D\) commute (\(CD=DC\)).</p>
</div>
<div class="paragraph">
<p>Consider this example:</p>
</div>
<div class="stemblock">
<div class="content">
\[M'=\left( \begin{matrix} 1 &amp; 0 &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; 1 &amp; 0 \\ 0 &amp; 1 &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; 0 &amp; 1 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>Apparently none of our assumptions holds, but \(M'\) is invertible (its inverse is itself \((M')^{-1}=M'\)). If you use the above method to calculate the inverse of \(M'\), surprisingly you do get the correct result. Now we need to prove our calculation holds for any invertible 4x4 matrix, with no above assumptions.
Here’s our final form for calculation:</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{-1}={\left( \begin{matrix} A &amp; B \\ C &amp; D \\ \end{matrix} \right)}^{-1}=\frac{1}{\left|M\right|}{\left( \begin{matrix} (\left|D\right|A-B(D^{\#}C))^{\#} &amp; (\left|B\right|C-D(A^{\#}B)^{\#})^{\#} \\ (\left|C\right|B-A(D^{\#}C)^{\#})^{\#} &amp; (\left|A\right|D-C(A^{\#}B))^{\#} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="stemblock">
<div class="content">
\[\left|M\right|=\left|A\right|\left|D\right|+\left|B\right|\left|C\right|-\operatorname{tr}((A^{\#}B)(D^{\#}C))\]
</div>
</div>
<div class="paragraph">
<p>Remember the definition of adjugate matrix \(M^{-1}=\frac{1}{\left|M\right|}M^{\#}\), here we are going to prove</p>
</div>
<div class="stemblock">
<div class="content">
\[M^{\#}={\left( \begin{matrix} X &amp; Y \\ Z &amp; W \\ \end{matrix} \right)}={\left( \begin{matrix} (\left|D\right|A-B(D^{\#}C))^{\#} &amp; (\left|B\right|C-D(A^{\#}B)^{\#})^{\#} \\ (\left|C\right|B-A(D^{\#}C)^{\#})^{\#} &amp; (\left|A\right|D-C(A^{\#}B))^{\#} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="paragraph">
<p>Starting from proving the top left submatrix \(X=(\left|D\right|A-B(D^{\#}C))^{\#}\),</p>
</div>
<div class="paragraph">
<p>The adjugate matrix of \(M\) is the transpose of the cofactor matrix \(C\) of \(M\) (\(M^{\#}=C^{T}\)), and the cofactor matrix \(C=((-1)^{i+j} M_{ij})\) where \(M_{ij}\) is the determinant of the (i,j)-minor of \(M\). Thus \(M^{\#}= ((-1)^{j+i}M_{ji})\). Remember the <strong>TRANSPOSE</strong> here!
For details visit Adjugate Matrix wiki page.</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
X&amp;={\left( \begin{matrix} \left| \begin{matrix} A_3 &amp; B_2 &amp; B_3 \\ C_1 &amp; D_0 &amp; D_1 \\ C_3 &amp; D_2 &amp; D_3 \end{matrix} \right| &amp; -\left| \begin{matrix} A_1 &amp; B_0 &amp; B_1 \\ C_1 &amp; D_0 &amp; D_1 \\ C_3 &amp; D_2 &amp; D_3 \end{matrix} \right| \\ -\left| \begin{matrix} A_2 &amp; B_2 &amp; B_3 \\ C_0 &amp; D_0 &amp; D_1 \\ C_2 &amp; D_2 &amp; D_3 \end{matrix} \right| &amp; \left| \begin{matrix} A_0 &amp; B_0 &amp; B_1 \\ C_0 &amp; D_0 &amp; D_1 \\ C_2 &amp; D_2 &amp; D_3 \end{matrix} \right| \\ \end{matrix} \right)}\\
&amp;={\left( \begin{matrix} A_3\left|D\right|-B_2(D_3C_1-D_1C_3) + B_3(D_2C_1-D_0C_3) &amp; -(A_1\left|D\right|-B_0(D_3C_1-D_1C_3) + B_1(D_2C_1-D_0C_3)) \\ -(A_2\left|D\right|-B_2(D_3C_0-D_1C_2) + B_3(D_2C_0-D_0C_2)) &amp; A_0\left|D\right|-B_0(D_3C_0-D_1C_2) + B_1(D_2C_0-D_0C_2) \\ \end{matrix} \right)}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Remember</p>
</div>
<div class="stemblock">
<div class="content">
\[D^{\#}C={\left( \begin{matrix}{} {D_3}{C_0}-{D_1}{C_2} &amp;{D_3}{C_1}-{D_1}{C_3} \\ {D_0}{C_2}-{D_2}{C_0} &amp; {D_0}{C_3}-{D_2}{C_1} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="paragraph">
<p>We have</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
X&amp;={\left( \begin{matrix} A_3\left|D\right|-B_2{(D^{\#}C)}_1 - B_3{(D^{\#}C)}_3 &amp; -(A_1\left|D\right|-B_0{(D^{\#}C)}_1 - B_1{(D^{\#}C)}_3) \\ -(A_2\left|D\right|-B_2{(D^{\#}C)}_0 - B_3{(D^{\#}C)}_2) &amp; A_0\left|D\right|-B_0{(D^{\#}C)}_0 - B_1{(D^{\#}C)}_2 \\ \end{matrix} \right)} \\
&amp;={\left( \begin{matrix} A_0\left|D\right|-B_0{(D^{\#}C)}_0 - B_1{(D^{\#}C)}_2  &amp; A_1\left|D\right|-B_0{(D^{\#}C)}_1 - B_1{(D^{\#}C)}_3 \\ A_2\left|D\right|-B_2{(D^{\#}C)}_0 - B_3{(D^{\#}C)}_2 &amp; A_3\left|D\right|-B_2{(D^{\#}C)}_1 - B_3{(D^{\#}C)}_3 \\ \end{matrix} \right)}^{\#} \\
&amp;=(\left|D\right|A-B(D^{\#}C))^{\#}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Similarly we can prove other submatrices \(Y\),\(Z\),\(W\).</p>
</div>
<div class="paragraph">
<p>Now we need to prove the determinant form</p>
</div>
<div class="stemblock">
<div class="content">
\[\left|M\right|=\left|A\right|\left|D\right|+\left|B\right|\left|C\right|-\operatorname{tr}((A^{\#}B)(D^{\#}C))\]
</div>
</div>
<div class="paragraph">
<p>Again we start from the left hand side</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\left|M\right|&amp;=A_0 \left| \begin{matrix} A_3 &amp; B_2 &amp; B_3 \\ C_1 &amp; D_0 &amp; D_1 \\ C_3 &amp; D_2 &amp; D_3 \end{matrix} \right| - A_1 \left| \begin{matrix} A_2 &amp; B_2 &amp; B_3 \\ C_0 &amp; D_0 &amp; D_1 \\ C_2 &amp; D_2 &amp; D_3 \end{matrix} \right| + B_0 \left| \begin{matrix} A_2 &amp; A_3 &amp; B_3 \\ C_0 &amp; C_1 &amp; D_1 \\ C_2 &amp; C_3 &amp; D_3 \end{matrix} \right| - B_1 \left| \begin{matrix} A_2 &amp; A_3 &amp; B_2 \\ C_0 &amp; C_1 &amp; D_0 \\ C_2 &amp; C_3 &amp; D_2 \end{matrix} \right| \\
&amp;= A_0(A_3\left|D\right|-B_2(D_3C_1-D_1C_3) + B_3(D_2C_1-D_0C_3)) - A_1(A_2\left|D\right|-B_2(D_3C_0-D_1C_2) + B_3(D_2C_0-D_0C_2)) \\
&amp;+B_0(B_3\left|C\right|+A_2(D_3C_1-D_1C_3) - A_3(D_3C_0-D_1C_2)) - B_1(B_2\left|C\right|+A_2(D_2C_1-D_0C_3) - A_3(D_2C_0-D_0C_2)) \\
&amp;= \left|A\right|\left|D\right| + \left|B\right|\left|C\right|  \\
&amp;- ({A_3}{B_0}-{A_1}{B_2})({D_3}{C_0}-{D_1}{C_2}) - ({A_3}{B_1}-{A_1}{B_3})({D_0}{C_2}-{D_2}{C_0}) \\
&amp;- ({A_0}{B_2}-{A_2}{B_0})({D_3}{C_1}-{D_1}{C_3}) - ({A_0}{B_3}-{A_2}{B_1})({D_0}{C_3}-{D_2}{C_1})
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Remember</p>
</div>
<div class="stemblock">
<div class="content">
\[A^{\#}B={\left( \begin{matrix}{} {A_3}{B_0}-{A_1}{B_2} &amp;{A_3}{B_1}-{A_1}{B_3} \\ {A_0}{B_2}-{A_2}{B_0} &amp; {A_0}{B_3}-{A_2}{B_1} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="stemblock">
<div class="content">
\[D^{\#}C={\left( \begin{matrix}{} {D_3}{C_0}-{D_1}{C_2} &amp;{D_3}{C_1}-{D_1}{C_3} \\ {D_0}{C_2}-{D_2}{C_0} &amp; {D_0}{C_3}-{D_2}{C_1} \\ \end{matrix} \right)}\]
</div>
</div>
<div class="paragraph">
<p>We have</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\left|M\right|&amp;= \left|A\right|\left|D\right| + \left|B\right|\left|C\right|- ({(A^{\#}B)}_0{(D^{\#}C)}_0 + {(A^{\#}B)}_1{(D^{\#}C)}_2 + {(A^{\#}B)}_2{(D^{\#}C)}_1 + {(A^{\#}B)}_3{(D^{\#}C)}_3) \\
&amp;=\left|A\right|\left|D\right|+\left|B\right|\left|C\right|-\operatorname{tr}((A^{\#}B)(D^{\#}C))
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>We have proved the derivation result holds for any invertible 4x4 matrix. Why this is the case? I think it is due to special properties of 2x2 matrices. With that said I believe there must be a more elegant way to derive the same result, if you know such a way, please leave a comment below!</p>
</div>
</div>
<div class="sect2">
<h3 id="_appendix_2">Appendix 2</h3>
<div class="paragraph">
<p>This is column major area. The first two functions for transform matrix is exactly the same in column major. Here is the general matrix inverse and helper functions:</p>
</div>
<div class="listingblock">
<div class="content">
<pre class="prettyprint highlight"><code class="language-cpp" data-lang="cpp">// for column major matrix
// we use __m128 to represent 2x2 matrix as A = | A0  A2 |
//                                              | A1  A3 |
// 2x2 column major Matrix multiply A*B
__forceinline __m128 Mat2Mul(__m128 vec1, __m128 vec2)
{
	return
		_mm_add_ps(_mm_mul_ps(                     vec1, VecSwizzle(vec2, 0,0,3,3)),
		           _mm_mul_ps(VecSwizzle(vec1, 2,3,0,1), VecSwizzle(vec2, 1,1,2,2)));
}
// 2x2 column major Matrix adjugate multiply (A#)*B
__forceinline __m128 Mat2AdjMul(__m128 vec1, __m128 vec2)
{
	return
		_mm_sub_ps(_mm_mul_ps(VecSwizzle(vec1, 3,0,3,0), vec2),
		           _mm_mul_ps(VecSwizzle(vec1, 2,1,2,1), VecSwizzle(vec2, 1,0,3,2)));

}
// 2x2 column major Matrix multiply adjugate A*(B#)
__forceinline __m128 Mat2MulAdj(__m128 vec1, __m128 vec2)
{
	return
		_mm_sub_ps(_mm_mul_ps(                     vec1, VecSwizzle(vec2, 3,3,0,0)),
		           _mm_mul_ps(VecSwizzle(vec1, 2,3,0,1), VecSwizzle(vec2, 1,1,2,2)));
}

// Inverse function is the same no matter column major or row major
// this version treats it as column major
inline Matrix4 GetInverse(const Matrix4&amp; inM)
{
	// use block matrix method
	// A is a matrix, then i(A) or iA means inverse of A, A# (or A_ in code) means adjugate of A, |A| (or detA in code) is determinant, tr(A) is trace

	// sub matrices
	__m128 A = VecShuffle_0101(inM.mVec[0], inM.mVec[1]);
	__m128 C = VecShuffle_2323(inM.mVec[0], inM.mVec[1]);
	__m128 B = VecShuffle_0101(inM.mVec[2], inM.mVec[3]);
	__m128 D = VecShuffle_2323(inM.mVec[2], inM.mVec[3]);

#if 0
	__m128 detA = _mm_set1_ps(inM.m[0][0] * inM.m[1][1] - inM.m[0][1] * inM.m[1][0]);
	__m128 detC = _mm_set1_ps(inM.m[0][2] * inM.m[1][3] - inM.m[0][3] * inM.m[1][2]);
	__m128 detB = _mm_set1_ps(inM.m[2][0] * inM.m[3][1] - inM.m[2][1] * inM.m[3][0]);
	__m128 detD = _mm_set1_ps(inM.m[2][2] * inM.m[3][3] - inM.m[2][3] * inM.m[3][2]);
#else
	// determinant as (|A| |C| |B| |D|)
	__m128 detSub = _mm_sub_ps(
		_mm_mul_ps(VecShuffle(inM.mVec[0], inM.mVec[2], 0,2,0,2), VecShuffle(inM.mVec[1], inM.mVec[3], 1,3,1,3)),
		_mm_mul_ps(VecShuffle(inM.mVec[0], inM.mVec[2], 1,3,1,3), VecShuffle(inM.mVec[1], inM.mVec[3], 0,2,0,2))
		);
	__m128 detA = VecSwizzle1(detSub, 0);
	__m128 detC = VecSwizzle1(detSub, 1);
	__m128 detB = VecSwizzle1(detSub, 2);
	__m128 detD = VecSwizzle1(detSub, 3);
#endif

	// let iM = 1/|M| * | X  Y |
	//                  | Z  W |

	// D#C
	__m128 D_C = Mat2AdjMul(D, C);
	// A#B
	__m128 A_B = Mat2AdjMul(A, B);
	// X# = |D|A - B(D#C)
	__m128 X_ = _mm_sub_ps(_mm_mul_ps(detD, A), Mat2Mul(B, D_C));
	// W# = |A|D - C(A#B)
	__m128 W_ = _mm_sub_ps(_mm_mul_ps(detA, D), Mat2Mul(C, A_B));

	// |M| = |A|*|D| + ... (continue later)
	__m128 detM = _mm_mul_ps(detA, detD);

	// Y# = |B|C - D(A#B)#
	__m128 Y_ = _mm_sub_ps(_mm_mul_ps(detB, C), Mat2MulAdj(D, A_B));
	// Z# = |C|B - A(D#C)#
	__m128 Z_ = _mm_sub_ps(_mm_mul_ps(detC, B), Mat2MulAdj(A, D_C));

	// |M| = |A|*|D| + |B|*|C| ... (continue later)
	detM = _mm_add_ps(detM, _mm_mul_ps(detB, detC));

	// tr((A#B)(D#C))
	__m128 tr = _mm_mul_ps(A_B, VecSwizzle(D_C, 0,2,1,3));
	tr = _mm_hadd_ps(tr, tr);
	tr = _mm_hadd_ps(tr, tr);
	// |M| = |A|*|D| + |B|*|C| - tr((A#B)(D#C))
	detM = _mm_sub_ps(detM, tr);

	const __m128 adjSignMask = _mm_setr_ps(1.f, -1.f, -1.f, 1.f));
	// (1/|M|, -1/|M|, -1/|M|, 1/|M|)
	__m128 rDetM = _mm_div_ps(adjSignMask, detM);

	X_ = _mm_mul_ps(X_, rDetM);
	Y_ = _mm_mul_ps(Y_, rDetM);
	Z_ = _mm_mul_ps(Z_, rDetM);
	W_ = _mm_mul_ps(W_, rDetM);

	Matrix4 r;

	// apply adjugate and store, here we combine adjugate shuffle and store shuffle
	r.mVec[0] = VecShuffle(X_, Z_, 3,1,3,1);
	r.mVec[1] = VecShuffle(X_, Z_, 2,0,2,0);
	r.mVec[2] = VecShuffle(Y_, W_, 3,1,3,1);
	r.mVec[3] = VecShuffle(Y_, W_, 2,0,2,0);

	return r;
}</code></pre>
</div>
</div>
</div>]]></description><link>https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html</link><guid isPermaLink="true">https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html</guid><category><![CDATA[Math]]></category><category><![CDATA[SSE]]></category><dc:creator><![CDATA[Eric Zhang]]></dc:creator><pubDate>Sun, 03 Sep 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Conversion between View Space Linear and Screen Space Linear]]></title><description><![CDATA[<div class="paragraph">
<p>Some background of this problem: when doing screen space reflection, we need to shoot a ray from every pixel to find out reflection color. However we don’t want to march the ray in view space, because for example you have a ray pointing relatively inwards or outwards to the screen, we may march a large distance in view space but only move a pixel in screen space. Since all information are stored in pixels, we are wasting time sampling the same pixel. The other way around, if the ray is pointing to X/Y axis in view space, we might miss pixels if we march too fast in view space. So in general it is a better idea to march the ray in screen space instead.</p>
</div>
<div class="paragraph">
<p>Now the problem becomes if we march to a point in screen space somewhere between start pixel and end pixel, what is the corresponding point in view space? Of course we can use projection matrix to un-project it back to view space, but since we already know start point and end point in view space, we can do better than that. Let’s define our problem more specifically:</p>
</div>
<div class="paragraph">
<p>We know two points in view space \(A_v\) and \(B_v\), their projected point in screen space is \(A_s\) and \(B_s\). In screen space, given a linear interpolation ratio \(r_s\) and a point \({P_s}={A_s}(1-{r_s}) + {{B_s}{r_s}}\), what is its corresponding point \(P_v\) in view space, and the ratio \(r_v\) such that \({P_v}={A_v}(1-{r_v})+{B_v}{r_v}\).</p>
</div>
<div class="paragraph">
<p>We will use OpenGL coordinate (right-handed, Y up), so \(A_v\) and \(B_v\) has negative Z value.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/linearconversion/fig1.png" alt="fig1.png" width="400">
</div>
<div class="title">Figure 1</div>
</div>
<div class="paragraph">
<p>If you wonder why \(r_s\) and \(r_v\) are different, take a look at figure 1. \({r_v}=\frac{\left|AP\right|}{\left|AB\right|}\), while \({r_s}=\frac{\left|{A_1}{P_1}\right|}{\left|{A_1}{B_1}\right|}\), they are the same only if \(AB\) is parallel to \({A_1}{B_1}\).</p>
</div>
<div class="paragraph">
<p>First we want to simplify our problem a little bit. Recall how we project a point from view space to screen space, with projection matrix \(M\) and window size \(S\), for any point in view space \(Q_v\), we project it into clip space (range from -1 to 1): \({Q_c}=\frac{1}{(M{Q_v}).w}{M{Q_v}}\), then we remap it to screen space: \({Q_s}=\frac{1}{2}({Q_c} + 1)S\). Here \(S\) is constant, so screen space is just a linear combination of view space, which means the linear ratio doesn’t change from clip space to view space: \({P_c}={A_c}(1-{r_s})+{B_c}{r_s}\). We can ignore screen space remap, and just work on clip space.</p>
</div>
<div class="paragraph">
<p>Now take a look a figure 1 again, you can see since \(A{B_2}\) is parallel to \({A_1}{B_1}\), so \({r_s}=\frac{\left|{A_1}{P_1}\right|}{\left|{A_1}{B_1}\right|}=\frac{\left|A{P_2}\right|}{\left|A{B_2}\right|}\). It means the selection of near plane and far plane doesn’t affect linear ratio at all, because they are all parallel to each other. So we can choose a good pair of near and far plane to simplify the problem, for example let \(A_v\) be on near plane and \(B_v\) be on far plane: \(n=-{A_v}.z\), \(f=-{B_v}.z\) (Or the other way around if \(B_v\) is closer to camera, it doesn’t affect the result). Then in clip space we have</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{A_c}.z&amp;=-1\\
{B_c}.z&amp;=1\\
{P_c}.z&amp;=({A_c}.z)(1-{r_s})+({B_c}.z){r_s}=2{r_s}-1
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>With these special Z values in clip space, if we find out \({P_v}.z\) in view space, we can get the view space linear ratio \({r_v}=\frac{{P_v}.z-{A_v}.z}{{B_v}.z-{A_v}.z}\).</p>
</div>
<div class="paragraph">
<p>Remember how perspective projection matrix is built:</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left( \begin{matrix} M_{00} &amp; 0 &amp; 0 &amp; 0 \\ 0 &amp; M_{11} &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; -\frac{f+n}{f-n} &amp; -\frac{2fn}{f-n} \\ 0 &amp; 0 &amp; -1 &amp; 0 \\ \end{matrix} \right) = \left( \begin{matrix} M_{00} &amp; 0 &amp; 0 &amp; 0 \\ 0 &amp; M_{11} &amp; 0 &amp; 0 \\ 0 &amp; 0 &amp; \frac{{A_v}.z+{B_v}.z}{{A_v}.z-{B_v}.z} &amp; -\frac{2({A_v}.z)({B_v}.z)}{{A_v}.z-{B_v}.z} \\ 0 &amp; 0 &amp; -1 &amp; 0 \\ \end{matrix} \right)\]
</div>
</div>
<div class="paragraph">
<p>If we only look at Z value of \(P_c\):</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{P_c}.z&amp;=\frac{1}{(M{P_v}).w}(M{P_v}).z\\
&amp;=\frac{1}{-{P_v}.z}(\frac{{A_v}.z+{B_v}.z}{{A_v}.z-{B_v}.z}({P_v}.z)-\frac{2({A_v}.z)({B_v}.z)}{{A_v}.z-{B_v}.z})=2{r_s}-1
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Reorganize this equation, we can solve \({P_v}.z\)</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
({A_v}.z+{B_v}.z)({P_v}.z)-2({A_v}.z)({B_v}.z)&amp;=-(2{r_s}-1)({A_v}.z-{B_v}.z)({P_v}.z)\\
2(({B_v}.z)(1-{r_s})+({A_v}.z){r_s})({P_v}.z)&amp;=2({A_v}.z)({B_v}.z)
\end{align*}\]
</div>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{P_v}.z&amp;=\frac{({A_v}.z)({B_v}.z)}{({B_v}.z)(1-{r_s})+({A_v}.z){r_s}}\\
&amp;=\frac{1}{\frac{1}{{A_v}.z}(1-{r_s})+\frac{1}{{B_v}.z}{r_s}}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>When I got this result I was shocked, it turns out for any point \(P\) in screen space (or clip space) with a linear interpolation ratio \(r_s\) between two point \(A\) and \(B\), its linear depth \({P_v}.z\) is the reciprocal of a linear interpolation with the same ratio, but on the reciprocal of linear depth of \(A\) and \(B\). Seems too good to be true, but if you look back at the reason why view space linear is different than screen space linear, it is because we need to divide by the W value after multiplying projection matrix (multiplying projection matrix itself won’t change linear ratio), which is effectively multiply by \(\frac{1}{-{P_v}.z}\). It sort of make sense that in screen space, linear interpolation will operate on \(\frac{1}{{P_v}.z}\) instead, but sill amazing result.</p>
</div>
<div class="paragraph">
<p>Now we can calculate view space linear ratio:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{r_v}&amp;=\frac{{P_v}.z-{A_v}.z}{{B_v}.z-{A_v}.z}\\
&amp;=\frac{({A_v}.z){r_s}}{({B_v}.z)(1-{r_s})+({A_v}.z){r_s}}\\
&amp;=\frac{\frac{1}{{B_v}.z}{r_s}}{\frac{1}{{A_v}.z}(1-{r_s})+\frac{1}{{B_v}.z}{r_s}}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Inversely if we have view space linear ratio \(r_v\), we can calculate screen space linear ratio:</p>
</div>
<div class="stemblock">
<div class="content">
\[{r_s}=\frac{({B_v}.z){r_v}}{({A_v}.z)(1-{r_v})+({B_v}.z){r_v}}\]
</div>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://upload.wikimedia.org/wikipedia/commons/5/57/Perspective_correct_texture_mapping.jpg" alt="Perspective correct texture mapping.jpg" width="600">
</div>
<div class="title">Figure 2</div>
</div>
<div class="paragraph">
<p>As I’m writing this post, I just realize another classic use case of this conversion: <a href="https://en.wikipedia.org/wiki/Texture_mapping#Affine_texture_mapping">texture mapping</a>. If you simply use screen space interpolated UV to read texture, you will get distortion in perspective (called Affine texture mapping). To fix this you need to convert linear ratio into view space, which is exactly what we are doing here. You should be able to get same formula on wiki page for fixing UV on your own: \({u_α}=\frac{(1-α)\frac{u_0}{z_0} + α\frac{u_1}{z_1}}{(1-α)\frac{1}{z_0} + α\frac{1}{z_1}}\), where \(α\) is screen space ratio between two end point \(u_0\) and \(u_1\) with linear depth \(z_0\) and \(z_1\). We don’t usually think about fixing perspective UV because modern hardware does all the hard work for us already, however when we need to deal with screen space and view space, this conversion comes in handy.</p>
</div>]]></description><link>https://lxjk.github.io/2017/06/10/Conversion-between-View-Space-Linear-and-Screen-Space-Linear.html</link><guid isPermaLink="true">https://lxjk.github.io/2017/06/10/Conversion-between-View-Space-Linear-and-Screen-Space-Linear.html</guid><category><![CDATA[Math]]></category><category><![CDATA[Graphics]]></category><dc:creator><![CDATA[Eric Zhang]]></dc:creator><pubDate>Sat, 10 Jun 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[Calculate Minimal Bounding Sphere of Frustum]]></title><description><![CDATA[<div class="paragraph">
<p>I came across this problem when fixing shadow shimmering in Cascaded Shadow Map, but it could be used in many other cases. To describe what we are going to do more specifically: given a frustum with near plane \(n\), far plane \(f\), and field of view angle \(fov\), we need to calculate the center \(C\) and radius \(R\) of its minimal bounding sphere. In the calculation we will use right hand Y up coordinate system (forward is –Z axis) for 3D, and a similar coordinate system (forward is –Y axis) for 2D. For convenience we denote half FOV angle as \(α=\frac{fov}{2}\).</p>
</div>
<div class="paragraph">
<p>Let’s start with 2D situation. First consider an extreme case as shown in figure 1 (a), if near plane is very closed to far plane, that is \(n=f\). Obviously our minimal bounding sphere sits in the center of far plane, and the radius will be half far plane width:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
C&amp;=(0, -f)\\
R&amp;=f\tanα
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Consider another extreme case as shown in figure 1 (b), if near plane is \(0\), that is \(n=0.\) We are actually calculating the circumscribed circle of an isosceles triangle. The center of our bounding sphere sits on Y axis \(C=(0,-R)\). I’m not going to calculate the radius here, since we don’t need it for the following calculation, but it should be easy to figure out if you are interested.</p>
</div>
<div class="openblock float-group">
<div class="content">
<div class="imageblock left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/frustum/fig1a.png" alt="fig1a.png" width="300">
</div>
<div class="title">Figure 1 (a)</div>
</div>
<div class="imageblock left">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/frustum/fig1b.png" alt="fig1b.png" width="300">
</div>
<div class="title">Figure 1 (b)</div>
</div>
</div>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/frustum/fig2.png" alt="fig2.png" width="400">
</div>
<div class="title">Figure 2</div>
</div>
<div class="paragraph">
<p>Now let’s go back to the normal case as shown in figure 2. Intuitively we can guess the minimal bounding sphere should sits on somewhere between near plane center \({N_0}(0,-n)\) and far plane center \({F_0}(0,-f)\), and it should have the same distance to all 4 vertices of the frustum, that is \(\left|\vec{C{N_1}}\right|=\left|\vec{C{N_2}}\right|=\left|\vec{C{F_1}}\right|=\left|\vec{C{F_2}}\right|=R\). Can we actually find such a point? Since frustum is symmetrical along Y axis, if we found a point \(C\) such that \(\left|\vec{C{N_1}}\right|=\left|\vec{C{F_1}}\right|=R\), it is guaranteed we will have \(\left|\vec{C{N_2}}\right|=\left|\vec{C{F_2}}\right|=R\) as well. Now we just need to calculate that point.</p>
</div>
<div class="paragraph">
<p>Remember our half FOV angle is denoted as \(α\), we have \(\left|\vec{{N_0}{N_1}}\right|=n\tan⁡α\), \(\left|\vec{{F_0}{F_1}}\right|=f\tan⁡α\), \(\left|\vec{{N_0}{F_0}}\right|=f-n\). We denote \(x=\left|\vec{C{N_0}}\right|\), then \(\left|\vec{C{F_0}}\right|=f-n-x\), if we solve \(x\) we can find out radius \(R\):</p>
</div>
<div class="stemblock">
<div class="content">
\[\left|\vec{{C}{N_0}}\right|^{2}+\left|\vec{{N_0}{N_1}}\right|^{2}=\left|\vec{{C}{N_1}}\right|^{2}=R^{2}=\left|\vec{{C}{F_1}}\right|^{2}=\left|\vec{{C}{F_0}}\right|^{2}+\left|\vec{{F_0}{F_1}}\right|^{2}\\
\begin{align*}
x^{2}+n^{2}\tan^{2}α&amp;=(f-n-x)^{2}+f^{2}\tan^{2}α\\
x^{2}+n^{2}\tan^{2}α&amp;=(f-n)^{2}-2(f-n)x+x^{2}+f^{2}\tan^{2}α\\
x&amp;=\frac{1}{2}((f-n)+(f+n)\tan^{2}α)\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Now we can get the radius:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
R&amp;=\left|\vec{{C}{N_1}}\right|=\sqrt{\left|\vec{{C}{N_0}}\right|^{2}+\left|\vec{{N_0}{N_1}}\right|^{2}}\\
&amp;=\sqrt{x^{2}+n^{2}\tan^{2}α}\\
&amp;=\frac{1}{2}\sqrt{(f-n)^{2}+2(f^{2}+n^{2})\tan^{2}α+(f+n)^{2}\tan^{4}α}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>And the center will be</p>
</div>
<div class="stemblock">
<div class="content">
\[C=(0,-(n+x))=(0, -\frac{1}{2}(f+n)(1+\tan^{2}α))\]
</div>
</div>
<div class="paragraph">
<p>However there is a catch. As shown in figure 3, if the near plane is close to far plane, we might end up with a bounding sphere larger than we need. Sphere \(C\) is the sphere we calculated, but sphere \(C'={F_0}\) is the minimal sphere. This is because based on our calculation, we want to fit all 4 vertices on the sphere. In the normal case this will give us the minimal sphere, but when near plane is close to far plane, it is better to give up near plane since it is already inside the bounding sphere based on far plane only.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/frustum/fig3.png" alt="fig3.png" width="400">
</div>
<div class="title">Figure 3</div>
</div>
<div class="paragraph">
<p>What is the condition we should give up? Take another look at the sphere center we calculated, we got a larger sphere because our center is farther along Y axis than our far plane, so we can simply make sure our bounding sphere center sits within the frustum, if it is farther than the far plane, clamp it to the far plane. The condition that we should clamp is</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
-\frac{1}{2}(f+n)(1+\tan^{2}α)&amp;\leqslant-f\\
\tan^{2}α&amp;\geqslant\frac{f-n}{f+n}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>To rewrite our result</p>
</div>
<div class="paragraph">
<p>If \(\tan^{2}α\geqslant\frac{f-n}{f+n}\)</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
C&amp;=(0, -f)\\
R&amp;=f\tanα
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Else</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
C&amp;=(0, -\frac{1}{2}(f+n)(1+\tan^{2}α))\\
R&amp;=\frac{1}{2}\sqrt{(f-n)^{2}+2(f^{2}+n^{2})\tan^{2}α+(f+n)^{2}\tan^{4}α}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>We solved 2D situation, going into 3D it is actually no more difficult than 2D. As shown in figure 4, we just need to work on the 2D frustum defined by \({N_1}{N_3}{F_1}{F_3}\). Similarly, based on symmetricity of frustum, if we find a bonding sphere that \(N_1\) and \(F_1\) is on the sphere, it is guaranteed that all 8 vertices of frustum will be on the sphere. The only extra thing we need to do here is calculate \(\left|\vec{{N_0}{N_1}}\right|\) and \(\left|\vec{{F_0}{F_1}}\right|\). It depends on which is the major axis of your field of view. I will use X axis as major axis for example, let \(w\) be viewport width, \(h\) be viewport height, we have \({N_1}=(-n\tan⁡α,n\frac{h}{w}\tan⁡α)\), \(\left|\vec{{N_0}{N_1}}\right|=n\sqrt{1+\frac{h^{2}}{w^{2}}}\tan⁡α\), similarly \(\left|\vec{{F_0}{F_1}}\right|=f\sqrt{1+\frac{h^{2}}{w^{2}}}\tan⁡α\).</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/frustum/fig4.png" alt="fig4.png" width="500">
</div>
<div class="title">Figure 4</div>
</div>
<div class="paragraph">
<p>Here is our collusion. For 3D frustum with viewport width \(w\), height \(h\), near plane \(n\), far plane \(f\), X axis field of view angle \(fov\), let \(k=\sqrt{1+\frac{h^{2}}{w^{2}}}\tan⁡{\frac{fov}{2}}\), then the minimal bounding sphere is:</p>
</div>
<div class="paragraph">
<p>If \(k^{2}\geqslant\frac{f-n}{f+n}\)</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
C&amp;=(0,0,-f)\\
R&amp;=fk
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Else</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
C&amp;=(0,0,-\frac{1}{2}(f+n)(1+k^{2}))\\
R&amp;=\frac{1}{2}\sqrt{(f-n)^{2}+2(f^{2}+n^{2})k^{2}+(f+n)^{2}k^{4}}
\end{align*}\]
</div>
</div>]]></description><link>https://lxjk.github.io/2017/04/15/Calculate-Minimal-Bounding-Sphere-of-Frustum.html</link><guid isPermaLink="true">https://lxjk.github.io/2017/04/15/Calculate-Minimal-Bounding-Sphere-of-Frustum.html</guid><category><![CDATA[Math]]></category><category><![CDATA[Graphics]]></category><dc:creator><![CDATA[Eric Zhang]]></dc:creator><pubDate>Sat, 15 Apr 2017 00:00:00 GMT</pubDate></item><item><title><![CDATA[A Different Way to Understand Quaternion and Rotation]]></title><description><![CDATA[<div class="sect2">
<h3 id="_before_we_start">Before We Start</h3>
<div class="paragraph">
<p>Quaternion is widely used in game engines to represent 3D rotation. As a game engineer you might be using quaternion explicitly or implicitly in your daily work, but do you really understand what is going on under the hood when you are calling “rotate a vector” or “combine two rotations”? Why rotating a vector \(\vec{v}\) by quaternion \(q\) is calculated by a “sandwich” multiplication: \(q\vec{v}q^{-1}\) ? Why rotating by quaternion \(q_1\) then \(q_2\) is in the reversed order: \({q_2}{q_1}\), and can you visualize the result rotation axis and angle?</p>
</div>
<div class="paragraph">
<p>Understanding quaternions also leads to more efficient use of quaternion. For example, one common situation in game development is that we need an object to face its opposite direction. What we usually would do is to get the normal or forward vector, negate it, build a rotation out of it, and assign the rotation to the object. Later in this article we will see how much calculation we need to do in this process. However with the understanding of quaternion, we only need to do \(q=(q.y,-q.x,q.w,-q.z)\), and I will show you why.</p>
</div>
<div class="paragraph">
<p>In this article, I will try to avoid touching the algebra structure of quaternion, or having to imagine a 4 dimensional hyper sphere. I will start with a special rotation operation: flip, and use that to visualize quaternion in a more accessible and geometrical way. This article will be split into 2 parts. In Part 1 we will talk about the idea of quaternion, understand and visualize how it rotates a vector and how to compose rotations. In Part 2 we will talk about how to make use of our understanding in Part 1, and how it is used in game engine versus rotation matrix and Euler angles.</p>
</div>
<div class="paragraph">
<p>I would assume you are comfortable with 3D math (vector dot product and cross product) and basic trigonometry.</p>
</div>
<div id="toc" class="toc">
<div id="toctitle" class="title">Table of Contents</div>
<ul class="sectlevel2">
<li><a href="#_before_we_start">Before We Start</a></li>
<li><a href="#_part_1_theory">Part 1. Theory</a>
<ul class="sectlevel2">
<li><a href="#_quaternion_definition">Quaternion Definition</a></li>
<li><a href="#_rotation_and_flip">Rotation and Flip</a></li>
<li><a href="#_quaternion_and_flip">Quaternion and Flip</a></li>
<li><a href="#_flip_composition">Flip Composition</a></li>
<li><a href="#_flip_vector">Flip Vector</a></li>
<li><a href="#_rotate_vector">Rotate Vector</a></li>
<li><a href="#_rotation_composition">Rotation Composition</a></li>
<li><a href="#_summary_of_part_1">Summary of Part 1</a></li>
</ul>
</li>
<li><a href="#_part_2_application">Part 2. Application</a>
<ul class="sectlevel2">
<li><a href="#_calculation_of_vector_rotation">Calculation of Vector Rotation</a></li>
<li><a href="#_world_rotation_and_local_rotation">World Rotation and Local Rotation</a></li>
<li><a href="#_rotation_along_x_y_z_axis">Rotation along X/Y/Z Axis</a></li>
<li><a href="#_euler_angles_to_quaternion">Euler Angles to Quaternion</a></li>
<li><a href="#_quaternion_and_rotation_matrix">Quaternion and Rotation Matrix</a></li>
<li><a href="#_quaternion_to_euler_angles">Quaternion to Euler Angles</a></li>
<li><a href="#_summary_of_part_2">Summary of Part 2</a></li>
</ul>
</li>
<li><a href="#_appendix">Appendix</a>
<ul class="sectlevel2">
<li><a href="#_derive_quaternion_multiplication">Derive Quaternion Multiplication</a></li>
</ul>
</li>
</ul>
</div>
</div>
<div class="sect1">
<h2 id="_part_1_theory">Part 1. Theory</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_quaternion_definition">Quaternion Definition</h3>
<div class="paragraph">
<p>Quaternion is a 4-tuple denoted as \(q=(x,y,z,w)\). The length of a quaternion is defined as \(\left|q\right| =\sqrt{x^{2}+y^{2}+z^{2}+w^{2}}\), just as you would expected from a 4D vector.</p>
</div>
<div class="paragraph">
<p>In order to represent 3D rotation, we have a constraint on the quaternions we use. But before that I want to introduce Euler’s rotation theorem:</p>
</div>
<div class="paragraph">
<p><strong><em>Any rotation in 3D space is equivalent to a single rotation of angle \(θ\) along some axis \(\vec{v}\).</em></strong></p>
</div>
<div class="paragraph">
<p>We can use quaternion to describe this angle-axis rotation : \(q=(\sin⁡\frac{θ}{2}\vec{v}.x,\sin⁡\frac{θ}{2}\vec{v}.y,\sin⁡\frac{θ}{2}\vec{v}.z,\cos⁡\frac{θ}{2})\), or in a more compact form \(q=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\). We call this form the vector form of a quaternion, and we will use this form throughout this article. You might be thinking why we are using \(\frac{θ}{2}\) other than using \(θ\) directly. I will explain that in a later section.</p>
</div>
<div class="paragraph">
<p>It is easy to see the length of this quaternion \(\left|q\right|=\sqrt{\sin^{2}\frac{θ}{2}\left|\vec{v}\right|^{2}+\cos^{2}\frac{θ}{2}}=1\). (Remember the axis \(\vec{v}\) is a unit vector that \(\left|\vec{v}\right|=1\)). We call it a unit quaternion if the length \(\left|q\right|=1\). So we can rewrite Euler’s rotation theorem in quaternion term:</p>
</div>
<div class="paragraph">
<p><strong><em>Any 3D rotation is equivalent a unit quaternion \(q\) that \(\left|q\right|=1\).</em></strong></p>
</div>
<hr>
<div class="sidebarblock">
<div class="content">
<div class="title">Side Note</div>
<div class="paragraph">
<p>This claim actually has 2 sides. Let me go a little be more in details in math term:
(1). For any 3D rotation equivalent to a rotation angle \(θ\) along axis \(\vec{v}\), there exists a unit quaternion \(q=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\) to describe this rotation.
(2). For any unit quaternion \(q=(x,y,z,w)\), it describes a rotation of angle \(θ=2\cos^{-1}w\) along axis \(\vec{v}=\frac{1}{\sqrt{1-w^{2}}}(x,y,z)\).</p>
</div>
</div>
</div>
<hr>
<div class="paragraph">
<p>From now on, any quaternion \(q\) used in this article is by default a unit quaternion, and we will use \(q\) to describe rotations.</p>
</div>
</div>
<div class="sect2">
<h3 id="_rotation_and_flip">Rotation and Flip</h3>
<div class="paragraph">
<p>Now let’s forget quaternion for a minute, and focus on the nature of rotations. This part is the key to understand quaternion calculation in an easier way.</p>
</div>
<div class="paragraph">
<p><strong><em>Any 3D rotation can be composed by 2 flips along some axes.</em></strong></p>
</div>
<div class="paragraph">
<p>The reason we want to break down a rotation into flips, is that flips are much easier to think and calculate than general 3D rotation. We will start from flip and build our way to understand rotation.
Here is a loose proof of this idea. We define counter-clockwise as the positive direction of rotation. First consider a special case. We have a rotation \(q\), which rotates  \(+90^{\circ}\) along axis Z. Now I can say this rotation is the same as 2 flips along axis \(\vec{a}\) and \(\vec{b}\), both of them are on XY plane, and the angle from \(\vec{a}\) to \(\vec{b}\) is \(+45^{\circ}\).</p>
</div>
<div class="imageblock" style="text-align: center;float: right">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/quaternions/fig1_b.png" alt="fig1 b.png" width="300">
</div>
<div class="title">Figure 1 (b)</div>
</div>
<div class="imageblock" style="text-align: center;float: right">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/quaternions/fig1.png" alt="fig1.png" width="300">
</div>
<div class="title">Figure 1 (a)</div>
</div>
<div class="paragraph">
<p>We demonstrate this through Figure 1. For any vector \(\vec{v}\), the result of this rotation is \(\vec{v''}\) , which is the same as flip \(\vec{v}\) along axis \(\vec{a}\) and get \(\vec{v'}\), and then flip \(\vec{v'}\) along axis \(\vec{b}\) and get \(\vec{v''}\).</p>
</div>
<div class="paragraph">
<p>It doesn’t matter where \(\vec{a}\) and \(\vec{b}\) are on the XY plane, but the order must be kept. If we choose \(\vec{b}\) by rotating \(\vec{a}\) along axis Z by \(+45^{\circ}\) with the positive direction we defined above, then we must flip along \(\vec{a}\) first then along \(\vec{b}\) to get our target rotation. The order and the sign of angle is important, as you can easily see flip along \(\vec{b}\) first then along \(\vec{a}\) will give a different result.</p>
</div>
<div class="paragraph">
<p>It’s not hard to generalize to a rotation of any angle \(θ\) along Z axis. And in this case, the angle from \(\vec{a}\) to \(\vec{b}\) is \(\frac{θ}{2}\).</p>
</div>
<div class="paragraph">
<p>What if the axis is not Z axis but any unit vector \(\vec{u}\) ? It turns out to be very straight forward. \(\vec{a}\) and \(\vec{b}\) are no longer on XY plane but on a plane cross the origin and perpendicular to \(\vec{u}\), as in Figure 2.</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/quaternions/fig2.png" alt="fig2.png" width="400">
</div>
<div class="title">Figure 2</div>
</div>
<div class="paragraph">
<p>Now we can rewrite our flip composition rule in a more specific form:</p>
</div>
<div class="paragraph">
<p><strong><em>Any 3D rotation equivalent to rotating angle \(θ\) along axis \(\vec{v}\) can be represented as a sequence of 2 flips along axis \(\vec{a}\) and \(\vec{b}\), such that \(\vec{a}·\vec{v}=0\), \(\vec{b}·\vec{v}=0\) and the angle from \(\vec{a}\) to \(\vec{b}\): \(&lt;\vec{a},\vec{b}&gt;=\frac{θ}{2}\).</em></strong></p>
</div>
<div class="paragraph">
<p>This representation means if we fully understand flip, which is easier to visualize, we can fully understand rotation and quaternions, since any quaternion can be broken down to flips.</p>
</div>
</div>
<div class="sect2">
<h3 id="_quaternion_and_flip">Quaternion and Flip</h3>
<div class="paragraph">
<p>Now let’s recall the quaternion vector form \(q=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\). With the discussion of flips above, you can almost immediately see why we are using \(\frac{θ}{2}\) here.</p>
</div>
<div class="paragraph">
<p>Think about flips again. A flip along axis \(\vec{a}\) is also a \(180^{\circ}\) rotation along axis \(\vec{a}\). So this flip can be represented in quaternion term</p>
</div>
<div class="stemblock">
<div class="content">
\[q_a=(\sin⁡\frac{180^{\circ}}{2}\vec{a},\cos⁡\frac{180^{\circ}}{2})=(\vec{a},0)\]
</div>
</div>
<div class="paragraph">
<p>From now on we will use quaternion to represent flip. Actually any unit quaternion with \(q.w=0\) is a flip along axis \((q.x,q.y,q.z)\).</p>
</div>
</div>
<div class="sect2">
<h3 id="_flip_composition">Flip Composition</h3>
<div class="paragraph">
<p>Here we need to introduce the multiplication of general quaternion. Let \(q_1=(\vec{v_1},w_1)\), \(q_2=(\vec{v_2},w_2)\) then</p>
</div>
<div class="stemblock">
<div class="content">
\[{q_1}{q_2}=(\vec{v_1},w_1)(\vec{v_2},w_2)=(w_1\vec{v_2} + w_2\vec{v_1} + \vec{v_1}×\vec{v_2}, {w_1}{w_2}-\vec{v_1}·\vec{v_2})\]
</div>
</div>
<div class="paragraph">
<p>Note here \(q_1\) and \(q_2\) are not necessarily unit quaternion, so even I’m using vector form, there’s no need to put \(\sin⁡\frac{θ}{2}\) and \(\cos⁡\frac{θ}{2}\) as we did for unit quaternions. It’s hard to explain this definition without introducing the algebra structure of quaternions, so I will skip that. If you are interesting to know how this is derived, quaternion <a href="https://en.wikipedia.org/wiki/Quaternion#Definition">Wiki page</a> has a very straight forward introduction.</p>
</div>
<div class="paragraph">
<p>We are not going to use this general quaternion multiplication in Part 1. Here we only need to know a simpler form, the multiplication of flips. Let \(q_a=(\vec{a},0)\), \(q_b=(\vec{b},0)\) then</p>
</div>
<div class="stemblock">
<div class="content">
\[{q_a}{q_b}=(\vec{a},0)(\vec{b},0)=(\vec{a}×\vec{b},-\vec{a}·\vec{b})\]
</div>
</div>
<div class="paragraph">
<p>It is naturally derived from the general form, and we will be only using this multiplication in Part 1.</p>
</div>
<div class="paragraph">
<p>With flip multiplication defined, we can rewrite our flip composition rule again:</p>
</div>
<div class="paragraph">
<p><strong><em>Any 3D rotation \(q=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\) can be represented as a sequence of 2 flips \(q_a=(\vec{a},0)\) and \(q_b=(\vec{b},0)\), such that</em></strong></p>
</div>
<div class="stemblock">
<div class="content">
\[q=-{q_b}{q_a}\]
</div>
</div>
<div class="paragraph">
<p><strong><em>where \(\vec{a}·\vec{v}=0\), \(\vec{b}·\vec{v}=0\) and the angle from \(\vec{a}\) to \(\vec{b}\): \(&lt;\vec{a},\vec{b}&gt;=\frac{θ}{2}\).</em></strong></p>
</div>
<div class="paragraph">
<p>You might be thinking why it is not \(q= {q_a}{q_b}\) instead. We will show where the order and the negative sign coming from in the proof.</p>
</div>
<div class="paragraph">
<p>\(\vec{a}·\vec{b}=\cos&lt;\vec{a},\vec{b}&gt;\left|\vec{a}\right|\left|\vec{b}\right|=\cos\frac{θ}{2}\). Since \(\vec{a}·\vec{v}=0\), \(\vec{b}·\vec{v}=0\) and \(\left|\vec{v}\right|=1\), we have \(\vec{a}×\vec{b}=\sin&lt;\vec{a},\vec{b}&gt;\left|\vec{a}\right|\left|\vec{b}\right|\vec{v}=\sin\frac{θ}{2}\vec{v}\).</p>
</div>
<div class="paragraph">
<p>If you are not sure about the direction of the cross product, see Figure 2.</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
q&amp;=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\\
&amp;=(\vec{a}×\vec{b},\vec{a}·\vec{b})\\
&amp;=-(-\vec{a}×\vec{b},-\vec{a}·\vec{b})\\
&amp;=(\vec{b}×\vec{a},-\vec{a}·\vec{b})\\
&amp;=-{q_b}{q_a}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Here you can also clearly see why we are using \(\sin⁡\frac{θ}{2}\) and \(\cos⁡\frac{θ}{2}\) in quaternions.</p>
</div>
<div class="paragraph">
<p>One thing I need to mention here is the negation of a quaternion. \(q=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\), then</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{-q}&amp;=(-\sin⁡\frac{θ}{2}\vec{v},-\cos⁡\frac{θ}{2})\\
&amp;=(-\sin⁡\frac{2π-θ}{2}\vec{v},\cos⁡\frac{2π-θ}{2})\\
&amp;=(\sin⁡\frac{-(2π-θ)}{2}\vec{v},\cos⁡\frac{-(2π-θ)}{2})\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Recall that \(\sin⁡θ=\sin(π-θ)\) and \(-\cos⁡θ=\cos(π-θ)\), then \(-\sin⁡θ=\sin(-θ)\) and \(\cos⁡θ=\cos(-θ)\).</p>
</div>
<div class="paragraph">
<p>It shows that \(-q\) is a rotation along axis \(\vec{v}\) of angle \(-(2π-θ)\), which is exactly the same rotation as \(q\). For example if \(θ=90^{\circ}\) then \(-(2π-θ)=-270^{\circ}\), rotate \(90^{\circ}\) along axis \(\vec{v}\) is the same as rotate \(270^{\circ}\) degree but in the opposite direction along the same axis \(\vec{v}\).</p>
</div>
<div class="paragraph">
<p>The fact that \(q\) and \(–q\) represents the same rotation is usually called double-cover. However in our calculation I don’t want you to simply think \(q\) and \(–q\) are the same. They are different in quaternion space, even though they map to the same 3D rotation. The negative sign of the flip composition needs to be there.</p>
</div>
<div class="paragraph">
<p>The order of \(q=-{q_b}{q_a}\) on the right hand side is important. It means flip along \(\vec{a}\) first and then \(\vec{b}\). Actually all unit quaternion multiplication needs to be “read” from right to left when we are thinking about the order of applying those rotations.</p>
</div>
<hr>
<div class="sidebarblock">
<div class="content">
<div class="title">Side Note</div>
<div class="paragraph">
<p>We can however get rid of the negative sign by choosing \(\vec{a}\) and \(\vec{b}\) differently.</p>
</div>
<div class="paragraph">
<p><em>Any 3D rotation \(q=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\) can be represented as a sequence of 2 flips \(q_a=(\vec{a},0)\) and \(q_b=(\vec{b},0)\), such that
\(q={q_b}{q_a}\)
where \(\vec{a}·\vec{v}=0\), \(\vec{b}·\vec{v}=0\) and the angle from \(\vec{a}\) to \(\vec{b}\): \(&lt;\vec{a},\vec{b}&gt;=\frac{θ}{2}-π\).</em></p>
</div>
<div class="paragraph">
<p>It becomes harder to visualize \(\vec{a}\) and \(\vec{b}\) if we go this way, and the negative sign does not really introduce a lot of difficulties, so we will stick with that negative sign in this article.</p>
</div>
</div>
</div>
<hr>
</div>
<div class="sect2">
<h3 id="_flip_vector">Flip Vector</h3>
<div class="paragraph">
<p>Given a flip \(q_a=(\vec{a},0)\) and vector \(\vec{v}\), we are ready to calculate the result of the flip \(\vec{v'}\).</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/quaternions/fig3.png" alt="fig3.png" width="400">
</div>
<div class="title">Figure 3</div>
</div>
<div class="paragraph">
<p>According to flip definition, \(\vec{v}\), \(\vec{a}\) and \(\vec{v'}\) are on the same plane, and the angle \(&lt;\vec{v},\vec{a}&gt;=&lt;\vec{a},\vec{v'}&gt;\).</p>
</div>
<div class="paragraph">
<p>If we treat \(\vec{v}\) and \(\vec{v'}\) as the axis of flip \(q_v=(\vec{v},0)\) and \(q_v'=(\vec{v'},0)\). From our flip composition rule, flipping along axis \(\vec{v}\) then \(\vec{a}\) should give us the same rotation as flipping along axis \(\vec{a}\) then \(\vec{v'}\).</p>
</div>
<div class="paragraph">
<p>We can actually calculate the result rotation. Let \(&lt;\vec{v},\vec{a}&gt;=&lt;\vec{a},\vec{v'}&gt;=\frac{θ}{2}\), \(\vec{u}=\frac{\vec{v}×\vec{a}}{\left|\vec{v}×\vec{a}\right|}=\frac{\vec{a}×\vec{v'}}{\left|\vec{a}×\vec{v'}\right|}\). Then the result rotation is of angle \(θ\) along axis \(\vec{u}\).</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
q&amp;=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\\
&amp;=-{q_a}{q_v}\\
&amp;=-{q_v'}{q_a}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>This gives \({q_v'}{q_a}={q_a}{q_v}\).</p>
</div>
<div class="paragraph">
<p>(Here \(\left|\vec{v}×\vec{a}\right|=\left|\vec{a}×\vec{v'}\right|=\sin\frac{θ}{2}\).If you are not sure what’s going on here, go back <a href="#_flip_composition">Flip Composition</a> and read the proof)</p>
</div>
<div class="paragraph">
<p>Now we need to introduce the inverse of a quaternion. The inverse of \(q\) is denoted as \(q^{-1}\), such that \(qq^{-1}=q^{-1}q=(\vec{0},1)\).</p>
</div>
<div class="paragraph">
<p>\(I=(\vec{0},1)\) is called identity quaternion, means no rotation at all. You can think of \(I=(\sin⁡0\vec{v},\cos⁡0)\), which means rotating \(0^{\circ}\) along any axis \(\vec{v}\). We haven’t gone into quaternion multiplication or rotation composition, but it’s not hard to see for any quaternion \(q\), \(qI=Iq=q\).</p>
</div>
<div class="paragraph">
<p>In the case of unit quaternion, the idea of inversed quaternion is if you apply a rotation, then apply its inverse, the result should be no rotation at all. And it is the same if you apply an inversed rotation then apply the original one.</p>
</div>
<div class="paragraph">
<p>For any unit quaternion \(q=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\), then \(q^{-1}=(-\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\). You can understand this in two ways, either \(q^{-1}=(\sin⁡\frac{θ}{2}(-\vec{v}),\cos⁡\frac{θ}{2})\) or \(q^{-1}=(\sin⁡\frac{-θ}{2}\vec{v},\cos⁡\frac{-θ}{2})\). \(q^{-1}\) is either a rotation of angle \(θ\) along axis \(-\vec{v}\), or a rotation of angle \(–θ\) along axis \(\vec{v}\). Either way it will cancel out the original rotation.</p>
</div>
<div class="paragraph">
<p>I will give a quick proof in the case of flip. You can try extend this proof to general unit quaternion. If \(q_a=(\vec{a},0)\), \(q_a^{-1}=(-\vec{a},0)\), we have</p>
</div>
<div class="stemblock">
<div class="content">
\[{q_a}{q_a^{-1}}=(\vec{a}×-\vec{a},-(\vec{a}·-\vec{a}))=(\vec{0},1)\]
</div>
</div>
<div class="paragraph">
<p>(Make sure you understand the difference between \(q^{-1}\) and \(–q\). Read <a href="#_flip_composition">Flip Composition</a> about quaternion negation if you are not sure.)</p>
</div>
<div class="paragraph">
<p>We can go back to previous result of flipping vector \({q_v'}{q_a}={q_a}{q_v}\). Apply inverse flip of \(q_a\) on both side, the equation becomes</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{q_v'}{q_a}{q_a^{-1}}&amp;={q_a}{q_v}{q_a^{-1}}\\
q_v'&amp;={q_a}{q_v}{q_a^{-1}}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>This provides us a way to calculate the result of flip. Since we only need the vector part of the result, we can denote this as</p>
</div>
<div class="stemblock">
<div class="content">
\[\vec{v'}={q_a}\vec{v}{q_a^{-1}}\]
</div>
</div>
<div class="paragraph">
<p>When we put a vector \(\vec{v}\) in quaternion multiplication, we are implicitly making that vector the axis of a flip to stuff it into a quaternion \((\vec{v},0)\). This is how the “sandwich” multiplication form comes from, but only in the form of flip. We will prove that our result holds the same for any rotation in the next section.</p>
</div>
</div>
<div class="sect2">
<h3 id="_rotate_vector">Rotate Vector</h3>
<div class="paragraph">
<p>We know any 3D rotation \(q\) can be broken down into 2 flips \(q= -{q_b}{q_a}\), which means flipping along \(\vec{a}\) first and then \(\vec{b}\). So for a vector \(\vec{v}\), we apply the first flip and get</p>
</div>
<div class="stemblock">
<div class="content">
\[\vec{v'}={q_a}\vec{v}{q_a^{-1}}\]
</div>
</div>
<div class="paragraph">
<p>Then we apply the second flip \(\vec{v'}\) and get</p>
</div>
<div class="stemblock">
<div class="content">
\[\vec{v''}={q_b}\vec{v'}{q_b^{-1}}\]
</div>
</div>
<div class="paragraph">
<p>So the final result is</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{v''}&amp;={q_b}{q_a}\vec{v}{q_a^{-1}}{q_b^{-1}}\\
&amp;=({q_b}{q_a})\vec{v}({q_b}{q_a})^{-1}\\
&amp;=(-q)\vec{v}(-q^{-1})\\
&amp;=q\vec{v}q^{-1}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Here you can see why \(q= -{q_b}{q_a}\) needs to be in this order.</p>
</div>
<div class="paragraph">
<p>One thing we need to prove</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{q_a^{-1}}{q_b^{-1}}&amp;=(-\vec{a},0)(-\vec{b},0)\\
&amp;=(-\vec{a}×-\vec{b},-(-\vec{a})·(-\vec{b}))\\
&amp;=(\vec{a}×\vec{b},-\vec{a}·\vec{b})\\
&amp;=(-\vec{b}×\vec{a},-\vec{b}·\vec{a})\\
&amp;=({q_b}{q_a})^{-1}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>At this point, we fully explained how to rotate a vector using quaternion.</p>
</div>
</div>
<div class="sect2">
<h3 id="_rotation_composition">Rotation Composition</h3>
<div class="paragraph">
<p>Given rotation \(q_1\) and \(q_2\), from the formula in the previous section, if we rotate vector \(\vec{v}\) by \(q_1\) first then by \(q_2\), we have</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{v'}&amp;={q_1}\vec{v}{q_1^{-1}}\\
\vec{v''}&amp;={q_2}\vec{v'}{q_2^{-1}}\\
&amp;={q_2}{q_1}\vec{v}{q_1^{-1}}{q_2^{-1}}\\
&amp;=({q_2}{q_1})\vec{v}({q_2}{q_1})^{-1}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>It is the same as apply the combined rotation \(q={q_2}{q_1}\). Be careful about the multiplication order.</p>
</div>
<div class="paragraph">
<p>Again we need to prove \({q_1^{-1}}{q_2^{-1}}=({q_2}{q_1})^{-1}\), but we will do this later. This equation is actually very easy to understand in geometric term. We have a combined rotation \(q={q_2}{q_1}\) that rotates \(q_1\) first then rotates \(q_2\). If we want to undo this rotation, which means apply the inverse \(q^{-1}=({q_2}{q_1})^{-1}\), we need to undo \(q_2\) first then undo \(q_1\), that is effectively \(q_1^{-1}q_2^{-1}\).</p>
</div>
<div class="paragraph">
<p>What does it really mean to combine 2 rotations, can we visualize the rotation axis and angle of the result? By converting rotations to flips we actually do that.</p>
</div>
<div class="paragraph">
<p>Let \(q_1=(\sin⁡\frac{θ_1}{2}\vec{v_1},\cos⁡\frac{θ_1}{2})\), \(q_2=(\sin⁡\frac{θ_2}{2}\vec{v_2},\cos⁡\frac{θ_2}{2})\), we need to choose a special flip break down, such that they share one flip: \(q_1=-{q_c}{q_a}\), \(q_2=-{q_b}{q_c}\).</p>
</div>
<div class="paragraph">
<p>Can we find such a break down? Remember the rule of flip composition requires the flip axis to be perpendicular to the rotation axis, that is \(\vec{c}·\vec{v_1}=0\), \(\vec{c}·\vec{v_2}=0\), we can choose \(\vec{c}=\frac{\vec{v_1}×\vec{v_2}}{\left|\vec{v_1}×\vec{v_2}\right|}\).</p>
</div>
<div class="paragraph">
<p>Based on \(\vec{c}\) we can find out the other two axes: rotate \(\vec{c}\) along axis \(\vec{v_1}\) by angle \(-\frac{θ_1}{2}\) results in \(\vec{a}\); rotate \(\vec{c}\) along axis \(\vec{v_2}\) by angle \(\frac{θ_2}{2}\) results in \(\vec{b}\). This process is demonstrated in Figure 4.</p>
</div>
<div class="paragraph">
<p>Now we have \(\vec{a}·\vec{v_1}=0\), \(\vec{c}·\vec{v_1}=0\), \(&lt;\vec{a},\vec{c}&gt;=\frac{θ_1}{2}\) and \(\vec{c}·\vec{v_2}=0\), \(\vec{b}·\vec{v_2}=0\), \(&lt;\vec{c},\vec{b}&gt;=\frac{θ_2}{2}\). Our break down \(q_1=-{q_c}{q_a}\), \(q_2=-{q_b}{q_c}\) is valid. The combined rotation can be written as</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
q&amp;={q_2}{q_1}\\
&amp;=(-{q_b}{q_c})(-{q_c}{q_a})\\
&amp;={q_b}({q_c}{q_c}){q_a}\\
&amp;=-{q_b}{q_a}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Here we need to prove this</p>
</div>
<div class="stemblock">
<div class="content">
\[{q_c}{q_c}=(\vec{c},0)(\vec{c},0)=(\vec{c}×\vec{c},-(\vec{c}·\vec{c}))=(\vec{0},-1)=-I\]
</div>
</div>
<div class="paragraph">
<p>It shows that the combined rotation can be composed by flip \(q_a\) and \(q_b\), which tells the combined rotation is a rotation of angle \(2&lt;\vec{a},\vec{b}&gt;\) along axis \(\vec{u}=\frac{\vec{a}×\vec{b}}{\left|\vec{a}×\vec{b}\right|}\).</p>
</div>
<div class="imageblock" style="text-align: center">
<div class="content">
<img src="https://github.com/lxjk/lxjk.github.io/raw/master/images/quaternions/fig4.png" alt="fig4.png" width="400">
</div>
<div class="title">Figure 4</div>
</div>
<div class="paragraph">
<p>In Figure 4, Blue plane is based on \(\vec{v_1}\) and \(\vec{v_1}\), \(\vec{c}\) is perpendicular to that plane.
Orange plane is based on \(\vec{a}\) and \(\vec{b}\), the result rotation axis \(\vec{u}\) is perpendicular to that plane.</p>
</div>
<div class="paragraph">
<p>With the same method, let’s prove the thing we left out:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{q_1^{-1}}{q_2^{-1}}&amp;=(-{q_c}{q_a})^{-1}(-{q_b}{q_c})^{-1}\\
&amp;={q_a^{-1}}{q_c^{-1}}{q_c^{-1}}{q_b^{-1}}\\
&amp;=-{q_a^{-1}}{q_b^{-1}}\\
&amp;=(-{q_b}{q_a})^{-1}\\
&amp;=({q_b}{q_c}{q_c}{q_a})^{-1}\\
&amp;=({q_2}{q_1})^{-1}\\
\end{align*}\]
</div>
</div>
</div>
<div class="sect2">
<h3 id="_summary_of_part_1">Summary of Part 1</h3>
<div class="paragraph">
<p>In Part 1, we covered the definition of quaternion \(q=(x,y,z,w)\), the vector form of quaternion \(q=(\vec{v},w)\), unit quaternion \(q=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\) and how it is used to represent a rotation.</p>
</div>
<div class="paragraph">
<p>We also talked about negation of quaternion \(–q\), and its double cover property; the inverse of quaternion \(q^{-1}\) and identity quaternion \(I=(\vec{0},1)\).</p>
</div>
<div class="paragraph">
<p>We use quaternion to represent flip \(q_a=(\vec{a},0)\), and derive the rule of flip composition \(q=-{q_b}{q_a}\). Based on this rule, we visualized and proved how quaternion rotates a vector by \(\vec{v'}=q\vec{v}q^{-1}\) and how rotation gets composed by \(q={q_2}{q_1}\).</p>
</div>
<div class="paragraph">
<p>We slightly touched quaternion multiplication, and we proved an important equation \({q_1^{-1}}{q_2^{-1}}=({q_2}{q_1})^{-1}\).</p>
</div>
<div class="paragraph">
<p>Hopefully you have a clear idea to think in quaternions now before we head to the application part. Although I’m not going to discuss quaternion’s algebra structure, it definitely helps deepening your understanding. If you are interested to know, quaternion <a href="https://en.wikipedia.org/wiki/Quaternion#Definition">Wiki page</a> is a good resource.</p>
</div>
<div class="paragraph">
<p>It also provides a good way to visualize a quaternion in 4D <a href="https://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation#Quaternion_rotation_operations">here</a>. Our idea of breaking down rotations into 2 flips essentially means all quaternions in 4D space can be generated by elements in it’s largest 3D sub-space  \(\left\{q.w=0\right\}\), and all elements in this sub-space are flips.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_part_2_application">Part 2. Application</h2>
<div class="sectionbody">
<div class="paragraph">
<p>In Part 2 we will be talking about using quaternion to solve real problems in programming. I will be using general vector form \(q=(\vec{v},w)\) even for unit quaternion instead of \(q=(\sin⁡\frac{θ}{2}\vec{v},\cos⁡\frac{θ}{2})\), since it is closed to the actual data format.</p>
</div>
<div class="paragraph">
<p>Recall the definition of general quaternion multiplication we mentioned in Part 1. Let \(q_1=(\vec{v_1},w_1)\), \(q_2=(\vec{v_2},w_2)\) then</p>
</div>
<div class="stemblock">
<div class="content">
\[{q_1}{q_2}=(\vec{v_1},w_1)(\vec{v_2},w_2)=(w_1\vec{v_2} + w_2\vec{v_1} + \vec{v_1}×\vec{v_2}, {w_1}{w_2}-\vec{v_1}·\vec{v_2})\]
</div>
</div>
<div class="paragraph">
<p>We will be using this a lot in the following sections.</p>
</div>
<div class="paragraph">
<p>The coordinate system we use is Z up and right-handed.</p>
</div>
<div class="sect2">
<h3 id="_calculation_of_vector_rotation">Calculation of Vector Rotation</h3>
<div class="paragraph">
<p>In this section we will derive the formula which most game engine are using to rotate a vector with quaternion. Given a rotation \(q=(\vec{v},w)\) and vector \(\vec{p}\), the rotation result is</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{p'}&amp;=q\vec{p}q^{-1}\\
&amp;=(\vec{v},w)(\vec{p},0)(-\vec{v},w)\\
&amp;=(w\vec{p}+\vec{v}×\vec{p},-\vec{v}·\vec{p})(-\vec{v},w)\\
&amp;=((\vec{v}·\vec{p})\vec{v}+w^{2}\vec{p}+2w(\vec{v}×\vec{p})+\vec{v}×(\vec{v}×\vec{p}),0)\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Since we only want the vector part</p>
</div>
<div class="stemblock">
<div class="content">
\[\vec{p'}=(\vec{v}·\vec{p})\vec{v}+w^{2}\vec{p}+2w(\vec{v}×\vec{p})+\vec{v}×(\vec{v}×\vec{p})\]
</div>
</div>
<div class="paragraph">
<p>Here we need to use the following equation of cross product to simplify the result</p>
</div>
<div class="stemblock">
<div class="content">
\[\vec{a}×(\vec{b}×\vec{c})=(\vec{a}·\vec{c})\vec{b}-(\vec{a}·\vec{b})\vec{c}\]
</div>
</div>
<div class="paragraph">
<p>So in our case</p>
</div>
<div class="stemblock">
<div class="content">
\[\vec{v}×(\vec{v}×\vec{p})=(\vec{v}·\vec{p})\vec{v}-(\vec{v}·\vec{v})\vec{p}=(\vec{v}·\vec{p})\vec{v}-\left|\vec{v}\right|^{2}\vec{p}\]
</div>
</div>
<div class="paragraph">
<p>Remember \(q\) is unit quaternion, so \(\left|\vec{v}\right|^{2}+w^{2}=1\). We have</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{v}×(\vec{v}×\vec{p})&amp;=(\vec{v}·\vec{p})\vec{v}+w^{2}\vec{p}-\vec{p}\\
(\vec{v}·\vec{p})\vec{v}+w^{2}\vec{p}&amp;=\vec{v}×(\vec{v}×\vec{p})+\vec{p}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Now we can simplify our rotation result to get rid of the dot product</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{p'}&amp;=\vec{p}+2w(\vec{v}×\vec{p})+2\vec{v}×(\vec{v}×\vec{p})\\
&amp;=\vec{p}+2(\vec{v}×(\vec{v}×\vec{p}+w\vec{p}))
\end{align*}\]
</div>
</div>
</div>
<div class="sect2">
<h3 id="_world_rotation_and_local_rotation">World Rotation and Local Rotation</h3>
<div class="paragraph">
<p>Let’s look at rotation composition again. The combined rotation \(q={q_2}{q_1}\) means rotating \(q_1\) first then \(q_2\). This right to left order only holds when \(q_2\) is a world rotation, or in another term the rotation axis \(\vec{v_2}\) of \(q_2\) is in world space. Then what if \(q_2\) is a local rotation, which means the rotation axis \(\vec{v_2}\) of \(q_2\) is in the local space after \(q_1\) rotation.</p>
</div>
<div class="paragraph">
<p>As an example of local rotation, imagine yourself lying down on the ground and facing up, now flip around to face the ground. What you just did is a \(180^{\circ}\) local rotation along Z axis. The rotation axis is not the world Z axis (which will be the up direction) but your local Z axis.</p>
</div>
<div class="paragraph">
<p>If we have an object with rotation \({q_1}=(\vec{v_1},{w_1})\), now we want to apply a local rotation \({q_{2L}}=(\vec{v_2},{w_2})\). We can convert the local rotation \(q_{2L}\) to world rotation \(q_{2W}\) by converting its rotation axis into world space. Since \(\vec{v_2}\) is in local space of \(q_1\), converting it into world space means rotating it by \(q_1\), so the world space rotation axis is \(\vec{v_{2W}}={q_1}\vec{v_2}{q_1}^{-1}\).</p>
</div>
<div class="paragraph">
<p>(Technically the rotation axis is \(\frac{\vec{v_2}}{\left|\vec{v_2}\right|}\), but since rotation angle is the same for local and world space, \(\left|\vec{v_2}\right|=\left|\vec{v_{2W}}\right|=\sin⁡\frac{θ}{2}\), we can just use \(\vec{v_2}\) in the calculation).</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{q_{2W}}&amp;=(\vec{v_{2W}},{w_2})\\
&amp;=(\vec{v_{2W}},0)+(\vec{0},{w_2})\\
&amp;={q_1}(\vec{v_2},0){q_1}^{-1}+{q_1}(\vec{0},{w_2}){q_1}^{-1}\\
&amp;={q_1}(\vec{v_2},{w_2}){q_1}^{-1}\\
&amp;={q_1}{q_{2L}}{q_1}^{-1}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>This equation tells us to convert a local rotation to world rotation, we can do the same as rotating a vector by using “sandwich” multiplication \({q_{2W}}={q_1}{q_{2L}}{q_1}^{-1}\). It also makes sense in geometric term. If we undo \(q_1\), now local space and world space are the same, we can then apply \(q_{2L}\) and apply \(q_1\) again to get the world rotation we want.</p>
</div>
<div class="paragraph">
<p>One thing I need to prove here</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
{q_1}(\vec{0},{w_2}){q_1}^{-1}&amp;=(\vec{v_1},{w_1})(\vec{0},{w_2})(-\vec{v_1},{w_1})\\
&amp;=({w_2}\vec{v_1},{w_1}{w_2})(-\vec{v_1},{w_1})\\
&amp;=(\vec{0},{w_2}(\left|\vec{v_1}\right|^{2}+{w_1}^{2}))\\
&amp;=(\vec{0},{w_2})\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>With the world rotation \(q_{2W}\) calculated, result of combined rotation is</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
q&amp;={q_{2W}}{q_1}\\
&amp;={q_1}{q_{2L}}{q_1}^{-1}{q_1}\\
&amp;={q_1}{q_{2L}}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>This means when we rotate \(q_1\) then rotate \(q_2\), if \(q_2\) is in world space, then combined rotation is \(q={q_2}{q_1}\) (right to left); if \(q_2\) is in local space of \(q_1\), the combined rotation is \(q={q_1}{q_2}\) (left to right).</p>
</div>
</div>
<div class="sect2">
<h3 id="_rotation_along_x_y_z_axis">Rotation along X/Y/Z Axis</h3>
<div class="paragraph">
<p>We can now go back to the problem I mentioned at the very beginning: we need an object to face its opposite direction. More clearly we have an object with rotation \(q=((x,y,z),w)\), and we want to flip it along local Z axis, that is rotate it \(180^{\circ}\) along its local Z axis. This extra rotation is denoted as \(q'=((0,0,\sin\frac{180^{\circ}}{2}),\cos\frac{180^{\circ}}{2})=((0,0,1),0)\). Based on local rotation composition we proved in previous section, the result is</p>
</div>
<div class="stemblock">
<div class="content">
\[q_Z=qq'=((x,y,z),w)((0,0,1),0)=((y,-x,w),-z)\]
</div>
</div>
<div class="paragraph">
<p>If we generalize the angle to \(θ\), then \(q'=((0,0,\sin\frac{θ}{2}),\cos\frac{θ}{2})\), then the result is:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
q_{(Z,θ)}=qq'&amp;=((x,y,z),w)((0,0,\sin\frac{θ}{2},\cos\frac{θ}{2})\\
&amp;=((x,y,z),w)(((0,0,0),1)\cos\frac{θ}{2}+((0,0,1),0)\sin\frac{θ}{2})\\
&amp;=((x,y,z),w)\cos\frac{θ}{2}+((y,-x,w),-z)\sin\frac{θ}{2}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>If we want to flip along world Z axis instead, we just need to change the multiplication order:</p>
</div>
<div class="stemblock">
<div class="content">
\[q_Z=q'q=((0,0,1),0)((x,y,z),w)=((-y,x,w),-z)\]
</div>
</div>
<div class="paragraph">
<p>We can use the same method to generalize the angle to \(θ\), and let \(q'=((0,0,\sin\frac{θ}{2}),\cos\frac{θ}{2})\),</p>
</div>
<div class="stemblock">
<div class="content">
\[q_{(Z,θ)}=q'q=((x,y,z),w)\cos\frac{θ}{2}+((-y,x,w),-z)\sin\frac{θ}{2})\]
</div>
</div>
<div class="paragraph">
<p>It is easy to extend the result to X and Y axis. I list the result summary as the following.</p>
</div>
<div class="paragraph">
<p>Flip along local axis:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
q_X&amp;=(w,z,-y,-x)\\
q_Y&amp;=(-z,w,x,-y)\\
q_Z&amp;=(y,-x,w,-z)\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Rotate \(θ\) along local axis:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
q_{(X,θ)}=(x,y,z,w)\cos\frac{θ}{2}+(w,z,-y,-x)\sin\frac{θ}{2}\\
q_{(Y,θ)}=(x,y,z,w)\cos\frac{θ}{2}+(-z,w,x,-y)\sin\frac{θ}{2}\\
q_{(Z,θ)}=(x,y,z,w)\cos\frac{θ}{2}+(y,-x,w,-z)\sin\frac{θ}{2}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Flip along world axis:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
q_X&amp;=(w,-z,y,-x)\\
q_Y&amp;=(z,w,-x,-y)\\
q_Z&amp;=(-y,x,w,-z)\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Rotate \(θ\) along world axis:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
q_{(X,θ)}=(x,y,z,w)\cos\frac{θ}{2}+(w,-z,y,-x)\sin\frac{θ}{2}\\
q_{(Y,θ)}=(x,y,z,w)\cos\frac{θ}{2}+(z,w,-x,-y)\sin\frac{θ}{2}\\
q_{(Z,θ)}=(x,y,z,w)\cos\frac{θ}{2}+(-y,x,w,-z)\sin\frac{θ}{2}\\
\end{align*}\]
</div>
</div>
</div>
<div class="sect2">
<h3 id="_euler_angles_to_quaternion">Euler Angles to Quaternion</h3>
<div class="paragraph">
<p>Quaternion is an instruction for rotation: rotate angle \(θ\) along axis \(\vec{v}\). Euler angles is a sequence of 3 instructions: rotate yaw angle along world axis Z, then rotate pitch angle along local axis Y, then rotate roll angle along local axis X.</p>
</div>
<div class="paragraph">
<p>It is very natural to see how Euler angles can be converted to quaternion. If we use \(Y,P,R\) for angle yaw pitch and roll, then these 3 rotations to can be denoted in quaternion \(q_Y=(0,0,\sin\frac{Y}{2},\cos\frac{Y}{2})\), \(q_P=(0,\sin\frac{P}{2},0,\cos\frac{P}{2})\), \(q_R=(\sin\frac{R}{2},0,0,\cos\frac{R}{2})\). Since pitch and roll are local rotations, the combined rotation will be</p>
</div>
<div class="stemblock">
<div class="content">
\[q={q_Y}{q_P}{q_R}=(0,0,\sin\frac{Y}{2},\cos\frac{Y}{2})(0,\sin\frac{P}{2},0,\cos\frac{P}{2})(\sin\frac{R}{2},0,0,\cos\frac{R}{2})\]
</div>
</div>
<div class="paragraph">
<p>Solving this we have the conversion from Euler angles to quaternion.</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
x&amp;=\sin\frac{R}{2}\cos\frac{P}{2}\cos\frac{Y}{2}-\cos\frac{R}{2}\sin\frac{P}{2}\sin\frac{Y}{2}\\
y&amp;=\cos\frac{R}{2}\sin\frac{P}{2}\cos\frac{Y}{2}+\sin\frac{R}{2}\cos\frac{P}{2}\sin\frac{Y}{2}\\
z&amp;=\cos\frac{R}{2}\cos\frac{P}{2}\sin\frac{Y}{2}-\sin\frac{R}{2}\sin\frac{P}{2}\cos\frac{Y}{2}\\
w&amp;=\cos\frac{R}{2}\cos\frac{P}{2}\cos\frac{Y}{2}+\sin\frac{R}{2}\sin\frac{P}{2}\sin\frac{Y}{2}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Converting quaternion to Euler angles, however, is tricky. It is easier if we convert quaternion to rotation matrix first then convert the rotation matrix to Euler angles, than trying to obtain the conversion directly. We will talk about this after the next section.</p>
</div>
</div>
<div class="sect2">
<h3 id="_quaternion_and_rotation_matrix">Quaternion and Rotation Matrix</h3>
<div class="paragraph">
<p>If we say quaternion is an instruction, Euler angles are 3 instructions, then the rotation matrix stores the rotation result directly. Remember each row of the rotation matrix is the X, Y, Z axis after this rotation, which means given a rotation \(q=(x,y,z,w)\), it’s corresponding rotation matrix is</p>
</div>
<div class="stemblock">
<div class="content">
\[M=\left[ \begin{array}{} \vec{X'} \\ \vec{Y'} \\ \vec{Z'} \end{array} \right]=\left[ \begin{array}{} q\vec{X}q^{-1} \\ q\vec{Y}q^{-1} \\ q\vec{Z}q^{-1} \end{array}  \right]\]
</div>
</div>
<div class="paragraph">
<p>By calculating the rotation result of the 3 axes, we get the conversion from quaternion to rotation matrix</p>
</div>
<div class="stemblock">
<div class="content">
\[M = \left[ \begin{array}{} 1-2y^{2}-2z^{2} &amp; 2xy+2zw &amp; 2xz-2yw \\ 2xy-2zw &amp; 1-2x^{2}-2z^{2} &amp; 2yz+2xw \\ 2xz+2yw &amp; 2yz-2xw &amp; 1-2x^{2}-2y^{2} \\ \end{array} \right]\]
</div>
</div>
<div class="paragraph">
<p>To convert from rotation matrix to quaternion, we can sum up diagonal elements of the matrix and get</p>
</div>
<div class="stemblock">
<div class="content">
\[M_{11}+M_{22}+M_{33}=3-4x^{2}-4y^{2}-4z^{2}\]
</div>
</div>
<div class="paragraph">
<p>Remember as a unit quaternion \(x^{2}+y^{2}+z^{2}+w^{2}=1\),</p>
</div>
<div class="stemblock">
<div class="content">
\[M_{11}+M_{22}+M_{33}= 4w^{2}-1\\
w=\frac{1}{2}\sqrt{M_{11}+M_{22}+M_{33}+1}\]
</div>
</div>
<div class="paragraph">
<p>Similarly we can obtain \(x,y,z\) by</p>
</div>
<div class="stemblock">
<div class="content">
\[M_{11}-M_{22}-M_{33}= 4x^{2}-1\\
M_{22}-M_{33}-M_{11}= 4y^{2}-1\\
M_{33}-M_{11}-M_{22}= 4z^{2}-1\\
x=\frac{1}{2}\sqrt{M_{11}-M_{22}-M_{33}+1}\\
y=\frac{1}{2}\sqrt{M_{22}-M_{33}-M_{11}+1}\\
z=\frac{1}{2}\sqrt{M_{33}-M_{11}-M_{22}+1}\\\]
</div>
</div>
<div class="paragraph">
<p>We can avoid calculating square root 4 times, by using the element we already calculated. Say we calculate \(w=\frac{1}{2}\sqrt{M_{11}+M_{22}+M_{33}+1}\) first, then we can get \(x,y,z\) by</p>
</div>
<div class="stemblock">
<div class="content">
\[x=\frac{1}{4w}(M_{23}-M_{32})\\
y=\frac{1}{4w}(M_{31}-M_{13})\\
z=\frac{1}{4w}(M_{12}-M_{21})\\\]
</div>
</div>
<div class="paragraph">
<p>You need to be careful if the value of \(w\) is closed to 0 (means \(M_{11}+M_{22}+M_{33}+1\) is closed to 0, no need to do square root).  In this case you want to calculate one of \(x,y,z\) instead. You can simply choose the one has the largest absolute value, and calculate the other 3 elements in a similar fashion.</p>
</div>
</div>
<div class="sect2">
<h3 id="_quaternion_to_euler_angles">Quaternion to Euler Angles</h3>
<div class="paragraph">
<p>Before we try to convert quaternion to Euler angles, let’s review how Euler angles can be converted to rotation matrix. As we know Euler angles are 3 instructions, it could be viewed as 3 rotation matrix:</p>
</div>
<div class="stemblock">
<div class="content">
\[M_R = \left[ \begin{array}{} 1 &amp; 0 &amp; 0 \\ 0 &amp; \cos{R} &amp; \sin{R} \\ 0 &amp; -\sin{R} &amp; \cos{R} \\ \end{array} \right],
M_P = \left[ \begin{array}{} \cos{P} &amp; 0 &amp; -\sin{P} \\ 0 &amp; 1 &amp; 0 \\ \sin{P} &amp; 0 &amp; \cos{P} \\ \end{array} \right],
M_Y = \left[ \begin{array}{} \cos{Y} &amp; \sin{Y} &amp; 0 \\ -\sin{Y} &amp; \cos{Y} &amp; 0 \\ 0 &amp; 0 &amp; 1 \\ \end{array} \right]\]
</div>
</div>
<div class="paragraph">
<p>The result rotation matrix is</p>
</div>
<div class="stemblock">
<div class="content">
\[M={M_R}{M_P}{M_Y}=\left[ \begin{array}{} \cos{P}\cos{Y} &amp; \cos{P}\sin{Y} &amp; -\sin{P} \\ \sin{R}\sin{P}\cos{Y}-\cos{R}\sin{Y} &amp; \sin{R}\sin{P}\sin{Y}+\cos{R}\cos{Y} &amp; \sin{R}\cos{P} \\ \cos{R}\sin{P}\cos{Y}+\sin{R}\sin{Y} &amp; \cos{R}\sin{P}\sin{Y}-\sin{R}\cos{Y} &amp; \cos{R}\cos{P} \\ \end{array} \right]\]
</div>
</div>
<div class="paragraph">
<p>You can also derive this by converting Euler angles to quaternion, then quaternion to rotation matrix, and by applying trigonometric double-angle formula you should get the same result.</p>
</div>
<div class="paragraph">
<p>If you put this result side by side with our quaternion to rotation matrix conversion, which I put here again for reference.</p>
</div>
<div class="stemblock">
<div class="content">
\[M = \left[ \begin{array}{} 1-2y^{2}-2z^{2} &amp; 2xy+2zw &amp; 2xz-2yw \\ 2xy-2zw &amp; 1-2x^{2}-2z^{2} &amp; 2yz+2xw \\ 2xz+2yw &amp; 2yz-2xw &amp; 1-2x^{2}-2y^{2} \\ \end{array} \right]\]
</div>
</div>
<div class="paragraph">
<p>You can easily spot this:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\cos{P}\cos{Y}&amp;=1-2y^{2}-2z^{2}\\
\cos{P}\sin{Y}&amp;=2xy+2zw\\
-\sin{P}&amp;=2xz-2yw\\
\sin{R}\cos{P}&amp;=2yz+2xw\\
\cos{R}\cos{P}&amp;=1-2x^{2}-2y^{2}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Now we can write down the conversion from quaternion to Euler angles</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
P&amp;=\mathrm{asin}(-2xy+2yw)\\
Y&amp;=\mathrm{atan2}(2xy+2zw,1-2y^{2}-2z^{2})\\
R&amp;=\mathrm{atan2}(2yz+2xw,1-2x^{2}-2y^{2})\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>However we still have a problem when pitch is near \(90^{\circ}\) or \(-90^{\circ}\). This is called singularity. This issue is explained more in <a href="http://www.euclideanspace.com/maths/geometry/rotations/conversions/quaternionToEuler/">this website</a>. In this case \(\cos{⁡P}=0,\sin{⁡P}=1\) or \(\cos{⁡P}=0,\sin{⁡}P=-1\), and the rotation matrix becomes:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
M_{90^{\circ}}&amp;=\left[ \begin{array}{} 0 &amp; 0 &amp; -1 \\ \sin{R}\cos{Y}-\cos{R}\sin{Y} &amp; \sin{R}\sin{Y}+\cos{R}\cos{Y} &amp; 0 \\ \cos{R}\cos{Y}+\sin{R}\sin{Y} &amp; \cos{R}\sin{Y}-\sin{R}\cos{Y} &amp; 0 \\ \end{array} \right] = \left[ \begin{array}{} 0 &amp; 0 &amp; -1 \\ \sin(R-Y) &amp; \cos(R-Y) &amp; 0 \\ \cos(R-Y) &amp; -\sin(R-Y) &amp; 0 \\ \end{array} \right]\\
M_{-90^{\circ}}&amp;=\left[ \begin{array}{} 0 &amp; 0 &amp; 1 \\ -\sin{R}\cos{Y}-\cos{R}\sin{Y} &amp; -\sin{R}\sin{Y}+\cos{R}\cos{Y} &amp; 0 \\ -\cos{R}\cos{Y}+\sin{R}\sin{Y} &amp; -\cos{R}\sin{Y}-\sin{R}\cos{Y} &amp; 0 \\ \end{array} \right] = \left[ \begin{array}{} 0 &amp; 0 &amp; 1 \\ -\sin(R+Y) &amp; \cos(R+Y) &amp; 0 \\ -\cos(R+Y) &amp; -\sin(R+Y) &amp; 0 \\ \end{array} \right]\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>The formula we used to calculate yaw and roll becomes \(\mathrm{atan2}(0,0)\), which will give an invalid value.</p>
</div>
<div class="paragraph">
<p>We need to go a different way, recall the conversion from Euler angles to quaternion.</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
x&amp;=\sin\frac{R}{2}\cos\frac{P}{2}\cos\frac{Y}{2}-\cos\frac{R}{2}\sin\frac{P}{2}\sin\frac{Y}{2}\\
y&amp;=\cos\frac{R}{2}\sin\frac{P}{2}\cos\frac{Y}{2}+\sin\frac{R}{2}\cos\frac{P}{2}\sin\frac{Y}{2}\\
z&amp;=\cos\frac{R}{2}\cos\frac{P}{2}\sin\frac{Y}{2}-\sin\frac{R}{2}\sin\frac{P}{2}\cos\frac{Y}{2}\\
w&amp;=\cos\frac{R}{2}\cos\frac{P}{2}\cos\frac{Y}{2}+\sin\frac{R}{2}\sin\frac{P}{2}\sin\frac{Y}{2}\\
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>If \(P=90^{\circ}\), then \(x=-z=\frac{\sqrt{2}}{2}\sin\frac{R-Y}{2}\), \(y=w=\frac{\sqrt{2}}{2}\cos\frac{R-Y}{2}\), then we have</p>
</div>
<div class="stemblock">
<div class="content">
\[R-Y=2\mathrm{atan2}(x,w)\]
</div>
</div>
<div class="paragraph">
<p>Similarly, if \(P=-90^{\circ}\), then \(x=z=\frac{\sqrt{2}}{2}\sin\frac{R+Y}{2}\), \(-y=w=\frac{\sqrt{2}}{2}\cos\frac{R+Y}{2}\), then we have</p>
</div>
<div class="stemblock">
<div class="content">
\[R+Y=2\mathrm{atan2}(x,w)\]
</div>
</div>
<div class="paragraph">
<p>Imagine an airplane facing straight up or down, yaw and roll basically means rotating along the same axis. We can simply let yaw be zero, and only calculate roll. So if \(P≈±90^{\circ}\), then</p>
</div>
<div class="stemblock">
<div class="content">
\[Y=0, R=2\mathrm{atan2}(x,w)\]
</div>
</div>
<div class="paragraph">
<p>Finally, since \(\sin{⁡P}=-2xz+2yw\), we only need to test \(-2xz+2yw≈±1\) to test if pitch is near \(±90^{\circ}\).</p>
</div>
</div>
<div class="sect2">
<h3 id="_summary_of_part_2">Summary of Part 2</h3>
<div class="paragraph">
<p>In Part 2 we talked about different multiplication order for combining world rotations or local rotations.</p>
</div>
<div class="paragraph">
<p>We derive the formula to calculate the result of rotating a vector by a quaternion. We also find out a quick way to apply rotation along X/Y/Z axis.</p>
</div>
<div class="paragraph">
<p>We discussed conversion between quaternion, Euler angles and rotation matrix.</p>
</div>
<div class="paragraph">
<p>As you can see whenever you get or set Euler angles in the game engine, you are doing a conversion from or to quaternion, and there will be trigonometric calculation involved. Try to avoid them if you can do quaternion calculation directly.</p>
</div>
<div class="paragraph">
<p>Also some systems use left-handed or Y up coordinate, or have different Euler angles convention. Be sure you understand the system you are using, since quaternion and Euler angles conversion will be very different.</p>
</div>
</div>
</div>
</div>
<div class="sect1">
<h2 id="_appendix">Appendix</h2>
<div class="sectionbody">
<div class="sect2">
<h3 id="_derive_quaternion_multiplication">Derive Quaternion Multiplication</h3>
<div class="paragraph">
<p>We can actually derive the general quaternion multiplication from the special flip break down \(q_1=-{q_c}{q_a}\), \(q_2=-{q_b}{q_c}\), we used to visualize the result of rotation composition. That is if we define flip multiplication \({q_a}{q_b}=(\vec{a},0)(\vec{b},0)=(\vec{a}×\vec{b},-\vec{a}·\vec{b})\) directly, we can proof what general quaternion multiplication \({q_1}{q_2}=(\sin⁡\frac{θ_1}{2}\vec{v_1},\cos⁡\frac{θ_1}{2})(\sin⁡\frac{θ_2}{2}\vec{v_2},\cos⁡\frac{θ_2}{2})\) would look like. If you don’t remember this, see <a href="#_rotation_composition">Rotation Composition</a> section in Part 1.</p>
</div>
<div class="paragraph">
<p>Here are some equations we will be using:</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{a}×(\vec{b}×\vec{c})&amp;=(\vec{a}·\vec{c})\vec{b}-(\vec{a}·\vec{b})\vec{c}\\
(\vec{a}×\vec{b})·(\vec{c}×\vec{d})&amp;=(\vec{a}·\vec{c})(\vec{b}·\vec{d})-(\vec{a}·\vec{d})(\vec{b}·\vec{c})\\
(\vec{a}×\vec{b})×(\vec{a}×\vec{c})&amp;=(\vec{a}·(\vec{b}×\vec{c}))\vec{a}
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>Recall how we choose the flip break down \(\vec{c}=\frac{\vec{v_1}×\vec{v_2}}{\left|\vec{v_1}×\vec{v_2}\right|}\).</p>
</div>
<div class="paragraph">
<p>Rotate \(\vec{c}\) along axis \(\vec{v_1}\) by angle \(-\frac{θ_1}{2}\) we get</p>
</div>
<div class="stemblock">
<div class="content">
\[\vec{a}=\cos\frac{-θ_1}{2}\vec{c} + \sin\frac{-θ_1}{2}(\vec{v_1}×\vec{c})=\frac{1}{\left|\vec{v_1}×\vec{v_2}\right|}(\cos\frac{θ_1}{2}(\vec{v_1}×\vec{v_2}) - \sin\frac{θ_1}{2}(\vec{v_1}×(\vec{v_1}×\vec{v_2})))\]
</div>
</div>
<div class="paragraph">
<p>Rotate \(\vec{c}\) along axis \(\vec{v_2}\) by angle \(\frac{θ_2}{2}\) we get</p>
</div>
<div class="stemblock">
<div class="content">
\[\vec{b}=\cos\frac{θ_2}{2}\vec{c} + \sin\frac{θ_2}{2}(\vec{v_2}×\vec{c})=\frac{1}{\left|\vec{v_1}×\vec{v_2}\right|}(\cos\frac{θ_2}{2}(\vec{v_1}×\vec{v_2}) + \sin\frac{θ_2}{2}(\vec{v_2}×(\vec{v_1}×\vec{v_2})))\]
</div>
</div>
<div class="paragraph">
<p>And we will have</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{a}·\vec{b}&amp;=\frac{1}{{\left|\vec{v_1}×\vec{v_2}\right|}^{2}}(\cos\frac{θ_1}{2}\cos\frac{θ_2}{2}{\left|\vec{v_1}×\vec{v_2}\right|}^{2} - \sin\frac{θ_1}{2}\sin\frac{θ_2}{2}((\vec{v_1}×(\vec{v_1}×\vec{v_2}))·(\vec{v_2}×(\vec{v_1}×\vec{v_2}))))\\
&amp;=\frac{1}{{\left|\vec{v_1}×\vec{v_2}\right|}^{2}}(\cos\frac{θ_1}{2}\cos\frac{θ_2}{2}{\left|\vec{v_1}×\vec{v_2}\right|}^{2} - \sin\frac{θ_1}{2}\sin\frac{θ_2}{2}(\vec{v_1}·\vec{v_2}){\left|\vec{v_1}×\vec{v_2}\right|}^{2})\\
&amp;=\cos\frac{θ_1}{2}\cos\frac{θ_2}{2} - \sin\frac{θ_1}{2}\sin\frac{θ_2}{2}(\vec{v_1}·\vec{v_2})
\end{align*}\]
</div>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
\vec{a}×\vec{b}&amp;=\frac{1}{{\left|\vec{v_1}×\vec{v_2}\right|}^{2}}(\cos\frac{θ_1}{2}\sin\frac{θ_2}{2}((\vec{v_1}×\vec{v_2})×(\vec{v_2}×(\vec{v_1}×\vec{v_2})))\\
&amp;- \sin\frac{θ_1}{2}\cos\frac{θ_2}{2}((\vec{v_1}×(\vec{v_1}×\vec{v_2}))×(\vec{v_1}×\vec{v_2})\\
&amp;- \sin\frac{θ_1}{2}\sin\frac{θ_2}{2}((\vec{v_1}×(\vec{v_1}×\vec{v_2}))×(\vec{v_2}×(\vec{v_1}×\vec{v_2}))))\\
&amp;=\frac{1}{{\left|\vec{v_1}×\vec{v_2}\right|}^{2}}(\cos\frac{θ_1}{2}\sin\frac{θ_2}{2}{\left|\vec{v_1}×\vec{v_2}\right|}^{2}\vec{v_2} + \sin\frac{θ_1}{2}\cos\frac{θ_2}{2}{\left|\vec{v_1}×\vec{v_2}\right|}^{2}\vec{v_1} - \sin\frac{θ_1}{2}\sin\frac{θ_2}{2}{\left|\vec{v_1}×\vec{v_2}\right|}^{2}(\vec{v_1}×\vec{v_2}))\\
&amp;=\cos\frac{θ_1}{2}\sin\frac{θ_2}{2}\vec{v_2} + \sin\frac{θ_1}{2}\cos\frac{θ_2}{2}\vec{v_1} - \sin\frac{θ_1}{2}\sin\frac{θ_2}{2}(\vec{v_1}×\vec{v_2})
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>From the previous proof of rotation composition we know \(q={q_2}{q_1}=-{q_b}{q_a}\), that is</p>
</div>
<div class="stemblock">
<div class="content">
\[\begin{align*}
q&amp;=(\vec{a}×\vec{b},\vec{a}·\vec{b})\\
&amp;=(\cos\frac{θ_1}{2}(\sin\frac{θ_2}{2}\vec{v_2}) + \cos\frac{θ_2}{2}(\sin\frac{θ_1}{2}\vec{v_1}) - (\sin\frac{θ_1}{2}\vec{v_1})×(\sin\frac{θ_2}{2}\vec{v_2}), \cos\frac{θ_1}{2}\cos\frac{θ_2}{2} - (\sin\frac{θ_1}{2}\vec{v_1})·(\sin\frac{θ_2}{2}\vec{v_2}))
\end{align*}\]
</div>
</div>
<div class="paragraph">
<p>which is the definition of quaternion multiplication of \({q_1}{q_2}=(\sin⁡\frac{θ_1}{2}\vec{v_1},\cos⁡\frac{θ_1}{2})(\sin⁡\frac{θ_2}{2}\vec{v_2},\cos⁡\frac{θ_2}{2})\).</p>
</div>
</div>
</div>
</div>]]></description><link>https://lxjk.github.io/2016/10/29/A-Different-Way-to-Understand-Quaternion-and-Rotation.html</link><guid isPermaLink="true">https://lxjk.github.io/2016/10/29/A-Different-Way-to-Understand-Quaternion-and-Rotation.html</guid><category><![CDATA[Math]]></category><dc:creator><![CDATA[Eric Zhang]]></dc:creator><pubDate>Sat, 29 Oct 2016 00:00:00 GMT</pubDate></item></channel></rss>