DirectCompute tutorial for Unity: Kernels and thread groups
2016-05-15 02:14
501 查看
DirectCompute
tutorial for Unity: Kernels and thread groups
scrawkComputeshader, DirectCompute, dx11
<-
Previous : Introduction
The last post of this tutorial series was just a bit of a introduction but from here on its all about the code. Today I will be going over the core concepts for writing compute shaders in Unity. At the heart of a compute shader is the kernel. This is the entry
point into the shader and acts like the Main function in other programming languages. I will also cover the tiling of threads by the GPU. These tiles are also known as blocks or thread groups. DirectCompute officially refers to these tiles as thread groups.
To create a compute shader in Unity simply go to the project panel and then click create->compute shader and then double click the shader to open it up in Monodevelop for editing. Paste in the following code into the newly created compute shader.
script. Name it KernelExample and paste in the following code.
Settings->Player and then tick the “Use Direct3D 11” box. You can now run the scene. The shader will do nothing but there should also be no errors.
In the script you will see the “Dispatch” function called. This is responsible for running the shader. Notice the first
variable is a 0. This is the kernel id that you want to run. In the shader you will see the “#pragma kernel CSMain1“. This
defines what function in the shader is the kernel as you may have many functions (and even many kernels) in one shader. There must be a function will the name CSMain1 in the shader or the shader will not compile.
Now notice the “[numthreads(4,1,1)]” line. This tells the GPU how many threads of the kernel to run per group. The 3 numbers
relate to each dimension. A thread group can be up to 3 dimensions and in this example we are just running a 1 dimension group with a width of 4 threads. That means we are running a total of 4 threads and each thread will run copy of the kernel. This is why
GPU’s are so fast. They can run thousands of threads at a time.
Now lets get the kernel to actually do something. Change the shader to this…
called when you are finished with it. Notice this argument added to the CSMain1 function “int3 threadID : SV_GroupThreadID“.
This is a request to the GPU to pass into the kernel the thread id when it is run. We are then writing the thread id into the buffer and since we have told the GPU we are running 4 threads the id ranges from 0 to 3 as we see from the print out.
Now those 4 threads make up whats called a thread group. In this case we are running 1 group of 4 threads but you can run multiple groups of threads. Lets run 2 groups instead of 1. Change the shaders kernel to this…
in this case we are running 1 dimension of 2 groups. We have also had to change the kernel with the argument “int3 groupID : SV_GroupID”
added. This is a request to the GPU to pass in the group id when the kernel is run. The reason we need this is because we are now writing out 8 values, 2 groups of 4 threads. We now need the threads position in the buffer and the formula for this is the thread
id plus the group id times the number of threads ( threadID.x + groupID.x*4 ). This is a bit awkward to write. Surely the GPU knows the threads position? Yes it does. Change the shaders kernel to this and rerun the scene.
: SV_DispatchThreadID“. This is the same number our formula gave us except now the GPU is doing it for us. This is the threads position in the groups of threads.
So far these have all been in 1 dimension. Lets step thing up a bit and move to 2 dimensions and instead of rewriting the kernel lets just add another one to the shader. Its not uncommon to have a kernel for each dimension in a shader performing the same algorithm.
First add this code to the shader below the previous code so there are two kernels in the shader.
are out putting from this line “int id = dispatchID.x + dispatchID.y * 8“. The dispatch id is the threads position in the
groups of threads for each dimension. We now have 2 dimension so we need the threads global position in the buffer and this is just the dispatch x id plus the dispatch y id times the total number of threads in the first dimensions (4 * 2). This is a concept
you will have to be familiar with when working with compute shaders. The reason is that buffers are always 1 dimensional and when working in higher dimension you need to calculate what index the result should be written into the buffer at.
The same theory applies when working with 3 dimensions but as it gets fiddly I will only demonstrate up to 2 dimensions. You just need to know that in 3 dimensions the buffer position is calculated as “int
id = dispatchID.x + dispatchID.y * groupSizeX + dispatchID.z * groupSizeX * groupSizeY” where group size is the number of groups times number of threads for that dimension.
You should also have a understanding of how the semantics work. Take for example this kernel argument…
Since we now have two kernels in the shader we also need to tell the GPU what kernel we want to run when we make the dispatch call. Each kernel is given a id in the order they appear. Our first kernel would be id 0 and the next is id 1. When the number of kernels
in a shader becomes larger this can become a bit confusing and its easy to set the wrong id. We can solve this by asking the shader for the kernels id by name. This line here “int
kernel = shader.FindKernel (“CSMain2”);” gets the id of kernel “CSMain2“. We then use this id when setting the buffer
and making the dispatch call.
About now you maybe thinking that this concept of groups of threads is a bit confusing. Why cant I just use one group of threads? Well you can but just know that there is a reason that threads are arranged into groups by the GPU. For a start a thread group
is limited by the number of threads it can have ( defined by the line “[numthreads(x,y,z)]” in the shader). This limit
is currently 1024 but may change with new hardware. For example you can have a maximum of “numthreads(1024,1,1)” for 1D,
“numthreads(32,32,1)” for 2D and so on. You can however have any number of groups of threads and as you will often be processing
data with millions of element the concept of thread groups is essential. Threads in a groups can also share memory and this can be used to make dramatic performance gains for certain algorithms but I will cover that in a future post.
Well I think that about covers kernels and thread groups. There is just one more thing I want to cover. How to pass uniforms into your shader. This works the same as in Cg shaders but there is no uniform key word. For the most part this relatively simple but
there are a few “Gotcha’s” so I will briefly go over it.
For example if you want to pass in a float you need this line in the shader…
Now here’s where it gets tricky. You can pass in arrays of values. Note that this first example wont work. I will explain why. You need this line in your shader…
I think that about covers it today. The next part will be covering how to use textures in your compute shaders. You can also download the project file for the kernel example. Its rather basic but its there if you need it. I will be adding to the same project
file for each tutorial I do.
Project
Files (Unity 5.4).
相关文章推荐
- 【VR视频播放】解决Unity模型贴图反转的问题
- 【VR开发】htc vive+unity 3D 简单保龄球游戏
- Part2:Unity学习笔记十六 - Space Shooter(从视频最后一课向Done_Main.unity场景修改的过程)
- Unity3D之Mecanim动画系统
- Unity3D之Easytouch控件控制主角移动
- Unity mesh texture开启 read 会增加内存畅想
- Unity memory profile面板参数解析
- VR虚拟现实技术学习资料
- unity 调用安卓系统摄像机和相册并对图片进行裁剪(一)
- Unity3d Realtime Dynamic Volume Clouds Rendering
- Unity3d Realtime Dynamic Volume Clouds Rendering
- Unity3d制作Loading场景进度条
- Part2:Unity学习笔记十五 - Space Shooter(从视频最后一课向Done_Main.unity场景修改的过程)
- UNITY_INITIALIZE_OUTPUT
- unity 入门学习之(一)创建基本的3D游戏场景
- Part2:Unity学习笔记十四 - Space Shooter(从视频最后一课向Done_Main.unity场景修改的过程)
- VR播放器怎么做?
- unity 计算两点角度
- unity 计算两点的距离
- Unity3d Ray Marching