您的位置：首页 > 移动开发 > Unity3D

DirectCompute tutorial for Unity: Kernels and thread groups

2016-05-15 02:14 501 查看

DirectCompute
tutorial for Unity: Kernels and thread groups

scrawkCompute
shader, DirectCompute, dx11

<-
Previous : Introduction

The last post of this tutorial series was just a bit of a introduction but from here on its all about the code. Today I will be going over the core concepts for writing compute shaders in Unity. At the heart of a compute shader is the kernel. This is the entry
point into the shader and acts like the Main function in other programming languages. I will also cover the tiling of threads by the GPU. These tiles are also known as blocks or thread groups. DirectCompute officially refers to these tiles as thread groups.

To create a compute shader in Unity simply go to the project panel and then click create->compute shader and then double click the shader to open it up in Monodevelop for editing. Paste in the following code into the newly created compute shader.

This is the bare minimum of content for a compute shader and will of course do nothing but will serve as a good starting point. A compute shader has to be run from a script in Unity so we will need one of those as well. Go to the project panel and click Create->C#
script. Name it KernelExample and paste in the following code.

Now drag the script onto any game object and then attach the compute shader to the shader attribute. The shader will now run in the start function when the scene is run. Before you run the scene however you need to enable dx11 in Unity. Go to Edit->Project
Settings->Player and then tick the “Use Direct3D 11” box. You can now run the scene. The shader will do nothing but there should also be no errors.

In the script you will see the “Dispatch” function called. This is responsible for running the shader. Notice the first
variable is a 0. This is the kernel id that you want to run. In the shader you will see the “#pragma kernel CSMain1“. This
defines what function in the shader is the kernel as you may have many functions (and even many kernels) in one shader. There must be a function will the name CSMain1 in the shader or the shader will not compile.

Now notice the “[numthreads(4,1,1)]” line. This tells the GPU how many threads of the kernel to run per group. The 3 numbers
relate to each dimension. A thread group can be up to 3 dimensions and in this example we are just running a 1 dimension group with a width of 4 threads. That means we are running a total of 4 threads and each thread will run copy of the kernel. This is why
GPU’s are so fast. They can run thousands of threads at a time.

Now lets get the kernel to actually do something. Change the shader to this…

and the scripts start function to this…

Now run the scene and you should see the numbers 0, 1, 2 and 3 printed out. Don’t worry too much about the buffer for now. I will cover them in detail in the future but just know that a buffer is a place to store data and it needs to have the release function
called when you are finished with it. Notice this argument added to the CSMain1 function “int3 threadID : SV_GroupThreadID“.
This is a request to the GPU to pass into the kernel the thread id when it is run. We are then writing the thread id into the buffer and since we have told the GPU we are running 4 threads the id ranges from 0 to 3 as we see from the print out.

Now those 4 threads make up whats called a thread group. In this case we are running 1 group of 4 threads but you can run multiple groups of threads. Lets run 2 groups instead of 1. Change the shaders kernel to this…

and the scripts start function to this…

Now run the scene and you should have 0-3 printed out twice. Now notice the change to the dispatch function. The last three variables (the 2,1,1) are the number of groups we want to run and just like the number of threads groups can go up to 3 dimensions and
in this case we are running 1 dimension of 2 groups. We have also had to change the kernel with the argument “int3 groupID : SV_GroupID”
added. This is a request to the GPU to pass in the group id when the kernel is run. The reason we need this is because we are now writing out 8 values, 2 groups of 4 threads. We now need the threads position in the buffer and the formula for this is the thread
id plus the group id times the number of threads ( threadID.x + groupID.x*4 ). This is a bit awkward to write. Surely the GPU knows the threads position? Yes it does. Change the shaders kernel to this and rerun the scene.

The results should be the same, two sets of 0-3 printed. Notice that the group id argument has been replaced with “int3 dispatchID
: SV_DispatchThreadID“. This is the same number our formula gave us except now the GPU is doing it for us. This is the threads position in the groups of threads.

So far these have all been in 1 dimension. Lets step thing up a bit and move to 2 dimensions and instead of rewriting the kernel lets just add another one to the shader. Its not uncommon to have a kernel for each dimension in a shader performing the same algorithm.
First add this code to the shader below the previous code so there are two kernels in the shader.

and the script to this…

Run the scene and you will see a row printed from 0 to 7 and the next row 8 to 15 and so on to 63. Why from 0 to 63? Well we now have 4 2D groups of threads and each group is 4 by 4 so has 16 threads. That gives us 64 threads in total. Notice what value we
are out putting from this line “int id = dispatchID.x + dispatchID.y * 8“. The dispatch id is the threads position in the
groups of threads for each dimension. We now have 2 dimension so we need the threads global position in the buffer and this is just the dispatch x id plus the dispatch y id times the total number of threads in the first dimensions (4 * 2). This is a concept
you will have to be familiar with when working with compute shaders. The reason is that buffers are always 1 dimensional and when working in higher dimension you need to calculate what index the result should be written into the buffer at.

The same theory applies when working with 3 dimensions but as it gets fiddly I will only demonstrate up to 2 dimensions. You just need to know that in 3 dimensions the buffer position is calculated as “int
id = dispatchID.x + dispatchID.y * groupSizeX + dispatchID.z * groupSizeX * groupSizeY” where group size is the number of groups times number of threads for that dimension.

You should also have a understanding of how the semantics work. Take for example this kernel argument…

SV_DispatchThreadID is the semantic and tells the GPU what value it should pass in for this argument. The name of the argument does not matter. You can call it what you want. For example this argument works the same as above.

Also the variable type can be changed. For example…

See the int3 has been changed to int. This is fine if you are only working with 1 dimension. You could also just use a int2 for 2 dimensions and you could also use a unsigned int (uint) instead of a int if you choose.

Since we now have two kernels in the shader we also need to tell the GPU what kernel we want to run when we make the dispatch call. Each kernel is given a id in the order they appear. Our first kernel would be id 0 and the next is id 1. When the number of kernels
in a shader becomes larger this can become a bit confusing and its easy to set the wrong id. We can solve this by asking the shader for the kernels id by name. This line here “int
kernel = shader.FindKernel (“CSMain2”);” gets the id of kernel “CSMain2“. We then use this id when setting the buffer
and making the dispatch call.

About now you maybe thinking that this concept of groups of threads is a bit confusing. Why cant I just use one group of threads? Well you can but just know that there is a reason that threads are arranged into groups by the GPU. For a start a thread group
is limited by the number of threads it can have ( defined by the line “[numthreads(x,y,z)]” in the shader). This limit
is currently 1024 but may change with new hardware. For example you can have a maximum of “numthreads(1024,1,1)” for 1D,
“numthreads(32,32,1)” for 2D and so on. You can however have any number of groups of threads and as you will often be processing
data with millions of element the concept of thread groups is essential. Threads in a groups can also share memory and this can be used to make dramatic performance gains for certain algorithms but I will cover that in a future post.

Well I think that about covers kernels and thread groups. There is just one more thing I want to cover. How to pass uniforms into your shader. This works the same as in Cg shaders but there is no uniform key word. For the most part this relatively simple but
there are a few “Gotcha’s” so I will briefly go over it.

For example if you want to pass in a float you need this line in the shader…

and this line in your script…

To set a vector you need this in the shader…

and this in the script…

You can only pass in a Vector4 from the script but your uniform can be a float, float2, float3 or float4. It will be filled with the appropriate values.

Now here’s where it gets tricky. You can pass in arrays of values. Note that this first example wont work. I will explain why. You need this line in your shader…

and this in your script…

Now this wont work. Whether this is by design or a bug in Unity I don’t know. You need to use vectors as uniforms for this to work. In your shader…

and your script…

This works. You can also use a float2 or float3. Just not a single float. You can also have arrays of vectors. In your shader…

and your script…

So here we have a array of two float4’s and it is set from a array of 8 floats from a script. The same principles apply when setting matrices. In your shader…

and your script…

And of course you can have arrays of matrices. In your shader…

and your script…

This same logic does not seem to apply to float2x2 or float3x3. Again, whether this is a bug or design I don’t know.

I think that about covers it today. The next part will be covering how to use textures in your compute shaders. You can also download the project file for the kernel example. Its rather basic but its there if you need it. I will be adding to the same project
file for each tutorial I do.

Project
Files (Unity 5.4).

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航

DirectCompute tutorial for Unity: Kernels and thread groups

DirectComputetutorial for Unity: Kernels and thread groups

DirectCompute
tutorial for Unity: Kernels and thread groups