CUDA:在GPU上实现核函数的嵌套以及编译运行
2017-11-28 21:59
295 查看
该源程序来自《CUDA C语言编程中文译文版》,如有侵权,联系删除。此处只为学习交流。
程序如下:
编译与运行:
-bash-4.1$ nvcc -o a nestedHelloWorld.cu -arch=sm_35 -rdc=true -lcudadevrt
-bash-4.1$ ./a 2
./a Execution Configuration: grid 2 block 8
Recursion=0: Hello World from thread 0 block 1
Recursion=0: Hello World from thread 1 block 1
Recursion=0: Hello World from thread 2 block 1
Recursion=0: Hello World from thread 3 block 1
Recursion=0: Hello World from thread 4 block 1
Recursion=0: Hello World from thread 5 block 1
Recursion=0: Hello World from thread 6 block 1
Recursion=0: Hello World from thread 7 block 1
Recursion=0: Hello World from thread 0 block 0
Recursion=0: Hello World from thread 1 block 0
Recursion=0: Hello World from thread 2 block 0
Recursion=0: Hello World from thread 3 block 0
Recursion=0: Hello World from thread 4 block 0
Recursion=0: Hello World from thread 5 block 0
Recursion=0: Hello World from thread 6 block 0
Recursion=0: Hello World from thread 7 block 0
-------> nested execution depth: 1
-------> nested execution depth: 1
Recursion=1: Hello World from thread 0 block 0
Recursion=1: Hello World from thread 1 block 0
Recursion=1: Hello World from thread 2 block 0
Recursion=1: Hello World from thread 3 block 0
Recursion=1: Hello World from thread 0 block 0
Recursion=1: Hello World from thread 1 block 0
Recursion=1: Hello World from thread 2 block 0
Recursion=1: Hello World from thread 3 block 0
-------> nested execution depth: 2
-------> nested execution depth: 2
Recursion=2: Hello World from thread 0 block 0
Recursion=2: Hello World from thread 1 block 0
Recursion=2: Hello World from thread 0 block 0
Recursion=2: Hello World from thread 1 block 0
-------> nested execution depth: 3
-------> nested execution depth: 3
Recursion=3: Hello World from thread 0 block 0
Recursion=3: Hello World from thread 0 block 0
-bash-4.1$
程序如下:
#include "../common/common.h" #include <stdio.h> #include <cuda_runtime.h> /* * A simple example of nested kernel launches from the GPU. Each thread displays * its information when execution begins, and also diagnostics when the next * lowest nesting layer completes. */ __global__ void nestedHelloWorld(int const iSize, int iDepth) { int tid = threadIdx.x; printf("Recursion=%d: Hello World from thread %d block %d\n", iDepth, tid, blockIdx.x); // condition to stop recursive execution if (iSize == 1) return; // reduce block size to half int nthreads = iSize >> 1; // thread 0 launches child grid recursively if(tid == 0 && nthreads > 0) { nestedHelloWorld<<<1, nthreads>>>(nthreads, ++iDepth); printf("-------> nested execution depth: %d\n", iDepth); } } int main(int argc, char **argv) { int size = 8; int blocksize = 8; // initial block size int igrid = 1; if(argc > 1) { igrid = atoi(argv[1]); size = igrid * blocksize; } dim3 block (blocksize, 1); dim3 grid ((size + block.x - 1) / block.x, 1); printf("%s Execution Configuration: grid %d block %d\n", argv[0], grid.x, block.x); nestedHelloWorld<<<grid, block>>>(block.x, 0); CHECK(cudaGetLastError()); CHECK(cudaDeviceReset()); return 0; }
编译与运行:
-bash-4.1$ nvcc -o a nestedHelloWorld.cu -arch=sm_35 -rdc=true -lcudadevrt
-bash-4.1$ ./a 2
./a Execution Configuration: grid 2 block 8
Recursion=0: Hello World from thread 0 block 1
Recursion=0: Hello World from thread 1 block 1
Recursion=0: Hello World from thread 2 block 1
Recursion=0: Hello World from thread 3 block 1
Recursion=0: Hello World from thread 4 block 1
Recursion=0: Hello World from thread 5 block 1
Recursion=0: Hello World from thread 6 block 1
Recursion=0: Hello World from thread 7 block 1
Recursion=0: Hello World from thread 0 block 0
Recursion=0: Hello World from thread 1 block 0
Recursion=0: Hello World from thread 2 block 0
Recursion=0: Hello World from thread 3 block 0
Recursion=0: Hello World from thread 4 block 0
Recursion=0: Hello World from thread 5 block 0
Recursion=0: Hello World from thread 6 block 0
Recursion=0: Hello World from thread 7 block 0
-------> nested execution depth: 1
-------> nested execution depth: 1
Recursion=1: Hello World from thread 0 block 0
Recursion=1: Hello World from thread 1 block 0
Recursion=1: Hello World from thread 2 block 0
Recursion=1: Hello World from thread 3 block 0
Recursion=1: Hello World from thread 0 block 0
Recursion=1: Hello World from thread 1 block 0
Recursion=1: Hello World from thread 2 block 0
Recursion=1: Hello World from thread 3 block 0
-------> nested execution depth: 2
-------> nested execution depth: 2
Recursion=2: Hello World from thread 0 block 0
Recursion=2: Hello World from thread 1 block 0
Recursion=2: Hello World from thread 0 block 0
Recursion=2: Hello World from thread 1 block 0
-------> nested execution depth: 3
-------> nested execution depth: 3
Recursion=3: Hello World from thread 0 block 0
Recursion=3: Hello World from thread 0 block 0
-bash-4.1$
相关文章推荐
- Ubuntu系统下的Hadoop集群(3)_Hadoop单机版自定义实现类以及编译运行
- libcef学习最详细的入门资料系列之二 :libcef的编译和运行,以及MFC下的实现
- Mac 通过命令行编译运行C代码 以及生成和调用静态库 以及Makefile实现过程
- windows7下实现Bundler并通过cygwin编译运行以及pmvs、cmvs的使用(2)
- CMake+vs2010实现对CMVS-PMVS源代码的编译运行,以及实现与bundler之间的相互调用
- 在vs中CUDA下c++混编实现c++运行gpu程序
- 本资料的资料来源以及 http://ideone.com/ - C++在线编译运行器
- 在CYGWIN下编译和运行软件Bundler ,以及PMVS,CMVS的编译与使用
- [转]如何远程连接运行OpenGL/Cuda 等GPU程序
- setup_cuda.py 编译gpu_nms
- 第一个cuda程序-基于VS2010+CUDA5.0 两个向量相加的GPU实现
- sublime Text3 实现编译及运行
- 关于OpenCV Gpu模块无法使用Cuda4.2以上版本编译成功的解决方案
- 详解CUDA核函数及运行时参数<<<>>>
- Android手机UI设计---”知乎“界面外观模仿篇(一)---使用Fragment实现底部导航以及嵌套
- Sublime MinGw实现C/C++代码编译运行
- Caffe+CUDA8.0+CuDNNv5.1+OpenCV3.1+Ubuntu14.04 配置参考文献 以及 常见编译问题总结
- android listview嵌套gridview,并实现grid元素部分显示以及点击展开与折叠
- Web Service之LAMP- 2 基于FastCGI 的编译安装以及 Xcache实现PHP的加速
- pytorch-0.2成功调用GPU:ubuntu16.04,Nvidia驱动安装以及最新cuda9.0与cudnnV7.0配置