CUDAExample-0-asyncAPI
2015-10-10 18:17
344 查看
标签: CUDAExample
This sample illustrates the usage of CUDA events for both GPU timing and overlapping CPU and GPU execution. Events are inserted into a stream of CUDA calls. Since CUDA stream calls are asynchronous, the CPU can perform computations while GPU is executing (including DMA memcopies between the host and device). CPU can query CUDA events to determine whether GPU has completed tasks.
例程主要用于说明,gpu和cpu是可以同是执行的,即当gpu工作时,cpu也在工作,例子中先在gpu中计算加法,在cpu中进行计数,输出gpu使用时间,以及cpu计数。
查看显卡硬件信息
运行结果:
CUDA device [GeForce GTX 980]
其中函数findCudaDevice()来自库文件helper_cuda.h,选择硬件,优先选择用户指定的设备,如果用户没有指定,则选择Gflops(每秒千兆次浮点运算)最高的一个硬件,返回设备ID号。
主要代码
运行结果
CUDA device [GeForce GTX 980]
time spent executing by the GPU: 12.40
time spent by CPU in CUDA calls: 0.04
CPU executed 2439 iterations while waiting for GPU to finish
从结果中可以看出cpu和gpu可以同时工作。
End
This sample illustrates the usage of CUDA events for both GPU timing and overlapping CPU and GPU execution. Events are inserted into a stream of CUDA calls. Since CUDA stream calls are asynchronous, the CPU can perform computations while GPU is executing (including DMA memcopies between the host and device). CPU can query CUDA events to determine whether GPU has completed tasks.
例程主要用于说明,gpu和cpu是可以同是执行的,即当gpu工作时,cpu也在工作,例子中先在gpu中计算加法,在cpu中进行计数,输出gpu使用时间,以及cpu计数。
查看显卡硬件信息
int devID; cudaDeviceProp deviceProps; //定义结构体类型的变量,保存硬件信息 // This will pick the best possible CUDA capable device devID = findCudaDevice(argc, (const char **)argv); //寻找合适的设备,只选一个 // get device name checkCudaErrors(cudaGetDeviceProperties(&deviceProps, devID)); printf("CUDA device [%s]\n", deviceProps.name);
运行结果:
CUDA device [GeForce GTX 980]
其中函数findCudaDevice()来自库文件helper_cuda.h,选择硬件,优先选择用户指定的设备,如果用户没有指定,则选择Gflops(每秒千兆次浮点运算)最高的一个硬件,返回设备ID号。
主要代码
// create cuda event handles cudaEvent_t start, stop; checkCudaErrors(cudaEventCreate(&start)); checkCudaErrors(cudaEventCreate(&stop)); StopWatchInterface *timer = NULL; sdkCreateTimer(&timer); sdkResetTimer(&timer); checkCudaErrors(cudaDeviceSynchronize()); float gpu_time = 0.0f; // asynchronously issue work to the GPU (all to stream 0) sdkStartTimer(&timer); cudaEventRecord(start, 0); cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0); increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value); //gpu并行计算加法核函数 cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0); cudaEventRecord(stop, 0); sdkStopTimer(&timer); // have CPU do some work while waiting for stage 1 to finish unsigned long int counter=0; while (cudaEventQuery(stop) == cudaErrorNotReady) //在gpu结束计算之间cpu进行计数 { counter++; } checkCudaErrors(cudaEventElapsedTime(&gpu_time, start, stop)); // print the cpu and gpu times 输出 printf("time spent executing by the GPU: %.2f\n", gpu_time); printf("time spent by CPU in CUDA calls: %.2f\n", sdkGetTimerValue(&timer)); printf("CPU executed %lu iterations while waiting for GPU to finish\n", counter);
运行结果
CUDA device [GeForce GTX 980]
time spent executing by the GPU: 12.40
time spent by CPU in CUDA calls: 0.04
CPU executed 2439 iterations while waiting for GPU to finish
从结果中可以看出cpu和gpu可以同时工作。
End
相关文章推荐
- BZOJ 3922 - Karin的弹幕
- JUnit4 与 JMock 之双剑合璧
- 短信验证安卓集成mob.com
- Android之菜单总结
- mocha测试
- Viewpager的setOnPageChangeListener方法详解
- spawn-fcgi 源码分析
- C语言:求Sn=a+aa+aaa+aaaa+aaaaa的前5项之和,其中a是一个数字,例如:2+22+222+2222+22222
- CSU1613 Elephants
- iOS---tableview加载图片的时候的优化之lazy(懒加载)模式and异步加载模式
- WinMain与WndProc以及窗口诞生过程总结
- 手势事件传递
- UIViewController没有随着设备一起旋转的原因
- 老项目Xcode5.1编译器错误
- 使用Volley的imageRequest加载图片实例(含listview异步加载图片错位问题)
- codeforces 324# C. Marina and Vasya (贪心)
- [ERROR] Can't find messagefile '/usr/bin/share/mysql/errmsg.sys'
- 设置屏幕旋转 (以下方法按着顺序设置)
- BerkeleyDB库简介
- 构造 - SGU 109 Magic of David Copperfield II