您的位置:首页 > 其它

CUDAExample-0-asyncAPI

2015-10-10 18:17 344 查看
标签: CUDAExample

This sample illustrates the usage of CUDA events for both GPU timing and overlapping CPU and GPU execution. Events are inserted into a stream of CUDA calls. Since CUDA stream calls are asynchronous, the CPU can perform computations while GPU is executing (including DMA memcopies between the host and device). CPU can query CUDA events to determine whether GPU has completed tasks.

例程主要用于说明,gpu和cpu是可以同是执行的,即当gpu工作时,cpu也在工作,例子中先在gpu中计算加法,在cpu中进行计数,输出gpu使用时间,以及cpu计数。

查看显卡硬件信息

int devID;
cudaDeviceProp deviceProps; //定义结构体类型的变量,保存硬件信息

// This will pick the best possible CUDA capable device
devID = findCudaDevice(argc, (const char **)argv);   //寻找合适的设备,只选一个

// get device name
checkCudaErrors(cudaGetDeviceProperties(&deviceProps, devID));
printf("CUDA device [%s]\n", deviceProps.name);


运行结果:

CUDA device [GeForce GTX 980]

其中函数findCudaDevice()来自库文件helper_cuda.h,选择硬件,优先选择用户指定的设备,如果用户没有指定,则选择Gflops(每秒千兆次浮点运算)最高的一个硬件,返回设备ID号。

主要代码

// create cuda event handles
cudaEvent_t start, stop;
checkCudaErrors(cudaEventCreate(&start));
checkCudaErrors(cudaEventCreate(&stop));

StopWatchInterface *timer = NULL;
sdkCreateTimer(&timer);
sdkResetTimer(&timer);

checkCudaErrors(cudaDeviceSynchronize());
float gpu_time = 0.0f;

// asynchronously issue work to the GPU (all to stream 0)
sdkStartTimer(&timer);
cudaEventRecord(start, 0);
cudaMemcpyAsync(d_a, a, nbytes, cudaMemcpyHostToDevice, 0);
increment_kernel<<<blocks, threads, 0, 0>>>(d_a, value); //gpu并行计算加法核函数
cudaMemcpyAsync(a, d_a, nbytes, cudaMemcpyDeviceToHost, 0);
cudaEventRecord(stop, 0);
sdkStopTimer(&timer);

// have CPU do some work while waiting for stage 1 to finish
unsigned long int counter=0;

while (cudaEventQuery(stop) == cudaErrorNotReady) //在gpu结束计算之间cpu进行计数
{
counter++;
}

checkCudaErrors(cudaEventElapsedTime(&gpu_time, start, stop));

// print the cpu and gpu times 输出
printf("time spent executing by the GPU: %.2f\n", gpu_time);
printf("time spent by CPU in CUDA calls: %.2f\n", sdkGetTimerValue(&timer));
printf("CPU executed %lu iterations while waiting for GPU to finish\n", counter);


运行结果

CUDA device [GeForce GTX 980]

time spent executing by the GPU: 12.40

time spent by CPU in CUDA calls: 0.04

CPU executed 2439 iterations while waiting for GPU to finish

从结果中可以看出cpu和gpu可以同时工作。

End
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: