intergral image with cuda
2015-09-20 21:29
363 查看
Blelloch:
For an input array with size n, the scan algorithm has computational complexity of O(n), and it consists of two phases: the reduce phase (or the up-sweep phase) and the down-sweep phase. We can visualize the reduce phase as building a binary tree (Figure
1), at each level reducing the number of nodes by half, and making one addition per node.
Since the operations are performed in place using shared memory, the tree we build is not an actual data structure, but helps explaining the algorithm.
In the down-sweep phase, we traverse the tree from the root to the leaves, and use the partial sums we computed in the reduce phase to obtain the scanned array. We note that the last element is set to zero in the beginning and it propagates to reach
the beginning of the array, thus resulting in an exclusive computation.
CUDA calculate:
CUDPPHandle cudpp_lib;
CUDPP_SAFE_CALL( cudppCreate(&cudpp_lib) );
// Setup cudpp multiscan plans for computing the integral image
CUDPPHandle mscan_plan, mscan_tr_plan;
CUDPPConfiguration cudpp_conf;
cudpp_conf.op = CUDPP_ADD;
cudpp_conf.datatype = CUDPP_FLOAT;
cudpp_conf.algorithm = CUDPP_SCAN;
cudpp_conf.options = CUDPP_OPTION_FORWARD | CUDPP_OPTION_INCLUSIVE;
CUDPP_SAFE_CALL( cudppPlan(cudpp_lib, &mscan_plan, cudpp_conf,
img_width, img_height, gray_img_pitch / sizeof(float)) );
CUDPP_SAFE_CALL( cudppPlan(cudpp_lib, &mscan_tr_plan, cudpp_conf,
img_height, img_width, int_img_tr_pitch / sizeof(float)) );
CUDPP_SAFE_CALL( cudppMultiScan(mscan_plan, d_int_img, d_gray_img, img_width, img_height) );
transposeGPU(d_int_img_tr, int_img_tr_pitch,
d_int_img, int_img_pitch,
img_width, img_height);
CUDPP_SAFE_CALL( cudppMultiScan(mscan_tr_plan, d_int_img_tr2, d_int_img_tr, img_height, img_width) );
transposeGPU(d_int_img, int_img_pitch,
d_int_img_tr2, int_img_tr_pitch,
img_height, img_width);
CUDPP_SAFE_CALL( cudppDestroyPlan(mscan_plan) );
CUDPP_SAFE_CALL( cudppDestroyPlan(mscan_tr_plan) );
CUDPP_SAFE_CALL( cudppDestroy( cudpp_lib ) );
CUDA_SAFE_CALL( cudaFree(d_rgb_img) );
CUDA_SAFE_CALL( cudaFree(d_gray_img) );
CUDA_SAFE_CALL( cudaFree(d_int_img_tr) );
CUDA_SAFE_CALL( cudaFree(d_int_img_tr2) );
For an input array with size n, the scan algorithm has computational complexity of O(n), and it consists of two phases: the reduce phase (or the up-sweep phase) and the down-sweep phase. We can visualize the reduce phase as building a binary tree (Figure
1), at each level reducing the number of nodes by half, and making one addition per node.
Since the operations are performed in place using shared memory, the tree we build is not an actual data structure, but helps explaining the algorithm.
In the down-sweep phase, we traverse the tree from the root to the leaves, and use the partial sums we computed in the reduce phase to obtain the scanned array. We note that the last element is set to zero in the beginning and it propagates to reach
the beginning of the array, thus resulting in an exclusive computation.
CUDA calculate:
CUDPPHandle cudpp_lib;
CUDPP_SAFE_CALL( cudppCreate(&cudpp_lib) );
// Setup cudpp multiscan plans for computing the integral image
CUDPPHandle mscan_plan, mscan_tr_plan;
CUDPPConfiguration cudpp_conf;
cudpp_conf.op = CUDPP_ADD;
cudpp_conf.datatype = CUDPP_FLOAT;
cudpp_conf.algorithm = CUDPP_SCAN;
cudpp_conf.options = CUDPP_OPTION_FORWARD | CUDPP_OPTION_INCLUSIVE;
CUDPP_SAFE_CALL( cudppPlan(cudpp_lib, &mscan_plan, cudpp_conf,
img_width, img_height, gray_img_pitch / sizeof(float)) );
CUDPP_SAFE_CALL( cudppPlan(cudpp_lib, &mscan_tr_plan, cudpp_conf,
img_height, img_width, int_img_tr_pitch / sizeof(float)) );
CUDPP_SAFE_CALL( cudppMultiScan(mscan_plan, d_int_img, d_gray_img, img_width, img_height) );
transposeGPU(d_int_img_tr, int_img_tr_pitch,
d_int_img, int_img_pitch,
img_width, img_height);
CUDPP_SAFE_CALL( cudppMultiScan(mscan_tr_plan, d_int_img_tr2, d_int_img_tr, img_height, img_width) );
transposeGPU(d_int_img, int_img_pitch,
d_int_img_tr2, int_img_tr_pitch,
img_height, img_width);
CUDPP_SAFE_CALL( cudppDestroyPlan(mscan_plan) );
CUDPP_SAFE_CALL( cudppDestroyPlan(mscan_tr_plan) );
CUDPP_SAFE_CALL( cudppDestroy( cudpp_lib ) );
CUDA_SAFE_CALL( cudaFree(d_rgb_img) );
CUDA_SAFE_CALL( cudaFree(d_gray_img) );
CUDA_SAFE_CALL( cudaFree(d_int_img_tr) );
CUDA_SAFE_CALL( cudaFree(d_int_img_tr2) );
相关文章推荐
- Computer Systems: A Programmer's Perspective --- 读书笔记1
- Longest Consecutive Sequence——Leetcode
- 过拟合问题以及解决方法
- poj3074/3076 Dancing Links (数独)
- 二叉树前序、中序、后序遍历相互求法
- 干货:结合Scikit-learn介绍几种常用的特征选择方法
- 【软考2】Java语言的基本知识汇总
- 【软考2】Java语言的基本知识汇总
- Java位运算部分实例代码
- 链表归并排序的递归与非递归实现
- 如何使用 vimdiff 来 git diff
- 剑指offer | 开篇
- 获取Hadoop的源码和通过Eclipse关联Hadoop的源码
- ExtJS Combobox 属性详解和默认值选中
- 常用的邮箱服务器(SMTP、POP3)地址、端口
- 正则表达式语法-转自MSDN
- lintcode-矩阵归零-162
- 学习 《算法导论》第13章 红黑树 总结二
- Fibonacci----poj3070(矩阵快速幂, 模板)
- 九点二十