您的位置：首页 > 编程语言 > Python开发

Python与图像处理2

2016-09-27 23:34 330 查看

1. np.argmin和np.argmax

解析：min/max是python的内置函数，np.argmin和np.argmax是numpy库中的成员函数。当然也有np.max和np.min。

np.argmax()，np.argmin()可以得到取得最大或最小值时的下标。np.argsort()可以得到排序后的数据原来位置下标。

2. 视觉测量

（1）立体视觉：立体视觉是双眼观察景物能分辨物体远近形态的感觉。主要的应用有移动机器人的自主导航系统，

航空及遥感测量，工业自动化系统等。[1][2]

（2）摄影测量学：摄影测量学是研究利用摄影手段获得被测物体的图像信息，从几何和物理方面进行分析处理，对

所摄对象的本质提供各种资料的一门学科。

（3）双目立体视觉：双目立体视觉就是利用两个摄像机拍摄同一场景，根据这样的信息来重构出立体场景来，甚至

完成三维立体的显示。

（4）X光应用：医学上常用作透视检查，工业中用来探伤。X射线可用电离计、闪烁计数器和感光乳胶片等检测。X

射线衍射法已成为研究晶体结构、形貌和各种缺陷的重要手段。

（5）激光雷达：工作在红外和可见光波段的，以激光为工作光束的雷达称为激光雷达。基本原理是向目标发射探测

信号（激光束），然后将接收到的从目标反射回来的信号（目标回波）与发射信号进行比较，作适当处理后，就可获

得目标的有关信息。比如，目标距离、方位、高度、速度、姿态、甚至形状等参数，从而对飞机、导弹等目标进行探

测、跟踪和识别。

说明：摄像机是3D空间和2D图像之间的一种映射，该映射关系是由摄像机的几何模型决定的，即通常所称的摄像机

参数，是表征摄像机映射的具体性质的矩阵，求解这些参数的过程被称为摄像机标定。

3. 运动估计和运动补偿

解析：

（1）运动估计：运动估计的基本思想是将图像序列的每一帧分成许多互不重叠的宏块，并认为宏块内所有象素的位

移量都相同，然后对每个宏块到参考帧某一给定特定搜索范围内根据一定的匹配准则找出与当前块最相似的块，即匹

配块，匹配块与当前块的相对位移即为运动矢量。视频压缩的时候，只需保存运动矢量和残差数据就可以完全恢复出

当前块。运动估计主要应用在视频编码和视频处理中。

（2）运动补偿：运动补偿是一种描述相邻帧（相邻在这里表示在编码关系上相邻，在播放顺序上两帧未必相邻）差

别的方法，具体来说是描述前面一帧（相邻在这里表示在编码关系上的前面，在播放顺序上未必在当前帧前面）的每

个小块怎样移动到当前帧中的某个位置去。这种方法经常被视频压缩/视频编解码器用来减少视频序列中的空域冗余，

它也可以用来进行去交织（deinterlacing）的操作。

4. 图像特征 [3]

解析：

（1）HOG特征：方向梯度直方图（Histogram of Oriented Gradient，HOG）特征是一种在计算机视觉和图像处理中

用来进行物体检测的特征描述子。它通过计算和统计图像局部区域的梯度方向直方图来构成特征。Hog特征结合SVM

分类器已经被广泛应用于图像识别中，尤其在行人检测中获得了极大的成功。

（2）LBP特征：局部二值模式（Local Binary Pattern，LBP）是一种用来描述图像局部纹理特征的算子，它具有旋转

不变性和灰度不变性等显著的优点。

（3）Haar特征：Haar特征分为三类：边缘特征、线性特征、中心特征和对角线特征，组合成特征模板。特征模板内

有白色和黑色两种矩形，并定义该模板的特征值为白色矩形像素和减去黑色矩形像素和。Haar特征值反映了图像的灰

度变化情况。

5. faces = face_cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=3, minSize=(10,10))

解析：

（1）scaleFactor：被检测对象的尺度变化。尺度越大，越容易漏掉检测的对象，但检测速度加快；尺度越小，检测

越细致准确，但检测速度变慢；

（2）minNeighbors：数值越大，检测到对象的条件越苛刻，反之检测到对象的条件越宽松；

（3）minSize：检测的对象最小尺寸，单位是像素*像素，使对象落在检测器的大小范围内；

（4）maxSize：检测的对象最大尺寸，单位是像素*像素，使对象落在检测器的大小范围内。

说明：该方法返回的是一个列表，每个列表元素是长度为四的元组，分别表示脸部左上角的x，y值，脸部区域的宽度

和高度。

6. numpy数学函数 [5]

解析：numpy提供常见的数学函数有sin，cos，exp，all，alltrue，any，apply along axis， argmax， argmin，

argsort， average， bincount， ceil， clip， conj， conjugate， corrcoef， cov， cross， cumprod， cumsum，

diff， dot， floor， inner， inv， lexsort， max， maximum， mean， median， min， minimum， nonzero，

outer， prod， re， round， sometrue， sort， std， sum， trace， transpose， var， vdot， vectorize等。

7. OTSU（最大类间方差法）

解析：

OTSU（最大类间方差法）就是计算出灰度图最佳阈值的算法。如下所示：

（1）先对灰度图进行直方图计算并归一化处理，得到0-255之间每个像素在灰度图中出现的概率，即表示为某个像素

在灰度图中出现了
$n$
个，灰度图总的像素点为
$N$
个，则这个像素的出现概率为
$P_{i} = n / N$
。

（2）每个灰度图可以由阈值
$k$
将灰度图分为
$A$
，
$B$
两大类，很容易得到
$A$
，
$B$
类在灰度图中的出现概率以及灰度均值。

（3）计算灰度图
$A$
，
$B$
类得类间方差，在最佳阈值
$K$
处，求得的类间方差最大，也就是类间方差最大的那个时刻的阈

值就为灰度图的最佳阈值。

说明：几种常见阈值分割算法包含：Otsu、最大熵、迭代法、自适应阀值、手动、基本全局阈值法。

8. 视频输入

import cv2

# 设置视频捕获
cap = cv2.VideoCapture(0)
while True:
ret, im = cap.read()
cv2.imshow('video test', im)
key = cv2.waitKey(10)
if key == 27:
break
if key == ord(' '):
cv2.imwrite('vid_result.jpg', im)

说明：如果仅有一个摄像头与计算机相连接，那么该摄像头的id为0。如果按下的是ESC键，那么退出应用。如果按

下的是空格键，那么保存该视频帧。[6]

9. VLFeat [7]

解析：一个开源的计算机视觉库，实现了SIFT，MSER，k-means，hierarchical k-means，agglomerative

information bottleneck，quick shift等算法。

10. HSV

解析：

HSV模型中颜色的参数分别是：色调（H），饱和度（S），明度（V）。如下所示：

（1）H参数表示色彩信息，即所处的光谱颜色的位置。该参数用一角度量来表示，红、绿、蓝分别相隔120度。互补

色分别相差180度。

（2）纯度S为一比例值，范围从0到1，它表示成所选颜色的纯度和该颜色最大纯度之间的比率。S=0时，只有灰度。

（3）V表示色彩的明亮程度，范围从0到1。有一点要注意：它和光强度之间并没有直接的联系。

11. NumPy数组中的ndarray对象属性

解析：

（1）ndarray.ndim：数组的维数（即数组轴的个数），等于秩。最常见的为二维数组（矩阵）。

（2）ndarray.shape：数组的维度。为一个表示数组在每个维度上大小的整数元组。例如，二维数组中，表示数组

的“行数”和“列数”。ndarray.shape返回一个元组，这个元组的长度就是维度的数目，即ndim属性。

（3）ndarray.size：数组元素的总个数，等于shape属性中元组元素的乘积。

（4）ndarray.dtype：表示数组中元素类型的对象，可使用标准的Python类型创建或指定dtype。

（5）ndarray.itemsize：数组中每个元素的字节大小。例如，一个元素类型为float64的数组itemsiz属性值为8。比

如，一个元素类型为complex32的数组itemsize属性值为4。

（6）ndarray.data：包含实际数组元素的缓冲区，由于一般通过数组索引获取元素，所以通常不需要使用这个属性。

12. cv2.inRange

解析：cv2.inRange(src, lowerb, upperb[, dst]) → dst

（1）src：first input array.

（2）lowerb：inclusive lower boundary array or a scalar.

（3）upperb：inclusive upper boundary array or a scalar.

（4）dst：output array of the same size as src and CV_8U type.

说明：Checks if array elements lie between the elements of two other arrays.

13. cv2.resize

解析：cv2.resize(src, dsize[, dst[, fx[, fy[, interpolation]]]]) → dst

（1）dsize: output image size; either dsize or both fx and fy must be non-zero. if it equals zero, it is computed as:

.

（2）dst: output image; it has the size dsize (when it is non-zero) or the size computed from src.size(), fx, and fy;

the type of dst is the same as of src.

（3）fx: scale factor along the horizontal axis; when it equals 0, it is computed as:

.

（4）fy: scale factor along the vertical axis; when it equals 0, it is computed as:

.

14. cv2.getStructuringElement

解析：cv2.getStructuringElement(shape, ksize[, anchor]) → retval

（1）shape：Element shape that could be one of the following: MORPH_RECT, MORPH_ELLIPSE,

MORPH_CROSS, CV_SHAPE_CUSTOM.

（2）ksize: Size of the structuring element.

说明：形态学处理的核心就是定义结构元素。

15. cv2.GaussianBlur（高斯模糊）

解析：cv2.GaussianBlur(src, ksize, sigmaX[, dst[, sigmaY[, borderType]]]) → dst

（1）ksize: Gaussian kernel size. ksize.width and ksize.height can differ but they both must be positive and odd.

Or, they can be zero’s and then they are computed from sigma* .

（2）sigmaX: Gaussian kernel standard deviation in X direction.

（3）sigmaY: Gaussian kernel standard deviation in Y direction; if sigmaY is zero, it is set to be equal to sigmaX, if

both sigmas are zeros, they are computed from ksize.width and ksize.height.

说明：对于高斯模板，我们需要制定的是高斯核的高和宽（奇数），沿
$x$
与
$y$
方向的标准差（如果只给
$x$
，
$y=x$
，如果都

给0，那么函数会自己计算）。高斯核可以有效的去除图像的高斯噪声。

说明：滤波操作：线性滤波操作（方框滤波；均值滤波；高斯滤波）。非线性滤波操作（中值滤波；双边滤波）。

16. cv2.bitwise_and

解析：cv2.bitwise_and(src1, src2[, dst[, mask]]) → dst

（1）src1：first input array or a scalar.

（2）src2：second input array or a scalar.

（3）mask：optional operation mask, 8-bit single channel array, that specifies elements of the output array to be

changed.

说明：bitwise_not（非），bitwise_xor（异或），bitwise_or（或），bitwise_and（与）。

17. OpenCV模块

解析：

（1）Core functionality：a compact module defining basic data structures, including the dense multi-dimensional

array Mat and basic functions used by all other modules.

（2）Image processing：an image processing module that includes linear and non-linear image filtering,

geometrical image transformations (resize, affine and perspective warping, generic table-based remapping), color

space conversion, histograms, and so on.

（3）video：a video analysis module that includes motion estimation, background subtraction（背景减除）, and

object tracking algorithms.

（4）calib3d：basic multiple-view geometry algorithms, single and stereo camera calibration, object pose

estimation, stereo correspondence algorithms, and elements of 3D reconstruction.

（5）features2d：salient feature detectors, descriptors, and descriptor matchers.

（6）objdetect：detection of objects and instances of the predefined classes (for example, faces, eyes, mugs,

people, cars, and so on).

（7）highgui：an easy-to-use interface to simple UI capabilities.

（8）videoio：an easy-to-use interface to video capturing and video codecs.

（9）gpu：GPU-accelerated algorithms from different OpenCV modules.

（10）other modules: some other helper modules, such as FLANN and Google test wrappers, Python bindings,

and others.

主要模块，如下所示：

（1）core：Core functionality；（2）imgproc：Image processing；（3）imgcodecs：Image file reading

and writing；（4）videoio：Media I/O；（5）highgui：High-level GUI；（6）video：Video Analysis；（7）

calib3d：Camera Calibration and 3D Reconstruction；（8）features2d：2D Features Framework；（9）

objdetect：Object Detection；（10）ml：Machine Learning；（11）flann：Clustering and Search in Multi-

Dimensional Spaces；（12）photo：Computational Photography；（13）stitching：Images stitching；（14）

cudaarithm：Operations on Matrices；（15）cudabgsegm：Background Segmentation；（16）cudacodec：

Video Encoding/Decoding；

（17）cudafeatures2d：Feature Detection and Description；（18）cudafilters：Image Filtering；（19）

cudaimgproc：Image Processing；（20）cudalegacy：Legacy support；（21）cudaobjdetect：Object

Detection；（22）cudaoptflow：Optical Flow；（23）cudastereo：Stereo Correspondence；（24）

cudawarping：Image Warping；（25）cudev：Device layer；（26）shape：Shape Distance and Matching；

（27）superres：Super Resolution；（28）videostab：Video Stabilization；（29）viz：3D Visualizer。

额外模块，如下所示：

（1）aruco：ArUco Marker Detection；（2）bgsegm：Improved Background-Foreground Segmentation

Methods；（3）bioinspired：Biologically inspired vision models and derivated tools；（4）ccalib：Custom

Calibration Pattern for 3D reconstruction；（5）cvv：GUI for Interactive Visual Debugging of Computer Vision

Programs；（6）datasets：Framework for working with different datasets；（7）dnn：Deep Neural Network

module；（8）dpm：Deformable Part-based Models；（9）face：Face Recognition；（10）fuzzy：Image

processing based on fuzzy mathematics；（11）hdf：Hierarchical Data Format I/O routines；（12）

line_descriptor：Binary descriptors for lines extracted from an image；（13）matlab：MATLAB Bridge；（14）

optflow：Optical Flow Algorithms；（15）plot：Plot function for Mat data；（16）reg：Image Registration；

（17）rgbd：RGB-Depth Processing；（18）saliency：Saliency API；（19）sfm：Structure From Motion；

（20）stereo：Stereo Correspondance Algorithms；（21）structured_light：Structured Light API；（22）

surface_matching：Surface Matching；（23）text：Scene Text Detection and Recognition；（24）tracking：

Tracking API；（25）xfeatures2d：Extra 2D Features Framework；（26）ximgproc：Extended Image

Processing；（27）xobjdetect：Extended object detection；（28）xphoto：Additional photo processing

algorithms。

18. OpenCV数据类型 [8]

解析：

（1）IplImage由CvMat派生，而CvMat由CvArr派生。

（2）IplImage是为图像进行编码的基本结构。

（3）Mat是一个多维的密集数据数组，可以用来处理矩阵、图像等常见的多维数据。

（4）Mat与IplImage数据类型可以相互转换。

说明：图像操作（I/O）常用的6个函数：图像载入函数（cvLoadImage）；窗口定义函数（cvNamedWindow）；图

像显示函数（cvShowImage）；图像保存函数（cvSaveImage）；图像销毁函数（cvReleaseImage）；图像转换函

数（cvGetImage）。

19. 安装Image Watch插件 [9][10][11]

解析：Provides a watch window for visualizing in-memory images (bitmaps) when debugging native C++ code. 参考

文献[11]有个例子，如下所示：

// Test application for the Visual Studio Image Watch Debugger extension

#include <iostream>                      // std::cout
#include <opencv2/core/core.hpp>         // cv::Mat
#include <opencv2/highgui/highgui.hpp>   // cv::imread()
#include <opencv2/imgproc/imgproc.hpp>   // cv::Canny

using namespace std;
using namespace cv;

void help() {
cout
<< "----------------------------------------------------" << endl
<< "This is a test program for the Image Watch Debugger " << endl
<< "plug-in for Visual Studio. The program loads an     " << endl
<< "image from a file and runs the Canny edge detector. " << endl
<< "No output is displayed or written to disk."           << endl
<< "Usage:"                                               << endl
<< "image-watch-demo inputimage"                          << endl
<< "----------------------------------------------------" << endl
<< endl;
}

int main(int argc, char* argv[]) {
help();

if (argc != 2) {
cout << "Wrong number of parameters" << endl;
return -1;
}

cout << "Loading input image: " << argv[1] << endl;
Mat input;
input = imread(argv[1], CV_LOAD_IMAGE_COLOR);

cout << "Detecting edges in input image" << endl;
Mat edges;
Canny(input, edges, 10, 10);

return 0;
}

解析：

（1）argc是指命令行输入参数的个数，argv存储了所有命令行参数。需要说明的是argv[0]指向程序运行全路径名。

（2）VS 2013设置输入参数（比如argv[1]）方式：项目 -> 属性 -> 配置属性 -> 调试 -> 命令参数。

（3）指定读入图像的颜色：将输入的图像转为3通道（CV_LOAD_IMAGE_COLOR），单通道

（CV_LOAD_IMAGE_GRAYSCALE）或者保持不变（CV_LOAD_IMAGE_ANYCOLOR）。

（4）void Canny(InputArray image, OutputArray edges, double threshold1, double threshold2, int apertureSize=3,

bool L2gradient=false). eg: threshold1：first threshold for the hysteresis procedure; threshold2：second threshold

for the hysteresis procedure.

（5）Image Watch的本质就是把内存中的像素值可视化出来。

20. 图像矩阵在内存中的存储 [13]

解析：

（1）灰度图像

（2）BGR图像

21. OpenCV常用操作

解析：

（1）imread(argv[1], CV_LOAD_IMAGE_COLOR);

（2）namedWindow( "Display window", WINDOW_AUTOSIZE );

（3）imshow( "Display window", image );

（4）waitKey(0);

（5）cvtColor( image, gray_image, CV_BGR2GRAY );

（6）imwrite("alpha.png", mat);

（7）IplImage* img = cvLoadImage("greatwave.png", 1); Mat mtx(img);

（8）Mat E = Mat::eye(4, 4, CV_64F);

（9）Mat O = Mat::ones(2, 2, CV_32F);

（10）Mat Z = Mat::zeros(3,3, CV_8UC1);

（11）double t = (double)getTickCount(); t = ((double)getTickCount() - t)/getTickFrequency();

（12）img.channels()

（13）img.depth()

（14）cv2.split(image)

（15）cv2.merge([B,G,R])

（16）cv2.warpAffine(image,M,(image.shape[1],image.shape[0])) #M = np.float32([[1,0,25],[0,1,50]])

（17）cv2.getRotationMatrix2D(center,45,0.75)

（18）cv2.flip(image,1) #水平翻转

（19）cv2.flip(image,0) #垂直翻转

（20）cv2.flip(image,-1) #水平垂直翻转

（21）cv2.calcHist([image],[0],None,[256],[0,256]) #灰度图像直方图

（22）chans = cv2.split(image)；colors = ("b","g","r")；for (chan,color) in zip(chans,colors): hist =

cv2.calcHist([chan],[0],None,[256],[0,256]) #彩色图像直方图

（23）cv2.equalizeHist(image) #灰度图像直方图均衡化

说明：cv2.imread('lena.jpg',0)表示灰度图模式加载一副彩图。

22. 闭运算和开运算

解析：cv2.morphologyEx(src, op, kernel[, dst[, anchor[, iterations[, borderType[, borderValue]]]]]) → dst

（1）MORPH_OPEN：an opening operation

（2）MORPH_CLOSE：a closing operation

（3）MORPH_TOPHAT：top hat

（4）MORPH_BLACKHAT：black hat

说明：开运算和闭运算是不可逆的，即先开运算后闭运算并不能得到原先的图像。开运算一般会平滑物体的轮廓，断

开较窄的狭颈并消除细的突出物。闭运算同样也会平滑轮廓的一部分，但与开操作相反，它通常会弥合较窄的间断和

细长的沟壑，消除小的孔洞，填补轮廓线中的断裂。

23. 固定阈值二值化 [14]

解析：cv2.threshold(src, thresh, maxval, type[, dst]) → retval, dst

（1）thresh：threshold value.

（2）maxval：maximum value to use with the THRESH_BINARY thresholding types.

（3）type：thresholding type.

说明：即将灰度值大于thresh的像素值设置为255，其它的像素值设置为0。除了固定阈值二值化，还有算术平法的自

适应二值化，高斯加权均值法自适应二值化。

24. cv2.absdiff

解析：cv2.absdiff(src1,src2,[dst])

说明：计算两幅图像差的绝对值。

25. 获取三维矩阵的子矩阵

解析：第
$i$
行到第
$j$
行与第
$m$
列到第
$n$
列的交叉部分：newImage
= image[
$i:j,m:n$
]，image大小为：
$M\times N\times K$
。

说明：BLUE = [255,0,0]；RED = [0,0,255]；GREEN = [0,255,0]；BLACK = [0,0,0]；WHITE = [255,255,255]

参考文献：

[1] 立体视觉：http://baike.baidu.com/link?url=fZNDBeFeWGgBUO_iVmfdk3IL0FEH91IBYdUxHKqBfgxG_j2IrkyrT-0J99xadheDB6arQqh7YUBezrUhyee6EQTFP6JE0jPunjDOJlyTjkGQZ25aSpNdkIrnGB-tYUQF

[2] 人类感知的奥秘：http://amuseum.cdstm.cn/AMuseum/perceptive/index.htm

[3] 图像特征提取三大法宝：HOG特征，LBP特征，Haar特征：http://www.open-open.com/lib/view/open1440832074794.html

[4] 调整基于HAAR特征的AdaBoost级联分类器的物体识别的参数：http://blog.csdn.net/shadow_guo/article/details/44114421

[5] numpy教程：数学函数和基本统计函数：http://blog.csdn.net/pipisorry/article/details/41214819

[6] 七种常见阈值分割代码：http://www.cnblogs.com/skyseraph/archive/2010/12/21/1913058.html

[7] VLFeat：http://www.vlfeat.org/doc.html

[8] OpenCV基本数据类型：http://blog.csdn.net/abcjennifer/article/details/7629349

[9] Image Watch：https://visualstudiogallery.msdn.microsoft.com/e682d542-7ef3-402c-b857-bbfba714f78d

[10] IMAGE WATCH HELP：http://research.microsoft.com/en-us/um/redmond/groups/ivm/imagewatchhelp/imagewatchhelp.htm

[11] Image Watch: viewing in-memory images in the Visual Studio debugger：http://docs.opencv.org/2.4/doc/tutorials/introduction/windows_visual_studio_image_watch/windows_visual_studio_image_watch.html#windows-visual-studio-image-watch

[12] VS 2013代码调试：http://jingyan.baidu.com/season/48337

[13] OpenCV中CvMat矩阵的内存存储方式：http://blog.sina.com.cn/s/blog_74f32c400101b140.html

[14] 利用Python OpenCV实现图像自适应二值化：http://blog.csdn.net/deerlux/article/details/48477219

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： python vlfeat hog lbp haar

相关文章推荐

新的分享

章节导航