卷积神经网络CNN-高级
2017-08-28 09:52
381 查看
卷积神经网络CNN-高级
AlexNet:现代神经网络起源
结构源自ImageNet比赛。1. AlexNet结构
在卷积的时候,我们会依据这个公式来提取特征图: 【img_size - filter_size】/stride +1 = new_feture_size
INPUT:224×224×3(RGB图像),实际上会经过预处理变为227×227×3的大小;
CONV1:使用的96个大小规格为11×11的卷积核,stride为4,pad为0,进行特征提取,(ps:图上之所以看起来是48个是由于采用了2个GPU服务器处理,每一个服务器上承担了48个),得到两个55×55×96大小的特征图,([227-11] / 4 + 1 )= 55;
MAX POOL1:使用3×3大小的filters,stride为2,进行最大池化,得到27×27×96池化后的特征图;
NORM1:归一化层;
CONV2:使用的256个大小规格为5×5的卷积核,stride为1,pad为2,进行特征提取,得到两个27×27×128大小的特征图;
MAX POOL2:使用3×3大小的filters,stride为2,进行最大池化,得到13×13×256池化后的特征图;
NORM2:归一化层;
CONV3:使用的384个大小规格为3×3的卷积核,stride为1,pad为1,得到13×13×384个特征图;
CONV4:使用的384个大小规格为3×3的卷积核,stride为1,pad为1,得到13×13×384个特征图;
CONV5:使用的256个大小规格为3×3的卷积核,stride为1,pad为1,得到13×13×256个特征图;
MAX POOL3:使用3×3大小的filters,stride为2,进行最大池化,得到6×6×256池化后的特征图;
FC6:4096个神经元;
FC7:4096个神经元;
FC8:1000个神经元。
2. VGG:AlexNet增强版
AlexNet和VGG结构对比
Group表示用多个卷积层代替一个卷积层。
VGG结构
VGG作用
结构简单,与AlexNet结构类似;性能优异,较AlexNet提升明显,与GoogleNet、ResNet表现接近;
选择最多的基本模型,方便进行结构的优化设计。
3. GoogleNet:多维度识别
结构发展变化
结构存在的问题:直接使用3×3等大小的卷积核,会使得结构参数变得很多;
使用1×1卷积核的好处:会用少的数据量,将数据降维,减少特征图的“厚度”。
结构的细节
全卷积结构FCN
一般的神经网络:卷积层+全连接层;全卷积网络:没有全连接层。(全连接层需要的参数很多)
特点:
输入图片大小无限制;
空间信息有丢失;
参数更少,表达能力更强。
4. ResNet:机器超越人类识别
结构特征
问题:为什么ResNet有效?
1.前向计算:低层卷积网络高层卷积网络信息融合;层数越深,模型的表力越强;
2.反向计算:导数传递更直接,越过模型,直达各层;
5. DeepFace:结构化图片的特殊处理
人脸识别:通过观察人脸确定对应的身份,在应用中跟多的是确认(verification)。人脸识别数据的特点
结构化:所有人脸,组成相似,理论上能够实现对齐;差异化:相同位置,形貌不同;
局部卷积
每个卷积核固定某个区域,不移动;不同区域之间不共享卷积核;
卷积核参数由固定区域数据确定。
全局部卷积连接的缺陷
预处理:大量对准,对准要求高,原始信息可能丢失;卷积参数数量很大,模型收敛难度大,需要大量数据;
模型可扩展性差,基本限于人脸计算。
6. U-Net:图片生成网络
通过卷积神经网络生成特殊类型的图片,图片所有pixel需要生成,多目标回归。VGG U-Net
卷积-逆卷积,池化-反池化
反池化记住原有位置,而不是resize;
逆卷积
实质:有学习能力的上采样。其生成图片具有更好的连贯性,具有更好的空间表达能力。
图片分割图生成
7. VGG实例:
以下是一个VGG16的TensorFlow模型。需要用到的工具函数—
utils.py:
import skimage import skimage.io import skimage.transform import numpy as np # synset = [l.strip() for l in open('synset.txt').readlines()] # returns image of shape [224, 224, 3] # [height, width, depth] def load_image(path): # load image img = skimage.io.imread(path) img = img / 255.0 assert (0 <= img).all() and (img <= 1.0).all() # print "Original Image Shape: ", img.shape # we crop image from center short_edge = min(img.shape[:2]) yy = int((img.shape[0] - short_edge) / 2) xx = int((img.shape[1] - short_edge) / 2) crop_img = img[yy: yy + short_edge, xx: xx + short_edge] # resize to 224, 224 resized_img = skimage.transform.resize(crop_img, (224, 224)) return resized_img # returns the top1 string def print_prob(prob, file_path): synset = [l.strip() for l in open(file_path).readlines()] # print prob pred = np.argsort(prob)[::-1] # Get top1 label top1 = synset[pred[0]] print(("Top1: ", top1, prob[pred[0]])) # Get top5 label top5 = [(synset[pred[i]], prob[pred[i]]) for i in range(5)] print(("Top5: ", top5)) return top1 def load_image2(path, height=None, width=None): # load image img = skimage.io.imread(path) img = img / 255.0 if height is not None and width is not None: ny = height nx = width elif height is not None: ny = height nx = img.shape[1] * ny / img.shape[0] elif width is not None: nx = width ny = img.shape[0] * nx / img.shape[1] else: ny = img.shape[0] nx = img.shape[1] return skimage.transform.resize(img, (ny, nx)) def test(): img = skimage.io.imread("./test_data/starry_night.jpg") ny = 300 nx = img.shape[1] * ny / img.shape[0] img = skimage.transform.resize(img, (ny, nx)) skimage.io.imsave("./test_data/test/output.jpg", img) if __name__ == "__main__": test()
VGG16模型结构—
vgg16.py:
import inspect import os import numpy as np import tensorflow as tf import time VGG_MEAN = [103.939, 116.779, 123.68] class Vgg16: def __init__(self, vgg16_npy_path=None): if vgg16_npy_path is None: path = inspect.getfile(Vgg16) path = os.path.abspath(os.path.join(path, os.pardir)) path = os.path.join(path, "vgg16.npy") vgg16_npy_path = path print(path) self.data_dict = np.load(vgg16_npy_path, encoding='latin1').item() print("npy file loaded") def build(self, rgb): """ load variable from npy to build the VGG :param rgb: rgb image [batch, height, width, 3] values scaled [0, 1] """ start_time = time.time() print("build model started") rgb_scaled = rgb * 255.0 # Convert RGB to BGR red, green, blue = tf.split(axis=3, num_or_size_splits=3, value=rgb_scaled) assert red.get_shape().as_list()[1:] == [224, 224, 1] assert green.get_shape().as_list()[1:] == [224, 224, 1] assert blue.get_shape().as_list()[1:] == [224, 224, 1] bgr = tf.concat(axis=3, values=[ blue - VGG_MEAN[0], green - VGG_MEAN[1], red - VGG_MEAN[2], ]) assert bgr.get_shape().as_list()[1:] == [224, 224, 3] self.conv1_1 = self.conv_layer(bgr, "conv1_1") self.conv1_2 = self.conv_layer(self.conv1_1, "conv1_2") self.pool1 = self.max_pool(self.conv1_2, 'pool1') self.conv2_1 = self.conv_layer(self.pool1, "conv2_1") self.conv2_2 = self.conv_layer(self.conv2_1, "conv2_2") self.pool2 = self.max_pool(self.conv2_2, 'pool2') self.conv3_1 = self.conv_layer(self.pool2, "conv3_1") self.conv3_2 = self.conv_layer(self.conv3_1, "conv3_2") self.conv3_3 = self.conv_layer(self.conv3_2, "conv3_3") self.pool3 = self.max_pool(self.conv3_3, 'pool3') self.conv4_1 = self.conv_layer(self.pool3, "conv4_1") self.conv4_2 = self.conv_layer(self.conv4_1, "conv4_2") self.conv4_3 = self.conv_layer(self.conv4_2, "conv4_3") self.pool4 = self.max_pool(self.conv4_3, 'pool4') self.conv5_1 = self.conv_layer(self.pool4, "conv5_1") self.conv5_2 = self.conv_layer(self.conv5_1, "conv5_2") self.conv5_3 = self.conv_layer(self.conv5_2, "conv5_3") self.pool5 = self.max_pool(self.conv5_3, 'pool5') self.fc6 = self.fc_layer(self.pool5, "fc6") assert self.fc6.get_shape().as_list()[1:] == [4096] self.relu6 = tf.nn.relu(self.fc6) self.fc7 = self.fc_layer(self.relu6, "fc7") self.relu7 = tf.nn.relu(self.fc7) self.fc8 = self.fc_layer(self.relu7, "fc8") self.prob = tf.nn.softmax(self.fc8, name="prob") self.data_dict = None print(("build model finished: %ds" % (time.time() - start_time))) def avg_pool(self, bottom, name): return tf.nn.avg_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name) def max_pool(self, bottom, name): return tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name) def conv_layer(self, bottom, name): with tf.variable_scope(name): filt = self.get_conv_filter(name) conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME') conv_biases = self.get_bias(name) bias = tf.nn.bias_add(conv, conv_biases) relu = tf.nn.relu(bias) return relu def fc_layer(self, bottom, name): with tf.variable_scope(name): shape = bottom.get_shape().as_list() dim = 1 for d in shape[1:]: dim *= d x = tf.reshape(bottom, [-1, dim]) weights = self.get_fc_weight(name) biases = self.get_bias(name) # Fully connected layer. Note that the '+' operation automatically # broadcasts the biases. fc = tf.nn.bias_add(tf.matmul(x, weights), biases) return fc def get_conv_filter(self, name): return tf.constant(self.data_dict[name][0], name="filter") def get_bias(self, name): return tf.constant(self.data_dict[name][1], name="biases") def get_fc_weight(self, name): return tf.constant(self.data_dict[name][0], name="weights")
导入训练好的VGG16模型参数进行训练—
vgg16_test.py:
import numpy as np import tensorflow as tf # import matplotlib.pyplot as plt import matplotlib.image as mpimg import skimage import vgg16 import utils img1 = utils.load_image("./test_data/dog.png") print img1.shape batch = img1.reshape((1, 224, 224, 3)) #plot the image # imgshow1=plt.imshow(img1) # with tf.Session(config=tf.ConfigProto(gpu_options=(tf.GPUOptions(per_process_gpu_memory_fraction=0.7)))) as sess: with tf.device('/cpu:0'): with tf.Session() as sess: images = tf.placeholder("float", [1, 224, 224, 3]) feed_dict = {images: batch} vgg = vgg16.Vgg16() with tf.name_scope("content_vgg"): vgg.build(images) prob = sess.run(vgg.prob, feed_dict=feed_dict) top5 = np.argsort(prob[0])[-1:-6:-1] for n, label in enumerate(top5): print label pool1 = sess.run(vgg.pool1, feed_dict=feed_dict) print pool1.shape conv3_3=sess.run(vgg.conv3_3, feed_dict=feed_dict) print conv3_3.shape #now let's plot the model filters vgg = vgg16.Vgg16() #get the saved parameter dict keys print vgg.data_dict.keys() #show the first conv layer filter_conv1=vgg.get_conv_filter("conv1_1") print 'filter_conv1', filter_conv1.shape tf.Print(filter_conv1[:,:,:,:5],[filter_conv1[:,:,:,:5]]) filter_conv3=vgg.get_conv_filter("conv3_3") print 'filter_conv3', filter_conv3.shape tf.Print(filter_conv3[:,:,:3,:5],[filter_conv3[:,:,:3,:5]])
相关文章推荐
- Deep Learning-TensorFlow (10) CNN卷积神经网络_ TFLearn 快速搭建深度学习模型
- 吴恩达deeplearning之CNN—卷积神经网络入门
- 深度学习笔记二-CNN(卷积神经网络)是什么?
- Deep Learning-TensorFlow (12) CNN卷积神经网络_ Network in Network 学习笔记
- 卷积神经网络CNN
- Matlab编程之——卷积神经网络CNN代码解析
- CNN(卷积神经网络)、RNN(循环神经网络)、DNN(深度神经网络)的内部网络结构有什么区别?
- TensorFlow:Chap6笔记总结(卷积神经网络CNN)
- Deep Learning模型之:CNN卷积神经网络(一)深度解析CNN
- 深度学习之卷积神经网络CNN及tensorflow代码实现示例
- 卷积神经网络(CNN)和Tensorflow初探——MNIST
- 深度学习|卷积神经网络(CNN)介绍(前篇)
- CNN笔记:通俗理解卷积神经网络
- (Convolutional Neural Networks)CNN-卷积神经网络学习
- (5)Deep Learning模型之:CNN卷积神经网络的全面详细概述之一
- 卷积神经网络(二):卷积神经网络CNN的BP算法
- CNN(卷积神经网络)、RNN(循环神经网络)、DNN(深度神经网络)
- 深度学习(DL)与卷积神经网络(CNN)学习笔记随笔-03-基于Python的LeNet之LR
- Deep Learning论文笔记之(五)CNN卷积神经网络代码理解
- 卷积神经网络(CNN)