您的位置：首页 > 理论基础 > 计算机网络

卷积神经网络CNN-高级

2017-08-28 09:52 381 查看

卷积神经网络CNN-高级

AlexNet：现代神经网络起源

结构源自ImageNet比赛。

1. AlexNet结构

在卷积的时候，我们会依据这个公式来提取特征图：【img_size - filter_size】/stride +1 = new_feture_size

INPUT：224×224×3（RGB图像）,实际上会经过预处理变为227×227×3的大小；

CONV1：使用的96个大小规格为11×11的卷积核，stride为4，pad为0，进行特征提取，（ps:图上之所以看起来是48个是由于采用了2个GPU服务器处理，每一个服务器上承担了48个），得到两个55×55×96大小的特征图，（[227-11] / 4 + 1 ）= 55；

MAX POOL1：使用3×3大小的filters，stride为2，进行最大池化，得到27×27×96池化后的特征图；

NORM1：归一化层；

CONV2：使用的256个大小规格为5×5的卷积核，stride为1，pad为2，进行特征提取，得到两个27×27×128大小的特征图；

MAX POOL2：使用3×3大小的filters，stride为2，进行最大池化，得到13×13×256池化后的特征图；

NORM2：归一化层；

CONV3：使用的384个大小规格为3×3的卷积核，stride为1，pad为1，得到13×13×384个特征图；

CONV4：使用的384个大小规格为3×3的卷积核，stride为1，pad为1，得到13×13×384个特征图；

CONV5：使用的256个大小规格为3×3的卷积核，stride为1，pad为1，得到13×13×256个特征图；

MAX POOL3：使用3×3大小的filters，stride为2，进行最大池化，得到6×6×256池化后的特征图；

FC6：4096个神经元；

FC7：4096个神经元；

FC8：1000个神经元。

2. VGG：AlexNet增强版

AlexNet和VGG结构对比

Group表示用多个卷积层代替一个卷积层。

VGG结构

VGG作用

结构简单，与AlexNet结构类似；

性能优异，较AlexNet提升明显，与GoogleNet、ResNet表现接近；

选择最多的基本模型，方便进行结构的优化设计。

3. GoogleNet：多维度识别

结构发展变化

结构存在的问题：直接使用3×3等大小的卷积核，会使得结构参数变得很多；

使用1×1卷积核的好处：会用少的数据量，将数据降维，减少特征图的“厚度”。

结构的细节

全卷积结构FCN

一般的神经网络：卷积层+全连接层；

全卷积网络：没有全连接层。（全连接层需要的参数很多）

特点：

输入图片大小无限制；

空间信息有丢失；

参数更少，表达能力更强。

4. ResNet：机器超越人类识别

结构特征

问题：为什么ResNet有效？

1.前向计算：低层卷积网络高层卷积网络信息融合；层数越深，模型的表力越强；

2.反向计算：导数传递更直接，越过模型，直达各层；

5. DeepFace：结构化图片的特殊处理

人脸识别：通过观察人脸确定对应的身份，在应用中跟多的是确认（verification）。

人脸识别数据的特点

结构化：所有人脸，组成相似，理论上能够实现对齐；

差异化：相同位置，形貌不同；

局部卷积

每个卷积核固定某个区域，不移动；

不同区域之间不共享卷积核；

卷积核参数由固定区域数据确定。

全局部卷积连接的缺陷

预处理：大量对准，对准要求高，原始信息可能丢失；

卷积参数数量很大，模型收敛难度大，需要大量数据；

模型可扩展性差，基本限于人脸计算。

6. U-Net：图片生成网络

通过卷积神经网络生成特殊类型的图片，图片所有pixel需要生成，多目标回归。

VGG U-Net

卷积-逆卷积，池化-反池化

反池化

记住原有位置，而不是resize；

逆卷积

实质：有学习能力的上采样。其生成图片具有更好的连贯性，具有更好的空间表达能力。

图片分割图生成

7. VGG实例：

以下是一个VGG16的TensorFlow模型。

需要用到的工具函数—

utils.py

：

import skimage
import skimage.io
import skimage.transform
import numpy as np

# synset = [l.strip() for l in open('synset.txt').readlines()]

# returns image of shape [224, 224, 3]
# [height, width, depth]
def load_image(path):
# load image
img = skimage.io.imread(path)
img = img / 255.0
assert (0 <= img).all() and (img <= 1.0).all()
# print "Original Image Shape: ", img.shape
# we crop image from center
short_edge = min(img.shape[:2])
yy = int((img.shape[0] - short_edge) / 2)
xx = int((img.shape[1] - short_edge) / 2)
crop_img = img[yy: yy + short_edge, xx: xx + short_edge]
# resize to 224, 224
resized_img = skimage.transform.resize(crop_img, (224, 224))
return resized_img

# returns the top1 string
def print_prob(prob, file_path):
synset = [l.strip() for l in open(file_path).readlines()]

# print prob
pred = np.argsort(prob)[::-1]

# Get top1 label
top1 = synset[pred[0]]
print(("Top1: ", top1, prob[pred[0]]))
# Get top5 label
top5 = [(synset[pred[i]], prob[pred[i]]) for i in range(5)]
print(("Top5: ", top5))
return top1

def load_image2(path, height=None, width=None):
# load image
img = skimage.io.imread(path)
img = img / 255.0
if height is not None and width is not None:
ny = height
nx = width
elif height is not None:
ny = height
nx = img.shape[1] * ny / img.shape[0]
elif width is not None:
nx = width
ny = img.shape[0] * nx / img.shape[1]
else:
ny = img.shape[0]
nx = img.shape[1]
return skimage.transform.resize(img, (ny, nx))

def test():
img = skimage.io.imread("./test_data/starry_night.jpg")
ny = 300
nx = img.shape[1] * ny / img.shape[0]
img = skimage.transform.resize(img, (ny, nx))
skimage.io.imsave("./test_data/test/output.jpg", img)

if __name__ == "__main__":
test()

VGG16模型结构—

vgg16.py

：

import inspect
import os

import numpy as np
import tensorflow as tf
import time

VGG_MEAN = [103.939, 116.779, 123.68]

class Vgg16:
def __init__(self, vgg16_npy_path=None):
if vgg16_npy_path is None:
path = inspect.getfile(Vgg16)
path = os.path.abspath(os.path.join(path, os.pardir))
path = os.path.join(path, "vgg16.npy")
vgg16_npy_path = path
print(path)

self.data_dict = np.load(vgg16_npy_path, encoding='latin1').item()
print("npy file loaded")

def build(self, rgb):
"""
load variable from npy to build the VGG
:param rgb: rgb image [batch, height, width, 3] values scaled [0, 1]
"""

start_time = time.time()
print("build model started")
rgb_scaled = rgb * 255.0

# Convert RGB to BGR
red, green, blue = tf.split(axis=3, num_or_size_splits=3, value=rgb_scaled)
assert red.get_shape().as_list()[1:] == [224, 224, 1]
assert green.get_shape().as_list()[1:] == [224, 224, 1]
assert blue.get_shape().as_list()[1:] == [224, 224, 1]
bgr = tf.concat(axis=3, values=[
blue - VGG_MEAN[0],
green - VGG_MEAN[1],
red - VGG_MEAN[2],
])
assert bgr.get_shape().as_list()[1:] == [224, 224, 3]

self.conv1_1 = self.conv_layer(bgr, "conv1_1")
self.conv1_2 = self.conv_layer(self.conv1_1, "conv1_2")
self.pool1 = self.max_pool(self.conv1_2, 'pool1')

self.conv2_1 = self.conv_layer(self.pool1, "conv2_1")
self.conv2_2 = self.conv_layer(self.conv2_1, "conv2_2")
self.pool2 = self.max_pool(self.conv2_2, 'pool2')

self.conv3_1 = self.conv_layer(self.pool2, "conv3_1")
self.conv3_2 = self.conv_layer(self.conv3_1, "conv3_2")
self.conv3_3 = self.conv_layer(self.conv3_2, "conv3_3")
self.pool3 = self.max_pool(self.conv3_3, 'pool3')

self.conv4_1 = self.conv_layer(self.pool3, "conv4_1")
self.conv4_2 = self.conv_layer(self.conv4_1, "conv4_2")
self.conv4_3 = self.conv_layer(self.conv4_2, "conv4_3")
self.pool4 = self.max_pool(self.conv4_3, 'pool4')

self.conv5_1 = self.conv_layer(self.pool4, "conv5_1")
self.conv5_2 = self.conv_layer(self.conv5_1, "conv5_2")
self.conv5_3 = self.conv_layer(self.conv5_2, "conv5_3")
self.pool5 = self.max_pool(self.conv5_3, 'pool5')

self.fc6 = self.fc_layer(self.pool5, "fc6")
assert self.fc6.get_shape().as_list()[1:] == [4096]
self.relu6 = tf.nn.relu(self.fc6)

self.fc7 = self.fc_layer(self.relu6, "fc7")
self.relu7 = tf.nn.relu(self.fc7)

self.fc8 = self.fc_layer(self.relu7, "fc8")

self.prob = tf.nn.softmax(self.fc8, name="prob")

self.data_dict = None
print(("build model finished: %ds" % (time.time() - start_time)))

def avg_pool(self, bottom, name):
return tf.nn.avg_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)

def max_pool(self, bottom, name):
return tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name=name)

def conv_layer(self, bottom, name):
with tf.variable_scope(name):
filt = self.get_conv_filter(name)

conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME')

conv_biases = self.get_bias(name)
bias = tf.nn.bias_add(conv, conv_biases)

relu = tf.nn.relu(bias)
return relu

def fc_layer(self, bottom, name):
with tf.variable_scope(name):
shape = bottom.get_shape().as_list()
dim = 1
for d in shape[1:]:
dim *= d
x = tf.reshape(bottom, [-1, dim])

weights = self.get_fc_weight(name)
biases = self.get_bias(name)

# Fully connected layer. Note that the '+' operation automatically
# broadcasts the biases.
fc = tf.nn.bias_add(tf.matmul(x, weights), biases)

return fc

def get_conv_filter(self, name):
return tf.constant(self.data_dict[name][0], name="filter")

def get_bias(self, name):
return tf.constant(self.data_dict[name][1], name="biases")

def get_fc_weight(self, name):
return tf.constant(self.data_dict[name][0], name="weights")

导入训练好的VGG16模型参数进行训练—

vgg16_test.py

：

import numpy as np
import tensorflow as tf
# import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import skimage
import vgg16
import utils

img1 = utils.load_image("./test_data/dog.png")

print img1.shape

batch = img1.reshape((1, 224, 224, 3))

#plot the image

# imgshow1=plt.imshow(img1)

# with tf.Session(config=tf.ConfigProto(gpu_options=(tf.GPUOptions(per_process_gpu_memory_fraction=0.7)))) as sess:
with tf.device('/cpu:0'):
with tf.Session() as sess:
images = tf.placeholder("float", [1, 224, 224, 3])
feed_dict = {images: batch}

vgg = vgg16.Vgg16()
with tf.name_scope("content_vgg"):
vgg.build(images)

prob = sess.run(vgg.prob, feed_dict=feed_dict)
top5 = np.argsort(prob[0])[-1:-6:-1]
for n, label in enumerate(top5):
print label
pool1 = sess.run(vgg.pool1, feed_dict=feed_dict)
print pool1.shape
conv3_3=sess.run(vgg.conv3_3, feed_dict=feed_dict)
print conv3_3.shape
#now let's plot the model filters
vgg = vgg16.Vgg16()

#get the saved parameter dict keys
print vgg.data_dict.keys()

#show the first conv layer
filter_conv1=vgg.get_conv_filter("conv1_1")
print 'filter_conv1', filter_conv1.shape

tf.Print(filter_conv1[:,:,:,:5],[filter_conv1[:,:,:,:5]])

filter_conv3=vgg.get_conv_filter("conv3_3")
print 'filter_conv3', filter_conv3.shape

tf.Print(filter_conv3[:,:,:3,:5],[filter_conv3[:,:,:3,:5]])

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签： 深度学习卷积神经网络 VGG

相关文章推荐

新的分享

章节导航