TensorFlow之深入理解Fast Neural Style
2016-10-09 14:15
507 查看
reference: http://hacker.duanshishi.com/?p=1639
前言
前面几篇文章讲述了在Computer Vision领域里面常用的模型,接下来一段时间,我会花精力来学习一些TensorFlow在Computer Vision领域的应用,主要是分析相关pape和源码,今天会来详细了解下fast neural style的相关工作,前面也有文章分析neuralstyle的内容,那篇算是neural style的起源,但是无法应用到实际工作上,为啥呢?它每次都需要指定好content image和style image,然后最小化content loss 和style loss去生成图像,时间花销很大,而且无法保存某种风格的model,所以每次生成图像都是训练一个model的过程,而fast
neural style中能够将训练好的某种style的image的模型保存下来,然后对content image 进行transform,当然文中还提到了image transform的另一个应用方向:Super-Resolution,利用深度学习的技术将低分辨率的图像转换为高分辨率图像,现在在很多大型的互联网公司,尤其是视频网站上也有应用。
Paper原理
几个月前,就看了Neural Style相关的文章TensorFlow之深入理解Neural Style,ANeural Algorithm of Aritistic Style中构造了一个多层的卷积网络,通过最小化定义的content loss和style loss最后生成一个结合了content和style的图像,很有意思,而Perceptual
Losses for Real-Time Style Transfer and Super-Resolution,通过使用perceptual loss来替代per-pixels loss使用pre-trained的vgg model来简化原先的loss计算,增加一个transform Network,直接生成Content image的style版本, 如何实现的呢,请看下图,容我道来:
整个网络是由部分组成:image
transformation network、 loss netwrok;Image Transformation network是一个deep residual conv netwrok,用来将输入图像(content image)直接transform为带有style的图像;而loss network参数是fixed的,这里的loss network和A
Neural Algorithm of Aritistic Style中的网络结构一致,只是参数不做更新,只用来做content loss 和style loss的计算,这个就是所谓的perceptual loss,作者是这样解释的为Image Classification的pretrained的卷积模型已经很好的学习了perceptual和semantic information(场景和语义信息),所以后面的整个loss network仅仅是为了计算content loss和style loss,而不像A
Neural Algorithm of Aritistic Style做更新这部分网络的参数,这里更新的是前面的transform network的参数,所以从整个网络结构上来看输入图像通过transform network得到转换的图像,然后计算对应的loss,整个网络通过最小化这个loss去update前面的transform network,是不是很简单?
loss的计算也和之前的都很类似,content loss:
style loss:
style
loss中的gram matrix:
Gram Matrix是一个很重要的东西,他可以保证y^hat和y之间有同样的shape。 Gram的说明具体见paper这部分,我这也解释不清楚,相信读者一看就明白:
相信看到这里就基本上明白了这篇paper在fast neural style是如何做的,总结一下:
transform network 网络结构为deep residual network,将输入image转换为带有特种风格的图像,网络参数可更新。
loss network 网络结构同之前paper类似,这里主要是计算content loss和style loss, 注意参数不做更新。
Gram matrix的提出,让transform之后的图像与最后经过loss network之后的图像不同shape时计算loss也很方便。
fast neural style on tensorflow
代码参考https://github.com/OlavHN/fast-neural-style,但是我跑了下,代码是跑不通的,原因大概是tensorflow在更新之后,local_variables之后的一些问题,具体原因可以看这个issue:https://github.com/tensorflow/tensorflow/issues/1045#issuecomment-239789244.还有这个项目的代码都写在一起,有点杂乱,我将train和最后生成style后的图像的代码分开了,项目放到了我的个人的githubneural_style_tensorflow,项目基本要求:python 2.7.x
Tensorflow r0.10
VGG-19 model
COCO dataset
Transform Network网络结构
import tensorflow as tf def conv2d(x, input_filters, output_filters, kernel, strides, padding='SAME'): with tf.variable_scope('conv') as scope: shape = [kernel, kernel, input_filters, output_filters] weight = tf.Variable(tf.truncated_normal(shape, stddev=0.1), name='weight') convolved = tf.nn.conv2d(x, weight, strides=[1, strides, strides, 1], padding=padding, name='conv') normalized = batch_norm(convolved, output_filters) return normalized def conv2d_transpose(x, input_filters, output_filters, kernel, strides, padding='SAME'): with tf.variable_scope('conv_transpose') as scope: shape = [kernel, kernel, output_filters, input_filters] weight = tf.Variable(tf.truncated_normal(shape, stddev=0.1), name='weight') batch_size = tf.shape(x)[0] height = tf.shape(x)[1] * strides width = tf.shape(x)[2] * strides output_shape = tf.pack([batch_size, height, width, output_filters]) convolved = tf.nn.conv2d_transpose(x, weight, output_shape, strides=[1, strides, strides, 1], padding=padding, name='conv_transpose') normalized = batch_norm(convolved, output_filters) return normalized def batch_norm(x, size): batch_mean, batch_var = tf.nn.moments(x, [0, 1, 2], keep_dims=True) beta = tf.Variable(tf.zeros([size]), name='beta') scale = tf.Variable(tf.ones([size]), name='scale') epsilon = 1e-3 return tf.nn.batch_normalization(x, batch_mean, batch_var, beta, scale, epsilon, name='batch') def residual(x, filters, kernel, strides, padding='SAME'): with tf.variable_scope('residual') as scope: conv1 = conv2d(x, filters, filters, kernel, strides, padding=padding) conv2 = conv2d(tf.nn.relu(conv1), filters, filters, kernel, strides, padding=padding) residual = x + conv2 return residual def net(image): with tf.variable_scope('conv1'): conv1 = tf.nn.relu(conv2d(image, 3, 32, 9, 1)) with tf.variable_scope('conv2'): conv2 = tf.nn.relu(conv2d(conv1, 32, 64, 3, 2)) with tf.variable_scope('conv3'): conv3 = tf.nn.relu(conv2d(conv2, 64, 128, 3, 2)) with tf.variable_scope('res1'): res1 = residual(conv3, 128, 3, 1) with tf.variable_scope('res2'): res2 = residual(res1, 128, 3, 1) with tf.variable_scope('res3'): res3 = residual(res2, 128, 3, 1) with tf.variable_scope('res4'): res4 = residual(res3, 128, 3, 1) with tf.variable_scope('res5'): res5 = residual(res4, 128, 3, 1) with tf.variable_scope('deconv1'): deconv1 = tf.nn.relu(conv2d_transpose(res5, 128, 64, 3, 2)) with tf.variable_scope('deconv2'): deconv2 = tf.nn.relu(conv2d_transpose(deconv1, 64, 32, 3, 2)) with tf.variable_scope('deconv3'): deconv3 = tf.nn.tanh(conv2d_transpose(deconv2, 32, 3, 9, 1)) y = deconv3 * 127.5 return y
使用deep residual network来训练COCO数据集,能够在保证性能的前提下,训练更深的模型。 而Loss Network是有pretrained的VGG网络来计算,网络结构:
import tensorflow as tf import numpy as np import scipy.io from scipy import misc def net(data_path, input_image): layers = ( 'conv1_1', 'relu1_1', 'conv1_2', 'relu1_2', 'pool1', 'conv2_1', 'relu2_1', 'conv2_2', 'relu2_2', 'pool2', 'conv3_1', 'relu3_1', 'conv3_2', 'relu3_2', 'conv3_3', 'relu3_3', 'conv3_4', 'relu3_4', 'pool3', 'conv4_1', 'relu4_1', 'conv4_2', 'relu4_2', 'conv4_3', 'relu4_3', 'conv4_4', 'relu4_4', 'pool4', 'conv5_1', 'relu5_1', 'conv5_2', 'relu5_2', 'conv5_3', 'relu5_3', 'conv5_4', 'relu5_4' ) data = scipy.io.loadmat(data_path) mean = data['normalization'][0][0][0] mean_pixel = np.mean(mean, axis=(0, 1)) weights = data['layers'][0] net = {} current = input_image for i, name in enumerate(layers): kind = name[:4] if kind == 'conv': kernels, bias = weights[i][0][0][0][0] # matconvnet: weights are [width, height, in_channels, out_channels] # tensorflow: weights are [height, width, in_channels, out_channels] kernels = np.transpose(kernels, (1, 0, 2, 3)) bias = bias.reshape(-1) current = _conv_layer(current, kernels, bias, name=name) elif kind == 'relu': current = tf.nn.relu(current, name=name) elif kind == 'pool': current = _pool_layer(current, name=name) net[name] = current assert len(net) == len(layers) return net, mean_pixel def _conv_layer(input, weights, bias, name=None): conv = tf.nn.conv2d(input, tf.constant(weights), strides=(1, 1, 1, 1), padding='SAME', name=name) return tf.nn.bias_add(conv, bias) def _pool_layer(input, name=None): return tf.nn.max_pool(input, ksize=(1, 2, 2, 1), strides=(1, 2, 2, 1), padding='SAME', name=name) def preprocess(image, mean_pixel): return image - mean_pixel def unprocess(image, mean_pixel): return image + mean_pixel
Content Loss:
def compute_content_loss(content_layers,net): content_loss = 0 # tf.app.flags.DEFINE_string("CONTENT_LAYERS", "relu4_2", "Which VGG layer to extract content loss from") for layer in content_layers: generated_images, content_images = tf.split(0, 2, net[layer]) size = tf.size(generated_images) content_loss += tf.nn.l2_loss(generated_images - content_images) / tf.to_float(size) content_loss = content_loss / len(content_layers) return content_loss
Style Loss:
def compute_style_loss(style_features_t, style_layers,net): style_loss = 0 for style_gram, layer in zip(style_features_t, style_layers): generated_images, _ = tf.split(0, 2, net[layer]) size = tf.size(generated_images) for style_image in style_gram: style_loss += tf.nn.l2_loss(tf.reduce_sum(gram(generated_images) - style_image, 0)) / tf.to_float(size) style_loss = style_loss / len(style_layers) return style_loss
gram:
def gram(layer): shape = tf.shape(layer) num_images = shape[0] num_filters = shape[3] size = tf.size(layer) filters = tf.reshape(layer, tf.pack([num_images, -1, num_filters])) grams = tf.batch_matmul(filters, filters, adj_x=True) / tf.to_float(size / FLAGS.BATCH_SIZE) return grams
在train_fast_neural_style.py main()中,net, _ = vgg.net(FLAGS.VGG_PATH, tf.concat(0, [generated, images]))这部分有点疑问,按我的想法来说应该分别把generated、images作为input给到vgg.net,然后在后面去计算与content image 和style的loss,但是这里是直接首先把generated和原来的content images先concat by axis=0(具体可以查下tf.concat)然后在输出进行tf.split得到对应网络的输出,这个很有意思,想想CNN在做卷积的时候某个位置的值只与周围相关,共享weight之后,该层彼此之间不相关(可能在generated和image之间就边缘地方有些许pixels的影响,基本可以忽略,我的解释可能就是这样,有其他更合理的解释的小伙伴,请在本文下方留言),这个技巧感觉挺有用的,以后写相关代码的时候可以采纳。
代码图像生成效果不是很好,我怀疑是content和 style之间的weight大小的关系,还有就是可能是epoch数大小的问题,之后我会好好改下权重,看看能不能有点比较好的结果。
相关文章推荐
- TensorFlow之深入理解Fast Neural Style
- 【机器学习】Tensorflow:理解和实现快速风格化图像fast neural style
- 基于深度学习的风格迁移转换的两种实现——style-transfer和fast-neural-style-tensorflow
- fast neural style transfer图像风格迁移基于tensorflow实现
- A Hierarchical Deep Convolutional Neural Network for Fast Artistic Style Transfer论文理解
- 深入理解Android 自定义attr Style styleable以及其应用
- neural-style、chainer-fast-neuralstyle图像风格转换使用
- 深入理解TensorFlow的变量
- 由ArrayList来深入理解Java中的fail-fast机制
- TensorFlow之深入理解VGG\Residual Network
- 深入理解Android 自定义attr Style styleable以及其应用
- Android基础:关于Dialog和Activity的style的深入理解
- Fast Patch-based Style Transfer of Arbitrary Style论文理解
- Tensorflow实现Neural Style图像风格转移
- 【神经网络与深度学习】neural-style、chainer-fast-neuralstyle图像风格转换使用
- 三十六、深入理解tensorflow的session和graph
- 由ArrayList来深入理解Java中的fail-fast机制
- 深入理解Android 自定义attr Style styleable以及其应用
- (六)TensorFlow学习之旅——深入理解AlexNet
- Android:Dialog和Activity的style的深入理解及对话框透明