Show and Tell Lessons learned from the 2015 MSCOCO Image Captioning Challenge论文及tensorflow源码解读(2)
2017-08-12 22:03
666 查看
Source code
build_model
build_image_embeddings
build_seq_embeddings
build_model
Set up learning rate
Set up the training ops
Run training
首先将图片丢入inception v3网络中,得到输出,代码如下:
这里我们先来看一下inception v3这个模型。inception model
“Rethinking the Inception Architecture for Computer Vision”slim包中提供的inception_v3函数直接返回论文中提到的模型。
Map inception output into embedding space.
这里直接用inception的输出作为图片的特征,并且通过一个全联接层,作为embedding。
通过embedding_map这个矩阵,和索引–self.input_seqs查询每个word对应的embedding
image的embedding作为-1时刻用来初始化lstm,之后在不断输入word的embedding得到sequence_length长度的输出,得到output
加一个全联接层,表示字典里每个word可能性的大小,最后在通过softmax
build_model
build_image_embeddings
build_seq_embeddings
build_model
Set up learning rate
Set up the training ops
Run training
Source code
build_model
build_image_embeddings
在建立了图片和caption的输入后,这部分将图片转换为固定大小的tensor,就像论文提及的,使用已经用很大的数据集训练好的深度网络模型,不改变它的参数,直接用于特征提取。首先将图片丢入inception v3网络中,得到输出,代码如下:
inception_output = image_embedding.inception_v3( self.images, trainable=self.train_inception, is_training=self.is_training())
这里我们先来看一下inception v3这个模型。inception model
“Rethinking the Inception Architecture for Computer Vision”slim包中提供的inception_v3函数直接返回论文中提到的模型。
Map inception output into embedding space.
这里直接用inception的输出作为图片的特征,并且通过一个全联接层,作为embedding。
with tf.variable_scope("image_embedding") as scope: image_embeddings = tf.contrib.layers.fully_connected( inputs=inception_output, num_outputs=self.config.embedding_size, activation_fn=None, weights_initializer=self.initializer, biases_initializer=None, scope=scope)
build_seq_embeddings
建立好了图片的embeddings之后,要建立文字,word的embeddings。也就是将一个个word转换为固定长度的向量。通过embedding_map这个矩阵,和索引–self.input_seqs查询每个word对应的embedding
seq_embeddings = tf.nn.embedding_lookup(embedding_map, self.input_seqs)
build_model
首先建立lstm_celllstm_cell = tf.contrib.rnn.BasicLSTMCell( num_units=self.config.num_lstm_units, state_is_tuple=True)
image的embedding作为-1时刻用来初始化lstm,之后在不断输入word的embedding得到sequence_length长度的输出,得到output
lstm_outputs, _ = tf.nn.dynamic_rnn(cell=lstm_cell, inputs=self.seq_embeddings, sequence_length=sequence_length, initial_state=initial_state, dtype=tf.float32, scope=lstm_scope)
加一个全联接层,表示字典里每个word可能性的大小,最后在通过softmax
logits = tf.contrib.layers.fully_connected( inputs=lstm_outputs, num_outputs=self.config.vocab_size, activation_fn=None, weights_initializer=self.initializer, scope=logits_scope)
Set up learning rate
tf.train.exponential_decay( learning_rate, global_step, decay_steps=decay_steps, decay_rate=training_config.learning_rate_decay_factor, staircase=True)
Set up the training ops.
train_op = tf.contrib.layers.optimize_loss( loss=model.total_loss, global_step=model.global_step, learning_rate=learning_rate, optimizer=training_config.optimizer, clip_gradients=training_config.clip_gradients, learning_rate_decay_fn=learning_rate_decay_fn)
Run training.
tf.contrib.slim.learning.train( train_op, train_dir, log_every_n_steps=FLAGS.log_every_n_steps, graph=g, global_step=model.global_step, number_of_steps=FLAGS.number_of_steps, init_fn=model.init_fn, saver=saver)
相关文章推荐
- Show and Tell Lessons learned from the 2015 MSCOCO Image Captioning Challenge论文及tensorflow源码解读
- 论文笔记:Show and Tell Lessons learned from the 2015 MSCOCO Image Captioning Challenge
- Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge代码
- 【论文笔记】Show and Tell: Lesson learned from the 2015 MSCOCO Image Captioning Challenge
- 论文笔记:Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- 论文分享 - Show and Tell: A Neural Image Caption Generator
- Tensorflow实现:图像描述---Show and Tell: A Neural Image Caption Generator
- Show and Tell: A Neural Image Caption Generator 翻译
- qtcpsocket send and recieve the image from youself
- Learned lessons from the largest player (Flickr, YouTube, Google, etc)
- 论文笔记:Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
- qtcpsocket send and recieve the image from youself
- Preparing for Application and Service Deployment-Azure Lessons from the Trenches
- paper 157:文章解读--How far are we from solving the 2D & 3D Face Alignment problem?-(and a dataset of 230,000 3D facial landmarks)
- Lessons Learned from Building and Running MHN, the World's Largest Crowdsourced Honeynet
- Show and Tell: A Neural Image Caption Generator(图文转换)
- 论文PCANet: A Simple Deep Learning Baseline for Image Classification?的matlab源码解读(四)
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Send an image from the iPhone using ASIHTTP and UIImagePicker
- image captioning-Show and Tell: A Neural Image Caption Generator