您的位置:首页 > 其它

Show and Tell Lessons learned from the 2015 MSCOCO Image Captioning Challenge论文及tensorflow源码解读(2)

2017-08-12 22:03 666 查看
Source code
build_model
build_image_embeddings

build_seq_embeddings

build_model

Set up learning rate

Set up the training ops

Run training

Source code

build_model

build_image_embeddings

在建立了图片和caption的输入后,这部分将图片转换为固定大小的tensor,就像论文提及的,使用已经用很大的数据集训练好的深度网络模型,不改变它的参数,直接用于特征提取。

首先将图片丢入inception v3网络中,得到输出,代码如下:

inception_output = image_embedding.inception_v3(
self.images,
trainable=self.train_inception,
is_training=self.is_training())


这里我们先来看一下inception v3这个模型。inception model

“Rethinking the Inception Architecture for Computer Vision”slim包中提供的inception_v3函数直接返回论文中提到的模型。

Map inception output into embedding space.

这里直接用inception的输出作为图片的特征,并且通过一个全联接层,作为embedding。

with tf.variable_scope("image_embedding") as scope:
image_embeddings = tf.contrib.layers.fully_connected(
inputs=inception_output,
num_outputs=self.config.embedding_size,
activation_fn=None,
weights_initializer=self.initializer,
biases_initializer=None,
scope=scope)


build_seq_embeddings

建立好了图片的embeddings之后,要建立文字,word的embeddings。也就是将一个个word转换为固定长度的向量。

通过embedding_map这个矩阵,和索引–self.input_seqs查询每个word对应的embedding

seq_embeddings = tf.nn.embedding_lookup(embedding_map, self.input_seqs)


build_model

首先建立lstm_cell

lstm_cell = tf.contrib.rnn.BasicLSTMCell(
num_units=self.config.num_lstm_units, state_is_tuple=True)


image的embedding作为-1时刻用来初始化lstm,之后在不断输入word的embedding得到sequence_length长度的输出,得到output

lstm_outputs, _ = tf.nn.dynamic_rnn(cell=lstm_cell,
inputs=self.seq_embeddings,
sequence_length=sequence_length,
initial_state=initial_state,
dtype=tf.float32,
scope=lstm_scope)


加一个全联接层,表示字典里每个word可能性的大小,最后在通过softmax

logits = tf.contrib.layers.fully_connected(
inputs=lstm_outputs,
num_outputs=self.config.vocab_size,
activation_fn=None,
weights_initializer=self.initializer,
scope=logits_scope)


Set up learning rate

tf.train.exponential_decay(
learning_rate,
global_step,
decay_steps=decay_steps,
decay_rate=training_config.learning_rate_decay_factor,
staircase=True)


Set up the training ops.

train_op = tf.contrib.layers.optimize_loss(
loss=model.total_loss,
global_step=model.global_step,
learning_rate=learning_rate,
optimizer=training_config.optimizer,
clip_gradients=training_config.clip_gradients,
learning_rate_decay_fn=learning_rate_decay_fn)


Run training.

tf.contrib.slim.learning.train(
train_op,
train_dir,
log_every_n_steps=FLAGS.log_every_n_steps,
graph=g,
global_step=model.global_step,
number_of_steps=FLAGS.number_of_steps,
init_fn=model.init_fn,
saver=saver)
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: 
相关文章推荐