您的位置：首页 > 其它

GPU 加速NLP任务（Theano+CUDA）

2015-12-05 18:51 405 查看

　　之前学习了CNN的相关知识，提到Yoon Kim(2014)的论文，利用CNN进行文本分类，虽然该CNN网络结构简单效果可观，但论文没有给出具体训练时间，这便值得进一步探讨。

from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time

vlen = 10 * 30 * 768  # 10 x #cores x # threads per core
iters = 1000

rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r,))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')

View Code
　　将上述代码保存为check_GPU.py，使用以下命令进行测试，根据测试结果可知gpu能否正常使用，若出错有可能是上面路径配置问题。

$ THEANO_FLAGS=mode=FAST_RUN,device=cpu,floatX=float32 python check1.py
[Elemwise{exp,no_inplace}(<TensorType(float32, vector)>)]
Looping 1000 times took 3.06635117531 seconds
Result is [ 1.23178029  1.61879337  1.52278066 ...,  2.20771813  2.29967761
1.62323284]
Used the cpu

$ THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python check1.py
Using gpu device 0: GeForce GTX 580
[GpuElemwise{exp,no_inplace}(<CudaNdarrayType(float32, vector)>), HostFromGpu(GpuElemwise{exp,no_inplace}.0)]
Looping 1000 times took 0.638810873032 seconds
Result is [ 1.23178029  1.61879349  1.52278066 ...,  2.20771813  2.29967761
1.62323296]
Used the gpu

　　由于目前Nvidia GPU主要是针对float32位浮点数计算进行优化加速，所以需要将代码中的数据及变量类型置成float32。

　　具体对代码做如下更改：

　　（1）process_data.py

line 55, W = np.zeros(shape=(vocab_size+1, k), dtype='float32')
line 56, W[0] = np.zeros(k, dtype='float32')

　　修改后运行命令，获得每个word对应的词向量（float32）。

python process_data.py GoogleNews-vectors-negative300.bin

　　（2）conv_net_sentence.py

　　添加allow_input_downcast=True，程序中间运算过程若产生float64，会cast到float32。

lin 82, set_zero = theano.function([zero_vec_tensor], updates=[(Words, T.set_subtensor(Words[0,:], zero_vec_tensor))], allow_input_downcast=True)
lin131, val_model = theano.function([index], classifier.errors(y),
　　　　　　givens={
　　　　　　　　　　x: val_set_x[index * batch_size: (index + 1) * batch_size],
　　　　　　　　　　y: val_set_y[index * batch_size: (index + 1) * batch_size]}, allow_input_downcast=True)
lin 137, test_model = theano.function([index], classifier.errors(y),
　　　　　　givens={
　　　　　　　　　　x: train_set_x[index * batch_size: (index + 1) * batch_size],
　　　　　　　　　　y: train_set_y[index * batch_size: (index + 1) * batch_size]}, allow_input_downcast=True)

lin 141, train_model = theano.function([index], cost, updates=grad_updates,
　　　　　　givens={
　　　　　　　　　　x: train_set_x[indexbatch_size:(index+1)batch_size],
　　　　　　　　　　y: train_set_y[indexbatch_size:(index+1)batch_size]}, allow_input_downcast=True)
lin 155, test_model_all = theano.function([x,y], test_error, allow_input_downcast=True)

　　（3）运行程序

THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,warn_float64=raise python conv_net_sentence.py -static -word2vec
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,warn_float64=raise python conv_net_sentence.py -nonstatic -word2vec
THEANO_FLAGS=mode=FAST_RUN,device=gpu0,floatX=float32,warn_float64=raise python conv_net_sentence.py -nonstatic -rand

　　（4）结果惊人，训练时间提升了20x。

　　第一次跑gpu，以上过程，若有疏忽，还请多多指导。

Reference：

1、有关theano配置：http://deeplearning.net/software/theano/library/config.html

2、Ubuntu安装Theano+CUDA：http://www.linuxidc.com/Linux/2014-10/107503.htm

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航