您的位置:首页 > 其它

Word2vec多线程(tensorflow)

2015-12-16 20:17 441 查看
workers = []

for _ in xrange(opts.concurrent_steps):

t = threading.Thread(target=self._train_thread_body)

t.start()

workers.append(t)

 
 

 
 

Word2vec.py使用了多线程

一般认为python多线程其实是单线程
由于python的设计 GPL 内存不是现成安全的

但是这里由于内部是调用c++代码
所以还是能起到多线程作用

 
 

而 Word2vec的 skipgramoperator内部类设计
解决多线程访问冲突问题用的是锁

mutex mu_;

random::PhiloxRandom philox_ GUARDED_BY(mu_);

random::SimplePhilox rng_ GUARDED_BY(mu_);

int32 current_epoch_ GUARDED_BY(mu_) = -1;

int64 total_words_processed_ GUARDED_BY(mu_) = 0;

int32 example_pos_ GUARDED_BY(mu_);

int32 label_pos_ GUARDED_BY(mu_);

int32 label_limit_ GUARDED_BY(mu_)

 
 

觉得operator的操作还是单线程并行执行的
由于锁

后面的batch计算是并行的

def _train_thread_body(self):

initial_epoch, = self._session.run([self._epoch])

while True:

_, epoch = self._session.run([self._train, self._epoch])

if epoch != initial_epoch:

break

 
 

(words, counts, words_per_epoch, self._epoch, self._words, examples,

labels) = word2vec.skipgram(filename=opts.train_data,

batch_size=opts.batch_size,

window_size=opts.window_size,

min_count=opts.min_count,

subsample=opts.subsample

 
 

 
 

 
 

The threading lock only affects Python code. If your thread is waiting for disk I/O or if it is calling C functions (e.g. via math library) you can ignore the GIL.

You may be able to use the async pattern to get around threading limits. Can you supply more information about what your program actually does?

I have issues with the technical accuracy of the video linked. David Beazley has done many well respected talks about the GIL at various Pycons. You can find them on pyvideo.org.

 
 

来自 <https://www.reddit.com/r/Python/comments/3s0vg9/is_my_multithreaded_python_program_doomed/>

 
 

 
 
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: