您的位置:首页 > 编程语言 > Python开发

Python 结巴分词(2)关键字提取

2016-07-18 20:37 387 查看
提取关键字的文章是,小说完美世界的前十章;

我事先把前十章合并到了一个文件中;

然后直接调用关键字函数;

import sys
sys.path.append('../')

import jieba
import jieba.analyse
from optparse import OptionParser#引入关键词的包
from docopt import docopt
data_path = "C:\\Users\\wangyuguang\\Desktop\\work_data\\profect_world\\"
topK = 10
withWeight = False
content = ""
for i in range(1,2):
Data_path = data_path + "he"+".txt"
content ="".join(open(Data_path, 'rb').read())
# print content
tags = jieba.analyse.extract_tags(content, topK=topK, withWeight=withWeight)#直接调用

if withWeight is True:
for tag in tags:
print("tag: %s\t\t weight: %f" % (tag[0],tag[1]))
else:
print(",".join(tags))


关键字结果:

Building prefix dict from the default dictionary ...
Loading model from cache c:\users\wangyuguang\appdata\local\temp\jieba.cache
Loading model cost 0.386 seconds.
Prefix dict has been built succesfully.
小不点,孩子,族长,石云峰,石村,凶禽,青鳞鹰,凶兽,一群,石昊
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: