您的位置:首页 > 编程语言 > Python开发

[置顶] 用python实现的可以自动补全的前缀树

2016-12-23 10:59 369 查看
1,以下是代码部分

import os,sys
import json
class TrieTree:
def __init__(self,is_debug=1,is_sentence=0):
self.tree = None
self.tree = {}
self.is_debug = is_debug
self.is_sentence = is_sentence
self.prefix_list = []
def addFromFile(self,filePath):
with open(filePath) as f:
for line in f:
line_list = line.strip().strip("#").split("#")
main_word = line_list[0].strip().split()
if not self.is_sentence:
sub_word_list = [
u.replace(" ","") for u in line_list
]
else:
sub_word_list = line_list

for i,w in enumerate(main_word):
if i == 0:
target_dict = self.tree
else:
target_dict = target_dict[main_word[i-1]]
if w not in target_dict:
target_dict[w] = {}
target_dict[w]["##cnt"] = 1
target_dict[w]["##terminal"] = []
target_dict[w]["##wordTag"] = 0
else:
target_dict[w]["##cnt"] += 1
if i== len(main_word)-1:
target_dict[w]["##terminal"].extend(sub_word_list)
target_dict[w]["##wordTag"] = 1
if self.is_debug:
context = json.dumps(self.tree,indent=2,ensure_ascii=False)
print>>file("./debug.json","w"),context
def searchPrefix(self,prefix_string):
self.prefix_list = []
target_dict = self.tree
if not self.tree:
return self.prefix_list
if self.is_sentence:
prefix_string = prefix_string.strip().split(" ")
for i,w in enumerate(prefix_string):
if w not in target_dict:
return self.prefix_list
else:
target_dict = target_dict[w]
def deepSearch(target_dict):
if len(target_dict.keys())==3:
self.p
9c55
refix_list.extend(target_dict["##terminal"])
return
else:
self.prefix_list.extend(target_dict["##terminal"])
for k in target_dict.keys():
if k not in ["##terminal","##cnt","##wordTag"]:
deepSearch(target_dict[k])
deepSearch(target_dict)
return self.prefix_list

if __name__ == "__main__":
trie = TrieTree(is_debug=1,is_sentence=1)
trie.addFromFile(sys.argv[1])
while 1:
raw=raw_input("Please input:")
print trie.searchPrefix(raw)


2,以下是测试用例部分,将下面的英文句子粘贴到一个文件名字是sent.d中;

Hi, my name is Steve.#

It’s nice to meet you.#

It’s a pleasure to meet you I’m Jack.#

What do you do for a living.#

I work at a restaurant.#

I work at a bank.#

I work in a software company.#

I’m a dentist.#

What is your name.#

What was that again.#

Excuse me.#

Pardon me.#

Are you ready?#

Are you free now?#

Are you Mr. Murthy?#

Are you angry with me?#

Are you afraid of them?#

Are you tired?#

Are you married?#

Are you employed?#

Are you interested in that?#

Are you awake?#

Are you aware of that?#

Are you a relative of Mr. Mohan?#

Are you not well?#

Are they your relatives?#

Are they from abroad?#

Are the shops open?#

Are you satisfied now?#

Are you joking?#

3,测试过程

在linux shell中执行:

python trieTree.py sent.d

即可输入一个完整的单词前缀进行查询了!

** 这里你可能会有疑问,这个算法只能是按照前缀搜索,即

按照2里面的例子来看,输入Are,只能得到一Are 开头的句子,输入Are you 只能得到以Are you 开头的句子,如果我想知道 所有含有单词shops的句子呢?该如何处理,这个时候 “后缀树”就会发挥作用了,名字为后缀树,实则不然,其实是把所有句子的后缀单元都压入到一个前缀树中,例如

Are you a lucky dog?

这个句子的所有的后缀就是

Are you a lucky dog?

you lucky dog?

lucky dog?

dog?

把每个句子的所有的后缀都压入到前缀树中,那么是不是就会很方便的查询到含有某个单词的所有句子了呢?
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: