[置顶] 用python实现的可以自动补全的前缀树
2016-12-23 10:59
369 查看
1,以下是代码部分
2,以下是测试用例部分,将下面的英文句子粘贴到一个文件名字是sent.d中;
Hi, my name is Steve.#
It’s nice to meet you.#
It’s a pleasure to meet you I’m Jack.#
What do you do for a living.#
I work at a restaurant.#
I work at a bank.#
I work in a software company.#
I’m a dentist.#
What is your name.#
What was that again.#
Excuse me.#
Pardon me.#
Are you ready?#
Are you free now?#
Are you Mr. Murthy?#
Are you angry with me?#
Are you afraid of them?#
Are you tired?#
Are you married?#
Are you employed?#
Are you interested in that?#
Are you awake?#
Are you aware of that?#
Are you a relative of Mr. Mohan?#
Are you not well?#
Are they your relatives?#
Are they from abroad?#
Are the shops open?#
Are you satisfied now?#
Are you joking?#
3,测试过程
在linux shell中执行:
python trieTree.py sent.d
即可输入一个完整的单词前缀进行查询了!
** 这里你可能会有疑问,这个算法只能是按照前缀搜索,即
按照2里面的例子来看,输入Are,只能得到一Are 开头的句子,输入Are you 只能得到以Are you 开头的句子,如果我想知道 所有含有单词shops的句子呢?该如何处理,这个时候 “后缀树”就会发挥作用了,名字为后缀树,实则不然,其实是把所有句子的后缀单元都压入到一个前缀树中,例如
Are you a lucky dog?
这个句子的所有的后缀就是
Are you a lucky dog?
you lucky dog?
lucky dog?
dog?
把每个句子的所有的后缀都压入到前缀树中,那么是不是就会很方便的查询到含有某个单词的所有句子了呢?
import os,sys import json class TrieTree: def __init__(self,is_debug=1,is_sentence=0): self.tree = None self.tree = {} self.is_debug = is_debug self.is_sentence = is_sentence self.prefix_list = [] def addFromFile(self,filePath): with open(filePath) as f: for line in f: line_list = line.strip().strip("#").split("#") main_word = line_list[0].strip().split() if not self.is_sentence: sub_word_list = [ u.replace(" ","") for u in line_list ] else: sub_word_list = line_list for i,w in enumerate(main_word): if i == 0: target_dict = self.tree else: target_dict = target_dict[main_word[i-1]] if w not in target_dict: target_dict[w] = {} target_dict[w]["##cnt"] = 1 target_dict[w]["##terminal"] = [] target_dict[w]["##wordTag"] = 0 else: target_dict[w]["##cnt"] += 1 if i== len(main_word)-1: target_dict[w]["##terminal"].extend(sub_word_list) target_dict[w]["##wordTag"] = 1 if self.is_debug: context = json.dumps(self.tree,indent=2,ensure_ascii=False) print>>file("./debug.json","w"),context def searchPrefix(self,prefix_string): self.prefix_list = [] target_dict = self.tree if not self.tree: return self.prefix_list if self.is_sentence: prefix_string = prefix_string.strip().split(" ") for i,w in enumerate(prefix_string): if w not in target_dict: return self.prefix_list else: target_dict = target_dict[w] def deepSearch(target_dict): if len(target_dict.keys())==3: self.p 9c55 refix_list.extend(target_dict["##terminal"]) return else: self.prefix_list.extend(target_dict["##terminal"]) for k in target_dict.keys(): if k not in ["##terminal","##cnt","##wordTag"]: deepSearch(target_dict[k]) deepSearch(target_dict) return self.prefix_list if __name__ == "__main__": trie = TrieTree(is_debug=1,is_sentence=1) trie.addFromFile(sys.argv[1]) while 1: raw=raw_input("Please input:") print trie.searchPrefix(raw)
2,以下是测试用例部分,将下面的英文句子粘贴到一个文件名字是sent.d中;
Hi, my name is Steve.#
It’s nice to meet you.#
It’s a pleasure to meet you I’m Jack.#
What do you do for a living.#
I work at a restaurant.#
I work at a bank.#
I work in a software company.#
I’m a dentist.#
What is your name.#
What was that again.#
Excuse me.#
Pardon me.#
Are you ready?#
Are you free now?#
Are you Mr. Murthy?#
Are you angry with me?#
Are you afraid of them?#
Are you tired?#
Are you married?#
Are you employed?#
Are you interested in that?#
Are you awake?#
Are you aware of that?#
Are you a relative of Mr. Mohan?#
Are you not well?#
Are they your relatives?#
Are they from abroad?#
Are the shops open?#
Are you satisfied now?#
Are you joking?#
3,测试过程
在linux shell中执行:
python trieTree.py sent.d
即可输入一个完整的单词前缀进行查询了!
** 这里你可能会有疑问,这个算法只能是按照前缀搜索,即
按照2里面的例子来看,输入Are,只能得到一Are 开头的句子,输入Are you 只能得到以Are you 开头的句子,如果我想知道 所有含有单词shops的句子呢?该如何处理,这个时候 “后缀树”就会发挥作用了,名字为后缀树,实则不然,其实是把所有句子的后缀单元都压入到一个前缀树中,例如
Are you a lucky dog?
这个句子的所有的后缀就是
Are you a lucky dog?
you lucky dog?
lucky dog?
dog?
把每个句子的所有的后缀都压入到前缀树中,那么是不是就会很方便的查询到含有某个单词的所有句子了呢?
相关文章推荐
- python安装readline模块 实现自动补全
- Python实现Tab自动补全和历史命令管理的方法
- 让 python 命令行也可以自动补全
- Mac环境下Sublime 3 配置 Anaconda 实现python自动补全
- python安装readline模块 实现自动补全
- Python实现Tab自动补全和历史命令管理的方法
- python——使用readline库实现tab自动补全
- windows 命令符python3实现tab自动补全功能
- linux 下写python脚本实现自动补全( 我51论坛也有)
- windows下python安装readline模块 实现自动补全
- python之readline模块 实现自动补全
- FreeBSD 12 下实现 Vim8 自动补全 Python
- 第三百六十八节,Python分布式爬虫打造搜索引擎Scrapy精讲—elasticsearch(搜索引擎)用Django实现搜索的自动补全功能
- [置顶] Android EditText 通过TextWatcher实现自动补全的注意点
- Python实现Tab自动补全和历史命令管理的方法
- Python实现日志自动记录功能(一个思路代码,详细可以自己扩展)
- python、ipython的安装以及命令行自动补全功能实现
- 让 python 命令行也可以自动补全
- 可以进行自动补全的python的eclipse配置方法
- 如何实现输入字母就可以查出数据库中以该字母匹配的中文实现自动补全功能