您的位置:首页 > 编程语言 > Java开发

AC多模匹配算法过滤敏感词实例

2013-05-06 19:04 525 查看
本文章摘编、转载需要注明来源 http://blog.csdn.net/shadowsick/article/details/8891939[/code] 
在应用项目中很多时候都需要用到过滤敏感词的功能,自己写个遍历明显在小数据的时候还能凑合下,但是对于大数据的时候就有点力不从心了,这里推荐使用ac多模匹配算法

先来写个应用类

/**
* AC多模匹配敏感字符工具类实现类
*
* @author shadow
* @email 124010356@qq.com
* @create 2012.04.28
*/
public class AcUtilImpl implements AcUtil {

public String contrast(String filters, String word, String regex) {

if (null == filters || "".equals(filters) || null == word
|| "".equals(word))
return "";

AhoCorasick ac = new AhoCorasick();
String[] strings = StringUtils.split(filters, regex);
for (String string : strings)
ac.add(string.getBytes(), string);
ac.prepare();
return matching(ac, word);
}

public String contrast(String[] filters, String word) {

if (null == filters || filters.length <= 0 || null == word
|| "".equals(word))
return "";

AhoCorasick ac = new AhoCorasick();
for (int i = 0, len = filters.length; i < len; i++) {
ac.add(filters[i].getBytes(), filters[i]);
}
ac.prepare();
return matching(ac, word);
}

public String contrast(List<String> filters, String word) {

if (null == filters || filters.size() <= 0 || null == word
|| "".equals(word))
return "";

AhoCorasick ac = new AhoCorasick();
for (int i = 0, len = filters.size(); i < len; i++) {
ac.add(filters.get(i).getBytes(), filters.get(i));
}
ac.prepare();
return matching(ac, word);
}

private String matching(AhoCorasick ac, String word) {
StringBuffer buffer = new StringBuffer();
Iterator<?> iterator = ac.search(word.getBytes());
while (iterator.hasNext()) {
SearchResult result = (SearchResult) iterator.next();
buffer.append(result.getOutputs()).append(",");
}
return buffer.length() > 0 ? buffer.substring(0, buffer.length() - 1)
: "";
}

public static void main(String[] args) {
String filters = "or,world,33,dd,test";
String word = "hello world, how are you!";
String regex = ",";
String result = new AcUtilImpl().contrast(filters, word, regex);
System.out.println(result);
}
}


然后运行main函数测试下,获得的结果是

[or],[world]

这个插件的性能,匹配度也灰常不错,AhoCorasick这个类自己下载放到项目里就可以了
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  算法 Word Java Web javaee