您的位置:首页 > 编程语言 > Java开发

【算法】计算一篇文章的单词数(C、Java语言实现)

2016-03-29 08:34 519 查看
1. C语言:一个字符一个字符的读取

(有空再贴出来)

2.Java语言:按行读取,并用正则分割成多个单词,再用MapReduce并行计算单词数 (我使用的是ieda,有些地方跟eclipse有点区别)

/**
* 按流读取文件 (通过read.readLine()获取一行)
* @param path
* @return
* @throws FileNotFoundException
*/
public BufferedReader openFile(final String path) throws FileNotFoundException {
BufferedReader reader = new BufferedReader(new FileReader(path));

return reader;
}


/**
* 采用Hash计算单词数
* @param line
* @return
*/
public void hash(final HashMap<String, Integer> hashMap, final String line) {
// 不能分割b2c,it's这类单词
String[] words = line.split("[^a-z]+");

for (String word : words) {
// 去除空格、空行
if (word.length() > 0) {
if (hashMap.containsKey(word) == false) {
hashMap.put(word, 1);
}
}
}
}


/**
* 计算单词个数
* @param hashMap
* @return
*/
public Integer computeWordCount(final HashMap<String, Integer> hashMap) {
return hashMap.size();
}


测试用例:

public static void main(String args[]) throws IOException {
String path = Paths.get(PROJECT_ROOT_DIR, "src/main/resources/articles/test.txt").toString();
BufferedReader reader = openFile(path);

HashMap<String, Integer> hashMap = new HashMap<>();
String line;
int wordCount;

while((line = reader.readLine()) != null) {
hash(hashMap, line);
}

wordCount = computeWordCount(hashMap);
System.out.println(wordCount);
}
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: