spark统计文献中每个英文单词出现的次数
2016-12-15 10:19
483 查看
实例英文文档
统计程序:统计文档中每个单词出现的次数
执行结果:
result=[(tune,1), (play,1), (string,1), (younger,,1), (He,2), (not,1), (country,1), (few,1), (heard,1), (small,1), (players,1), (town.,1), (if,1), (it.,1), (B,14), (a,5), (BB,2), (was,4), (b,1), (one,1), (A,17), (When,1), (could,2), (our,1), (best,1), (,1),
(he,4), (in,1), (member,1), (music,,1), (self-taught,1), (of,2), (music,1), (father,1), (times,,1), (mandolin,1), (read,1), (C,11), (player.,1), (My,1), (but,1), (instrument,1), (D,11), (the,1)]
My father was a self-taught mandolin player. He was one of the best string instrument players in our town. He could not read music, but if he heard a tune a few times, he could play it. When he was younger, he was a member of a small country music b A A A A A A A A A A A A A A A A A B B B B B B B B BB B B BB B B B B C C C C C C C C C C C D D D D D D D D D D D
统计程序:统计文档中每个单词出现的次数
/** * Created by hbin on 2016/12/9. */ import java.util.Arrays; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaPairRDD; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; import org.apache.spark.api.java.function.*; import scala.Boolean; import scala.Tuple2; /** * spark对数据的核心抽象 RDD(弹性分布式数据集) * RDD就是分布式的元素集合,在spark中对数据的所有操作不外乎创建RDD * 转化已有RDD以及调用RDD操作进行求值,spark会自动将RDD中的数据分发到集群上, * 并将操作并行化 */ public class BasicMap { public static void main(String[] args) throws Exception { SparkConf sparkConf = new SparkConf().setAppName("JavaSparkPi"); JavaSparkContext jsc = new JavaSparkContext(sparkConf); JavaRDD<String> input=jsc.textFile("E:\\sparkProject\\log.txt"); //flatMap 将行数据切分为单词 JavaRDD<String> words=input.flatMap(new FlatMapFunction<String, String>() { @Override public Iterable<String> call(String s) throws Exception { return Arrays.asList(s.split(" ")); } }); JavaPairRDD<String,Integer> result=words.mapToPair(new PairFunction<String, String, Integer>() { @Override public Tuple2<String, Integer> call(String s) throws Exception { return new Tuple2(s,1); } }).reduceByKey(new Function2<Integer, Integer, Integer>() {//合并具有相同键的值 @Override public Integer call(Integer a, Integer b) throws Exception { return a+b;//键相同,则对应的值相加 } }); System.out.println("result="+result.collect()); } }
执行结果:
result=[(tune,1), (play,1), (string,1), (younger,,1), (He,2), (not,1), (country,1), (few,1), (heard,1), (small,1), (players,1), (town.,1), (if,1), (it.,1), (B,14), (a,5), (BB,2), (was,4), (b,1), (one,1), (A,17), (When,1), (could,2), (our,1), (best,1), (,1),
(he,4), (in,1), (member,1), (music,,1), (self-taught,1), (of,2), (music,1), (father,1), (times,,1), (mandolin,1), (read,1), (C,11), (player.,1), (My,1), (but,1), (instrument,1), (D,11), (the,1)]
相关文章推荐
- 统计一篇英文中每个单词出现的次数
- java实现读取一篇英文文章,统计其中每个单词出现的次数并排序输出
- 一个利用map统计一段英文文章中每个单词出现次数的小程序
- 统计一段英文每个单词出现的次数
- 黑马程序员——统计文件中每个英文单词出现的次数
- Java 读取一段英文文档统计每个单词出现的次数和单词的总数
- Java 读取一段英文文档统计每个单词出现的次数和单词的总数
- 统计一篇英文文章内每个单词出现频率,并返回出现频率最高的前10个单词及其出现次数
- 用array_count_values统计一篇英文文档中每个单词的出现次数,结果用表格展示出来
- JAVA-统计英文句子中出现次数最多的单词和出现的次数
- Java统计一篇文章中出现次数最多的汉字或英文单词 又出现次数的统计
- 统计英文中每个字母出现的次数
- 【代码】统计文件中,不同字符出现的次数(可排序,没有处理英文单词、数字和大小写)
- java 读取键盘输入到txt文件,统计每个单词出现的次数并输出
- 统计每个单词在输入中出现的次数
- 统计文件中每个单词出现的次数
- 统计字符串中每个单词出现的次数 for C++
- HashMap分拣存储1:统计每个单词出现的次数
- 统计一篇英文文件中,单词出现的次数,并按单词的长度进行排序