您的位置:首页 > 其它

spark学习-21-Spark的groupByKey

2017-08-24 13:39 435 查看
1.看代码

package groupByKey;

import java.util.Arrays;
import java.util.List;

import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.sql.SparkSession;

public class GroupByKey {
public static void main(String[] args) {

SparkSession spark= SparkSession.builder()
.appName("lcc_java_read_hbase_register_to_table")
.master("local[*]")
.getOrCreate();

JavaSparkContext sc = new JavaSparkContext(spark.sparkContext());
List<Integer> datas = Arrays.asList(1, 2, 3, 4, 5, 6, 7, 8, 9);

JavaRDD<Integer> dataRDD = sc.parallelize(datas);
JavaPairRDD<Object,Iterable<Integer>>  javapairrdd =   dataRDD.groupBy(new Function<Integer, Object>() {
@Override
public Object call(Integer v1) throws Exception {
return (v1 % 2 == 0) ? "偶数" : "奇数";
}
});

javapairrdd.collect().forEach(System.out::println);

sc.parallelize(datas)
.groupBy(new Function<Integer, Object>() {
@Override
public Object call(Integer v1) throws Exception {
return (v1 % 2 == 0) ? "偶数" : "奇数";
}
})
.collect()
.forEach(System.out::println);

List<String> datas2 = Arrays.asList("dog", "tiger", "lion", "cat", "spider", "eagle");
JavaRDD<String> dataRDD2 = sc.parallelize(datas2);

sc.parallelize(datas2)
.keyBy(v1 -> v1.length())
.groupByKey()
.collect()
.forEach(System.out::println);

JavaPairRDD<Object,Iterable<String>>  javapairrdd2 =   dataRDD2.groupBy(new Function<String, Object>() {
@Override
public Object call(String v1) throws Exception {
System.out.println("=======v1======"+v1);
if(v1.length() < 5){
return "过滤";
}else{
return v1.length();
}
}
});

javapairrdd2.collect().forEach(System.out::println);

sc.close();
}
}


运行结果如下:

(偶数,[2, 4, 6, 8])
(奇数,[1, 3, 5, 7, 9])

(偶数,[2, 4, 6, 8])
(奇数,[1, 3, 5, 7, 9])

(4,[lion])
(6,[spider])
(3,[dog, cat])
(5,[tiger, eagle])
=======v1======cat
=======v1======spider
=======v1======eagle
=======v1======dog
=======v1======tiger
=======v1======lion
(6,[spider])
(过滤,[dog, lion, cat])
(5,[tiger, eagle])


JavaPairRDD
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: