sparksql语法,读parquet,load,save
2016-01-08 15:58
573 查看
[hadoop@node1 spark-1.5.2-bin-hadoop2.6]$ cd examples/src/main/resources/ [hadoop@node1 resources]$ file users.parquet users.parquet: Par archive data [hadoop@node1 resources]$ strings users.parquet|more PAR1 Alyssa example.avro.User name% favorite_color% favorite_numbers array name favorite_color favorite_numbers array avro.schema {"type":"record","name":"User","namespace":"example.avro","fields":[{"name":"name","type":"string"},{"name":"favorite_color","type":["string","null"]},{"name":"favorit e_numbers","type":{"type":"array","items":"int"}}]} parquet-mr version 1.4.3 PAR1 --读取parquet,保存为parquet scala> val df = sqlContext.read.load("hdfs://node1:8020/test/input/users.parquet") df: org.apache.spark.sql.DataFrame = [name: string, favorite_color: string, favorite_numbers: array<int>] scala> df.select("name", "favorite_color").write.save("namesAndFavColors.parquet") [hadoop@node1 resources]$ hadoop fs -ls /user/hadoop/namesAndFavColors.parquet 15/12/15 10:13:58 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 4 items -rw-r--r-- 1 hadoop supergroup 0 2015-12-15 10:13 /user/hadoop/namesAndFavColors.parquet/_SUCCESS -rw-r--r-- 1 hadoop supergroup 303 2015-12-15 10:13 /user/hadoop/namesAndFavColors.parquet/_common_metadata -rw-r--r-- 1 hadoop supergroup 537 2015-12-15 10:13 /user/hadoop/namesAndFavColors.parquet/_metadata -rw-r--r-- 1 hadoop supergroup 549 2015-12-15 10:13 /user/hadoop/namesAndFavColors.parquet/part-r-00000-1523bee5-95b8-497c-b2d2-924a06eace33.gz.parquet --读取json,保存为parquet scala> val df = sqlContext.read.format("json").load("hdfs://node1:8020/test/input/people.json") df: org.apache.spark.sql.DataFrame = [age: bigint, name: string] scala> df.select("name", "age").write.format("parquet").save("namesAndAges.parquet") [hadoop@node1 ~]$ hadoop fs -ls /user/hadoop/namesAndAges.parquet Found 5 items -rw-r--r-- 1 hadoop supergroup 0 2015-12-15 10:31 /user/hadoop/namesAndAges.parquet/_SUCCESS -rw-r--r-- 1 hadoop supergroup 277 2015-12-15 10:31 /user/hadoop/namesAndAges.parquet/_common_metadata -rw-r--r-- 1 hadoop supergroup 750 2015-12-15 10:31 /user/hadoop/namesAndAges.parquet/_metadata -rw-r--r-- 1 hadoop supergroup 537 2015-12-15 10:31 /user/hadoop/namesAndAges.parquet/part-r-00000-d9c21326-ae90-437d-a952-46524e22ca2e.gz.parquet -rw-r--r-- 1 hadoop supergroup 531 2015-12-15 10:31 /user/hadoop/namesAndAges.parquet/part-r-00001-d9c21326-ae90-437d-a952-46524e22ca2e.gz.parquet
相关文章推荐
- 详解HDFS Short Circuit Local Reads
- Spark RDD API详解(一) Map和Reduce
- 使用spark和spark mllib进行股票预测
- Hadoop_2.1.0 MapReduce序列图
- 使用Hadoop搭建现代电信企业架构
- Spark随谈——开发指南(译)
- 单机版搭建Hadoop环境图文教程详解
- Spark,一种快速数据分析替代方案
- 康诺云推出三款智能硬件产品,为健康管理业务搭建数据池
- MySQL中使用innobackupex、xtrabackup进行大数据的备份和还原教程
- hadoop常见错误以及处理方法详解
- hadoop 单机安装配置教程
- hadoop的hdfs文件操作实现上传文件到hdfs
- hadoop实现grep示例分享
- php+ajax导入大数据时产生的问题处理
- C# 大数据导出word的假死报错的处理方法
- Apache Hadoop版本详解
- linux下搭建hadoop环境步骤分享
- hadoop client与datanode的通信协议分析
- hadoop中一些常用的命令介绍