Apache CarbonData快速入门指南
2017-08-25 11:20
323 查看
How to Use it?
CarbonData是由华为开发、开源并支持Apache Hadoop的列式存储文件格式,支持索引、压缩以及解编码等, 其目的是为了实现同一份数据达到多种需求,而且能够实现更快的交互查询。
Follow the steps in CarbonData-Quick Start.
Put the *.csv file into HDFS, like:
cd carbondata $ Create a sample.csv file using the following commands $ put into hdfs, like: 'hdfs://presto00:9000/carbon/sample.csv'
Start spark, like:
$ ./sbin/start-master.sh $ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://presto00:7077
Start spark-shell, like:
$ ./bin/spark-shell --jars ../carbondata-1.2.0/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.3.jar --executor-memory 6G
Note:
--executor-memory 6Gsetted for the java eap space, if the load data is not big, you can ignore it.
execute by scala, like:
$ import org.apache.spark.sql.SparkSession $ import org.apache.spark.sql.CarbonSession._ $ val carbon = SparkSession.builder().config(sc.getConf).config(sc.getConf).getOrCreateCarbonSession("hdfs://presto00:9000//carbon/db") $ carbon.sql("CREATE TABLE IF NOT EXISTS test(id string, name string, city string, age Int) STORED BY 'carbondata'") $ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test options('DELIMITER'=',', 'FILEHEADER'='id,name,city,age')")
Note:
1.
/carbon/dbis the hdfs store path that tables stored.
2.
CREATE TABLEdefines the column and the type
3.
'DELIMITER'=','or
'DELIMITER'='\t', to explain the separator of the data in the *.csv
4.
LOAD DATAoptions rely on the header of the csv, like:
id,name,city,age
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
run:
$ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test")
1,david,shenzhen,31
2,eason,shenzhen,27
3,jarry,wuhan,35
run:
$ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test options('FILEHEADER'='id,name,city,age')")
More Usage
file likesplit by '\t':
1 david shenzhen 31
2 eason shenzhen 27
3 jarry wuhan 35
must run:
$ carbon.sql("CREATE TABLE IF NOT EXISTS test(id string, name string, age Int) STORED BY 'carbondata'") $ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test options('DELIMITER'='\t','FILEHEADER'='id,name,city,age')")
Note:
CREATE TABLEdo not need to contain all the column, but when
LOAD DATAyou must give all the header info, more to see in Programming Guide.
For any question, you can make comments followed.
相关文章推荐
- Apache Lucene 快速入门指南
- Apache CarbonData :一种为更加快速数据分析而生的新Hadoop文件版式
- Apache Spark DataFrames入门指南:创建DataFrame(2)
- Apache Spark DataFrames入门指南:操作DataFrame
- Apache Spark DataFrames入门指南:操作DataFrame
- Apache Spark DataFrames入门指南:创建DataFrame
- Apache Spark DataFrames入门指南:操作DataFrame
- ABP框架快速入门指南--3.多页Web应用
- RMAN快速入门指南
- DataBinding快速入门(还在用findViewById?)
- 在 Java 应用程序中使用 Elasticsearch: 高性能 RESTful 搜索引擎和文档存储快速入门指南
- Spark快速入门指南
- Gradle2.0用户指南翻译——第七章. Java 快速入门
- CSLA.NET快速入门系列——DataPortal实现模式
- Apache CarbonData
- Gradle用户指南(章9:Groovy快速入门)
- 关于CSS布局的核心概念的快速入门指南
- Apache Shiro 快速入门教程,shiro 基础教程
- Apache Spark 2.2.0 中文文档 - 快速入门 | ApacheCN
- 三、JPA和SpringData集成快速入门