您的位置:首页 > 运维架构 > Apache

Apache CarbonData快速入门指南

2017-08-25 11:20 323 查看

How to Use it?

CarbonData是由华为开发、开源并支持Apache Hadoop的列式存储文件格式,支持索引、压缩以及解编码等,
其目的是为了实现同一份数据达到多种需求,而且能够实现更快的交互查询。


 Follow the steps in CarbonData-Quick Start.

Put the *.csv file into HDFS, like:

cd carbondata
$ Create a sample.csv file using the following commands
$ put into hdfs, like: 'hdfs://presto00:9000/carbon/sample.csv'


Start spark, like:

$ ./sbin/start-master.sh
$ ./bin/spark-class org.apache.spark.deploy.worker.Worker spark://presto00:7077


Start spark-shell, like:

$ ./bin/spark-shell --jars ../carbondata-1.2.0/carbondata_2.11-1.2.0-SNAPSHOT-shade-hadoop2.7.3.jar --executor-memory 6G


Note:
--executor-memory 6G
setted for the java eap space, if the load data is not big, you can ignore it.

execute by scala, like:

$ import org.apache.spark.sql.SparkSession
$ import org.apache.spark.sql.CarbonSession._
$ val carbon = SparkSession.builder().config(sc.getConf).config(sc.getConf).getOrCreateCarbonSession("hdfs://presto00:9000//carbon/db")
$ carbon.sql("CREATE TABLE IF NOT EXISTS test(id string, name string, city string, age Int) STORED BY 'carbondata'")
$ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test options('DELIMITER'=',', 'FILEHEADER'='id,name,city,age')")


Note:

1.
/carbon/db
is the hdfs store path that tables stored.

2.
CREATE TABLE
defines the column and the type

3.
'DELIMITER'=','
or
'DELIMITER'='\t'
, to explain the separator of the data in the *.csv

4.
LOAD DATA
options rely on the header of the csv, like:

id,name,city,age

1,david,shenzhen,31

2,eason,shenzhen,27

3,jarry,wuhan,35

run:

$ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test")


1,david,shenzhen,31

2,eason,shenzhen,27

3,jarry,wuhan,35

run:

$ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test options('FILEHEADER'='id,name,city,age')")


More Usage

file like
split by '\t'
:

1 david shenzhen 31

2 eason shenzhen 27

3 jarry wuhan 35

must run:

$ carbon.sql("CREATE TABLE IF NOT EXISTS test(id string, name string, age Int) STORED BY 'carbondata'")
$ carbon.sql("LOAD DATA INPATH 'hdfs://presto00:9000/carbon/sample.csv' INTO TABLE test options('DELIMITER'='\t','FILEHEADER'='id,name,city,age')")


Note:
CREATE TABLE
do not need to contain all the column, but when
LOAD DATA
you must give all the header info, more to see in Programming Guide.

For any question, you can make comments followed.
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: