您的位置:首页 > 大数据 > Hadoop

Spark将HDFS数据导入到HBase

2015-12-26 15:32 375 查看
Author: FuRenjie

本程序运行环境:Spark+HDFS+HBase+Yarn 

hbase表结构为:表名table,列族fam,列为col。

第一步:上代码 

object inputHbase:
<code class="hljs scala has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.apache.hadoop.hbase.client._
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.apache.hadoop.hbase.util.Bytes
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.apache.spark.{SparkContext, SparkConf}
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">import</span> org.apache.hadoop.hbase._

<span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
* Created by Chensy on 15-8-10.
*/</span>
<span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">object</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">inputHbase</span> {</span>
<span class="hljs-javadoc" style="color: rgb(136, 0, 0); box-sizing: border-box;">/**
* hbase table:table col-family:fam col:col
*/</span>

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">def</span> main(args: Array[String]) {
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> conf = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> SparkConf()
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> sc = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> SparkContext(conf)
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> readFile = sc.textFile(args(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>)).map(x => x.split(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">","</span>))
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> tableName = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"table"</span>

readFile.foreachPartition{
x=> {
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> myConf = HBaseConfiguration.create()
myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.zookeeper.quorum"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"172.23.27.45,172.23.27.46,172.23.27.47"</span>)
myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.zookeeper.property.clientPort"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"2181"</span>)
myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.defaults.for.version.skip"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"true"</span>)
myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.master"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"172.23.27.39:60000"</span>)
myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.cluster.distributed"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"true"</span>)
myConf.set(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hbase.rootdir"</span>,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"hdfs://cdh5-test/hbase"</span>)

<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> myTable = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> HTable(myConf,TableName.valueOf(tableName))
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//将自动提交关闭,如果不关闭,每写一条数据都会进行提交,是导入数据较慢的做主要因素。</span>
myTable.setAutoFlush(<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">false</span>,<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">false</span>)
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//设置缓存大小,当缓存大于设置值时,hbase会自动提交。此处可自己尝试大小,一般对大数据量,设置为5M即可。</span>
myTable.setWriteBufferSize(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">5</span>*<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1024</span>*<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1024</span>)

x.foreach{ y=> {
<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">val</span> p = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> Put(Bytes.toBytes(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"row"</span>+y(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>)))
p.add(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"fam"</span>.getBytes,<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"col"</span>.getBytes,Bytes.toBytes(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"value"</span>+y(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>)))
myTable.put(p)
}

}
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//每一个分片结束后都进行flushCommits(),如果不执行,当hbase最后缓存小于上面设定值时,不会进行提交,导致数据丢失。</span>
myTable.flushCommits()
}
}
System.exit(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>)
}

}

</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li><li style="box-sizing: border-box; padding: 0px 5px;">34</li><li style="box-sizing: border-box; padding: 0px 5px;">35</li><li style="box-sizing: border-box; padding: 0px 5px;">36</li><li style="box-sizing: border-box; padding: 0px 5px;">37</li><li style="box-sizing: border-box; padding: 0px 5px;">38</li><li style="box-sizing: border-box; padding: 0px 5px;">39</li><li style="box-sizing: border-box; padding: 0px 5px;">40</li><li style="box-sizing: border-box; padding: 0px 5px;">41</li><li style="box-sizing: border-box; padding: 0px 5px;">42</li><li style="box-sizing: border-box; padding: 0px 5px;">43</li><li style="box-sizing: border-box; padding: 0px 5px;">44</li><li style="box-sizing: border-box; padding: 0px 5px;">45</li><li style="box-sizing: border-box; padding: 0px 5px;">46</li><li style="box-sizing: border-box; padding: 0px 5px;">47</li><li style="box-sizing: border-box; padding: 0px 5px;">48</li><li style="box-sizing: border-box; padding: 0px 5px;">49</li><li style="box-sizing: border-box; padding: 0px 5px;">50</li><li style="box-sizing: border-box; padding: 0px 5px;">51</li><li style="box-sizing: border-box; padding: 0px 5px;">52</li><li style="box-sizing: border-box; padding: 0px 5px;">53</li><li style="box-sizing: border-box; padding: 0px 5px;">54</li><li style="box-sizing: border-box; padding: 0px 5px;">55</li><li style="box-sizing: border-box; padding: 0px 5px;">56</li></ul>


第二步:打包,并传至HDFS
<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">idea打包就不说了,inputHbase<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.jar</span>
hadoop fs -put inputHbase<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.jar</span> /xxx/spark/streaming </code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>


第三步:添加相关jars 

建个公共库,把需要用到的jar包存放一起,方便添加 



第四步:编写执行脚本:submit-yarn-inputHbase.sh
<code class="hljs haml has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">[root@JXQ-23-27-38 streaming]# vim submit-yarn-inputHbase.sh

cd $SPARK_HOME

#pwd

./bin/spark-submit --name inputHbase \
-<span class="ruby" style="box-sizing: border-box;">-<span class="hljs-class" style="box-sizing: border-box;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">class</span> <span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">com</span>.<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">wylog</span>.<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">hbase</span>.<span class="hljs-title" style="box-sizing: border-box; color: rgb(102, 0, 102);">inputHbase</span> \</span>
</span>                   -<span class="ruby" style="box-sizing: border-box;">-master yarn-cluster \
</span>                   -<span class="ruby" style="box-sizing: border-box;">-num-executors <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span> \
</span>                   -<span class="ruby" style="box-sizing: border-box;">-executor-memory <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>g  \
</span>                   -<span class="ruby" style="box-sizing: border-box;">-executor-cores <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">8</span> \
</span>                   -<span class="ruby" style="box-sizing: border-box;">-driver-memory <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>g \
</span>                   -<span class="ruby" style="box-sizing: border-box;">-driver-cores <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span> \
</span>                   -<span class="ruby" style="box-sizing: border-box;">-jars /root/spark/streaming/public_lib/hbase-client-<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">98.6</span>-cdh5.<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3.2</span>.jar,
</span><span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">/root/spark/streaming/public_lib/hbase-server-0.98.6-cdh5.3.2.jar,</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">/root/spark/streaming/public_lib/hbase-protocol-0.98.6-cdh5.3.2.jar,</span>
<span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">/root/spark/streaming/public_lib/htrace-core-2.04.jar  \</span>
hdfs://cdh5-test/xxx/spark/streaming/inputHbase.jar \
hdfs://cdh5-test/data/notify-server/172.17.88.88/notify-server2_detail.log.*</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li></ul>


参考资料: 
http://www.cloudera.com/content/cloudera/zh-CN/documentation/core/v5-3-x/topics/admin_hbase_import.html#concept_asc_ctz_wp_unique_1
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: