hadoop(四): 本地 hbase 集群配置 Azure Blob Storage
2016-09-05 21:19
489 查看
基于 HDP2.4安装(五):集群及组件安装 创建的hadoop集群,修改默认配置,将hbase 存储配置为 Azure Blob Storage
目录:
简述
配置
验证
FAQ
简述:
hadoop-azure 提供hadoop 与 azure blob storage 集成支持,需要部署 hadoop-azure.jar 程序包,在HDP2.4 安装包中已默认提供,如下图:
配置成功后,读写的数据都存储在 Azure Blob Storage account
支持配置多个 Azure Blob Storage account, 实现了标准的 Hadoop FileSystem interface
Reference file system paths using URLs using the wasb scheme.
Tested on both Linux and Windows. Tested at scale.
Azure Blob Storage 包含三部分内容:
Storage Account: All access is done through a storage account
Container: A container is a grouping of multiple blobs. A storage account may have multiple containers. In Hadoop, an entire file system hierarchy is stored in a single container. It is also possible to configure multiple containers, effectively presenting multiple file systems that can be referenced using distinct URLs.
Blob: A file of any type and size. In Hadoop, files are stored in blobs. The internal implementation also uses blobs to persist the file system hierarchy and other metadata
配置 :
在 china Azure 门户(https://manage.windowsazure.cn) 创建一个 blob storage Account, 如下图命名:localhbase
配置访问 Azure blob storage 访问证书及key以及切换文件系统配置,本地 hadoop core-site.xml 文件,内容如下
在大多数场景下Hadoop clusters, the core-site.xml file is world-readable,为了安全起见,可通过配置将Key加密,然后通过配置的程序对key进行解密,此场景下的配置如下(基于安全考虑的可选配置):
Azure Blob Storage interface for Hadoop supports two kinds of blobs, block blobs and page blobs;Block blobs are the default kind of blob and are good for most big-data use cases, like input data for Hive, Pig, analytical map-reduce jobs etc
Page blob handling in hadoop-azure was introduced to support HBase log files. Page blobs can be written any number of times, whereas block blobs can only be appended to 50,000 times before you run out of blocks and your writes will fail,That won’t work for HBase logs, so page blob support was introduced to overcome this limitation
Page blobs can be up to 1TB in size, larger than the maximum 200GB size for block blobs
In order to have the files you create be page blobs, you must set the configuration variable fs.azure.page.blob.dir to a comma-separated list of folder names
验证:
上面的参数配置均在 ambari 中完成,重启参数依赖的服务
命令: hdfs dfs -ls /hbase/data/default 如下图, 没有数据
参见 HBase(三): Azure HDInsigt HBase表数据导入本地HBase 将测试表数据导入,完成后如下图:
命令:./hbase hbck -repair -ignorePreCheckPermission
命令: hbase shell
查看数据,如下图,则OK
用我们自己开发的查询工具验证数据,如下图,关于工具的开发见下一章
参考资料: https://hadoop.apache.org/docs/current/hadoop-azure/index.html
FAQ
ambari collector不要与regionserver一台机器
配置ha一定要在更改数据目录到wasb之前
hadoop core-site.xml增加以下配置,否则mapreduce2组件会起不来,(注意impl为小写)
本地自建集群,配置HA,修改集群的FS为 wasb, 然后将原hbase集群物理文件目录copy至新建的blob storage, 此时,在使用phoenix插入带有索引的表数据时出错,修改hbase-site.xml配置如下:
目录:
简述
配置
验证
FAQ
简述:
hadoop-azure 提供hadoop 与 azure blob storage 集成支持,需要部署 hadoop-azure.jar 程序包,在HDP2.4 安装包中已默认提供,如下图:
配置成功后,读写的数据都存储在 Azure Blob Storage account
支持配置多个 Azure Blob Storage account, 实现了标准的 Hadoop FileSystem interface
Reference file system paths using URLs using the wasb scheme.
Tested on both Linux and Windows. Tested at scale.
Azure Blob Storage 包含三部分内容:
Storage Account: All access is done through a storage account
Container: A container is a grouping of multiple blobs. A storage account may have multiple containers. In Hadoop, an entire file system hierarchy is stored in a single container. It is also possible to configure multiple containers, effectively presenting multiple file systems that can be referenced using distinct URLs.
Blob: A file of any type and size. In Hadoop, files are stored in blobs. The internal implementation also uses blobs to persist the file system hierarchy and other metadata
配置 :
在 china Azure 门户(https://manage.windowsazure.cn) 创建一个 blob storage Account, 如下图命名:localhbase
配置访问 Azure blob storage 访问证书及key以及切换文件系统配置,本地 hadoop core-site.xml 文件,内容如下
<property> <name>fs.defaultFS</name> <value>wasb://localhbase@localhbase.blob.core.chinacloudapi.cn</value> </property> <property> <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name> <value>YOUR ACCESS KEY</value> </property>
在大多数场景下Hadoop clusters, the core-site.xml file is world-readable,为了安全起见,可通过配置将Key加密,然后通过配置的程序对key进行解密,此场景下的配置如下(基于安全考虑的可选配置):
<property> <name>fs.azure.account.keyprovider.localhbase.blob.core.chinacloudapi.cn</name> <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value> </property> <property> <name>fs.azure.account.key.localhbase.blob.core.chinacloudapi.cn</name> <value>YOUR ENCRYPTED ACCESS KEY</value> </property> <property> <name>fs.azure.shellkeyprovider.script</name> <value>PATH TO DECRYPTION PROGRAM</value> </property>
Azure Blob Storage interface for Hadoop supports two kinds of blobs, block blobs and page blobs;Block blobs are the default kind of blob and are good for most big-data use cases, like input data for Hive, Pig, analytical map-reduce jobs etc
Page blob handling in hadoop-azure was introduced to support HBase log files. Page blobs can be written any number of times, whereas block blobs can only be appended to 50,000 times before you run out of blocks and your writes will fail,That won’t work for HBase logs, so page blob support was introduced to overcome this limitation
Page blobs can be up to 1TB in size, larger than the maximum 200GB size for block blobs
In order to have the files you create be page blobs, you must set the configuration variable fs.azure.page.blob.dir to a comma-separated list of folder names
<property> <name>fs.azure.page.blob.dir</name> <value>/hbase/WALs,/hbase/oldWALs,/mapreducestaging,/hbase/MasterProcWALs,/atshistory,/tezstaging,/ams/hbase</value> </property>
验证:
上面的参数配置均在 ambari 中完成,重启参数依赖的服务
命令: hdfs dfs -ls /hbase/data/default 如下图, 没有数据
参见 HBase(三): Azure HDInsigt HBase表数据导入本地HBase 将测试表数据导入,完成后如下图:
命令:./hbase hbck -repair -ignorePreCheckPermission
命令: hbase shell
查看数据,如下图,则OK
用我们自己开发的查询工具验证数据,如下图,关于工具的开发见下一章
参考资料: https://hadoop.apache.org/docs/current/hadoop-azure/index.html
FAQ
ambari collector不要与regionserver一台机器
配置ha一定要在更改数据目录到wasb之前
hadoop core-site.xml增加以下配置,否则mapreduce2组件会起不来,(注意impl为小写)
<property> <name>fs.AbstractFileSystem.wasb.impl</name> <value>org.apache.hadoop.fs.azure.Wasb</value> </property>
本地自建集群,配置HA,修改集群的FS为 wasb, 然后将原hbase集群物理文件目录copy至新建的blob storage, 此时,在使用phoenix插入带有索引的表数据时出错,修改hbase-site.xml配置如下:
<property> <name>hbase.regionserver.wal.codec</name> <value>org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec</value> </property>
相关文章推荐
- [hadoop] 基于Hadoop集群的HBase集群的配置
- Ganglia监控Hadoop及Hbase集群性能(安装配置)
- Ganglia监控Hadoop及Hbase集群性能(安装配置)
- 基于集群的Hadoop HBase安装与配置
- Hbase完全分布式集群安装配置(Hbase1.0.0,Hadoop2.6.0)
- 基于Hadoop集群的HBase集群的配置【2】
- 第十二章 Ganglia监控Hadoop及Hbase集群性能(安装配置)
- Hbase完全分布式集群安装配置(Hbase1.0.0,Hadoop2.6.0)
- CentOS下安装与配置Ganglia监控Hadoop集群及HBase
- Hadoop学习笔记(十四)---Hbase集群安装及配置
- 配置密码分布式集群环境hadoop、hbase、zookeeper搭建(全)
- hadoop hbase 整合本地安装配置
- HBase入门笔记(三)-- 完全分布模式Hadoop集群安装配置
- 基于Hadoop集群的HBase集群的配置
- 集群环境下配置hadoop1.0,zookeeper,hbase
- Hadoop+ZooKeeper+HBase集群配置
- 20131010配置hadoop集群遇到namespaceID、storage ID、防火墙的问题
- 第三章:Hadoop简介及配置Hadoop-1.2.1,hbase-0.94.13集群
- [推荐]Hadoop+HBase+Zookeeper集群的配置
- hadoop2.610集群配置(包含HA和Hbase )