导入数据到HBase的方式选择
2016-04-12 10:55
260 查看
Choosing the Right Import Method
If the data is already in an HBase table: To move the data from one HBase cluster to another, use snapshot and either the clone_snapshot or ExportSnapshot utility; or, use the CopyTable utility.To move the data from one HBase cluster to another without downtime on either cluster, use replication.
To migrate data between HBase version that are not wire compatible, such as from CDH 4 to CDH 5, see Importing HBase Data From CDH 4 to CDH 5.
If the data currently exists outside HBase: If possible, write the data to HFile format, and use a BulkLoad to import it into HBase. The data is immediately available to HBase and you can bypass the normal write path, increasing efficiency.
If you prefer not to use bulk loads, and you are using a tool such as Pig, you can use it to import your data.
If you need to stream live data to HBase instead of import in bulk: Write a Java client using the Java API, or use the Apache Thrift Proxy API to write a client in a language supported by Thrift.
Stream data directly into HBase using the REST Proxy API in conjunction with an HTTP client such as wget or curl.
Use Flume or Spark.
Most likely, at least one of these methods works in your situation. If not, you can use MapReduce directly. Test the most feasible methods with a subset of your data to determine which one is optimal. 摘自:http://www.cloudera.com/documentation/enterprise/5-4-x/topics/admin_hbase_import.html
相关文章推荐
- secureCRT,永久设置,保护眼睛,配色方案
- JayRock:JSON and JSON_RPC for .Net
- Linux基础命令
- android AsyncQueryHandler详解
- gerrit 配置
- ABP集合贴
- Python dict sort
- 《Spark MLlib机器学习实践》内容简介、目录
- 基于Linux内核定制X86平台的微操作系统
- ubuntu安装openggsn-0.92
- Tui-x 基础使用
- Informix 物联网应用示例(转)
- nginx rewrite 指令last break区别最详细的解释
- Mysql初始化root密码和允许远程访问
- 使用Jenkins搭建持续集成服务
- iOS的socket开发基础
- 工厂模式(1)
- Eclipse使用AmaterasUML
- Centos7 关闭防火墙
- Wamp环境下,手机端访问localhost