您的位置:首页 > 运维架构

Sqoop学习笔记 --- 增量导入数据到HBase

2016-11-16 10:02 344 查看
English Version:

  

Sqoop provides an incremental import mode which can be used to retrieve only rows newer than some previously-imported set of rows.

ArgumentDescription
--check-column (col)
Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR)
--incremental (mode)
Specifies how Sqoop determines which rows are new. Legal values for 
mode
 include 
append
 and 
lastmodified
.
--last-value (value)
Specifies the maximum value of the check column from the previous import.
Sqoop supports two types of incremental imports: 
append
 and 
lastmodified
. You can use the 
--incremental
 argument
to specify the type of incremental import to perform.
You should specify 
append
 mode when importing a table where new rows are continually being added with increasing row id values. You specify the column containing
the row’s id with 
--check-column
. Sqoop imports rows where the check column has a value greater than the one specified with 
--last-value
.
An alternate table update strategy supported by Sqoop is called 
lastmodified
 mode. You should use this when rows of the source table may be updated, and each
such update will set the value of a last-modified column to the current timestamp. Rows where the check column holds a timestamp more recent than the timestamp specified with 
--last-value
 are imported.
At the end of an incremental import, the value which should be specified as 
--last-value
 for a subsequent import is printed to the screen. When running a
subsequent import, you should specify 
--last-value
 in this way to ensure you import only the new or updated data. This is handled automatically by creating an incremental import as a saved job, which
is the preferred mechanism for performing a recurring incremental import. See the section on saved jobs later in this document for more information.

翻译:==================================

翻译上述段落的意思其实不难理解,增量导入共有三个参数

第一个参数:

     --check-column (col):控制增量的变量字段,这个字段最好不要是字符串类型的。比如说是time, id 等等字段。

第二个字段:

  --incremental (mode):增加的模式选择,共有两个选择一个是 append, 一个是lastmodified.

第三个字段:

  

  --last-value (value): 根据第一个参数的变量,从哪里开始导入,例如这个参数是 --last-value 0 那么就从0开始导入。

加上其余的语句如下:

     sqoop import --connect jdbc:mysql://ip:port/db --table tablename --hbase-table namespace:tablename --column-family columnfamily --hbase-create-table -username 'username' -password 'password' --incremental append --check-column
'id' --last-value 0

    
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签: