您的位置：首页 > 其它

Phoenix二级索引那些事儿（下）

2015-11-06 15:04 411 查看

http://www.icaijing.com/hot/article4940159/

Phoenix二级索引那些事儿（下）

作者：中兴大数据| 发表时间：2015-7-30 03:31:18

索引配置

公共配置

hbase-site.xml

hbase.regionserver.wal.codec

org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec

全局索引配置

hbase-site.xml配置项

支持（HBase0.98.4+ and Phoenix 4.3.1+ only）

hbase.region.server.rpc.scheduler.factory.class

org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory

Factory to create thePhoenix RPC Scheduler that uses separate queues for index and metadataupdates

hbase.rpc.controllerfactory.class

org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory

Factory to create thePhoenix RPC Scheduler that uses separate queues for index and metadataupdates

局部索引配置

hbase-site.xml配置项

hbase.master.loadbalancer.class

org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer

hbase.coprocessor.master.classes

org.apache.phoenix.hbase.index.master.IndexMasterObserver

hbase.coprocessor.regionserver.classes

org.apache.hadoop.hbase.regionserver.LocalIndexMerger

索引配置调优

hbase-site.xml配置项

index.builder.threads.max

Default: 10

根据主表更新建立索引表更新的线程数目

调高这个值可以克服读取Region的row state的瓶颈，如果调的太高，HRegion又会遇到处理太多并发scan requests的瓶颈以及 generalthread-swapping 障碍.

index.builder.threads.keepalivetime

Default: 60

index builder线程池里的线程过期之后存活的时间

超过这个存活时间，未使用会立马被释放，核心线程是不会被保留的．如果从负载角度考虑，是可以手动去释放线程

index.writer.threads.max

Default: 10

将index update写入index table的线程数目

应该大致对应index table的数目

index.writer.threads.keepalivetime

Default: 60

类似index.builder.threads.keepalivetime

hbase.htable.threads.max

Default: 2,147,483,647

索引表可以使用的写线程最大数目

增加这个值会提高索引更新的并发量，提升全局吞吐

hbase.htable.threads.keepalivetime

Default: 60

类似index.builder.threads.keepalivetime

index.tablefactory.cache.size

Default: 10

放入缓存的索引表的数量

增加这个值，可以确保写index时不需要重新创建indexHTable，但是值越大，memory压力越大．

org.apache.phoenix.regionserver.index.priority.min

Default: 1000

Value to specify to bottom (inclusive) of therange in which index priority may lie.

org.apache.phoenix.regionserver.index.priority.max

Default: 1050

Value to specify to top (exclusive) of therange in which index priority may lie.

Higher priorites within the index min/max rangedo not means updates are processed sooner.

org.apache.phoenix.regionserver.index.handler.count

Default: 30

Number of threads to use when serving indexwrite requests for global index maintenance.

Though the actual number of threads is dictatedby the Max(number of call queues, handler count), where the number of callqueues is determined by standard HBase configuration. To further tune thequeues, you can adjust the standard rpc queue length parameters
(currently,there are no special knobs for the index queues), specificallyipc.server.max.callqueue.length and ipc.server.callqueue.handler.factor. Seethe HBase Reference Guide for more details.

其它功能

Phoenix子查询

IN和Not In的子查询

例子

SELECT ItemName

FROM Items

WHERE ItemID IN

(SELECT ItemID

FROM Orders

WHERE Date >= to_date('2013/09/02'));

Exists和Not Exists的子查询

例子

SELECT ItemName

FROM Items i

WHERE EXISTS

(SELECT *

FROM Orders

WHERE Date >= to_date('2013/09/02')

AND ItemID = i.ItemID);

半连接、反连接、join

例子

SELECTd.dept_id,e.dept_id,e.name FROM DEPT d JOIN EMPL e ON e.dept_id = d.dept_id;

JOIN支持：

INNER

LEFT OUTER

RIGHT

比较运算

例子

SELECT ID, Name

FROM Contest

WHERE Score >

(SELECT avg(Score)

FROM Contest)

ORDER BY ScoreDESC;

ANY/SOME/ALL运算

例子

SELECT OrderID

FROM Orders

WHERE quantity>= ANY

(SELECT max(quantity)

FROM Orders

GROUP BY ItemID);

相关子查询

例子

SELECT PatentID,Title

FROM Patents p

WHERE FileDate , LIMIT

SELECT title,author, isbn, description

FROM library

WHEREpublished_date > 2010

AND (title, author,isbn) > (?, ?, ?)

ORDER BY title,author, isbn

LIMIT 20

Phoenix 统计收集

统计收集有助于提升query性能。

命令：

UPDATE STATISTICSmy_table

等效于

UPDATE STATISTICSmy_table ALL

如果只收集index或者column

UPDATE STATISTICSmy_table INDEX

UPDATE STATISTICSmy_table COLUMNS

参数配置

phoenix.stats.guidepost.width默认104857600

phoenix.stats.guidepost.per.region

phoenix.stats.updateFrequency默认900000 (15 mins)

phoenix.stats.minUpdateFrequency默认7.5 mins

phoenix.stats.useCurrentTime 默认true

Phoenix 常用语法

Commands

Other Grammar

Phoenix 常用函数

Aggregate Functions

String Functions

Time and Date Functions

Numeric Functions

Array Functions

Other Functions

Phoenix 数据类型

Phoenix 使用任意时间戳

在Property里面设置属性 \"CurrentSCN\"。ts是一个long。

Properties props = new Properties();

props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts));

Connection conn = DriverManager.connect(myUrl, props);

conn.createStatement().execute(\"UPSERT INTO myTable VALUES ('a')\");

conn.commit();

相当于:

myTable.put(Bytes.toBytes('a'),ts);

Phoenix 性能提升

1、加盐：

加盐可以将数据存入多个region里，从而提升读写性能。

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SALT_BUCKETS=42

如果有16台region server，每台server有4核CPU，则SALT_BUCKETS 设置为32~64之间。即如果集群总的CPU核数为N，则SALT_BUCKETS为 0.5N ~ N 之间。

加盐后的注意事项：

sequential scan 返回的结果可能不是自然排序的，如果sequential scan使用了LIMIT语句，将与不加盐的情况不一样。

Spit point：If no split points are specified for the table, the salted table would be pre-split on salt bytes boundaries to ensure load distribution among region servers even during the initial phase of the table. If users are to provide split points manually,
users need to include a salt byte in the split points they provide.

Row Key 排序：Pre-spliting also ensures that all entries in the region server all starts with the same salt byte, and therefore are stored in a sorted manner. When doing a parallel scan across all region servers, we can take advantage of this properties to perform
a merge sort of the client side. The resulting scan would still be return sequentially as if it is from a normal table。实际上是改写了Row Key，添加了一个prefix：new_row_key = (++index % BUCKETS_NUMBER) + original_key。数据存储到 Buckects_Number 个Bucket中，每个Bucket的Prefix 相同，在query的时候，同时在各个Bucket进行。

2、split

如果不想通过加盐来分区，可以自己手动设置分区的方法。这样可以不引入额外的byte，或者改变row key的顺序，例子

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA')

3、使用多个列族

CREATE TABLE TEST (MYKEY VARCHAR NOT NULL PRIMARY KEY, A.COL1 VARCHAR, A.COL2 VARCHAR, B.COL3 VARCHAR)

4、使用压缩

CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) COMPRESSION='GZ'

5、使用二级索引

6、优化集群

7、优化phoenix 参数

Phoenix 跳过SCAN

The List for SkipScanFilter forthe above query would be [ [ [ a - b ], [ d - e ] ], [ 1, 2 ] ] where [ [ a - b], [ d - e ] ] is the range for KEY1and [ 1, 2 ] keys for KEY2.

例子

SELECT * from T

WHERE ((KEY1>='a' AND KEY1 'c' AND KEY1

中兴大数据, 微软, 例子, 中兴

内容来自用户分享和网络整理，不保证内容的准确性，如有侵权内容，可联系管理员处理

标签：

相关文章推荐

新的分享

章节导航