Phoenix二级索引那些事儿(下)
2015-11-06 15:04
411 查看
http://www.icaijing.com/hot/article4940159/
Phoenix二级索引那些事儿(下)
作者:中兴大数据| 发表时间:2015-7-30 03:31:18索引配置 公共配置 hbase-site.xml hbase.regionserver.wal.codec org.apache.hadoop.hbase.regionserver.wal.IndexedWALEditCodec 全局索引配置 hbase-site.xml配置项 支持(HBase0.98.4+ and Phoenix 4.3.1+ only) hbase.region.server.rpc.scheduler.factory.class org.apache.hadoop.hbase.ipc.PhoenixRpcSchedulerFactory Factory to create thePhoenix RPC Scheduler that uses separate queues for index and metadataupdates hbase.rpc.controllerfactory.class org.apache.hadoop.hbase.ipc.controller.ServerRpcControllerFactory Factory to create thePhoenix RPC Scheduler that uses separate queues for index and metadataupdates 局部索引配置 hbase-site.xml配置项 hbase.master.loadbalancer.class org.apache.phoenix.hbase.index.balancer.IndexLoadBalancer hbase.coprocessor.master.classes org.apache.phoenix.hbase.index.master.IndexMasterObserver hbase.coprocessor.regionserver.classes org.apache.hadoop.hbase.regionserver.LocalIndexMerger 索引配置调优 hbase-site.xml配置项 index.builder.threads.max Default: 10 根据主表更新建立索引表更新的线程数目 调高这个值可以克服读取Region的row state的瓶颈,如果调的太高,HRegion又会遇到处理太多并发scan requests的瓶颈以及 generalthread-swapping 障碍. index.builder.threads.keepalivetime Default: 60 index builder线程池里的线程过期之后存活的时间 超过这个存活时间,未使用会立马被释放,核心线程是不会被保留的.如果从负载角度考虑,是可以手动去释放线程 index.writer.threads.max Default: 10 将index update写入index table的线程数目 应该大致对应index table的数目 index.writer.threads.keepalivetime Default: 60 类似index.builder.threads.keepalivetime hbase.htable.threads.max Default: 2,147,483,647 索引表可以使用的写线程最大数目 增加这个值会提高索引更新的并发量,提升全局吞吐 hbase.htable.threads.keepalivetime Default: 60 类似index.builder.threads.keepalivetime index.tablefactory.cache.size Default: 10 放入缓存的索引表的数量 增加这个值,可以确保写index时不需要重新创建indexHTable,但是值越大,memory压力越大. org.apache.phoenix.regionserver.index.priority.min Default: 1000 Value to specify to bottom (inclusive) of therange in which index priority may lie. org.apache.phoenix.regionserver.index.priority.max Default: 1050 Value to specify to top (exclusive) of therange in which index priority may lie. Higher priorites within the index min/max rangedo not means updates are processed sooner. org.apache.phoenix.regionserver.index.handler.count Default: 30 Number of threads to use when serving indexwrite requests for global index maintenance. Though the actual number of threads is dictatedby the Max(number of call queues, handler count), where the number of callqueues is determined by standard HBase configuration. To further tune thequeues, you can adjust the standard rpc queue length parameters (currently,there are no special knobs for the index queues), specificallyipc.server.max.callqueue.length and ipc.server.callqueue.handler.factor. Seethe HBase Reference Guide for more details. 其它功能 Phoenix子查询 IN和Not In的子查询 例子 SELECT ItemName FROM Items WHERE ItemID IN (SELECT ItemID FROM Orders WHERE Date >= to_date('2013/09/02')); Exists和Not Exists的子查询 例子 SELECT ItemName FROM Items i WHERE EXISTS (SELECT * FROM Orders WHERE Date >= to_date('2013/09/02') AND ItemID = i.ItemID); 半连接、反连接、join 例子 SELECTd.dept_id,e.dept_id,e.name FROM DEPT d JOIN EMPL e ON e.dept_id = d.dept_id; JOIN支持: INNER LEFT OUTER RIGHT 比较运算 例子 SELECT ID, Name FROM Contest WHERE Score > (SELECT avg(Score) FROM Contest) ORDER BY ScoreDESC; ANY/SOME/ALL运算 例子 SELECT OrderID FROM Orders WHERE quantity>= ANY (SELECT max(quantity) FROM Orders GROUP BY ItemID); 相关子查询 例子 SELECT PatentID,Title FROM Patents p WHERE FileDate , LIMIT SELECT title,author, isbn, description FROM library WHEREpublished_date > 2010 AND (title, author,isbn) > (?, ?, ?) ORDER BY title,author, isbn LIMIT 20 Phoenix 统计收集 统计收集有助于提升query性能。 命令: UPDATE STATISTICSmy_table 等效于 UPDATE STATISTICSmy_table ALL 如果只收集index或者column UPDATE STATISTICSmy_table INDEX UPDATE STATISTICSmy_table COLUMNS 参数配置 phoenix.stats.guidepost.width默认104857600 phoenix.stats.guidepost.per.region phoenix.stats.updateFrequency默认900000 (15 mins) phoenix.stats.minUpdateFrequency默认7.5 mins phoenix.stats.useCurrentTime 默认true Phoenix 常用语法 Commands Other Grammar Phoenix 常用函数 Aggregate Functions String Functions Time and Date Functions Numeric Functions Array Functions Other Functions Phoenix 数据类型 Phoenix 使用任意时间戳 在Property里面设置属性 \"CurrentSCN\"。ts是一个long。 Properties props = new Properties(); props.setProperty(PhoenixRuntime.CURRENT_SCN_ATTRIB, Long.toString(ts)); Connection conn = DriverManager.connect(myUrl, props); conn.createStatement().execute(\"UPSERT INTO myTable VALUES ('a')\"); conn.commit(); 相当于: myTable.put(Bytes.toBytes('a'),ts); Phoenix 性能提升 1、加盐: 加盐可以将数据存入多个region里,从而提升读写性能。 CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SALT_BUCKETS=42 如果有16台region server,每台server有4核CPU,则SALT_BUCKETS 设置为32~64之间。即如果集群总的CPU核数为N,则SALT_BUCKETS为 0.5N ~ N 之间。 加盐后的注意事项: sequential scan 返回的结果可能不是自然排序的,如果sequential scan使用了LIMIT语句,将与不加盐的情况不一样。 Spit point:If no split points are specified for the table, the salted table would be pre-split on salt bytes boundaries to ensure load distribution among region servers even during the initial phase of the table. If users are to provide split points manually, users need to include a salt byte in the split points they provide. Row Key 排序:Pre-spliting also ensures that all entries in the region server all starts with the same salt byte, and therefore are stored in a sorted manner. When doing a parallel scan across all region servers, we can take advantage of this properties to perform a merge sort of the client side. The resulting scan would still be return sequentially as if it is from a normal table。实际上是改写了Row Key,添加了一个prefix:new_row_key = (++index % BUCKETS_NUMBER) + original_key。数据存储到 Buckects_Number 个Bucket中,每个Bucket的Prefix 相同,在query的时候,同时在各个Bucket进行。 2、split 如果不想通过加盐来分区,可以自己手动设置分区的方法。这样可以不引入额外的byte,或者改变row key的顺序,例子 CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) SPLIT ON ('CS','EU','NA') 3、使用多个列族 CREATE TABLE TEST (MYKEY VARCHAR NOT NULL PRIMARY KEY, A.COL1 VARCHAR, A.COL2 VARCHAR, B.COL3 VARCHAR) 4、使用压缩 CREATE TABLE TEST (HOST VARCHAR NOT NULL PRIMARY KEY, DESCRIPTION VARCHAR) COMPRESSION='GZ' 5、使用二级索引 6、优化集群 7、优化phoenix 参数 Phoenix 跳过SCAN The List for SkipScanFilter forthe above query would be [ [ [ a - b ], [ d - e ] ], [ 1, 2 ] ] where [ [ a - b], [ d - e ] ] is the range for KEY1and [ 1, 2 ] keys for KEY2. 例子 SELECT * from T WHERE ((KEY1>='a' AND KEY1 'c' AND KEY1 |
相关文章推荐
- Amoeba实现读写分离
- 正则表达式规则
- Sqoop安装配置及数据导入导出
- 【JAVA学习——eclipse编译器】
- 使用docker安装部署Spark集群来训练CNN(含Python实例)
- poj 2418 Hardwood Species 字典树
- 怎么获取textview上触摸点的字符或者附近的字符?
- 大规模高并发网站架构scala方案
- 10款轻量级JavaScript代码高亮插件
- Struts2文件上传
- js实现裁剪头像上传编辑器
- php编写TCP服务端和客户端程序
- Hive动态分区
- Unity Water Shader
- ceph OSD 故障记录
- PHP parse_str()函数
- 《Java并发编程从入门到精通》显示锁Lock和ReentrantLock
- 2015/11/6用Python写游戏,pygame入门(6):控制大量的对象
- Sublime console installation instructions install Package Control
- 平台如如何实现类似windows的右键菜单