您的位置:首页 > 数据库 > Oracle

Oracle 基础篇 --- 索引选项

2015-08-11 09:41 621 查看
###4.4 选项

####4.4.1 唯一索引

确保每个索引值都是唯一的.

主键、唯一键与唯一索引的区别

一般,我们看到术语“索引”和“键”交换使用,但实际上这两个是不同的。索引是存储在数据库中的一个物理结构,键纯粹是一个逻辑概念。键代表创建来实施业务规则的完整性约束。索引和键的混淆通常是由于数据库使用索引来实施完整性约束。

主键约束、唯一索引的区别:

SQL> create table test(id int, name varchar2(20), constraint pk_id primary key(id));

Table created.

SQL> select constraint_name, constraint_type from user_constraints where table_name = 'TEST';

CONSTRAINT_NAME                C
------------------------------ -
PK_ID                          P

#在test表中,我们指定了ID列作为主键,Oracle数据库会自动创建一个同名的唯一索引:

SQL> select index_name, index_type,uniqueness from user_indexes where table_name = 'TEST';

INDEX_NAME           INDEX_TYPE           UNIQUENES
-------------------- -------------------- ---------
PK_ID                NORMAL               UNIQUE

#此时,如果我们再试图在ID列上创建一个唯一索引,Oracle会报错,因为该列上已经存在一个唯一索引:

SQL> create unique index ind_test_uk on test(id);
create unique index ind_test_uk on test(id)
*
ERROR at line 1:
ORA-01408: such column list already indexed

SQL> create index ind_test_uk on test(id);
create index ind_test_uk on test(id)
*
ERROR at line 1:
ORA-01408: such column list already indexed


唯一键约束、唯一索引的区别:

SQL> drop table test purge;

Table dropped.

SQL> create table test(
2  id int,
3  name varchar2(20),
4  constraint uk_test unique(id));

Table created.

SQL> select constraint_name, constraint_type
from user_constraints
where table_name = 'TEST';

CONSTRAINT_NAME                C
------------------------------ -
UK_TEST                        U

SQL> select index_name, index_type, uniqueness
2  from user_indexes
3  where table_name = 'TEST';

INDEX_NAME           INDEX_TYPE           UNIQUENES
-------------------- -------------------- ---------
UK_TEST              NORMAL               UNIQUE

#Oracle同样自动创建了一个同名的唯一索引,而且也不允许再在此列上创建唯一索引或非唯一索引。


主键约束要求列值非空(NOT NULL),那么唯一键约束是否也要求非空呢?

SQL> insert into test values(1, 'sally');

1 row created.

SQL> insert into test values(null, 'Tony');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from test;

ID NAME
---------- --------------------
1 sally
Tony

#从实验结果来看,
唯一键约束并没有非空要求。
唯一索引对列值非空不做要求。


键约束或者唯一键约束失效,Oracle自动创建的唯一索引是否会受到影响?

SQL> drop table test purge;

Table dropped.

create table test(
id int,
name varchar2(20),
constraint uk_test unique(id));

Table created.

SQL> select index_name, index_type, uniqueness, status from user_indexes where table_name = 'TEST';

INDEX_NAME           INDEX_TYPE           UNIQUENES STATUS
-------------------- -------------------- --------- --------
UK_TEST              NORMAL               UNIQUE    VALID

SQL> alter table test disable constraint uk_test;

Table altered.

SQL> select index_name, index_type, uniqueness, status from user_indexes where table_name = 'TEST';

no rows selected

SQL> alter table test enable constraint uk_test;

Table altered.

SQL> select index_name, index_type, uniqueness, status from user_indexes where table_name = 'TEST';

INDEX_NAME           INDEX_TYPE           UNIQUENES STATUS
-------------------- -------------------- --------- --------
UK_TEST              NORMAL               UNIQUE    VALID

#当主键约束或者唯一键约束失效时,Oracle会标记隐式创建的唯一索引为删除状态。


先创建唯一索引,再创建主键或者唯一键约束,情况又会怎样呢?

SQL> drop table test purge;

Table dropped.

SQL> create table test(
2  id int,
3  name varchar(20));

Table created.

SQL> create unique index idx_test_id on test(id);

Index created.

SQL> select index_name, index_type, uniqueness, status
2  from user_indexes
3  where table_name = 'TEST';

INDEX_NAME           INDEX_TYPE           UNIQUENES STATUS
-------------------- -------------------- --------- --------
IDX_TEST_ID          NORMAL               UNIQUE    VALID

SQL> alter table test add constraint uk_test unique(id);

Table altered.

SQL> select index_name, index_type, uniqueness, status
2  from user_indexes
3  where table_name = 'TEST';

INDEX_NAME           INDEX_TYPE           UNIQUENES STATUS
-------------------- -------------------- --------- --------
IDX_TEST_ID          NORMAL               UNIQUE    VALID

SQL> select constraint_name, constraint_type
2  from user_constraints
3  where table_name = 'TEST';

CONSTRAINT_NAME                C
------------------------------ -
UK_TEST                        U

SQL> alter table test disable constraint uk_test;

Table altered.

SQL> select constraint_name, constraint_type, status
2  from user_constraints
3  where table_name = 'TEST';

CONSTRAINT_NAME                C STATUS
------------------------------ - --------
UK_TEST                        U DISABLED

SQL> select index_name, index_type, uniqueness, status
2  from user_indexes
3  where table_name = 'TEST';

INDEX_NAME           INDEX_TYPE           UNIQUENES STATUS
-------------------- -------------------- --------- --------
IDX_TEST_ID          NORMAL               UNIQUE    VALID


####4.4.2 反向关键字索引

“反向关键字索引”会按相反顺序存储索引值的字节。这可以减少索引中特定热点的活
动量。如果许多用户正按同一顺序处理数据,那么在任何给定时刻,关键字值的前缀部分
(当前正在处理的)是非常接近的值。因此,在索引结构的该区域中会发生大量的活动。
为反向字节样式的关键字值建立索引后,反向关键字索引可在索引结构中分散这些活动。

REVERSE索引也是一种B树索引,但它物理上将按照列顺序保存的每个索引键值进行了反转。例如,索引键是20,用16进制存储这个标准B树索引键的两个字节是C1,15,那么反向索引存储的字节就是15,C1。

反向索引主要解决的是叶子块的争用问题。在RAC中,这个问题更加明显,可能多实例反复修改同一个块。举个例子,在一张按照主键顺序存储的表中,一个实例增加记录20,另一个增加21,这两个值的键存储于同一个索引叶子块的左右两侧。

在反向索引中,插入操作会被反序字节排列分发到索引的全部叶子键中。就像上面的例子,20和21两个键,在标准键索引中,他们应该是相邻的,但在反向索引中,他们会被分开存储。因此按顺序键插入操作的IO会更加平均。

因为索引上的数据不是按照列存储的顺序,反向索引会禁止一些案例中可以用到的索引范围扫描。例如,如果一个用户查询ID值大于20的记录,那么数据库不能从包含这个ID的块开始查找,而是从所有的叶子块。

这种索引的设计目的是消除插入操作的索引热点。对于插入的性能提升有帮助,但也是有限的,因为数据库不能使用索引范围扫描了。

Sometimes, using a reverse-key index can make an OLTP Oracle Real Application Clusters application faster. For example, keeping the index of mail messages in an e-mail application: some users keep old messages, and the index must maintain pointers to these as well as to the most recent.

SQL> create table t(
2  a number,
3  b varchar(10),
4  c date );

Table created.

SQL> begin
2  for i int 1..1000 loop
3  insert into t values(i, 'Test', sysdate);
4  end loop;
5  commit;
6  end;
7  /

SQL> create index ind_t_rev on t(a,b,c) reverse;

Index created.

SQL> select index_name, index_type from user_indexes where table_name = 'T';

INDEX_NAME                     INDEX_TYPE
------------------------------ ---------------------------
IND_T_REV                      NORMAL/REV

#Using the reverse key arrangement eliminates the ability to run an index range scanning query on the index.
Because lexically adjacent keys are not stored next to each other in a reverse-key index,
only fetch-by-key or full-index (table) scans can be performed.

SQL> select * from t where a = 1000;

------------------------------------------------------------------------------
| Id  | Operation        | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT |           |     1 |    29 |     2   (0)| 00:00:01 |
|*  1 |  INDEX RANGE SCAN| IND_T_REV |     1 |    29 |     2   (0)| 00:00:01 |
------------------------------------------------------------------------------

SQL> select count(*) from t where trunc(c) = to_date('24-JUL-15','DD-MM-YY');

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |     1 |     9 |     3   (0)| 00:00:01 |
|   1 |  SORT AGGREGATE    |      |     1 |     9 |            |          |
|*  2 |   TABLE ACCESS FULL| T    |  1000 |  9000 |     3   (0)| 00:00:01 |
---------------------------------------------------------------------------

#取消reverse
SQL> alter index ind_t_rev rebuild noreverse;

Index altered.

SQL> select index_name, index_type from user_indexes where table_name = 'T';

INDEX_NAME                     INDEX_TYPE
------------------------------ ---------------------------
IND_T_REV                      NORMAL


####4.4.3 组合索引

在Oracle中可以创建组合索引,即同时包含两个或两个以上列的索引。

当使用基于规则的优化器(RBO)时,只有当组合索引的前导列出现在SQL语句的where子句中时,才会使用到该索引;

在使用Oracle9i之前的基于成本的优化器(CBO)时, 只有当组合索引的前导列出现在SQL语句的where子句中时,才可能会使用到该索引,这取决于优化器计算的使用索引的成本和使用全表扫描的成本,Oracle会自动选择成本低的访问路径(请见下面的测试1和测试2);

从Oracle9i起,Oracle引入了一种新的索引扫描方式——索引跳跃扫描(index skip scan),这种扫描方式只有基于成本的优化器(CBO)才能使用。这样,当SQL语句的where子句中即使没有组合索引的前导列,并且索引跳跃扫描的成本低于其他扫描方式的成本时,Oracle就会使用该方式扫描组合索引(请见下面的测试3);

Oracle优化器有时会做出错误的选择,因为它再“聪明”,也不如我们SQL语句编写人员更清楚表中数据的分布,在这种情况下,通过使用提示(hint),我们可以帮助Oracle优化器作出更好的选择(请见下面的测试4)。

创建测试表T

#T表创建
SQL> create table t as select * from all_objects;

Table created.

#数据分布
SQL> select object_type, count(*) from t group by object_type;

OBJECT_TYPE           COUNT(*)
------------------- ----------
EDITION                      1
INDEX PARTITION            512
TABLE SUBPARTITION          32
CONSUMER GROUP               2
SEQUENCE                   245
SYNONYM                  27889
JOB                         15
......

SQL> select count(*) from t;

COUNT(*)
----------
74051

#创建复合索引
SQL> create index indx_t on t(object_type,object_name);

Index created.

SQL> select INDEX_NAME, INDEX_type from user_indexes where table_name = 'T';

INDEX_NAME      INDEX_TYPE
--------------- ---------------
INDX_T          NORMAL

SQL> select index_name, table_name, column_name from user_ind_columns where TABLE_NAME = 'T';

INDEX_NAME      TABLE_NAME      COLUMN_NAME
--------------- --------------- ------------------------------
INDX_T          T               OBJECT_TYPE
INDX_T          T               OBJECT_NAME

SQL> analyze table t compute statistics
2  for table
3  for all indexes
4  for all indexed columns
5  /

Table analyzed.


测试一:使用了组合索引的前导列并且访问了表中的少量记录

SQL> set autotrace traceonly

SQL> select * from t where object_type='JOB';

15 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 723869532

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |    15 |  1500 |    14   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |    15 |  1500 |    14   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | INDX_T |    15 |       |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_TYPE"='JOB')

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
16  consistent gets
0  physical reads
0  redo size
2980  bytes sent via SQL*Net to client
524  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
15  rows processed
#正如我们所期望的,由于使用了组合索引的前导列并且访问了表中的少量记录,Oracle明智地选择了索引扫描。


测试二:使用了组合索引的前导列,是由于访问了表中的大量数据

SQL> select * from t where object_type='SYNONYM';

27889 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      | 27889 |  2723K|   297   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    | 27889 |  2723K|   297   (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OBJECT_TYPE"='SYNONYM')

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
2894  consistent gets
0  physical reads
0  redo size
1381826  bytes sent via SQL*Net to client
20973  bytes received via SQL*Net from client
1861  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
27889  rows processed

# 很明显,即使使用了组合索引的前导列,但是由于访问了表中的大量数据,Oracle选择了不使用索引而直接使用全表扫描,因为优化器认为全表扫描的成本更低,但事实是不是真的这样的?我们通过增加提示(hint)来强制它使用索引来看看:

SQL> select /*+ index(T indx_t) */ * from t where object_type = 'SYNONYM';

27889 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 723869532

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        | 27889 |  2723K| 20012   (1)| 00:04:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      | 27889 |  2723K| 20012   (1)| 00:04:01 |
|*  2 |   INDEX RANGE SCAN          | INDX_T | 27889 |       |   173   (0)| 00:00:03 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_TYPE"='SYNONYM')

Statistics
----------------------------------------------------------
0  recursive calls
0  db block gets
24661  consistent gets
0  physical reads
0  redo size
3236139  bytes sent via SQL*Net to client
20973  bytes received via SQL*Net from client
1861  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
27889  rows processed
#从以上结果可以看出,在访问大量数据的情况下,使用索引确实会导致更高的执行成本,这从statistics部分的逻辑读取数(consistent gets)就可以看出,使用索引导致的逻辑读取数是不使用索引导致的逻辑读的10倍还多。因此,Oracle明智地选择了全表扫描而不是索引扫描。


测试三: where子句中没有索引前导列的情况

SQL> select * from t where object_name = 'DBA_TAB_COLS';

Execution Plan
----------------------------------------------------------
Plan hash value: 2722864248

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     2 |   200 |    43   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |     2 |   200 |    43   (0)| 00:00:01 |
|*  2 |   INDEX SKIP SCAN           | INDX_T |     2 |       |    41   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_NAME"='DBA_TAB_COLS')
filter("OBJECT_NAME"='DBA_TAB_COLS')

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
35  consistent gets
0  physical reads
0  redo size
1753  bytes sent via SQL*Net to client
524  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2  rows processed

由于只查询了2条数据,即使没有使用前导列,Oracle正确地选择了索引跳跃扫描。我们再来看看如果不使用索引跳跃扫描,该语句的成本:

SQL> select /*+ no_index(t indx_t)*/ * from t where object_name = 'DBA_TAB_COLS';

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     2 |   200 |   296   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    |     2 |   200 |   296   (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OBJECT_NAME"='DBA_TAB_COLS')

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
1060  consistent gets
0  physical reads
0  redo size
1747  bytes sent via SQL*Net to client
524  bytes received via SQL*Net from client
2  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
2  rows processed


测试四:不选择使用索引的情况

SQL> select * from t where object_name like 'DE%';

101 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |   267 | 26700 |   296   (1)| 00:00:04 |
|*  1 |  TABLE ACCESS FULL| T    |   267 | 26700 |   296   (1)| 00:00:04 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

1 - filter("OBJECT_NAME" LIKE 'DE%')

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
1065  consistent gets
0  physical reads
0  redo size
8012  bytes sent via SQL*Net to client
590  bytes received via SQL*Net from client
8  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
101  rows processed

#这次只选择了101条数据,跟表T中总的数据量74051条相比,显然只是很小的一部分,但是Oracle还是选择了全表扫描,有1065个逻辑读。这种情况下,如果我们强制使用索引,情况会怎样呢?

SQL> select /*+ index(t indx_t)*/ * from t where object_name like 'DE%';

101 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 2722864248

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   267 | 26700 |   455   (0)| 00:00:06 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T      |   267 | 26700 |   455   (0)| 00:00:06 |
|*  2 |   INDEX SKIP SCAN           | INDX_T |   267 |       |   265   (0)| 00:00:04 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - access("OBJECT_NAME" LIKE 'DE%')
filter("OBJECT_NAME" LIKE 'DE%')

Statistics
----------------------------------------------------------
1  recursive calls
0  db block gets
119  consistent gets
0  physical reads
0  redo size
11862  bytes sent via SQL*Net to client
590  bytes received via SQL*Net from client
8  SQL*Net roundtrips to/from client
0  sorts (memory)
0  sorts (disk)
101  rows processed

#通过添加提示(hint),我们强制Oracle使用了索引扫描(INDEX SKIP SCAN),执行了119个逻辑读,比使用全表扫描的时候少。

#由此可见,Oracle优化器有时会做出错误的选择,因为它再“聪明”,也不如我们SQL语句编写人员更清楚表中数据的分布,在这种情况下,通过使用提示(hint),我们可以帮助Oracle优化器作出更好的选择。


####4.4.4 基于函数的索引

使用Oracle函数索引,无疑是提高查询效率的有效方法之一。谈到任何对列的操作都可能导致全表扫描,例如:

SQL> select employee_id, first_name from employees where substr(first_name,1,2) = 'Sa';

-------------------------------------------------------------------------------
| Id  | Operation	  | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	      |     1 |    11 |     3	(0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPLOYEES |     1 |    11 |     3	(0)| 00:00:01 |
-------------------------------------------------------------------------------


但是这种查询在客服系统又经常使用,我们可以创建一个带有substr函数的基于Oracle函数索引,

create index emp_fname_substr on employees(substr(first_name, 1, 2));

SQL> select index_name, index_type from user_indexes where table_name = 'EMPLOYEES';

INDEX_NAME		       INDEX_TYPE
------------------------------ ---------------------------
EMP_LAST_NAME_IDX	       NORMAL
EMP_PHONE_IX		       NORMAL
EMP_FNAME_SUBSTR	       FUNCTION-BASED NORMAL

SQL> select employee_id, first_name from employees where substr(first_name,1,2) = 'Sa';

-------------------------------------------------------------------------------------------
| Id  | Operation		    | Name	       | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT	    |		       |     1 |    14 |     2	 (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMPLOYEES |     1 |    14 |     2	 (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN | EMP_FNAME_SUBSTR |     1 |       |     1	 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------


这样在执行上面的查询语句时,这个基于函数的索引将排上用场,执行计划将是(INDEX RANGE SCAN)。
上面的例子中,我们创建了基于函数的索引,但是如果执行下面的查询:

SQL> select employee_id, first_name from employees where substr(first_name,1,1) = 'S';

13 rows selected.

Execution Plan
----------------------------------------------------------
Plan hash value: 1445457117

-------------------------------------------------------------------------------
| Id  | Operation	  | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |	      |     1 |    11 |     3	(0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMPLOYEES |     1 |    11 |     3	(0)| 00:00:01 |
-------------------------------------------------------------------------------


得到的执行计划将还是(TABLE ACCESS FULL),因为只有当数据列能够等式匹配时,基于函数的索引才能生效,这样对于这种索引的计划和维护的要求都很高。请注意,向表中添加索引是非常危险的操作,因为这将导致许多查询执行计划的变更。然而,如果我们使用基于函数的索引就不会产生这样的问题,因为Oracle只有在查询使用了匹配的内置函数时才会使用这种类型的索引。

####4.4.5 压缩索引

oracle 索引压缩(key compression)是oracle 9i 中引入的一项新特性。该特性可以压缩索引或者索引组织表中的重复键值,从而节省存储空间。非分区的unique 索引和non-unique(至少两列)索引都能够被压缩。bitmap 索引不能够进行压缩。

在oracle 索引压缩中有几个比较纠结的术语,需要说明一下。索引压缩是通过将索引中的键值拆分成两部分实现的,也就是grouping piece 也称作prefix 和 unique piece 也称作suffix 。grouping piece 是用来压缩的被unique piece 共享的部分。如果键值不能提供unique piece,那么oracle 将会使用rowid 来唯一标识。只有B-tree 索引的叶子节点能够被压缩,分支节点不能够被压缩。索引压缩是在单个block 中完成的,不能够跨blocks进行索引压缩。grouping piece (prefix) 和 unique piece (suffix) 存储在同一个索引 block 中。

具体prefix 和 suffix 是怎么划分的呢?默认prefix 长度等于索引列的数量减去1。当然我们可以人为控制prefix 的长度,非唯一索引的最大prefix 长度等于索引列的数量。唯一索引的最大prefix 长度等于索引列的数量减去1。比如,假设索引有三个列:
默认的时候:prefix (column1,column2) suffix (column3)
如果有以下几组键值(1,2,3),(1,2,4),(1,2,7),(1,3,5),(1,3,4),(1,4,4) 那么在prefix中重复的(1,2),(1,3) 将会被压缩至保留一份。

索引压缩适合于那些键值重复率高的索引,这样才能够达到压缩键值,节省存储空间目的。索引压缩以后一个索引块可以存放更多的键值,这样当进行full index scan,full fast index scan 的时候IO性能会更好,但是CPU的负载会增加,至于总体的性能就要看IO性能的提高和CPU负载增加那个是主要方面了。我不认为索引压缩性能总是提高的,更多的意义在于节省存储空间,减少IO时间。

SQL> create table objects1 as select object_id, object_name from dba_objects;

SQL> create table objects2 as select 100 object_id, object_name from dba_objects;

SQL> create table objects3 as select object_id, object_name from dba_objects;

SQL> create index objects1_idx on objects1(object_id) compress 1;

Index created.

SQL> create index objects2_inx on objects2(object_id) compress 1;

Index created.

SQL> create index objects3_inx on objects3(object_id);

Index created.

SQL> select index_name, compression, leaf_blocks
from user_indexes
where index_name in ('OBJECTS1_IDX','OBJECTS2_INX','OBJECTS3_INX');

INDEX_NAME                     COMPRESS LEAF_BLOCKS
------------------------------ -------- -----------
OBJECTS1_IDX                   ENABLED          230
OBJECTS2_INX                   ENABLED          116
OBJECTS3_INX                   DISABLED         167

SQL> select object_id,object_name from objects1 where object_id = 100;

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |     1 |    29 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| OBJECTS1     |     1 |    29 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | OBJECTS1_IDX |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

SQL> select object_id,object_name from objects2 where object_id = 100;

------------------------------------------------------------------------------
| Id  | Operation         | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |          | 75203 |  1982K|    98   (2)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| OBJECTS2 | 75203 |  1982K|    98   (2)| 00:00:02 |
------------------------------------------------------------------------------

SQL> select object_id,object_name from objects3 where object_id = 100;

--------------------------------------------------------------------------------------------
| Id  | Operation                   | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |              |     1 |    29 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| OBJECTS3     |     1 |    29 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | OBJECTS3_INX |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------


我们可以看到对于objects1 和 objects3 因为object_id 都是唯一的,所以没有压缩的空间,压缩以后索引反而占用了更大的空间,还不如不压缩。而objects2 中 object_id 都是重复的压缩效果明显。

除了创建的时候进行索引压缩,还可以在rebuild index 的时候指定索引压缩和解压缩。

SQL> alter index objects1_idx rebuild nocompress;
Index altered.
SQL> alter index objects1_idx rebuild compress;
Index altered.


注:压缩也是会引入存储开销的,只是很多时候压缩节省的空间比压缩需要的存储开销更大,所以压缩以后整体的存储开销减小了。
compress 后面接的数字表示的是prefix 的深度,也就是需要用来压缩的columns 的数量。

####4.4.6 顺序索引

The DESC keyword on the CREATE INDEX statement is no longer ignored. It specifies that the index should be created in descending order. Indexes on character data are created in descending order of the character values in the database character set. Neither this, nor the ASC keyword, may be specified for a domain index. DESC cannot be specified for a bitmapped index.:

# would benefit from an index like this:
CREATE INDEX c_id_desc ON Citites(city_id DESC)
SELECT * FROM Cities ORDER BY city_id DESC

# would benefit from an index like this:
CREATE INDEX f_miles_desc on Flights(miles DESC)
SELECT MAX(miles) FROM Flight

# would benefit from an index like this:
CREATE INDEX arrival_time_desc ON Flights(dest_airport, arrive_time DESC)
SELECT * FROM Flights WHERE dest_airport = 'LAX'
ORDER BY ARRIVAL DESC

SQL> create table t_objects as
2  select object_name, object_id, created, owner
3  from all_objects;

SQL> select count(*) from t_objects;

COUNT(*)
----------
74101

#创建升序索引
SQL> create index t_idx_1 on t_objects (object_name, owner);   ---the usual index.

SQL> select index_name, index_type from user_indexes where table_name = 'T_OBJECTS';

INDEX_NAME           INDEX_TYPE
-------------------- ---------------
T_IDX_1              NORMAL

SQL> select index_name, table_name, column_name, descend from user_ind_columns where index_name = 'T_IDX_1';

INDEX_NAME           TABLE_NAME           COLUMN_NAME          DESC
-------------------- -------------------- -------------------- ----
T_IDX_1              T_OBJECTS            OBJECT_NAME          ASC
T_IDX_1              T_OBJECTS            OWNER                ASC

#the database does not use descending indexes until you first analyze the index and the table on which the index is defined
SQL> select * from t_objects
2  where object_name between 'Y' and 'Z'
3  order by object_name asc, owner asc;

--------------------------------------------------------------------------------
| Id  | Operation          | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |           |  1004 | 43172 |   141   (2)| 00:00:02 |
|   1 |  SORT ORDER BY     |           |  1004 | 43172 |   141   (2)| 00:00:02 |
|*  2 |   TABLE ACCESS FULL| T_OBJECTS |  1004 | 43172 |   140   (1)| 00:00:02 |
--------------------------------------------------------------------------------

SQL> analyze table t_objects
2  compute statistics
3  for all columns
4  for all indexes;

Table analyzed.

SQL> select * from t_objects
where object_name between 'Y' and 'Z'
order by object_name asc, owner asc;

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |    82 |  3280 |    43   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T_OBJECTS |    82 |  3280 |    43   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_IDX_1   |    82 |       |     3   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------

SQL> select * from t_objects
where object_name between 'Y' and 'Z'
order by object_name desc, owner desc;

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    82 |  3280 |    43   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T_OBJECTS |    82 |  3280 |    43   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN DESCENDING| T_IDX_1   |    82 |       |     3   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

#创建降序索引
SQL> create index t_inx_1 on t_objects(object_name desc, owner desc);

Index created.

SQL> select index_name, table_name, column_name, descend from user_ind_columns where index_name = 'T_INX_1';

INDEX_NAME           TABLE_NAME           COLUMN_NAME          DESC
-------------------- -------------------- -------------------- ----
T_INX_1              T_OBJECTS            SYS_NC00005$         DESC
T_INX_1              T_OBJECTS            SYS_NC00006$         DESC

SQL> select * from t_objects
where object_name between 'Y' and 'Z'
order by object_name asc, owner asc;

------------------------------------------------------------------------------------------
| Id  | Operation                    | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |           |    82 |  3280 |    29   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID | T_OBJECTS |    82 |  3280 |    29   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN DESCENDING| T_INX_1   |    47 |       |     5   (0)| 00:00:01 |
------------------------------------------------------------------------------------------

SQL> select * from t_objects
where object_name between 'Y' and 'Z'
order by object_name desc, owner desc;

-----------------------------------------------------------------------------------------
| Id  | Operation                   | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |           |    82 |  3280 |    29   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T_OBJECTS |    82 |  3280 |    29   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | T_INX_1   |    47 |       |     5   (0)| 00:00:01 |
-----------------------------------------------------------------------------------------
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Oracle 索引