您的位置:首页 > 数据库 > Oracle

Big Table and Small Table Join strategy in Oracle

2013-07-22 17:45 459 查看
Theoptimizerusesnestedloopjoinswhenjoiningsmallnumberofrows,withagooddrivingconditionbetweenthetwotables.Youdrivefromtheouterlooptotheinnerloop,sotheorderoftablesintheexecutionplanisimportant.

Theouterloopisthedrivingrowsource.Itproducesasetofrowsfordrivingthejoincondition.Therowsourcecanbeatableaccessedusinganindexscanorafulltablescan.Also,therowscanbeproducedfromanyotheroperation.Forexample,theoutput
fromanestedloopjoincanbeusedasarowsourceforanothernestedloopjoin.

Theinnerloopisiteratedforeveryrowreturnedfromtheouterloop,ideallybyanindexscan.Iftheaccesspathfortheinnerloopisnotdependentontheouterloop,thenyoucanendupwithaCartesianproduct;foreveryiterationoftheouterloop,the
innerloopproducesthesamesetofrows.Therefore,youshoulduseotherjoinmethodswhentwoindependentrowsourcesarejoinedtogether.

Thefollowingnoworkloadsystemstatisticswillbeused:

SELECT

 PNAME,

 PVAL1

FROM

 SYS.AUX_STATS$

ORDERBY

 PNAME;

 

PNAME               PVAL1

-------------------------

CPUSPEED

CPUSPEEDNW     2116.57559

DSTART

DSTOP

FLAGS                   1

IOSEEKTIM              10

IOTFRSPEED           4096

MAXTHR

MBRC

MREADTIM

SLAVETHR

SREADTIM

STATUS 


Firstthetables. We willstartsimple,withonetable(T3)having100rowsandanothertable(T4)having10rows:

CREATETABLET3(

 C1NUMBER,

 C2NUMBER,

 C3NUMBER,

 C4VARCHAR2(20),

 PADDINGVARCHAR2(200));

 

CREATETABLET4(

 C1NUMBER,

 C2NUMBER,

 C3NUMBER,

 C4VARCHAR2(20),

 PADDINGVARCHAR2(200));

 

INSERTINTO

 T3

SELECT

 ROWNUMC1,

 1000000-ROWNUMC2,

 MOD(ROWNUM-1,1000)C3,

 TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY')C4,

 LPAD('',200,'A')PADDING

FROM

 DUAL

CONNECTBY

 LEVEL<=100;

 

INSERTINTO

 T4

SELECT

 ROWNUMC1,

 1000000-ROWNUMC2,

 MOD(ROWNUM-1,1000)C3,

 TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY')C4,

 LPAD('',200,'A')PADDING

FROM

 DUAL

CONNECTBY

 LEVEL<=10;

 

COMMIT;

 

CREATEINDEXIND_T3_C1ONT3(C1);

CREATEINDEXIND_T3_C2ONT3(C2);

CREATEINDEXIND_T3_C3ONT3(C3);

 

CREATEINDEXIND_T4_C1ONT4(C1);

CREATEINDEXIND_T4_C2ONT4(C2);

CREATEINDEXIND_T4_C3ONT4(C3);

 

EXECDBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T3',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)

EXECDBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T4',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL) 


Weareabletoeasilyproduceanexamplewherethe“smallest”tableisselectedasthedrivingtable(notethatIhadtoaddahint tospecify anestedloopjoin inseveraloftheseexamples):

SETAUTOTRACETRACEONLYEXPLAIN

 

SELECT/*+USE_NL(T3T4)*/

 T3.C1,

 T3.C2,

 T3.C3,

 T3.C4,

 T4.C1,

 T4.C2,

 T4.C3,

 T4.C4

FROM

 T3,

 T4

WHERE

 T3.C1=T4.C1;

 

ExecutionPlan

----------------------------------------------------------

Planhashvalue:567778651

 

------------------------------------------------------------------------------------------

|Id |Operation                   |Name     |Rows |Bytes|Cost(%CPU)|Time    |

------------------------------------------------------------------------------------------

|  0|SELECTSTATEMENT            |          |   10|  420|   13  (0)|00:00:01|

|  1| NESTEDLOOPS               |          |      |      |           |         |

|  2|  NESTEDLOOPS              |          |   10|  420|   13  (0)|00:00:01|

|  3|   TABLEACCESSFULL        |T4       |   10|  210|    3  (0)|00:00:01|

|* 4|   INDEXRANGESCAN         |IND_T3_C1|    1|      |    0  (0)|00:00:01|

|  5|  TABLEACCESSBYINDEXROWID|T3       |    1|   21|    1  (0)|00:00:01|

------------------------------------------------------------------------------------------

 

PredicateInformation(identifiedbyoperationid):

---------------------------------------------------

  4-access("T3"."C1"="T4"."C1") 


Ifwestopatthatpoint,wecoulddeclarequitesimplythattheoptimizerselectsthesmallertableasthedrivingtable. Butwaitaminute,takealookatthisexamplewheretheoptimizerselectedthelargesttableasthedrivingtable:

SELECT/*+USE_NL(T3T4)*/

 T3.C1,

 T3.C2,

 T3.C3,

 T3.C4,

 T4.C1,

 T4.C2,

 T4.C3,

 T4.C4

FROM

 T3,

 T4

WHERE

 T3.C1=T4.C1

 ANDT3.C2=1;

 

ExecutionPlan

----------------------------------------------------------

Planhashvalue:4214127300

 

-------------------------------------------------------------------------------------------

|Id |Operation                    |Name     |Rows |Bytes|Cost(%CPU)|Time    |

-------------------------------------------------------------------------------------------

|  0|SELECTSTATEMENT             |          |    1|   42|    3  (0)|00:00:01|

|  1| NESTEDLOOPS                |          |      |      |           |         |

|  2|  NESTEDLOOPS               |          |    1|   42|    3  (0)|00:00:01|

|  3|   TABLEACCESSBYINDEXROWID|T3       |    1|   21|    2  (0)|00:00:01|

|* 4|    INDEXRANGESCAN         |IND_T3_C2|    1|      |    1  (0)|00:00:01|

|* 5|   INDEXRANGESCAN          |IND_T4_C1|    1|      |    0  (0)|00:00:01|

|  6|  TABLEACCESSBYINDEXROWID|T4       |    1|   21|    1  (0)|00:00:01|

-------------------------------------------------------------------------------------------

 

PredicateInformation(identifiedbyoperationid):

---------------------------------------------------

  4-access("T3"."C2"=1)

  5-access("T3"."C1"="T4"."C1") 


Theaboveexecutionplansweregeneratedon11.2.0.2,whichsometimesdiffersabitfromolderOracleDatabasereleaseversionswhennestedloopsjoinsareused(notethetwonestedloopsjoins), howeverweareabletohinttheoptimizertogeneratetheolder
stylenestedloopsjoin:

SELECT/*+USE_NL(T3T4)OPTIMIZER_FEATURES_ENABLE('10.2.0.4')*/

 T3.C1,

 T3.C2,

 T3.C3,

 T3.C4,

 T4.C1,

 T4.C2,

 T4.C3,

 T4.C4

FROM

 T3,

 T4

WHERE

 T3.C1=T4.C1;

 

ExecutionPlan

----------------------------------------------------------

Planhashvalue:2465588182

 

-----------------------------------------------------------------------------------------

|Id |Operation                  |Name     |Rows |Bytes|Cost(%CPU)|Time    |

-----------------------------------------------------------------------------------------

|  0|SELECTSTATEMENT           |          |   10|  420|   13  (0)|00:00:01|

|  1| TABLEACCESSBYINDEXROWID|T3       |    1|   21|    1  (0)|00:00:01|

|  2|  NESTEDLOOPS             |          |   10|  420|   13  (0)|00:00:01|

|  3|   TABLEACCESSFULL       |T4       |   10|  210|    3  (0)|00:00:01|

|* 4|   INDEXRANGESCAN        |IND_T3_C1|    1|      |    0  (0)|00:00:01|

-----------------------------------------------------------------------------------------

 

PredicateInformation(identifiedbyoperationid):

---------------------------------------------------

  4-access("T3"."C1"="T4"."C1") 

SELECT/*+USE_NL(T3T4)OPTIMIZER_FEATURES_ENABLE('10.2.0.4')*/

 T3.C1,

 T3.C2,

 T3.C3,

 T3.C4,

 T4.C1,

 T4.C2,

 T4.C3,

 T4.C4

FROM

 T3,

 T4

WHERE

 T3.C1=T4.C1

 ANDT3.C2=1;

 

ExecutionPlan

----------------------------------------------------------

Planhashvalue:3446668716

 

-------------------------------------------------------------------------------------------

|Id |Operation                    |Name     |Rows |Bytes|Cost(%CPU)|Time    |

-------------------------------------------------------------------------------------------

|  0|SELECTSTATEMENT             |          |    1|   42|    3  (0)|00:00:01|

|  1| TABLEACCESSBYINDEXROWID |T4       |    1|   21|    1  (0)|00:00:01|

|  2|  NESTEDLOOPS               |          |    1|   42|    3  (0)|00:00:01|

|  3|   TABLEACCESSBYINDEXROWID|T3       |    1|   21|    2  (0)|00:00:01|

|* 4|    INDEXRANGESCAN         |IND_T3_C2|    1|      |    1  (0)|00:00:01|

|* 5|   INDEXRANGESCAN          |IND_T4_C1|    1|      |    0  (0)|00:00:01|

-------------------------------------------------------------------------------------------

 

PredicateInformation(identifiedbyoperationid):

---------------------------------------------------

  4-access("T3"."C2"=1)

  5-access("T3"."C1"="T4"."C1") 


Wefoundonecasewherethelargertablewasselectedasthedrivingtable,sothebooksandarticlesthatsimplystateabsolutelythatthesmallesttable willbe thedrivingtablearenotcompletelycorrect. Maybethelargertableisonlyselectedasthe
drivingtablewhenbothtablesaresmall? Let’stestthattheorybycreatingacoupleofmoretables:

SETAUTOTRACEOFF

 

CREATETABLET1(

 C1NUMBER,

 C2NUMBER,

 C3NUMBER,

 C4VARCHAR2(20),

 PADDINGVARCHAR2(200));

 

CREATETABLET2(

 C1NUMBER,

 C2NUMBER,

 C3NUMBER,

 C4VARCHAR2(20),

 PADDINGVARCHAR2(200));

 

INSERTINTO

 T1

SELECT

 ROWNUMC1,

 1000000-ROWNUMC2,

 MOD(ROWNUM-1,1000)C3,

 TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY')C4,

 LPAD('',200,'A')PADDING

FROM

 DUAL

CONNECTBY

 LEVEL<=1000000;

 

INSERTINTO

 T2

SELECT

 ROWNUMC1,

 1000000-ROWNUMC2,

 MOD(ROWNUM-1,1000)C3,

 TO_CHAR(SYSDATE+MOD(ROWNUM-1,10000),'DAY')C4,

 LPAD('',200,'A')PADDING

FROM

 DUAL

CONNECTBY

 LEVEL<=100000;

 

COMMIT;

 

CREATEINDEXIND_T1_C1ONT1(C1);

CREATEINDEXIND_T1_C2ONT1(C2);

CREATEINDEXIND_T1_C3ONT1(C3);

 

CREATEINDEXIND_T2_C1ONT2(C1);

CREATEINDEXIND_T2_C2ONT2(C2);

CREATEINDEXIND_T2_C3ONT2(C3);

 

EXECDBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T1',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL)

EXECDBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'T2',CASCADE=>TRUE,ESTIMATE_PERCENT=>NULL) 


TheabovescriptcreatedtableT1with1,000,000rowsandtableT2with100,000rows. Wewillnowusequeriesthataresimilartothosethatwereusedwiththe100and10rowtables.

Thesmallertable(T2)asthedrivingtable:

SETAUTOTRACETRACEONLYEXPLAIN

 

SELECT/*+USE_NL(T1T2)*/

 T1.C1,

 T1.C2,

 T1.C3,

 T1.C4,

 T2.C1,

 T2.C2,

 T2.C3,

 T2.C4

FROM

 T1,

 T2

WHERE

 T1.C1=T2.C1;

 

ExecutionPlan

----------------------------------------------------------

Planhashvalue:2610346857

 

------------------------------------------------------------------------------------------

|Id |Operation                   |Name     |Rows |Bytes|Cost(%CPU)|Time    |

------------------------------------------------------------------------------------------

|  0|SELECTSTATEMENT            |          |  100K| 4687K|  300K (1)|01:00:12|

|  1| NESTEDLOOPS               |          |      |      |           |         |

|  2|  NESTEDLOOPS              |          |  100K| 4687K|  300K (1)|01:00:12|

|  3|   TABLEACCESSFULL        |T2       |  100K| 2343K|  889  (1)|00:00:11|

|* 4|   INDEXRANGESCAN         |IND_T1_C1|    1|      |    2  (0)|00:00:01|

|  5|  TABLEACCESSBYINDEXROWID|T1       |    1|   24|    3  (0)|00:00:01|

------------------------------------------------------------------------------------------

 

PredicateInformation(identifiedbyoperationid):

---------------------------------------------------

  4-access("T1"."C1"="T2"."C1") 


Thelargertableasthedrivingtable:

SELECT/*+USE_NL(T1T2)*/

 T1.C1,

 T1.C2,

 T1.C3,

 T1.C4,

 T2.C1,

 T2.C2,

 T2.C3,

 T2.C4

FROM

 T1,

 T2

WHERE

 T1.C1=T2.C1

 ANDT1.C2BETWEEN1AND10000;

 

ExecutionPlan

----------------------------------------------------------

Planhashvalue:2331401024

 

-------------------------------------------------------------------------------------------

|Id |Operation                    |Name     |Rows |Bytes|Cost(%CPU)|Time    |

-------------------------------------------------------------------------------------------

|  0|SELECTSTATEMENT             |          |10001|  468K|11353  (1)|00:02:17|

|  1| NESTEDLOOPS                |          |      |      |           |         |

|  2|  NESTEDLOOPS               |          |10001|  468K|11353  (1)|00:02:17|

|  3|   TABLEACCESSBYINDEXROWID|T1       |10001|  234K|  348  (0)|00:00:05|

|* 4|    INDEXRANGESCAN         |IND_T1_C2|10001|      |   25  (0)|00:00:01|

|* 5|   INDEXRANGESCAN          |IND_T2_C1|    1|      |    1  (0)|00:00:01|

|  6|  TABLEACCESSBYINDEXROWID|T2       |    1|   24|    2  (0)|00:00:01|

-------------------------------------------------------------------------------------------

 

PredicateInformation(identifiedbyoperationid):

---------------------------------------------------

  4-access("T1"."C2">=1AND"T1"."C2"<=10000)

  5-access("T1"."C1"="T2"."C1") 


So,whatishappening? Isitsimplythecasethatitistheexpectednumberofrowsthatwillbereturnedfromeachtablethatdetermineswhichtablewillbethedrivingtable? Let’stest:

SETAUTOTRACEOFF

 

SELECT

 COUNT(*)

FROM

 T1

WHERE

 T1.C1BETWEEN890000AND1000000;

 

 COUNT(*)

----------

   110001

 

SELECT

 COUNT(*)

FROM

 T2

WHERE

 T2.C2BETWEEN900000AND1000000;

 

 COUNT(*)

----------

   100000 


Theaboveshowsthatifwespecify T1.C1BETWEEN890000AND1000000 intheWHEREclausetherewillbe110,001rowsfromthelargertablethatmatchthecriteria.Ifwespecify T2.C2
BETWEEN900000AND1000000
 intheWHEREclausetherewillbe100,000rowsfromthesmallertablethatmatchthecriteria.Ifweexecutethefollowingquery,whichtablewillbethedrivingtable,the10timeslargerT1tablewhereweareretrieving110,001
rowsorthesmallerT2tablewhereweareretrieving100,000rows?

SETAUTOTRACETRACEONLYEXPLAIN

 

SELECT/*+USE_NL(T1T2)*/

 T1.C1,

 T1.C2,

 T1.C3,

 T1.C4,

 T2.C1,

 T2.C2,

 T2.C3,

 T2.C4

FROM

 T1,

 T2

WHERE

 T1.C3=T2.C3

 ANDT1.C1BETWEEN890000AND1000000

 ANDT2.C2BETWEEN900000AND1000000; 


ThisistheresultthatIreceived,whichseemstodemonstratethatitisnotjustthesizeofthetables,norisitthenumberofexpectedrowstobereturnedfromthetables,thatdetermineswhichtablewillbethedrivingtable:

-------------------------------------------------------------------------------------------

|Id |Operation                    |Name     |Rows |Bytes|Cost(%CPU)|Time    |

-------------------------------------------------------------------------------------------

|  0|SELECTSTATEMENT             |          |   11M|  503M|   11M (1)|37:03:27|

|  1| NESTEDLOOPS                |          |      |      |           |         |

|  2|  NESTEDLOOPS               |          |   11M|  503M|   11M (1)|37:03:27|

|  3|   TABLEACCESSBYINDEXROWID|T1       |  110K| 2578K| 3799  (1)|00:00:46|

|* 4|    INDEXRANGESCAN         |IND_T1_C1|  110K|      |  248  (1)|00:00:03|

|* 5|   INDEXRANGESCAN          |IND_T2_C3|  100|      |    1  (0)|00:00:01|

|* 6|  TABLEACCESSBYINDEXROWID|T2       |  100| 2400|  101  (0)|00:00:02|

-------------------------------------------------------------------------------------------

 

PredicateInformation(identifiedbyoperationid):

---------------------------------------------------

  4-access("T1"."C1">=890000AND"T1"."C1"<=1000000)

  5-access("T1"."C3"="T2"."C3")

  6-filter("T2"."C2">=900000AND"T2"."C2"<=1000000) 


110,001rowsfromT1isstillsomewhatcloseinnumbertothe100,000rowsfromT2,solet’stryanexperimentselecting992,701rowsfromT1:

SELECT/*+USE_NL(T1T2)*/

 T1.C1,

 T1.C2,

 T1.C3,

 T1.C4,

 T2.C1,

 T2.C2,

 T2.C3,

 T2.C4

FROM

 T1,

 T2

WHERE

 T1.C3=T2.C3

 ANDT1.C1BETWEEN7300AND1000000

 ANDT2.C2BETWEEN900000AND1000000;

 

ExecutionPlan

----------------------------------------------------------

Planhashvalue:3718770616

 

------------------------------------------------------------------------------------------

|Id |Operation                   |Name     |Rows |Bytes|Cost(%CPU)|Time    |

------------------------------------------------------------------------------------------

|  0|SELECTSTATEMENT            |          |   99M| 4544M|  100M (1)|334:20:23|

|  1| NESTEDLOOPS               |          |      |      |           |         |

|  2|  NESTEDLOOPS              |          |   99M| 4544M|  100M (1)|334:20:23|

|* 3|   TABLEACCESSFULL        |T1       |  992K|   22M| 8835  (1)|00:01:47|

|* 4|   INDEXRANGESCAN         |IND_T2_C3|  100|      |    1  (0)|00:00:01|

|* 5|  TABLEACCESSBYINDEXROWID|T2       |  100| 2400|  101  (0)|00:00:02|

------------------------------------------------------------------------------------------

 

PredicateInformation(identifiedbyoperationid):

---------------------------------------------------

  3-filter("T1"."C1">=7300AND"T1"."C1"<=1000000)

  4-access("T1"."C3"="T2"."C3")

  5-filter("T2"."C2">=900000AND"T2"."C2"<=1000000) 


Asshownabove,tableT1isstillthedrivingtableinthenestedloopsjoin. Let’stestretrieving993,001rowsfromT1:

SELECT/*+USE_NL(T1T2)*/

 T1.C1,

 T1.C2,

 T1.C3,

 T1.C4,

 T2.C1,

 T2.C2,

 T2.C3,

 T2.C4

FROM

 T1,

 T2

WHERE

 T1.C3=T2.C3

 ANDT1.C1BETWEEN7000AND1000000

 ANDT2.C2BETWEEN900000AND1000000;

 

------------------------------------------------------------------------------------------

|Id |Operation                   |Name     |Rows |Bytes|Cost(%CPU)|Time    |

------------------------------------------------------------------------------------------

|  0|SELECTSTATEMENT            |          |   99M| 4545M|  100M (1)|334:26:13|

|  1| NESTEDLOOPS               |          |      |      |           |         |

|  2|  NESTEDLOOPS              |          |   99M| 4545M|  100M (1)|334:26:13|

|* 3|   TABLEACCESSFULL        |T2       |  100K| 2343K|  889  (1)|00:00:11|

|* 4|   INDEXRANGESCAN         |IND_T1_C3| 1000|      |    3  (0)|00:00:01|

|* 5|  TABLEACCESSBYINDEXROWID|T1       |  993|23832| 1003  (0)|00:00:13|

------------------------------------------------------------------------------------------

 

PredicateInformation(identifiedbyoperationid):

---------------------------------------------------

  3-filter("T2"."C2">=900000AND"T2"."C2"<=1000000)

  4-access("T1"."C3"="T2"."C3")

  5-filter("T1"."C1">=7000AND"T1"."C1"<=1000000) 


Asshownabove,tableT2isnowthedrivingtableforthenestedloopsjoin. So,theremustbeotherfactorsbeyondtable(orbetterwordedrowsource)sizeandthenumberofrowsthatwillberetrievedfromthetables. YoumightbewonderingiftheCLUSTERING_FACTOR
oftheindexesalsoplaysaroleindeterminingwhichtableisthedrivingtable:

SETAUTOTRACEOFF


SELECT

 TABLE_NAME,

 INDEX_NAME,

 CLUSTERING_FACTOR,

 NUM_ROWS

FROM

 USER_INDEXES

WHERE

 TABLE_NAMEIN('T1','T2')

ORDERBY

 TABLE_NAME,

 INDEX_NAME;

 

TABLE_NAMEINDEX_NAMECLUSTERING_FACTOR  NUM_ROWS

-----------------------------------------------

T1        IND_T1_C1             32259   1000000

T1        IND_T1_C2             32259   1000000

T1        IND_T1_C3           1000000   1000000

T2        IND_T2_C1              3226    100000

T2        IND_T2_C2              3226    100000

T2        IND_T2_C3            100000    100000 


Isuggested(withoutchecking)intheOTNthreadthattheCLUSTERING_FACTORoftheindexoncolumnsC2wouldbehigherthantheCLUSTERING_FACTORoftheindexoncolumnsC1becauseofthereverse(descending)orderinwhichtheC2columnvalueswereinserted
intothetables. Surprisingly(atleasttome),theoptimizersettheCLUSTERING_FACTORoftheC1andC2columnstobethesamevalues,andsettheCLUSTERING_FACTORofcolumnC3tobethesameasthenumberofrowsinthetable. Maybeoneofthereadersof
thisblogarticlecanexplainwhathappenedtotheCLUSTERING_FACTOR.

So,theanswertotheOP’squestionisnotassimpleas“theSmallerTableistheDrivingTable”or ”theLargerTableistheDrivingTable”,butthatthereareotherfactorsinvolved. Ithinkthatitmightbetimeforathirdreadthroughofthebook“Cost-Based
OracleFundamentals”. Inthemeantime,anyonecaretosharemoreinsight(yeswecouldlookinsidea10053tracefile,buttheremustbeatleastoneothertipthatcanbeprovidedwithoutreferencingsuchatracefile).
内容来自用户分享和网络整理,不保证内容的准确性,如有侵权内容,可联系管理员处理 点击这里给我发消息
标签:  Big Table and Small