<rt id="bn8ez"></rt>
<label id="bn8ez"></label>

  • <span id="bn8ez"></span>

    <label id="bn8ez"><meter id="bn8ez"></meter></label>

    Decode360's Blog

    業(yè)精于勤而荒于嬉 QQ:150355677 MSN:decode360@hotmail.com

      BlogJava :: 首頁 :: 新隨筆 :: 聯(lián)系 ::  :: 管理 ::
      302 隨筆 :: 26 文章 :: 82 評論 :: 0 Trackbacks

    ??? Tom的這篇文章介紹了Oracle在不同的情況下,使用RBO和CBO對執(zhí)行計劃的影響。以及在CBO下,對于TABLE的不同大小而產生的變化。在最后還介紹了一下使用現有執(zhí)行計劃對于條件過濾之后記錄數的多少而形成的不同情況,也回答了我以前的一些困擾。但是具體到底多少記錄占比會使CBO發(fā)生變化還是不得而知。另外也可以看一下關于DBMS_STATS包的應用方法,這個包還是很有用的,就是比較復雜一些。

    On Joins and Query Plans ?
    ??? By Tom Kyte

    Our technologist optimizes joins and explains changing plans.

    I know some of the rules about joins and that join 2 here is better than 1. Are there other rules about joins that I should know, particularly which are efficient to use and which are not?

    				1) SELECT distinct D.deptno, D.dname
        FROM dept D, emp E
            WHERE E.deptno = D.deptno
            ORDER BY D.deptno;
    
    2) SELECT D.deptno, D.dname
        FROM dept D
            WHERE EXISTS
                     (
                     SELECT 1
                     FROM emp E
                     WHERE E.deptno = D.deptno
                     )
            ORDER BY D.deptno;
    
    		

    In general, you should phrase the queries in the manner that says it best. If one set of joins were particularly efficient to use in all cases, Oracle would not have implemented the rest of them!

    In general, you use a join when you need data from more than one table in the ultimate SELECT list. Here you need data only from DEPT, so it is unlikely that you would consider a join. You would instead consider using either WHERE EXISTS or WHERE IN. (The cost-based optimizer [CBO] sees them as more or less equivalent—you might, too.)

    I would write the query as either

    				select deptno, dname
      from dept
     where exists 
       ( select NULL 
           from emp 
          where emp.deptno 
             = dept.deptno )
     order by deptno;
    
    		

    or

    				select deptno, dname
      from dept
     where deptno in 
    ( select deptno 
        from emp )
     order by deptno;
    
    		

    In both cases, the optimizer would employ either a semi join, which you cannot specify but the optimizer may perform (it stops joining after the first hit), or an index probe using NESTED LOOPS into EMP to test for row existence.

    This example demonstrates, in general, what happens. When the optimizer detects that DEPT is small and EMP is large, it index-probes the large EMP table for each row in DEPT. On the other hand, if it deems DEPT to be large, the optimizer will do a pure semi join in bulk, with no indexes. I'm using a copy of the EMP and DEPT tables for the test:

    				SQL> create table emp as
      2  select * from scott.emp;
    Table created.
     
    SQL> create table dept as
      2  select * from scott.dept;
    Table created.
     
    SQL> create index emp_deptno_idx
      2  on emp(deptno);
    Index created.
    
    		

    Then I set the first representative statistics. I use SET_TABLE_STATS to make EMP appear large, with 1,000,000 rows, and DEPT appear small, with only 100 rows. Additionally, I tell the optimizer about the index I envision having on EMP(DEPTNO). Because Oracle Database 10g Release 2 computes statistics on an index creation by default, I start by removing any existing statistics on this index and putting in place statistics that might be more representative:

    				
    						SQL> begin
      2   dbms_stats.set_table_stats
      3   ( user,
      4    'EMP',
      5    numrows => 1000000,
      6    numblks => 100000 );
      7   dbms_stats.delete_index_stats
      8   ( user,
      9    'EMP_DEPTNO_IDX' );
     10   dbms_stats.set_index_stats
     11   ( user,
     12    'EMP_DEPTNO_IDX',
     13     numrows => 1000000,
     14     numdist => 10000,
     15     numlblks =>10000 );
     16   dbms_stats.set_column_stats
     17   ( user,
     18    'EMP',
     19    'DEPTNO',
     20     DISTCNT => 10000 );
     21   dbms_stats.set_table_stats
     22   ( user,
     23    'DEPT',
     24     numrows=> 100,
     25     numblks => 100 );
     26  end;
     27  /
    				
    		

    Now I'm ready to see what the optimizer decides to do. I use AUTOTRACE to review the query plans generated for two queries, either of which would be a correct way to ask this question. I'm showing two queries to demonstrate that when developers use the CBO, they don't have to decide on the best way to ask a question—the optimizer recognizes certain constructs and picks the best path possible. This is in contrast to the old rule-based optimizer (RBO), which would view WHERE IN and WHERE EXISTS very differently.

    When using the CBO with a large EMP and a small DEPT, Oracle Database might use a plan such as the one in Listing 1. The optimizer opts to read every row in DEPT and then performs an index range scan—the index probe into the EMP table—to see if the row exists in the other table. Because I have relatively few index probes to perform (about 100 is what the optimizer thinks) and because the EMP table is large, the optimizer uses this plan in both the WHERE IN and WHERE EXISTS cases.

    Code Listing 1: CBO plan with large EMP and small DEPT

    				
    						SQL> set autotrace traceonly explain
    SQL> select deptno, dname
      2    from dept where deptno  in
      3    ( select deptno
      4        from emp )
      5   order by deptno;
     
    Execution Plan
    -----------------------------------------------------------
    Plan hash value: 3383088615
     
    -----------------------------------------------------------
    |  Id |	Operation	| Name	        |   Rows | Bytes  |
    -----------------------------------------------------------
    |   0 |SELECT STATEMENT	|		|   100	 |  3500  |
    |   1 |SORT ORDER BY	|		|   100	 |  3500  |
    |   2 |NESTED LOOPS SEMI|		|   100	 |  3500  | 
    |   3 |TABLE ACCESS FULL| DEPT	        |   100	 |  2200  | 
    |*  4 |INDEX RANGE SCAN	| EMP_DEPTNO_IDX|  1000K |    12M |
    -----------------------------------------------------------
     
    Predicate Information (identified by operation id):
    -----------------------------------------------------------
     
       4 - access("DEPTNO"="DEPTNO")
     
    SQL> select deptno, dname
      2    from dept where exists
      3    ( select null
      4        from emp
      5       where emp.deptno =
      6            dept.deptno )
      7   order by deptno;
     
    Execution Plan
    ------------------------------------------------------------
    Plan hash value: 3383088615
     
    ------------------------------------------------------------
    |  Id |	Operation	 | Name	         |   Rows | Bytes  |
    ------------------------------------------------------------
    |   0 |	SELECT STATEMENT |	         |   100  |  3500  |
    |   1 | SORT ORDER BY	 |		 |   100  |  3500  |
    |   2 | NESTED LOOPS SEMI|		 |   100  |  3500  |
    |   3 | TABLE ACCESS FULL| DEPT	         |   100  |  2200  |
    |*  4 | INDEX RANGE SCAN | EMP_DEPTNO_IDX|  1000K |    12M |
    ------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ------------------------------------------------------------
     
       4 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
    				
    		

    Compare that with the RBO plan, shown in Listing 2. When using the RBO, the developer must ask, "What size will EMP and DEPT be?" and, depending on the answer, choose either WHERE IN or WHERE EXISTS to get an "optimal" plan. The RBO comes up with two different plans based solely on how the question was asked, not on how large the amount of data was. This is one of the major advantages of the CBO over the RBO: Developers don't have to know how to write the best query in all cases—which is not to say that they don't need to learn SQL! They just do not have to figure out minutiae such as where they should use WHERE IN versus WHERE EXISTS.

    Code Listing 2: RBO plan

    				
    						SQL> select /*+ RULE */
      2    deptno, dname
      3    from dept where deptno  in
      4    ( select deptno
      5        from emp )
      6   order by deptno;
     
    Execution Plan
    -------------------------------------
    Plan hash value: 1932208647
     
    -------------------------------------
    |  Id |	Operation	 | Name	    |
    -------------------------------------
    |   0 | SELECT STATEMENT |	    |
    |   1 | MERGE JOIN       |	    |
    |   2 | SORT JOIN	 |	    |
    |   3 | TABLE ACCESS FULL| DEPT     |
    |*  4 | SORT JOIN	 |	    |
    |   5 | VIEW		 | VW_NSO_1 |			
    |   6 | SORT UNIQUE	 |          |
    |   7 | TABLE ACCESS FULL| EMP	    |
    -------------------------------------
     
    Predicate Information (identified by operation id):
    -------------------------------------
     
       4 - access("DEPTNO"="$nso_col_1")
           filter("DEPTNO"="$nso_col_1")
     
    Note
    ----------
       - rule based optimizer used (consider using cbo)
     
    SQL> select /*+ RULE */
      2    deptno, dname
      3    from dept where exists
      4    ( select null
      5        from emp
      6       where emp.deptno =
      7            dept.deptno )
      8   order by deptno;
     
    Execution Plan
    ----------------------------------------
    Plan hash value: 4109416194
     
    ----------------------------------------
    | Id | Operation	| Name	       |
    ----------------------------------------
    |  0 | SELECT STATEMENT	|	       |
    |  1 | SORT ORDER BY	|	       |
    |* 2 | FILTER		|	       |
    |  3 | TABLE ACCESS FULL| DEPT 	       |
    |* 4 | INDEX RANGE SCAN	| EMP_DEPTNO_ID|
    ----------------------------------------
     
    Predicate Information (identified by operation id):
    ----------------------------------------
     
       2 - filter( EXISTS (SELECT 0 FROM "EMP" "EMP" WHERE
                    "EMP"."DEPTNO"=:B1))
       4 - access("EMP"."DEPTNO"=:B1)
    				
    		

    Now, back to the CBO. As the amount of data in DEPT increases—and the optimizer decides that it will have to perform significantly more index probes into the EMP table—the query plans change. They go from using NESTED LOOPS and INDEX RANGE SCANS to more-efficient bulk operations. To see this, I tell the optimizer that DEPT is much larger now

    				SQL> begin
      2     dbms_stats.set_table_stats
      3     ( user,
      4      'DEPT',
      5       numrows=> 100000,
      6       numblks => 10000 );
      7  end;
      8  /
    
    		

    and then rerun the queries and review the resulting plans, as shown in Listing 3. As you can see, the optimizer, in both cases, now chooses to perform full scans and a nice big hash semi join—a join that stops after the first match—instead of using an index probe, because it would have to use the index 100,000 times.

    Code Listing 3: CBO plan with larger DEPT

    				
    						SQL> set autotrace traceonly explain
    SQL> select deptno, dname
      2    from dept where deptno  in
      3    ( select deptno
      4        from emp )
      5   order by deptno;
     
    Execution Plan
    -----------------------------------------------------------------
    Plan hash value: 3127359958
    
    -----------------------------------------------------------------
    | Id |	Operation	    | Name	     |	 Rows |  Bytes	|	
    -----------------------------------------------------------------
    |  0 |	SELECT STATEMENT    |		     |   100K |  3417K	|	
    |  1 |  SORT ORDER BY	    |		     |   100K |  3417K	|
    |* 2 |  HASH JOIN SEMI	    |		     |   100K |  3417K	|
    |  3 |  TABLE ACCESS FULL   | DEPT	     |   100K |  2148K	|	
    |  4 |  INDEX FAST FULL SCAN| EMP_DEPTNO_IDX |   1000K|    12M	|
    -----------------------------------------------------------------
    Predicate Information (identified by operation id):
    -----------------------------------------------------------------
     
       2 - access("DEPTNO"="DEPTNO")
     
    SQL> select deptno, dname
      2    from dept where exists
      3    ( select null
      4        from emp
      5       where emp.deptno =
      6            dept.deptno )
      7   order by deptno;
     
    Execution Plan
    ---------------------------------------------------------------
    Plan hash value: 3127359958
     
    ---------------------------------------------------------------
    | Id |	Operation	    | Name	    |	Rows |   Bytes|	
    ---------------------------------------------------------------
    |  0 |	SELECT STATEMENT    |		    |   100K |  3417K |	
    |  1 |  SORT ORDER BY	    |		    |   100K |  3417K | 
    |* 2 |  HASH JOIN SEMI	    |		    |   100K |  3417K | 
    |  3 |  TABLE ACCESS FULL   | DEPT	    |   100K |  2148K |	
    |  4 |  INDEX FAST FULL SCAN| EMP_DEPTNO_IDX|   1000K|    12M |	
    ---------------------------------------------------------------
     
    Predicate Information (identified by operation id):
    ---------------------------------------------------------------
     
       2 - access("EMP"."DEPTNO"="DEPT"."DEPTNO")
    				
    		

    Another important thing to note about these examples is that the RBO never performed a semi join, as the CBO did. The RBO has a very limited set of access paths and join operations available—it simply doesn't know how to do a semi join and won't use that join operation. Instead it performs operations such as the SORT DISTINCT it did in Listing 2, which is much less efficient than the cost-based plan in this case. Another reason to abandon the RBO in favor of the CBO!

    Why Does My Plan Change?

    I have noticed that gathering statistics against tables where none of the data has changed can cause the query plans against those tables to change. For example, one day I gather statistics, run some queries, and save the query plans in a report. The next day, I gather statistics (but I have not modified any of the data in any table) and I discover that the plans are different. What could cause that? The new query plans were better, but I would like to understand how this could be.

    This change is likely due to the default value of the METHOD_OPT parameter used with DBMS_STATS. In Oracle Database 10g, the METHOD_OPT parameter defaults to a SIZE AUTO. After you ran a query, the database remembered the predicates and updated a dictionary table, SYS.COL_USAGE$. Then, the next time you ran DBMS_STATS to gather statistics on these tables, DBMS_STATS queried that table to find out what columns should have histograms collected automatically, based on past query workload. It looked at your predicates and said, "Hmm, these columns are candidates for histograms based on the queries the end users have been running."

    You can see this easily with a relatively small example. I start out by creating a small table with some skewed data in a column named ID and gathering the default set of statistics on it. The data in the ID column is such that for the values 0 through 4, about 20 percent of the table will be retrieved, but for the value 99, only a single row will be returned. I'm using this exaggerated example just to see the plans change:

    				SQL> create table t
      2  as
      3  select mod(rownum,5) id, a.*
      4    from all_objects a;
    Table created.
    SQL> update t
      2     set id = 99
      3   where rownum = 1;
    1 row updated.
     
    SQL> create index t_idx on t(id);
    Index created.
     
    SQL> begin
      2   dbms_stats.gather_table_stats
      3   ( user, 'T' );
      4  end;
      5  /
    
    SQL> select column_name, count(*)
      2    from user_tab_histograms
      3   where table_name = 'T'
      4     and column_name = 'ID'
      5   group by column_name;
     
    COLUMN_NAME   COUNT(*)
    -----------   ------------
    ID                  2
    
    		

    So, right now, the ID column doesn't have complete histograms, by default—just two buckets, as shown by the query against USER_TAB_HISTOGRAMS. To understand the skewed nature of the data, I need a histogram with more than just two buckets; the two entries in USER_TAB_HISTOGRAMS tell the optimizer only the high and low values right now.

    The optimizer knows the high value (99), the low value (0), the number of distinct values (6 in this case), and the number of rows in my table T (50,119 when I tested). Given those facts, the optimizer will believe that WHERE ID = <value> will return about 50,119/6, or 8,353 rows. Sure enough, when I query with ID=1 or ID=99, I observe the results in Listing 4.

    Code Listing 4: CBO plans when ID=1 and ID=99

    				SQL> set autotrace traceonly explain
    SQL> select *
      2    from t
      3   where id = 1;
     
    ----------------------------------------------------------------------------
    |  Id |	Operation	 | Name	|  Rows	|  Bytes|   Cost (%CPU)	|  Time	   |
    ----------------------------------------------------------------------------
    |   0 |	SELECT STATEMENT |	|  8353	|   783K|   163	(2)	| 00:00:02 |
    |*  1 | TABLE ACCESS FULL| T	|  8353	|   783K|   163	(2)	| 00:00:02 |
    ----------------------------------------------------------------------------
     
    SQL> select *
      2    from t
      3   where id = 99;
     
    Execution Plan
    ----------------------------------------------------------------------------
    Plan hash value: 1601196873
     
    ----------------------------------------------------------------------------
    |  Id |	Operation	 | Name	|   Rows |  Bytes|  Cost (%CPU)	| Time	   |
    ----------------------------------------------------------------------------
    |   0 |	SELECT STATEMENT |	|   8353 |   783K|   163 (2)	| 00:00:02 |
    |*  1 | TABLE ACCESS FULL| T	|   8353 |   783K|   163 (2)	| 00:00:02 |
    ----------------------------------------------------------------------------
    
    		

    Now, I immediately gather statistics, using the same command as before:

    				SQL> begin
      2   dbms_stats.gather_table_stats
      3   ( user, 'T' );
      4  end;
      5  /
    SQL> select column_name, count(*)
      2    from user_tab_histograms
      3   where table_name = 'T'
      4     and column_name = 'ID'
      5   group by column_name;
     
    COLUMN_NAME   COUNT(*)
    ------------  -----------
    ID                  5
    
    		

    Note, however, that I have more than two buckets for my histograms. DBMS_STATS, using the AUTO setting, gathered more information here. If I were to query the SYS.COL_USAGE$ table right now, I would discover that a new row was added, indicating that I have some queries in my system that use equality predicates against this particular database column. That is the "magic" that caused DBMS_STATS to change how it gathered statistics and caused a dramatic change in my resulting query plans, as shown in Listing 5.

    Code Listing 5: New CBO plans when ID=1 and ID=99

    				
    						
    								SQL> select *
      2    from t;
      3   where id = 1;
     
    Execution Plan
    ----------------------------------------------------------------------------
    Plan hash value: 1601196873
    
    ----------------------------------------------------------------------------
    | Id |	Operation	 | Name	|  Rows	|  Bytes|  Cost (%CPU)	| Time	   |
    ----------------------------------------------------------------------------
    |  0 |	SELECT STATEMENT |	|  10260|   961K|   164	(2)	| 00:00:02 |
    |* 1 |  TABLE ACCESS FULL| T	|  10260|   961K|   164	(2)	| 00:00:02 |
    ----------------------------------------------------------------------------
     
    SQL> select *
      2    from t
      3   where id = 99;
     
    Execution Plan
    ----------------------------------------------------------------------------------------
    Plan hash value: 470836197
     
    ----------------------------------------------------------------------------------------
    | Id |  Operation	           | Name    | Rows |  Bytes|  Cost (%CPU)  |  Time    |
    ----------------------------------------------------------------------------------------
    |  0 |  SELECT STATEMENT	   |	     | 1    |    96	|   2	(0) | 00:00:01 |
    |  1 |  TABLE ACCESS BY INDEX ROWID|	T    | 1    |    96	|   2	(0) | 00:00:01 |
    |* 2 |  INDEX RANGE SCAN	   |	T_IDX| 1    |   	|   1	(0) | 00:00:01 |
    ----------------------------------------------------------------------------------------
    				
    		

    Note how the cardinalities are very different—8,353 changed to either 10,260 or 1. The resulting change in estimated cardinality led to a change in the overall cost of the first query plan and resulted in an entirely different (but better) query plan in the second case.

    This is a significant fact for a DBA to be aware of. As the queries submitted by your end users change over time, the AUTO feature of statistics gathering could change what statistics are gathered over time as well. Unless you understand this feature, enabled by default in Oracle Database 10g, it can look as if inexplicable magic is happening in your database.


    Tom Kyte (thomas.kyte@oracle.com) has worked for Oracle since 1993. He is a vice president in the Oracle Public Sector group and the author of Expert Oracle Database Architecture: 9 i and 10 g Programming Techniques and Solutions (Apress, 2005) and Effective Oracle by Design (Oracle Press, 2003), among others.




    -The End-

    posted on 2009-03-28 21:48 decode360-3 閱讀(239) 評論(0)  編輯  收藏 所屬分類: Oracle
    主站蜘蛛池模板: 爱丫爱丫影院在线观看免费| 色欲aⅴ亚洲情无码AV| 中文字幕视频免费在线观看| 精品日韩99亚洲的在线发布| 日韩免费无码一区二区三区| 亚洲人成网站在线观看播放| a级毛片免费播放| 亚洲av无码一区二区三区网站| 亚洲人成网男女大片在线播放| 99精品一区二区免费视频| 91在线亚洲精品专区| 免费无码专区毛片高潮喷水| 免费大黄网站在线观看| 美女尿口扒开图片免费| 亚洲成A人片77777国产| 久久九九免费高清视频| 浮力影院第一页小视频国产在线观看免费 | 亚洲精品二区国产综合野狼| 精品无码一级毛片免费视频观看 | 亚洲国产成人爱av在线播放| 国产V片在线播放免费无码| 亚洲av不卡一区二区三区| 99爱免费观看视频在线| 91丁香亚洲综合社区| 凹凸精品视频分类国产品免费| 一进一出60分钟免费视频| 亚洲福利视频导航| 三年片在线观看免费大全| 日本久久久久亚洲中字幕| 你好老叔电影观看免费| 久久精品国产亚洲AV无码娇色| 日本免费xxxx| 日韩在线视频线视频免费网站| 亚洲综合国产一区二区三区| 亚洲成人在线免费观看| 免费夜色污私人影院网站电影| 亚洲av无码乱码国产精品| 成年大片免费视频| 中文字幕不卡高清免费| 亚洲男人的天堂久久精品| 久久久久一级精品亚洲国产成人综合AV区|